linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/9] A course adjustment, maybe
@ 2022-10-05 14:55 Chuck Lever
  2022-10-05 14:55 ` [PATCH RFC 1/9] nfsd: fix nfsd_file_unhash_and_dispose Chuck Lever
                   ` (8 more replies)
  0 siblings, 9 replies; 18+ messages in thread
From: Chuck Lever @ 2022-10-05 14:55 UTC (permalink / raw)
  To: linux-nfs

I'm proposing this series as the first NFSD-related patchset to go
into v6.2 (for-next), though I haven't opened that yet.

For quite some time, we've been encouraged to disable filecache
garbage collection for NFSv4 files, and I think I found a surgical
way to do just that. That is presented in "NFSD: Add an NFSD_FILE_GC
flag to enable nfsd_file garbage collection".

To make that fit, I've dropped Jeff's fix for nfsd_file_close(), but
incorporated his proposed replacement logic into nfsd_file_put().
The justification for that can be found in the patch descriptions.

Jeff's other two patches are included here because I intend to get
them merged into v6.1 soon thus they will become part of the base
for NFSD for-next. For the moment I'm leaving out Fixes tags because
I'd like to see these get some testing before they are applied to
v6.0 -- and again, we're not yet 100% sure these fix a misbehavior
that has been hit in the field.

Comments and opinions are welcome.

---

Chuck Lever (7):
      NFSD: Pass the target nfsd_file to nfsd_commit()
      NFSD: Revert "NFSD: NFSv4 CLOSE should release an nfsd_file immediately"
      NFSD: Add an NFSD_FILE_GC flag to enable nfsd_file garbage collection
      NFSD: Use const pointers as parameters to fh_ helpers.
      NFSD: Use rhashtable for managing nfs4_file objects
      NFSD: Clean up nfs4_preprocess_stateid_op() call sites
      NFSD: Trace delegation revocations

Jeff Layton (2):
      nfsd: fix nfsd_file_unhash_and_dispose
      nfsd: rework hashtable handling in nfsd_do_file_acquire


 fs/nfsd/filecache.c | 165 ++++++++++++++++---------------
 fs/nfsd/filecache.h |   4 +-
 fs/nfsd/nfs3proc.c  |  10 +-
 fs/nfsd/nfs4proc.c  |  42 ++++----
 fs/nfsd/nfs4state.c | 235 ++++++++++++++++++++++++++++++--------------
 fs/nfsd/nfsfh.h     |  10 +-
 fs/nfsd/state.h     |   5 +-
 fs/nfsd/trace.h     |  58 ++++++++++-
 fs/nfsd/vfs.c       |  19 ++--
 fs/nfsd/vfs.h       |   3 +-
 10 files changed, 344 insertions(+), 207 deletions(-)

--
Chuck Lever


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH RFC 1/9] nfsd: fix nfsd_file_unhash_and_dispose
  2022-10-05 14:55 [PATCH RFC 0/9] A course adjustment, maybe Chuck Lever
@ 2022-10-05 14:55 ` Chuck Lever
  2022-10-05 14:55 ` [PATCH RFC 2/9] nfsd: rework hashtable handling in nfsd_do_file_acquire Chuck Lever
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: Chuck Lever @ 2022-10-05 14:55 UTC (permalink / raw)
  To: linux-nfs

From: Jeff Layton <jlayton@kernel.org>

nfsd_file_unhash_and_dispose() is called for two reasons:

We're either shutting down and purging the filecache, or we've gotten a
notification about a file delete, so we want to go ahead and unhash it
so that it'll get cleaned up when we close.

We're either walking the hashtable or doing a lookup in it and we
don't take a reference in either case. What we want to do in both cases
is to try and unhash the object and put it on the dispose list if that
was successful. If it's no longer hashed, then we don't want to touch
it, with the assumption being that something else is already cleaning
up the sentinel reference.

Instead of trying to selectively decrement the refcount in this
function, just unhash it, and if that was successful, move it to the
dispose list. Then, the disposal routine will just clean that up as
usual.

Also, just make this a void function, drop the WARN_ON_ONCE, and the
comments about deadlocking since the nature of the purported deadlock
is no longer clear.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c |   36 +++++++-----------------------------
 1 file changed, 7 insertions(+), 29 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index eeed4ae5b4ad..844c41832d50 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -405,22 +405,15 @@ nfsd_file_unhash(struct nfsd_file *nf)
 	return false;
 }
 
-/*
- * Return true if the file was unhashed.
- */
-static bool
+static void
 nfsd_file_unhash_and_dispose(struct nfsd_file *nf, struct list_head *dispose)
 {
 	trace_nfsd_file_unhash_and_dispose(nf);
-	if (!nfsd_file_unhash(nf))
-		return false;
-	/* keep final reference for nfsd_file_lru_dispose */
-	if (refcount_dec_not_one(&nf->nf_ref))
-		return true;
-
-	nfsd_file_lru_remove(nf);
-	list_add(&nf->nf_lru, dispose);
-	return true;
+	if (nfsd_file_unhash(nf)) {
+		/* caller must call nfsd_file_dispose_list() later */
+		nfsd_file_lru_remove(nf);
+		list_add(&nf->nf_lru, dispose);
+	}
 }
 
 static void
@@ -562,8 +555,6 @@ nfsd_file_dispose_list_delayed(struct list_head *dispose)
  * @lock: LRU list lock (unused)
  * @arg: dispose list
  *
- * Note this can deadlock with nfsd_file_cache_purge.
- *
  * Return values:
  *   %LRU_REMOVED: @item was removed from the LRU
  *   %LRU_ROTATE: @item is to be moved to the LRU tail
@@ -748,8 +739,6 @@ nfsd_file_close_inode(struct inode *inode)
  *
  * Walk the LRU list and close any entries that have not been used since
  * the last scan.
- *
- * Note this can deadlock with nfsd_file_cache_purge.
  */
 static void
 nfsd_file_delayed_close(struct work_struct *work)
@@ -891,16 +880,12 @@ nfsd_file_cache_init(void)
 	goto out;
 }
 
-/*
- * Note this can deadlock with nfsd_file_lru_cb.
- */
 static void
 __nfsd_file_cache_purge(struct net *net)
 {
 	struct rhashtable_iter iter;
 	struct nfsd_file *nf;
 	LIST_HEAD(dispose);
-	bool del;
 
 	rhashtable_walk_enter(&nfsd_file_rhash_tbl, &iter);
 	do {
@@ -910,14 +895,7 @@ __nfsd_file_cache_purge(struct net *net)
 		while (!IS_ERR_OR_NULL(nf)) {
 			if (net && nf->nf_net != net)
 				continue;
-			del = nfsd_file_unhash_and_dispose(nf, &dispose);
-
-			/*
-			 * Deadlock detected! Something marked this entry as
-			 * unhased, but hasn't removed it from the hash list.
-			 */
-			WARN_ON_ONCE(!del);
-
+			nfsd_file_unhash_and_dispose(nf, &dispose);
 			nf = rhashtable_walk_next(&iter);
 		}
 



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC 2/9] nfsd: rework hashtable handling in nfsd_do_file_acquire
  2022-10-05 14:55 [PATCH RFC 0/9] A course adjustment, maybe Chuck Lever
  2022-10-05 14:55 ` [PATCH RFC 1/9] nfsd: fix nfsd_file_unhash_and_dispose Chuck Lever
@ 2022-10-05 14:55 ` Chuck Lever
  2022-10-05 14:55 ` [PATCH RFC 3/9] NFSD: Pass the target nfsd_file to nfsd_commit() Chuck Lever
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: Chuck Lever @ 2022-10-05 14:55 UTC (permalink / raw)
  To: linux-nfs

From: Jeff Layton <jlayton@kernel.org>

nfsd_file is RCU-freed, so we need to hold the rcu_read_lock long enough
to get a reference after finding it in the hash. Take the
rcu_read_lock() and call rhashtable_lookup directly.

Switch to using rhashtable_lookup_insert_key as well, and use the usual
retry mechanism if we hit an -EEXIST. Rename the "retry" bool to
open_retry, and eliminiate the insert_err goto target.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c |   52 ++++++++++++++++++++++-----------------------------
 1 file changed, 22 insertions(+), 30 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 844c41832d50..a2adfc247648 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -1042,9 +1042,10 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		.need	= may_flags & NFSD_FILE_MAY_MASK,
 		.net	= SVC_NET(rqstp),
 	};
-	struct nfsd_file *nf, *new;
-	bool retry = true;
+	bool open_retry = true;
+	struct nfsd_file *nf;
 	__be32 status;
+	int ret;
 
 	status = fh_verify(rqstp, fhp, S_IFREG,
 				may_flags|NFSD_MAY_OWNER_OVERRIDE);
@@ -1054,35 +1055,33 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	key.cred = get_current_cred();
 
 retry:
-	/* Avoid allocation if the item is already in cache */
-	nf = rhashtable_lookup_fast(&nfsd_file_rhash_tbl, &key,
-				    nfsd_file_rhash_params);
+	rcu_read_lock();
+	nf = rhashtable_lookup(&nfsd_file_rhash_tbl, &key,
+			       nfsd_file_rhash_params);
 	if (nf)
 		nf = nfsd_file_get(nf);
+	rcu_read_unlock();
 	if (nf)
 		goto wait_for_construction;
 
-	new = nfsd_file_alloc(&key, may_flags);
-	if (!new) {
+	nf = nfsd_file_alloc(&key, may_flags);
+	if (!nf) {
 		status = nfserr_jukebox;
 		goto out_status;
 	}
 
-	nf = rhashtable_lookup_get_insert_key(&nfsd_file_rhash_tbl,
-					      &key, &new->nf_rhash,
-					      nfsd_file_rhash_params);
-	if (!nf) {
-		nf = new;
-		goto open_file;
-	}
-	if (IS_ERR(nf))
-		goto insert_err;
-	nf = nfsd_file_get(nf);
-	if (nf == NULL) {
-		nf = new;
+	ret = rhashtable_lookup_insert_key(&nfsd_file_rhash_tbl,
+					   &key, &nf->nf_rhash,
+					   nfsd_file_rhash_params);
+	if (likely(ret == 0))
 		goto open_file;
-	}
-	nfsd_file_slab_free(&new->nf_rcu);
+
+	nfsd_file_slab_free(&nf->nf_rcu);
+	if (ret == -EEXIST)
+		goto retry;
+	trace_nfsd_file_insert_err(rqstp, key.inode, may_flags, ret);
+	status = nfserr_jukebox;
+	goto out_status;
 
 wait_for_construction:
 	wait_on_bit(&nf->nf_flags, NFSD_FILE_PENDING, TASK_UNINTERRUPTIBLE);
@@ -1090,11 +1089,11 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	/* Did construction of this file fail? */
 	if (!test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
 		trace_nfsd_file_cons_err(rqstp, key.inode, may_flags, nf);
-		if (!retry) {
+		if (!open_retry) {
 			status = nfserr_jukebox;
 			goto out;
 		}
-		retry = false;
+		open_retry = false;
 		nfsd_file_put_noref(nf);
 		goto retry;
 	}
@@ -1142,13 +1141,6 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	smp_mb__after_atomic();
 	wake_up_bit(&nf->nf_flags, NFSD_FILE_PENDING);
 	goto out;
-
-insert_err:
-	nfsd_file_slab_free(&new->nf_rcu);
-	trace_nfsd_file_insert_err(rqstp, key.inode, may_flags, PTR_ERR(nf));
-	nf = NULL;
-	status = nfserr_jukebox;
-	goto out_status;
 }
 
 /**



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC 3/9] NFSD: Pass the target nfsd_file to nfsd_commit()
  2022-10-05 14:55 [PATCH RFC 0/9] A course adjustment, maybe Chuck Lever
  2022-10-05 14:55 ` [PATCH RFC 1/9] nfsd: fix nfsd_file_unhash_and_dispose Chuck Lever
  2022-10-05 14:55 ` [PATCH RFC 2/9] nfsd: rework hashtable handling in nfsd_do_file_acquire Chuck Lever
@ 2022-10-05 14:55 ` Chuck Lever
  2022-10-05 14:56 ` [PATCH RFC 4/9] NFSD: Revert "NFSD: NFSv4 CLOSE should release an nfsd_file immediately" Chuck Lever
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: Chuck Lever @ 2022-10-05 14:55 UTC (permalink / raw)
  To: linux-nfs

In a moment I'm going to introduce separate nfsd_file types, one of
which is garbage-collected; the other, not. The garbage-collected
variety is to be used by NFSv2 and v3, and the non-garbage-collected
variety is to be used by NFSv4.

nfsd_commit() is invoked by both NFSv3 and NFSv4 consumers. We want
nfsd_commit() to find and use the correct variety of cached
nfsd_file object for the NFS version that is in use.

This is a refactoring change. No behavior change is expected.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/nfs3proc.c |   10 +++++++++-
 fs/nfsd/nfs4proc.c |   11 ++++++++++-
 fs/nfsd/vfs.c      |   15 ++++-----------
 fs/nfsd/vfs.h      |    3 ++-
 4 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
index a41cca619338..d12823371857 100644
--- a/fs/nfsd/nfs3proc.c
+++ b/fs/nfsd/nfs3proc.c
@@ -13,6 +13,7 @@
 #include "cache.h"
 #include "xdr3.h"
 #include "vfs.h"
+#include "filecache.h"
 
 #define NFSDDBG_FACILITY		NFSDDBG_PROC
 
@@ -770,6 +771,7 @@ nfsd3_proc_commit(struct svc_rqst *rqstp)
 {
 	struct nfsd3_commitargs *argp = rqstp->rq_argp;
 	struct nfsd3_commitres *resp = rqstp->rq_resp;
+	struct nfsd_file *nf;
 
 	dprintk("nfsd: COMMIT(3)   %s %u@%Lu\n",
 				SVCFH_fmt(&argp->fh),
@@ -777,8 +779,14 @@ nfsd3_proc_commit(struct svc_rqst *rqstp)
 				(unsigned long long) argp->offset);
 
 	fh_copy(&resp->fh, &argp->fh);
-	resp->status = nfsd_commit(rqstp, &resp->fh, argp->offset,
+	resp->status = nfsd_file_acquire(rqstp, &resp->fh, NFSD_MAY_WRITE |
+					 NFSD_MAY_NOT_BREAK_LEASE, &nf);
+	if (resp->status)
+		goto out;
+	resp->status = nfsd_commit(rqstp, &resp->fh, nf, argp->offset,
 				   argp->count, resp->verf);
+	nfsd_file_put(nf);
+out:
 	return rpc_success;
 }
 
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index a72ab97f77ef..d0d976f847ca 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -739,10 +739,19 @@ nfsd4_commit(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	     union nfsd4_op_u *u)
 {
 	struct nfsd4_commit *commit = &u->commit;
+	struct nfsd_file *nf;
+	__be32 status;
 
-	return nfsd_commit(rqstp, &cstate->current_fh, commit->co_offset,
+	status = nfsd_file_acquire(rqstp, &cstate->current_fh, NFSD_MAY_WRITE |
+				   NFSD_MAY_NOT_BREAK_LEASE, &nf);
+	if (status != nfs_ok)
+		return status;
+
+	status = nfsd_commit(rqstp, &cstate->current_fh, nf, commit->co_offset,
 			     commit->co_count,
 			     (__be32 *)commit->co_verf.data);
+	nfsd_file_put(nf);
+	return status;
 }
 
 static __be32
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index fc17b0ac8729..44f210ba17cc 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1108,6 +1108,7 @@ nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t offset,
  * nfsd_commit - Commit pending writes to stable storage
  * @rqstp: RPC request being processed
  * @fhp: NFS filehandle
+ * @nf: target file
  * @offset: raw offset from beginning of file
  * @count: raw count of bytes to sync
  * @verf: filled in with the server's current write verifier
@@ -1124,19 +1125,13 @@ nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t offset,
  *   An nfsstat value in network byte order.
  */
 __be32
-nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, u64 offset,
-	    u32 count, __be32 *verf)
+nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf,
+	    u64 offset, u32 count, __be32 *verf)
 {
+	__be32			err = nfs_ok;
 	u64			maxbytes;
 	loff_t			start, end;
 	struct nfsd_net		*nn;
-	struct nfsd_file	*nf;
-	__be32			err;
-
-	err = nfsd_file_acquire(rqstp, fhp,
-			NFSD_MAY_WRITE|NFSD_MAY_NOT_BREAK_LEASE, &nf);
-	if (err)
-		goto out;
 
 	/*
 	 * Convert the client-provided (offset, count) range to a
@@ -1177,8 +1172,6 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, u64 offset,
 	} else
 		nfsd_copy_write_verifier(verf, nn);
 
-	nfsd_file_put(nf);
-out:
 	return err;
 }
 
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index c95cd414b4bb..04dfd80570f0 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -88,7 +88,8 @@ __be32		nfsd_access(struct svc_rqst *, struct svc_fh *, u32 *, u32 *);
 __be32		nfsd_create_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp,
 				struct svc_fh *resfhp, struct nfsd_attrs *iap);
 __be32		nfsd_commit(struct svc_rqst *rqst, struct svc_fh *fhp,
-				u64 offset, u32 count, __be32 *verf);
+				struct nfsd_file *nf, u64 offset, u32 count,
+				__be32 *verf);
 #ifdef CONFIG_NFSD_V4
 __be32		nfsd_getxattr(struct svc_rqst *rqstp, struct svc_fh *fhp,
 			    char *name, void **bufp, int *lenp);



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC 4/9] NFSD: Revert "NFSD: NFSv4 CLOSE should release an nfsd_file immediately"
  2022-10-05 14:55 [PATCH RFC 0/9] A course adjustment, maybe Chuck Lever
                   ` (2 preceding siblings ...)
  2022-10-05 14:55 ` [PATCH RFC 3/9] NFSD: Pass the target nfsd_file to nfsd_commit() Chuck Lever
@ 2022-10-05 14:56 ` Chuck Lever
  2022-10-05 14:56 ` [PATCH RFC 5/9] NFSD: Add an NFSD_FILE_GC flag to enable nfsd_file garbage collection Chuck Lever
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: Chuck Lever @ 2022-10-05 14:56 UTC (permalink / raw)
  To: linux-nfs

This reverts commit 5e138c4a750dc140d881dab4a8804b094bbc08d2.

That commit attempted to make files available to other users as soon
as all NFSv4 clients were done with them, rather than waiting until
the filecache LRU had garbage collected them.

It gets the reference counting wrong, for one thing.

But it also misses that DELEGRETURN should release a file in the
same fashion. In fact, any nfsd_file_put() on an file held open
by an NFSv4 client needs potentially to release the file
immediately...

Clear the way for implementing that idea.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c |   18 ------------------
 fs/nfsd/filecache.h |    1 -
 fs/nfsd/nfs4state.c |    4 ++--
 3 files changed, 2 insertions(+), 21 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index a2adfc247648..b7aa523c2010 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -444,24 +444,6 @@ nfsd_file_put(struct nfsd_file *nf)
 		nfsd_file_put_noref(nf);
 }
 
-/**
- * nfsd_file_close - Close an nfsd_file
- * @nf: nfsd_file to close
- *
- * If this is the final reference for @nf, free it immediately.
- * This reflects an on-the-wire CLOSE or DELEGRETURN into the
- * VFS and exported filesystem.
- */
-void nfsd_file_close(struct nfsd_file *nf)
-{
-	nfsd_file_put(nf);
-	if (refcount_dec_if_one(&nf->nf_ref)) {
-		nfsd_file_unhash(nf);
-		nfsd_file_lru_remove(nf);
-		nfsd_file_free(nf);
-	}
-}
-
 struct nfsd_file *
 nfsd_file_get(struct nfsd_file *nf)
 {
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index 8e8c0c47d67d..f81c198f4ed6 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -52,7 +52,6 @@ void nfsd_file_cache_shutdown(void);
 int nfsd_file_cache_start_net(struct net *net);
 void nfsd_file_cache_shutdown_net(struct net *net);
 void nfsd_file_put(struct nfsd_file *nf);
-void nfsd_file_close(struct nfsd_file *nf);
 struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
 void nfsd_file_close_inode_sync(struct inode *inode);
 bool nfsd_file_is_cached(struct inode *inode);
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index c5d199d7e6b4..2b850de288cf 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -820,9 +820,9 @@ static void __nfs4_file_put_access(struct nfs4_file *fp, int oflag)
 			swap(f2, fp->fi_fds[O_RDWR]);
 		spin_unlock(&fp->fi_lock);
 		if (f1)
-			nfsd_file_close(f1);
+			nfsd_file_put(f1);
 		if (f2)
-			nfsd_file_close(f2);
+			nfsd_file_put(f2);
 	}
 }
 



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC 5/9] NFSD: Add an NFSD_FILE_GC flag to enable nfsd_file garbage collection
  2022-10-05 14:55 [PATCH RFC 0/9] A course adjustment, maybe Chuck Lever
                   ` (3 preceding siblings ...)
  2022-10-05 14:56 ` [PATCH RFC 4/9] NFSD: Revert "NFSD: NFSv4 CLOSE should release an nfsd_file immediately" Chuck Lever
@ 2022-10-05 14:56 ` Chuck Lever
  2022-10-06 15:59   ` Jeff Layton
  2022-10-05 14:56 ` [PATCH RFC 6/9] NFSD: Use const pointers as parameters to fh_ helpers Chuck Lever
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 18+ messages in thread
From: Chuck Lever @ 2022-10-05 14:56 UTC (permalink / raw)
  To: linux-nfs

NFSv4 operations manage the lifetime of nfsd_file items they use by
means of NFSv4 OPEN and CLOSE. Hence there's no need for them to be
garbage collected.

Introduce a mechanism to enable garbage collection for nfsd_file
items used only by NFSv2/3 callers.

Note that the change in nfsd_file_put() ensures that both CLOSE and
DELEGRETURN will actually close out and free an nfsd_file on last
reference of a non-garbage-collected file.

Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=394
Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c |   61 +++++++++++++++++++++++++++++++++++++++++++++------
 fs/nfsd/filecache.h |    3 +++
 fs/nfsd/nfs3proc.c  |    4 ++-
 fs/nfsd/trace.h     |    3 ++-
 fs/nfsd/vfs.c       |    4 ++-
 5 files changed, 63 insertions(+), 12 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index b7aa523c2010..01c27deabf83 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -63,6 +63,7 @@ struct nfsd_file_lookup_key {
 	struct net			*net;
 	const struct cred		*cred;
 	unsigned char			need;
+	unsigned char			gc:1;
 	enum nfsd_file_lookup_type	type;
 };
 
@@ -162,6 +163,8 @@ static int nfsd_file_obj_cmpfn(struct rhashtable_compare_arg *arg,
 			return 1;
 		if (!nfsd_match_cred(nf->nf_cred, key->cred))
 			return 1;
+		if (test_bit(NFSD_FILE_GC, &nf->nf_flags) != key->gc)
+			return 1;
 		if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0)
 			return 1;
 		break;
@@ -297,6 +300,8 @@ nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may)
 		nf->nf_flags = 0;
 		__set_bit(NFSD_FILE_HASHED, &nf->nf_flags);
 		__set_bit(NFSD_FILE_PENDING, &nf->nf_flags);
+		if (key->gc)
+			__set_bit(NFSD_FILE_GC, &nf->nf_flags);
 		nf->nf_inode = key->inode;
 		/* nf_ref is pre-incremented for hash table */
 		refcount_set(&nf->nf_ref, 2);
@@ -428,16 +433,27 @@ nfsd_file_put_noref(struct nfsd_file *nf)
 	}
 }
 
+static void
+nfsd_file_unhash_and_put(struct nfsd_file *nf)
+{
+	if (nfsd_file_unhash(nf))
+		nfsd_file_put_noref(nf);
+}
+
 void
 nfsd_file_put(struct nfsd_file *nf)
 {
 	might_sleep();
 
-	nfsd_file_lru_add(nf);
+	if (test_bit(NFSD_FILE_GC, &nf->nf_flags) == 1)
+		nfsd_file_lru_add(nf);
+	else if (refcount_read(&nf->nf_ref) == 2)
+		nfsd_file_unhash_and_put(nf);
+
 	if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) {
 		nfsd_file_flush(nf);
 		nfsd_file_put_noref(nf);
-	} else if (nf->nf_file) {
+	} else if (nf->nf_file && test_bit(NFSD_FILE_GC, &nf->nf_flags) == 1) {
 		nfsd_file_put_noref(nf);
 		nfsd_file_schedule_laundrette();
 	} else
@@ -1017,12 +1033,14 @@ nfsd_file_is_cached(struct inode *inode)
 
 static __be32
 nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
-		     unsigned int may_flags, struct nfsd_file **pnf, bool open)
+		     unsigned int may_flags, struct nfsd_file **pnf,
+		     bool open, int want_gc)
 {
 	struct nfsd_file_lookup_key key = {
 		.type	= NFSD_FILE_KEY_FULL,
 		.need	= may_flags & NFSD_FILE_MAY_MASK,
 		.net	= SVC_NET(rqstp),
+		.gc	= want_gc,
 	};
 	bool open_retry = true;
 	struct nfsd_file *nf;
@@ -1117,14 +1135,35 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	 * then unhash.
 	 */
 	if (status != nfs_ok || key.inode->i_nlink == 0)
-		if (nfsd_file_unhash(nf))
-			nfsd_file_put_noref(nf);
+		nfsd_file_unhash_and_put(nf);
 	clear_bit_unlock(NFSD_FILE_PENDING, &nf->nf_flags);
 	smp_mb__after_atomic();
 	wake_up_bit(&nf->nf_flags, NFSD_FILE_PENDING);
 	goto out;
 }
 
+/**
+ * nfsd_file_acquire_gc - Get a struct nfsd_file with an open file
+ * @rqstp: the RPC transaction being executed
+ * @fhp: the NFS filehandle of the file to be opened
+ * @may_flags: NFSD_MAY_ settings for the file
+ * @pnf: OUT: new or found "struct nfsd_file" object
+ *
+ * The nfsd_file object returned by this API is reference-counted
+ * and garbage-collected. The object is retained for a few
+ * seconds after the final nfsd_file_put() in case the caller
+ * wants to re-use it.
+ *
+ * Returns nfs_ok and sets @pnf on success; otherwise an nfsstat in
+ * network byte order is returned.
+ */
+__be32
+nfsd_file_acquire_gc(struct svc_rqst *rqstp, struct svc_fh *fhp,
+		     unsigned int may_flags, struct nfsd_file **pnf)
+{
+	return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, true, 1);
+}
+
 /**
  * nfsd_file_acquire - Get a struct nfsd_file with an open file
  * @rqstp: the RPC transaction being executed
@@ -1132,6 +1171,10 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
  * @may_flags: NFSD_MAY_ settings for the file
  * @pnf: OUT: new or found "struct nfsd_file" object
  *
+ * The nfsd_file_object returned by this API is reference-counted
+ * but not garbage-collected. The object is released one RCU grace
+ * period after the final nfsd_file_put().
+ *
  * Returns nfs_ok and sets @pnf on success; otherwise an nfsstat in
  * network byte order is returned.
  */
@@ -1139,7 +1182,7 @@ __be32
 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		  unsigned int may_flags, struct nfsd_file **pnf)
 {
-	return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, true);
+	return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, true, 0);
 }
 
 /**
@@ -1149,6 +1192,10 @@ nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
  * @may_flags: NFSD_MAY_ settings for the file
  * @pnf: OUT: new or found "struct nfsd_file" object
  *
+ * The nfsd_file_object returned by this API is reference-counted
+ * but not garbage-collected. The object is released immediately
+ * one RCU grace period after the final nfsd_file_put().
+ *
  * Returns nfs_ok and sets @pnf on success; otherwise an nfsstat in
  * network byte order is returned.
  */
@@ -1156,7 +1203,7 @@ __be32
 nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		 unsigned int may_flags, struct nfsd_file **pnf)
 {
-	return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, false);
+	return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, false, 0);
 }
 
 /*
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index f81c198f4ed6..0f6546bcd3e0 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -38,6 +38,7 @@ struct nfsd_file {
 #define NFSD_FILE_HASHED	(0)
 #define NFSD_FILE_PENDING	(1)
 #define NFSD_FILE_REFERENCED	(2)
+#define NFSD_FILE_GC		(3)
 	unsigned long		nf_flags;
 	struct inode		*nf_inode;	/* don't deref */
 	refcount_t		nf_ref;
@@ -55,6 +56,8 @@ void nfsd_file_put(struct nfsd_file *nf);
 struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
 void nfsd_file_close_inode_sync(struct inode *inode);
 bool nfsd_file_is_cached(struct inode *inode);
+__be32 nfsd_file_acquire_gc(struct svc_rqst *rqstp, struct svc_fh *fhp,
+		  unsigned int may_flags, struct nfsd_file **nfp);
 __be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		  unsigned int may_flags, struct nfsd_file **nfp);
 __be32 nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
index d12823371857..6a5ad6c91d8e 100644
--- a/fs/nfsd/nfs3proc.c
+++ b/fs/nfsd/nfs3proc.c
@@ -779,8 +779,8 @@ nfsd3_proc_commit(struct svc_rqst *rqstp)
 				(unsigned long long) argp->offset);
 
 	fh_copy(&resp->fh, &argp->fh);
-	resp->status = nfsd_file_acquire(rqstp, &resp->fh, NFSD_MAY_WRITE |
-					 NFSD_MAY_NOT_BREAK_LEASE, &nf);
+	resp->status = nfsd_file_acquire_gc(rqstp, &resp->fh, NFSD_MAY_WRITE |
+					    NFSD_MAY_NOT_BREAK_LEASE, &nf);
 	if (resp->status)
 		goto out;
 	resp->status = nfsd_commit(rqstp, &resp->fh, nf, argp->offset,
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index 9ebd67d461f9..4921144880d3 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -742,7 +742,8 @@ DEFINE_CLID_EVENT(confirmed_r);
 	__print_flags(val, "|",						\
 		{ 1 << NFSD_FILE_HASHED,	"HASHED" },		\
 		{ 1 << NFSD_FILE_PENDING,	"PENDING" },		\
-		{ 1 << NFSD_FILE_REFERENCED,	"REFERENCED"})
+		{ 1 << NFSD_FILE_REFERENCED,	"REFERENCED"},		\
+		{ 1 << NFSD_FILE_GC,		"GC"})
 
 DECLARE_EVENT_CLASS(nfsd_file_class,
 	TP_PROTO(struct nfsd_file *nf),
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 44f210ba17cc..89d682a56fc4 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1060,7 +1060,7 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	__be32 err;
 
 	trace_nfsd_read_start(rqstp, fhp, offset, *count);
-	err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf);
+	err = nfsd_file_acquire_gc(rqstp, fhp, NFSD_MAY_READ, &nf);
 	if (err)
 		return err;
 
@@ -1092,7 +1092,7 @@ nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t offset,
 
 	trace_nfsd_write_start(rqstp, fhp, offset, *cnt);
 
-	err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_WRITE, &nf);
+	err = nfsd_file_acquire_gc(rqstp, fhp, NFSD_MAY_WRITE, &nf);
 	if (err)
 		goto out;
 



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC 6/9] NFSD: Use const pointers as parameters to fh_ helpers.
  2022-10-05 14:55 [PATCH RFC 0/9] A course adjustment, maybe Chuck Lever
                   ` (4 preceding siblings ...)
  2022-10-05 14:56 ` [PATCH RFC 5/9] NFSD: Add an NFSD_FILE_GC flag to enable nfsd_file garbage collection Chuck Lever
@ 2022-10-05 14:56 ` Chuck Lever
  2022-10-06 15:59   ` Jeff Layton
  2022-10-05 14:56 ` [PATCH RFC 7/9] NFSD: Use rhashtable for managing nfs4_file objects Chuck Lever
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 18+ messages in thread
From: Chuck Lever @ 2022-10-05 14:56 UTC (permalink / raw)
  To: linux-nfs

Enable callers to use const pointers where they are able to.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/nfsfh.h |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
index c3ae6414fc5c..513e028b0bbe 100644
--- a/fs/nfsd/nfsfh.h
+++ b/fs/nfsd/nfsfh.h
@@ -220,7 +220,7 @@ __be32	fh_update(struct svc_fh *);
 void	fh_put(struct svc_fh *);
 
 static __inline__ struct svc_fh *
-fh_copy(struct svc_fh *dst, struct svc_fh *src)
+fh_copy(struct svc_fh *dst, const struct svc_fh *src)
 {
 	WARN_ON(src->fh_dentry);
 
@@ -229,7 +229,7 @@ fh_copy(struct svc_fh *dst, struct svc_fh *src)
 }
 
 static inline void
-fh_copy_shallow(struct knfsd_fh *dst, struct knfsd_fh *src)
+fh_copy_shallow(struct knfsd_fh *dst, const struct knfsd_fh *src)
 {
 	dst->fh_size = src->fh_size;
 	memcpy(&dst->fh_raw, &src->fh_raw, src->fh_size);
@@ -243,7 +243,8 @@ fh_init(struct svc_fh *fhp, int maxsize)
 	return fhp;
 }
 
-static inline bool fh_match(struct knfsd_fh *fh1, struct knfsd_fh *fh2)
+static inline bool fh_match(const struct knfsd_fh *fh1,
+			    const struct knfsd_fh *fh2)
 {
 	if (fh1->fh_size != fh2->fh_size)
 		return false;
@@ -252,7 +253,8 @@ static inline bool fh_match(struct knfsd_fh *fh1, struct knfsd_fh *fh2)
 	return true;
 }
 
-static inline bool fh_fsid_match(struct knfsd_fh *fh1, struct knfsd_fh *fh2)
+static inline bool fh_fsid_match(const struct knfsd_fh *fh1,
+				 const struct knfsd_fh *fh2)
 {
 	if (fh1->fh_fsid_type != fh2->fh_fsid_type)
 		return false;



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC 7/9] NFSD: Use rhashtable for managing nfs4_file objects
  2022-10-05 14:55 [PATCH RFC 0/9] A course adjustment, maybe Chuck Lever
                   ` (5 preceding siblings ...)
  2022-10-05 14:56 ` [PATCH RFC 6/9] NFSD: Use const pointers as parameters to fh_ helpers Chuck Lever
@ 2022-10-05 14:56 ` Chuck Lever
  2022-10-05 15:11   ` Chuck Lever III
  2022-10-05 14:56 ` [PATCH RFC 8/9] NFSD: Clean up nfs4_preprocess_stateid_op() call sites Chuck Lever
  2022-10-05 14:56 ` [PATCH RFC 9/9] NFSD: Trace delegation revocations Chuck Lever
  8 siblings, 1 reply; 18+ messages in thread
From: Chuck Lever @ 2022-10-05 14:56 UTC (permalink / raw)
  To: linux-nfs

fh_match() is expensive to use for hash chains that contain more
than a few objects. With common workloads, I see multiple thousands
of objects stored in file_hashtbl[], which always has only 256
buckets.

Replace it with an rhashtable, which dynamically resizes its bucket
array to keep hash chains short.

This also enables the removal of the use of state_lock to serialize
operations on the new rhashtable.

The result is an improvement in the latency of NFSv4 operations
and the reduction of nfsd CPU utilization due to the cache misses
of walking long hash chains in file_hashtbl.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/nfs4state.c |  229 +++++++++++++++++++++++++++++++++++----------------
 fs/nfsd/state.h     |    5 -
 2 files changed, 158 insertions(+), 76 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 2b850de288cf..06499b9481a6 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -44,7 +44,9 @@
 #include <linux/jhash.h>
 #include <linux/string_helpers.h>
 #include <linux/fsnotify.h>
+#include <linux/rhashtable.h>
 #include <linux/nfs_ssc.h>
+
 #include "xdr4.h"
 #include "xdr4cb.h"
 #include "vfs.h"
@@ -84,6 +86,7 @@ static bool check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner)
 static void nfs4_free_ol_stateid(struct nfs4_stid *stid);
 void nfsd4_end_grace(struct nfsd_net *nn);
 static void _free_cpntf_state_locked(struct nfsd_net *nn, struct nfs4_cpntf_state *cps);
+static void unhash_nfs4_file(struct nfs4_file *fp);
 
 /* Locking: */
 
@@ -577,11 +580,8 @@ static void nfsd4_free_file_rcu(struct rcu_head *rcu)
 void
 put_nfs4_file(struct nfs4_file *fi)
 {
-	might_lock(&state_lock);
-
-	if (refcount_dec_and_lock(&fi->fi_ref, &state_lock)) {
-		hlist_del_rcu(&fi->fi_hash);
-		spin_unlock(&state_lock);
+	if (refcount_dec_and_test(&fi->fi_ref)) {
+		unhash_nfs4_file(fi);
 		WARN_ON_ONCE(!list_empty(&fi->fi_clnt_odstate));
 		WARN_ON_ONCE(!list_empty(&fi->fi_delegations));
 		call_rcu(&fi->fi_rcu, nfsd4_free_file_rcu);
@@ -695,19 +695,85 @@ static unsigned int ownerstr_hashval(struct xdr_netobj *ownername)
 	return ret & OWNER_HASH_MASK;
 }
 
-/* hash table for nfs4_file */
-#define FILE_HASH_BITS                   8
-#define FILE_HASH_SIZE                  (1 << FILE_HASH_BITS)
+static struct rhashtable nfs4_file_rhashtbl ____cacheline_aligned_in_smp;
 
-static unsigned int file_hashval(struct svc_fh *fh)
+/*
+ * The returned hash value is based solely on the address of an in-code
+ * inode, a pointer to a slab-allocated object. The entropy in such a
+ * pointer is concentrated in its middle bits.
+ */
+static u32 nfs4_file_inode_hash(const struct inode *inode, u32 seed)
+{
+	unsigned long ptr = (unsigned long)inode;
+	u32 k;
+
+	k = ptr >> L1_CACHE_SHIFT;
+	k &= 0x00ffffff;
+	return jhash2(&k, 1, seed);
+}
+
+/**
+ * nfs4_file_key_hashfn - Compute the hash value of a lookup key
+ * @data: key on which to compute the hash value
+ * @len: rhash table's key_len parameter (unused)
+ * @seed: rhash table's random seed of the day
+ *
+ * Return value:
+ *   Computed 32-bit hash value
+ */
+static u32 nfs4_file_key_hashfn(const void *data, u32 len, u32 seed)
 {
-	struct inode *inode = d_inode(fh->fh_dentry);
+	const struct svc_fh *fhp = data;
 
-	/* XXX: why not (here & in file cache) use inode? */
-	return (unsigned int)hash_long(inode->i_ino, FILE_HASH_BITS);
+	return nfs4_file_inode_hash(d_inode(fhp->fh_dentry), seed);
 }
 
-static struct hlist_head file_hashtbl[FILE_HASH_SIZE];
+/**
+ * nfs4_file_obj_hashfn - Compute the hash value of an nfs4_file object
+ * @data: object on which to compute the hash value
+ * @len: rhash table's key_len parameter (unused)
+ * @seed: rhash table's random seed of the day
+ *
+ * Return value:
+ *   Computed 32-bit hash value
+ */
+static u32 nfs4_file_obj_hashfn(const void *data, u32 len, u32 seed)
+{
+	const struct nfs4_file *fi = data;
+
+	return nfs4_file_inode_hash(fi->fi_inode, seed);
+}
+
+/**
+ * nfs4_file_obj_cmpfn - Match a cache item against search criteria
+ * @arg: search criteria
+ * @ptr: cache item to check
+ *
+ * Return values:
+ *   %0 - Item matches search criteria
+ *   %1 - Item does not match search criteria
+ */
+static int nfs4_file_obj_cmpfn(struct rhashtable_compare_arg *arg,
+			       const void *ptr)
+{
+	const struct svc_fh *fhp = arg->key;
+	const struct nfs4_file *fi = ptr;
+
+	return fh_match(&fi->fi_fhandle, &fhp->fh_handle) ? 0 : 1;
+}
+
+static const struct rhashtable_params nfs4_file_rhash_params = {
+	.key_len		= sizeof_field(struct nfs4_file, fi_inode),
+	.key_offset		= offsetof(struct nfs4_file, fi_inode),
+	.head_offset		= offsetof(struct nfs4_file, fi_rhash),
+	.hashfn			= nfs4_file_key_hashfn,
+	.obj_hashfn		= nfs4_file_obj_hashfn,
+	.obj_cmpfn		= nfs4_file_obj_cmpfn,
+
+	/* Reduce resizing churn on light workloads */
+	.min_size		= 512,		/* buckets */
+	.automatic_shrinking	= true,
+};
 
 /*
  * Check if courtesy clients have conflicting access and resolve it if possible
@@ -4251,11 +4317,8 @@ static struct nfs4_file *nfsd4_alloc_file(void)
 }
 
 /* OPEN Share state helper functions */
-static void nfsd4_init_file(struct svc_fh *fh, unsigned int hashval,
-				struct nfs4_file *fp)
+static void init_nfs4_file(const struct svc_fh *fh, struct nfs4_file *fp)
 {
-	lockdep_assert_held(&state_lock);
-
 	refcount_set(&fp->fi_ref, 1);
 	spin_lock_init(&fp->fi_lock);
 	INIT_LIST_HEAD(&fp->fi_stateids);
@@ -4273,7 +4336,6 @@ static void nfsd4_init_file(struct svc_fh *fh, unsigned int hashval,
 	INIT_LIST_HEAD(&fp->fi_lo_states);
 	atomic_set(&fp->fi_lo_recalls, 0);
 #endif
-	hlist_add_head_rcu(&fp->fi_hash, &file_hashtbl[hashval]);
 }
 
 void
@@ -4626,71 +4688,84 @@ move_to_close_lru(struct nfs4_ol_stateid *s, struct net *net)
 		nfs4_put_stid(&last->st_stid);
 }
 
-/* search file_hashtbl[] for file */
-static struct nfs4_file *
-find_file_locked(struct svc_fh *fh, unsigned int hashval)
+static struct nfs4_file *find_nfs4_file(const struct svc_fh *fhp)
 {
-	struct nfs4_file *fp;
+	struct nfs4_file *fi;
 
-	hlist_for_each_entry_rcu(fp, &file_hashtbl[hashval], fi_hash,
-				lockdep_is_held(&state_lock)) {
-		if (fh_match(&fp->fi_fhandle, &fh->fh_handle)) {
-			if (refcount_inc_not_zero(&fp->fi_ref))
-				return fp;
-		}
-	}
-	return NULL;
+	rcu_read_lock();
+	fi = rhashtable_lookup(&nfs4_file_rhashtbl, fhp,
+			       nfs4_file_rhash_params);
+	if (fi)
+		if (!refcount_inc_not_zero(&fi->fi_ref))
+			fi = NULL;
+	rcu_read_unlock();
+	return fi;
 }
 
-static struct nfs4_file *insert_file(struct nfs4_file *new, struct svc_fh *fh,
-				     unsigned int hashval)
+static void check_nfs4_file_aliases_locked(struct nfs4_file *new,
+					   const struct svc_fh *fhp)
 {
-	struct nfs4_file *fp;
-	struct nfs4_file *ret = NULL;
-	bool alias_found = false;
+	struct rhashtable *ht = &nfs4_file_rhashtbl;
+	struct rhash_lock_head __rcu *const *bkt;
+	struct rhashtable_compare_arg arg = {
+		.ht	= ht,
+		.key	= fhp,
+	};
+	struct bucket_table *tbl;
+	struct rhash_head *he;
+	unsigned int hash;
 
-	spin_lock(&state_lock);
-	hlist_for_each_entry_rcu(fp, &file_hashtbl[hashval], fi_hash,
-				 lockdep_is_held(&state_lock)) {
-		if (fh_match(&fp->fi_fhandle, &fh->fh_handle)) {
-			if (refcount_inc_not_zero(&fp->fi_ref))
-				ret = fp;
-		} else if (d_inode(fh->fh_dentry) == fp->fi_inode)
-			fp->fi_aliased = alias_found = true;
-	}
-	if (likely(ret == NULL)) {
-		nfsd4_init_file(fh, hashval, new);
-		new->fi_aliased = alias_found;
-		ret = new;
+	/*
+	 * rhashtable guarantees small buckets, thus this loop stays
+	 * efficient.
+	 */
+	rcu_read_lock();
+	tbl = rht_dereference_rcu(ht->tbl, ht);
+	hash = rht_key_hashfn(ht, tbl, fhp, nfs4_file_rhash_params);
+	bkt = rht_bucket(tbl, hash);
+	rht_for_each_rcu_from(he, rht_ptr_rcu(bkt), tbl, hash) {
+		struct nfs4_file *fi;
+
+		fi = rht_obj(ht, he);
+		if (nfs4_file_obj_cmpfn(&arg, fi) == 0)
+			continue;
+		if (d_inode(fhp->fh_dentry) == fi->fi_inode) {
+			fi->fi_aliased = true;
+			new->fi_aliased = true;
+		}
 	}
-	spin_unlock(&state_lock);
-	return ret;
+	rcu_read_unlock();
 }
 
-static struct nfs4_file * find_file(struct svc_fh *fh)
+static noinline struct nfs4_file *
+find_or_hash_nfs4_file(struct nfs4_file *new, const struct svc_fh *fhp)
 {
-	struct nfs4_file *fp;
-	unsigned int hashval = file_hashval(fh);
+	struct nfs4_file *fi;
 
-	rcu_read_lock();
-	fp = find_file_locked(fh, hashval);
-	rcu_read_unlock();
-	return fp;
-}
+	init_nfs4_file(fhp, new);
 
-static struct nfs4_file *
-find_or_add_file(struct nfs4_file *new, struct svc_fh *fh)
-{
-	struct nfs4_file *fp;
-	unsigned int hashval = file_hashval(fh);
+	fi = rhashtable_lookup_get_insert_key(&nfs4_file_rhashtbl,
+					      fhp, &new->fi_rhash,
+					      nfs4_file_rhash_params);
+	if (!fi) {
+		fi = new;
+		goto check_aliases;
+	}
+	if (IS_ERR(fi))		/* or BUG? */
+		return NULL;
+	if (!refcount_inc_not_zero(&fi->fi_ref))
+		fi = new;
 
-	rcu_read_lock();
-	fp = find_file_locked(fh, hashval);
-	rcu_read_unlock();
-	if (fp)
-		return fp;
+check_aliases:
+	check_nfs4_file_aliases_locked(fi, fhp);
+
+	return fi;
+}
 
-	return insert_file(new, fh, hashval);
+static void unhash_nfs4_file(struct nfs4_file *fi)
+{
+	rhashtable_remove_fast(&nfs4_file_rhashtbl, &fi->fi_rhash,
+			       nfs4_file_rhash_params);
 }
 
 /*
@@ -4703,9 +4778,10 @@ nfs4_share_conflict(struct svc_fh *current_fh, unsigned int deny_type)
 	struct nfs4_file *fp;
 	__be32 ret = nfs_ok;
 
-	fp = find_file(current_fh);
+	fp = find_nfs4_file(current_fh);
 	if (!fp)
 		return ret;
+
 	/* Check for conflicting share reservations */
 	spin_lock(&fp->fi_lock);
 	if (fp->fi_share_deny & deny_type)
@@ -5548,7 +5624,9 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf
 	 * and check for delegations in the process of being recalled.
 	 * If not found, create the nfs4_file struct
 	 */
-	fp = find_or_add_file(open->op_file, current_fh);
+	fp = find_or_hash_nfs4_file(open->op_file, current_fh);
+	if (unlikely(!fp))
+		return nfserr_jukebox;
 	if (fp != open->op_file) {
 		status = nfs4_check_deleg(cl, open, &dp);
 		if (status)
@@ -7905,10 +7983,16 @@ nfs4_state_start(void)
 {
 	int ret;
 
-	ret = nfsd4_create_callback_queue();
+	ret = rhashtable_init(&nfs4_file_rhashtbl, &nfs4_file_rhash_params);
 	if (ret)
 		return ret;
 
+	ret = nfsd4_create_callback_queue();
+	if (ret) {
+		rhashtable_destroy(&nfs4_file_rhashtbl);
+		return ret;
+	}
+
 	set_max_delegations();
 	return 0;
 }
@@ -7939,6 +8023,7 @@ nfs4_state_shutdown_net(struct net *net)
 
 	nfsd4_client_tracking_exit(net);
 	nfs4_state_destroy_net(net);
+	rhashtable_destroy(&nfs4_file_rhashtbl);
 #ifdef CONFIG_NFSD_V4_2_INTER_SSC
 	nfsd4_ssc_shutdown_umount(nn);
 #endif
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index ae596dbf8667..879f085bc39e 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -536,16 +536,13 @@ struct nfs4_clnt_odstate {
  * inode can have multiple filehandles associated with it, so there is
  * (potentially) a many to one relationship between this struct and struct
  * inode.
- *
- * These are hashed by filehandle in the file_hashtbl, which is protected by
- * the global state_lock spinlock.
  */
 struct nfs4_file {
 	refcount_t		fi_ref;
 	struct inode *		fi_inode;
 	bool			fi_aliased;
 	spinlock_t		fi_lock;
-	struct hlist_node       fi_hash;	/* hash on fi_fhandle */
+	struct rhash_head	fi_rhash;
 	struct list_head        fi_stateids;
 	union {
 		struct list_head	fi_delegations;



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC 8/9] NFSD: Clean up nfs4_preprocess_stateid_op() call sites
  2022-10-05 14:55 [PATCH RFC 0/9] A course adjustment, maybe Chuck Lever
                   ` (6 preceding siblings ...)
  2022-10-05 14:56 ` [PATCH RFC 7/9] NFSD: Use rhashtable for managing nfs4_file objects Chuck Lever
@ 2022-10-05 14:56 ` Chuck Lever
  2022-10-06 16:05   ` Jeff Layton
  2022-10-05 14:56 ` [PATCH RFC 9/9] NFSD: Trace delegation revocations Chuck Lever
  8 siblings, 1 reply; 18+ messages in thread
From: Chuck Lever @ 2022-10-05 14:56 UTC (permalink / raw)
  To: linux-nfs

Remove the lame-duck dprintk()s around nfs4_preprocess_stateid_op()
call sites.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/nfs4proc.c |   31 +++++++------------------------
 1 file changed, 7 insertions(+), 24 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index d0d976f847ca..42b81e88ea14 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -951,12 +951,7 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
 					&read->rd_stateid, RD_STATE,
 					&read->rd_nf, NULL);
-	if (status) {
-		dprintk("NFSD: nfsd4_read: couldn't process stateid!\n");
-		goto out;
-	}
-	status = nfs_ok;
-out:
+
 	read->rd_rqstp = rqstp;
 	read->rd_fhp = &cstate->current_fh;
 	return status;
@@ -1125,10 +1120,8 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 		status = nfs4_preprocess_stateid_op(rqstp, cstate,
 				&cstate->current_fh, &setattr->sa_stateid,
 				WR_STATE, NULL, NULL);
-		if (status) {
-			dprintk("NFSD: nfsd4_setattr: couldn't process stateid!\n");
+		if (status)
 			return status;
-		}
 	}
 	err = fh_want_write(&cstate->current_fh);
 	if (err)
@@ -1176,10 +1169,8 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 			       write->wr_offset, cnt);
 	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
 						stateid, WR_STATE, &nf, NULL);
-	if (status) {
-		dprintk("NFSD: nfsd4_write: couldn't process stateid!\n");
+	if (status)
 		return status;
-	}
 
 	write->wr_how_written = write->wr_stable_how;
 
@@ -1210,17 +1201,13 @@ nfsd4_verify_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 
 	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->save_fh,
 					    src_stateid, RD_STATE, src, NULL);
-	if (status) {
-		dprintk("NFSD: %s: couldn't process src stateid!\n", __func__);
+	if (status)
 		goto out;
-	}
 
 	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
 					    dst_stateid, WR_STATE, dst, NULL);
-	if (status) {
-		dprintk("NFSD: %s: couldn't process dst stateid!\n", __func__);
+	if (status)
 		goto out_put_src;
-	}
 
 	/* fix up for NFS-specific error code */
 	if (!S_ISREG(file_inode((*src)->nf_file)->i_mode) ||
@@ -1955,10 +1942,8 @@ nfsd4_fallocate(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
 					    &fallocate->falloc_stateid,
 					    WR_STATE, &nf, NULL);
-	if (status != nfs_ok) {
-		dprintk("NFSD: nfsd4_fallocate: couldn't process stateid!\n");
+	if (status != nfs_ok)
 		return status;
-	}
 
 	status = nfsd4_vfs_fallocate(rqstp, &cstate->current_fh, nf->nf_file,
 				     fallocate->falloc_offset,
@@ -2014,10 +1999,8 @@ nfsd4_seek(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
 					    &seek->seek_stateid,
 					    RD_STATE, &nf, NULL);
-	if (status) {
-		dprintk("NFSD: nfsd4_seek: couldn't process stateid!\n");
+	if (status)
 		return status;
-	}
 
 	switch (seek->seek_whence) {
 	case NFS4_CONTENT_DATA:



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC 9/9] NFSD: Trace delegation revocations
  2022-10-05 14:55 [PATCH RFC 0/9] A course adjustment, maybe Chuck Lever
                   ` (7 preceding siblings ...)
  2022-10-05 14:56 ` [PATCH RFC 8/9] NFSD: Clean up nfs4_preprocess_stateid_op() call sites Chuck Lever
@ 2022-10-05 14:56 ` Chuck Lever
  2022-10-06 16:06   ` Jeff Layton
  8 siblings, 1 reply; 18+ messages in thread
From: Chuck Lever @ 2022-10-05 14:56 UTC (permalink / raw)
  To: linux-nfs

Revocation is an exceptional event. Generate a trace record when it
occurs so that other activity can be triggered.

Example:

nfsd-1104  [005]  1912.002544: nfsd_stid_revoke:        client 633c9343:4e82788d stateid 00000003:00000001 ref=2 type=DELEG

Trace infrastructure is provided for subsequent additional tracing
related to nfs4_stid activity.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/nfs4state.c |    2 ++
 fs/nfsd/trace.h     |   55 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 06499b9481a6..349302afa7eb 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1404,6 +1404,8 @@ static void revoke_delegation(struct nfs4_delegation *dp)
 
 	WARN_ON(!list_empty(&dp->dl_recall_lru));
 
+	trace_nfsd_stid_revoke(&dp->dl_stid);
+
 	if (clp->cl_minorversion) {
 		dp->dl_stid.sc_type = NFS4_REVOKED_DELEG_STID;
 		refcount_inc(&dp->dl_stid.sc_count);
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index 4921144880d3..23fb39c957af 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -561,6 +561,61 @@ DEFINE_EVENT(nfsd_stateseqid_class, nfsd_##name, \
 DEFINE_STATESEQID_EVENT(preprocess);
 DEFINE_STATESEQID_EVENT(open_confirm);
 
+TRACE_DEFINE_ENUM(NFS4_OPEN_STID);
+TRACE_DEFINE_ENUM(NFS4_LOCK_STID);
+TRACE_DEFINE_ENUM(NFS4_DELEG_STID);
+TRACE_DEFINE_ENUM(NFS4_CLOSED_STID);
+TRACE_DEFINE_ENUM(NFS4_REVOKED_DELEG_STID);
+TRACE_DEFINE_ENUM(NFS4_CLOSED_DELEG_STID);
+TRACE_DEFINE_ENUM(NFS4_LAYOUT_STID);
+
+#define show_stid_type(x)						\
+	__print_flags(x, "|",						\
+		{ NFS4_OPEN_STID,		"OPEN" },		\
+		{ NFS4_LOCK_STID,		"LOCK" },		\
+		{ NFS4_DELEG_STID,		"DELEG" },		\
+		{ NFS4_CLOSED_STID,		"CLOSED" },		\
+		{ NFS4_REVOKED_DELEG_STID,	"REVOKED" },		\
+		{ NFS4_CLOSED_DELEG_STID,	"CLOSED_DELEG" },	\
+		{ NFS4_LAYOUT_STID,		"LAYOUT" })
+
+DECLARE_EVENT_CLASS(nfsd_stid_class,
+	TP_PROTO(
+		const struct nfs4_stid *stid
+	),
+	TP_ARGS(stid),
+	TP_STRUCT__entry(
+		__field(unsigned long, sc_type)
+		__field(int, sc_count)
+		__field(u32, cl_boot)
+		__field(u32, cl_id)
+		__field(u32, si_id)
+		__field(u32, si_generation)
+	),
+	TP_fast_assign(
+		const stateid_t *stp = &stid->sc_stateid;
+
+		__entry->sc_type = stid->sc_type;
+		__entry->sc_count = refcount_read(&stid->sc_count);
+		__entry->cl_boot = stp->si_opaque.so_clid.cl_boot;
+		__entry->cl_id = stp->si_opaque.so_clid.cl_id;
+		__entry->si_id = stp->si_opaque.so_id;
+		__entry->si_generation = stp->si_generation;
+	),
+	TP_printk("client %08x:%08x stateid %08x:%08x ref=%d type=%s",
+		__entry->cl_boot, __entry->cl_id,
+		__entry->si_id, __entry->si_generation,
+		__entry->sc_count, show_stid_type(__entry->sc_type)
+	)
+);
+
+#define DEFINE_STID_EVENT(name)					\
+DEFINE_EVENT(nfsd_stid_class, nfsd_stid_##name,			\
+	TP_PROTO(const struct nfs4_stid *stid),			\
+	TP_ARGS(stid))
+
+DEFINE_STID_EVENT(revoke);
+
 DECLARE_EVENT_CLASS(nfsd_clientid_class,
 	TP_PROTO(const clientid_t *clid),
 	TP_ARGS(clid),



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 7/9] NFSD: Use rhashtable for managing nfs4_file objects
  2022-10-05 14:56 ` [PATCH RFC 7/9] NFSD: Use rhashtable for managing nfs4_file objects Chuck Lever
@ 2022-10-05 15:11   ` Chuck Lever III
  2022-10-06 16:12     ` Jeff Layton
  0 siblings, 1 reply; 18+ messages in thread
From: Chuck Lever III @ 2022-10-05 15:11 UTC (permalink / raw)
  To: Linux NFS Mailing List



> On Oct 5, 2022, at 10:56 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
> 
> fh_match() is expensive to use for hash chains that contain more
> than a few objects. With common workloads, I see multiple thousands
> of objects stored in file_hashtbl[], which always has only 256
> buckets.
> 
> Replace it with an rhashtable, which dynamically resizes its bucket
> array to keep hash chains short.
> 
> This also enables the removal of the use of state_lock to serialize
> operations on the new rhashtable.
> 
> The result is an improvement in the latency of NFSv4 operations
> and the reduction of nfsd CPU utilization due to the cache misses
> of walking long hash chains in file_hashtbl.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> fs/nfsd/nfs4state.c |  229 +++++++++++++++++++++++++++++++++++----------------
> fs/nfsd/state.h     |    5 -
> 2 files changed, 158 insertions(+), 76 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 2b850de288cf..06499b9481a6 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -44,7 +44,9 @@
> #include <linux/jhash.h>
> #include <linux/string_helpers.h>
> #include <linux/fsnotify.h>
> +#include <linux/rhashtable.h>
> #include <linux/nfs_ssc.h>
> +
> #include "xdr4.h"
> #include "xdr4cb.h"
> #include "vfs.h"
> @@ -84,6 +86,7 @@ static bool check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner)
> static void nfs4_free_ol_stateid(struct nfs4_stid *stid);
> void nfsd4_end_grace(struct nfsd_net *nn);
> static void _free_cpntf_state_locked(struct nfsd_net *nn, struct nfs4_cpntf_state *cps);
> +static void unhash_nfs4_file(struct nfs4_file *fp);
> 
> /* Locking: */
> 
> @@ -577,11 +580,8 @@ static void nfsd4_free_file_rcu(struct rcu_head *rcu)
> void
> put_nfs4_file(struct nfs4_file *fi)
> {
> -	might_lock(&state_lock);
> -
> -	if (refcount_dec_and_lock(&fi->fi_ref, &state_lock)) {
> -		hlist_del_rcu(&fi->fi_hash);
> -		spin_unlock(&state_lock);
> +	if (refcount_dec_and_test(&fi->fi_ref)) {
> +		unhash_nfs4_file(fi);
> 		WARN_ON_ONCE(!list_empty(&fi->fi_clnt_odstate));
> 		WARN_ON_ONCE(!list_empty(&fi->fi_delegations));
> 		call_rcu(&fi->fi_rcu, nfsd4_free_file_rcu);
> @@ -695,19 +695,85 @@ static unsigned int ownerstr_hashval(struct xdr_netobj *ownername)
> 	return ret & OWNER_HASH_MASK;
> }
> 
> -/* hash table for nfs4_file */
> -#define FILE_HASH_BITS                   8
> -#define FILE_HASH_SIZE                  (1 << FILE_HASH_BITS)
> +static struct rhashtable nfs4_file_rhashtbl ____cacheline_aligned_in_smp;
> 
> -static unsigned int file_hashval(struct svc_fh *fh)
> +/*
> + * The returned hash value is based solely on the address of an in-code
> + * inode, a pointer to a slab-allocated object. The entropy in such a
> + * pointer is concentrated in its middle bits.
> + */
> +static u32 nfs4_file_inode_hash(const struct inode *inode, u32 seed)
> +{
> +	unsigned long ptr = (unsigned long)inode;
> +	u32 k;
> +
> +	k = ptr >> L1_CACHE_SHIFT;
> +	k &= 0x00ffffff;
> +	return jhash2(&k, 1, seed);
> +}
> +
> +/**
> + * nfs4_file_key_hashfn - Compute the hash value of a lookup key
> + * @data: key on which to compute the hash value
> + * @len: rhash table's key_len parameter (unused)
> + * @seed: rhash table's random seed of the day
> + *
> + * Return value:
> + *   Computed 32-bit hash value
> + */
> +static u32 nfs4_file_key_hashfn(const void *data, u32 len, u32 seed)
> {
> -	struct inode *inode = d_inode(fh->fh_dentry);
> +	const struct svc_fh *fhp = data;
> 
> -	/* XXX: why not (here & in file cache) use inode? */
> -	return (unsigned int)hash_long(inode->i_ino, FILE_HASH_BITS);
> +	return nfs4_file_inode_hash(d_inode(fhp->fh_dentry), seed);
> }
> 
> -static struct hlist_head file_hashtbl[FILE_HASH_SIZE];
> +/**
> + * nfs4_file_obj_hashfn - Compute the hash value of an nfs4_file object
> + * @data: object on which to compute the hash value
> + * @len: rhash table's key_len parameter (unused)
> + * @seed: rhash table's random seed of the day
> + *
> + * Return value:
> + *   Computed 32-bit hash value
> + */
> +static u32 nfs4_file_obj_hashfn(const void *data, u32 len, u32 seed)
> +{
> +	const struct nfs4_file *fi = data;
> +
> +	return nfs4_file_inode_hash(fi->fi_inode, seed);
> +}
> +
> +/**
> + * nfs4_file_obj_cmpfn - Match a cache item against search criteria
> + * @arg: search criteria
> + * @ptr: cache item to check
> + *
> + * Return values:
> + *   %0 - Item matches search criteria
> + *   %1 - Item does not match search criteria
> + */
> +static int nfs4_file_obj_cmpfn(struct rhashtable_compare_arg *arg,
> +			       const void *ptr)
> +{
> +	const struct svc_fh *fhp = arg->key;
> +	const struct nfs4_file *fi = ptr;
> +
> +	return fh_match(&fi->fi_fhandle, &fhp->fh_handle) ? 0 : 1;
> +}
> +
> +static const struct rhashtable_params nfs4_file_rhash_params = {
> +	.key_len		= sizeof_field(struct nfs4_file, fi_inode),
> +	.key_offset		= offsetof(struct nfs4_file, fi_inode),
> +	.head_offset		= offsetof(struct nfs4_file, fi_rhash),
> +	.hashfn			= nfs4_file_key_hashfn,
> +	.obj_hashfn		= nfs4_file_obj_hashfn,
> +	.obj_cmpfn		= nfs4_file_obj_cmpfn,
> +
> +	/* Reduce resizing churn on light workloads */
> +	.min_size		= 512,		/* buckets */
> +	.automatic_shrinking	= true,
> +};
> 
> /*
>  * Check if courtesy clients have conflicting access and resolve it if possible
> @@ -4251,11 +4317,8 @@ static struct nfs4_file *nfsd4_alloc_file(void)
> }
> 
> /* OPEN Share state helper functions */
> -static void nfsd4_init_file(struct svc_fh *fh, unsigned int hashval,
> -				struct nfs4_file *fp)
> +static void init_nfs4_file(const struct svc_fh *fh, struct nfs4_file *fp)
> {
> -	lockdep_assert_held(&state_lock);
> -
> 	refcount_set(&fp->fi_ref, 1);
> 	spin_lock_init(&fp->fi_lock);
> 	INIT_LIST_HEAD(&fp->fi_stateids);
> @@ -4273,7 +4336,6 @@ static void nfsd4_init_file(struct svc_fh *fh, unsigned int hashval,
> 	INIT_LIST_HEAD(&fp->fi_lo_states);
> 	atomic_set(&fp->fi_lo_recalls, 0);
> #endif
> -	hlist_add_head_rcu(&fp->fi_hash, &file_hashtbl[hashval]);
> }
> 
> void
> @@ -4626,71 +4688,84 @@ move_to_close_lru(struct nfs4_ol_stateid *s, struct net *net)
> 		nfs4_put_stid(&last->st_stid);
> }
> 
> -/* search file_hashtbl[] for file */
> -static struct nfs4_file *
> -find_file_locked(struct svc_fh *fh, unsigned int hashval)
> +static struct nfs4_file *find_nfs4_file(const struct svc_fh *fhp)
> {
> -	struct nfs4_file *fp;
> +	struct nfs4_file *fi;
> 
> -	hlist_for_each_entry_rcu(fp, &file_hashtbl[hashval], fi_hash,
> -				lockdep_is_held(&state_lock)) {
> -		if (fh_match(&fp->fi_fhandle, &fh->fh_handle)) {
> -			if (refcount_inc_not_zero(&fp->fi_ref))
> -				return fp;
> -		}
> -	}
> -	return NULL;
> +	rcu_read_lock();
> +	fi = rhashtable_lookup(&nfs4_file_rhashtbl, fhp,
> +			       nfs4_file_rhash_params);
> +	if (fi)
> +		if (!refcount_inc_not_zero(&fi->fi_ref))
> +			fi = NULL;
> +	rcu_read_unlock();
> +	return fi;
> }
> 
> -static struct nfs4_file *insert_file(struct nfs4_file *new, struct svc_fh *fh,
> -				     unsigned int hashval)
> +static void check_nfs4_file_aliases_locked(struct nfs4_file *new,
> +					   const struct svc_fh *fhp)
> {
> -	struct nfs4_file *fp;
> -	struct nfs4_file *ret = NULL;
> -	bool alias_found = false;
> +	struct rhashtable *ht = &nfs4_file_rhashtbl;
> +	struct rhash_lock_head __rcu *const *bkt;
> +	struct rhashtable_compare_arg arg = {
> +		.ht	= ht,
> +		.key	= fhp,
> +	};
> +	struct bucket_table *tbl;
> +	struct rhash_head *he;
> +	unsigned int hash;
> 
> -	spin_lock(&state_lock);
> -	hlist_for_each_entry_rcu(fp, &file_hashtbl[hashval], fi_hash,
> -				 lockdep_is_held(&state_lock)) {
> -		if (fh_match(&fp->fi_fhandle, &fh->fh_handle)) {
> -			if (refcount_inc_not_zero(&fp->fi_ref))
> -				ret = fp;
> -		} else if (d_inode(fh->fh_dentry) == fp->fi_inode)
> -			fp->fi_aliased = alias_found = true;
> -	}
> -	if (likely(ret == NULL)) {
> -		nfsd4_init_file(fh, hashval, new);
> -		new->fi_aliased = alias_found;
> -		ret = new;
> +	/*
> +	 * rhashtable guarantees small buckets, thus this loop stays
> +	 * efficient.
> +	 */
> +	rcu_read_lock();
> +	tbl = rht_dereference_rcu(ht->tbl, ht);
> +	hash = rht_key_hashfn(ht, tbl, fhp, nfs4_file_rhash_params);
> +	bkt = rht_bucket(tbl, hash);
> +	rht_for_each_rcu_from(he, rht_ptr_rcu(bkt), tbl, hash) {
> +		struct nfs4_file *fi;
> +
> +		fi = rht_obj(ht, he);
> +		if (nfs4_file_obj_cmpfn(&arg, fi) == 0)
> +			continue;
> +		if (d_inode(fhp->fh_dentry) == fi->fi_inode) {
> +			fi->fi_aliased = true;
> +			new->fi_aliased = true;
> +		}
> 	}
> -	spin_unlock(&state_lock);
> -	return ret;
> +	rcu_read_unlock();
> }
> 
> -static struct nfs4_file * find_file(struct svc_fh *fh)
> +static noinline struct nfs4_file *
> +find_or_hash_nfs4_file(struct nfs4_file *new, const struct svc_fh *fhp)
> {
> -	struct nfs4_file *fp;
> -	unsigned int hashval = file_hashval(fh);
> +	struct nfs4_file *fi;
> 
> -	rcu_read_lock();
> -	fp = find_file_locked(fh, hashval);
> -	rcu_read_unlock();
> -	return fp;
> -}
> +	init_nfs4_file(fhp, new);
> 
> -static struct nfs4_file *
> -find_or_add_file(struct nfs4_file *new, struct svc_fh *fh)
> -{
> -	struct nfs4_file *fp;
> -	unsigned int hashval = file_hashval(fh);
> +	fi = rhashtable_lookup_get_insert_key(&nfs4_file_rhashtbl,
> +					      fhp, &new->fi_rhash,
> +					      nfs4_file_rhash_params);
> +	if (!fi) {
> +		fi = new;
> +		goto check_aliases;
> +	}
> +	if (IS_ERR(fi))		/* or BUG? */
> +		return NULL;
> +	if (!refcount_inc_not_zero(&fi->fi_ref))
> +		fi = new;

Ah, hrm. Given what we just had to do to nfsd_file_do_acquire(),
maybe this needs the same fix to hang onto the RCU read lock
while dicking with the nfs4_file object's reference count?


> -	rcu_read_lock();
> -	fp = find_file_locked(fh, hashval);
> -	rcu_read_unlock();
> -	if (fp)
> -		return fp;
> +check_aliases:
> +	check_nfs4_file_aliases_locked(fi, fhp);
> +
> +	return fi;
> +}
> 
> -	return insert_file(new, fh, hashval);
> +static void unhash_nfs4_file(struct nfs4_file *fi)
> +{
> +	rhashtable_remove_fast(&nfs4_file_rhashtbl, &fi->fi_rhash,
> +			       nfs4_file_rhash_params);
> }
> 
> /*
> @@ -4703,9 +4778,10 @@ nfs4_share_conflict(struct svc_fh *current_fh, unsigned int deny_type)
> 	struct nfs4_file *fp;
> 	__be32 ret = nfs_ok;
> 
> -	fp = find_file(current_fh);
> +	fp = find_nfs4_file(current_fh);
> 	if (!fp)
> 		return ret;
> +
> 	/* Check for conflicting share reservations */
> 	spin_lock(&fp->fi_lock);
> 	if (fp->fi_share_deny & deny_type)
> @@ -5548,7 +5624,9 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf
> 	 * and check for delegations in the process of being recalled.
> 	 * If not found, create the nfs4_file struct
> 	 */
> -	fp = find_or_add_file(open->op_file, current_fh);
> +	fp = find_or_hash_nfs4_file(open->op_file, current_fh);
> +	if (unlikely(!fp))
> +		return nfserr_jukebox;
> 	if (fp != open->op_file) {
> 		status = nfs4_check_deleg(cl, open, &dp);
> 		if (status)
> @@ -7905,10 +7983,16 @@ nfs4_state_start(void)
> {
> 	int ret;
> 
> -	ret = nfsd4_create_callback_queue();
> +	ret = rhashtable_init(&nfs4_file_rhashtbl, &nfs4_file_rhash_params);
> 	if (ret)
> 		return ret;
> 
> +	ret = nfsd4_create_callback_queue();
> +	if (ret) {
> +		rhashtable_destroy(&nfs4_file_rhashtbl);
> +		return ret;
> +	}
> +
> 	set_max_delegations();
> 	return 0;
> }
> @@ -7939,6 +8023,7 @@ nfs4_state_shutdown_net(struct net *net)
> 
> 	nfsd4_client_tracking_exit(net);
> 	nfs4_state_destroy_net(net);
> +	rhashtable_destroy(&nfs4_file_rhashtbl);
> #ifdef CONFIG_NFSD_V4_2_INTER_SSC
> 	nfsd4_ssc_shutdown_umount(nn);
> #endif
> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
> index ae596dbf8667..879f085bc39e 100644
> --- a/fs/nfsd/state.h
> +++ b/fs/nfsd/state.h
> @@ -536,16 +536,13 @@ struct nfs4_clnt_odstate {
>  * inode can have multiple filehandles associated with it, so there is
>  * (potentially) a many to one relationship between this struct and struct
>  * inode.
> - *
> - * These are hashed by filehandle in the file_hashtbl, which is protected by
> - * the global state_lock spinlock.
>  */
> struct nfs4_file {
> 	refcount_t		fi_ref;
> 	struct inode *		fi_inode;
> 	bool			fi_aliased;
> 	spinlock_t		fi_lock;
> -	struct hlist_node       fi_hash;	/* hash on fi_fhandle */
> +	struct rhash_head	fi_rhash;
> 	struct list_head        fi_stateids;
> 	union {
> 		struct list_head	fi_delegations;
> 
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 5/9] NFSD: Add an NFSD_FILE_GC flag to enable nfsd_file garbage collection
  2022-10-05 14:56 ` [PATCH RFC 5/9] NFSD: Add an NFSD_FILE_GC flag to enable nfsd_file garbage collection Chuck Lever
@ 2022-10-06 15:59   ` Jeff Layton
  2022-10-06 16:06     ` Chuck Lever III
  0 siblings, 1 reply; 18+ messages in thread
From: Jeff Layton @ 2022-10-06 15:59 UTC (permalink / raw)
  To: Chuck Lever, linux-nfs

On Wed, 2022-10-05 at 10:56 -0400, Chuck Lever wrote:
> NFSv4 operations manage the lifetime of nfsd_file items they use by
> means of NFSv4 OPEN and CLOSE. Hence there's no need for them to be
> garbage collected.
> 
> Introduce a mechanism to enable garbage collection for nfsd_file
> items used only by NFSv2/3 callers.
> 
> Note that the change in nfsd_file_put() ensures that both CLOSE and
> DELEGRETURN will actually close out and free an nfsd_file on last
> reference of a non-garbage-collected file.
> 
> Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=394
> Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  fs/nfsd/filecache.c |   61 +++++++++++++++++++++++++++++++++++++++++++++------
>  fs/nfsd/filecache.h |    3 +++
>  fs/nfsd/nfs3proc.c  |    4 ++-
>  fs/nfsd/trace.h     |    3 ++-
>  fs/nfsd/vfs.c       |    4 ++-
>  5 files changed, 63 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index b7aa523c2010..01c27deabf83 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -63,6 +63,7 @@ struct nfsd_file_lookup_key {
>  	struct net			*net;
>  	const struct cred		*cred;
>  	unsigned char			need;
> +	unsigned char			gc:1;
>  	enum nfsd_file_lookup_type	type;
>  };
>  
> @@ -162,6 +163,8 @@ static int nfsd_file_obj_cmpfn(struct rhashtable_compare_arg *arg,
>  			return 1;
>  		if (!nfsd_match_cred(nf->nf_cred, key->cred))
>  			return 1;
> +		if (test_bit(NFSD_FILE_GC, &nf->nf_flags) != key->gc)
> +			return 1;
>  		if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0)
>  			return 1;
>  		break;
> @@ -297,6 +300,8 @@ nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may)
>  		nf->nf_flags = 0;
>  		__set_bit(NFSD_FILE_HASHED, &nf->nf_flags);
>  		__set_bit(NFSD_FILE_PENDING, &nf->nf_flags);
> +		if (key->gc)
> +			__set_bit(NFSD_FILE_GC, &nf->nf_flags);
>  		nf->nf_inode = key->inode;
>  		/* nf_ref is pre-incremented for hash table */
>  		refcount_set(&nf->nf_ref, 2);
> @@ -428,16 +433,27 @@ nfsd_file_put_noref(struct nfsd_file *nf)
>  	}
>  }
>  
> +static void
> +nfsd_file_unhash_and_put(struct nfsd_file *nf)
> +{
> +	if (nfsd_file_unhash(nf))
> +		nfsd_file_put_noref(nf);
> +}
> +
>  void
>  nfsd_file_put(struct nfsd_file *nf)
>  {
>  	might_sleep();
>  
> -	nfsd_file_lru_add(nf);
> +	if (test_bit(NFSD_FILE_GC, &nf->nf_flags) == 1)
> +		nfsd_file_lru_add(nf);
> +	else if (refcount_read(&nf->nf_ref) == 2)
> +		nfsd_file_unhash_and_put(nf);
> +
>  	if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) {
>  		nfsd_file_flush(nf);
>  		nfsd_file_put_noref(nf);
> -	} else if (nf->nf_file) {
> +	} else if (nf->nf_file && test_bit(NFSD_FILE_GC, &nf->nf_flags) == 1) {
>  		nfsd_file_put_noref(nf);
>  		nfsd_file_schedule_laundrette();
>  	} else
> @@ -1017,12 +1033,14 @@ nfsd_file_is_cached(struct inode *inode)
>  
>  static __be32
>  nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
> -		     unsigned int may_flags, struct nfsd_file **pnf, bool open)
> +		     unsigned int may_flags, struct nfsd_file **pnf,
> +		     bool open, int want_gc)

want_gc should be a bool

>  {
>  	struct nfsd_file_lookup_key key = {
>  		.type	= NFSD_FILE_KEY_FULL,
>  		.need	= may_flags & NFSD_FILE_MAY_MASK,
>  		.net	= SVC_NET(rqstp),
> +		.gc	= want_gc,
>  	};
>  	bool open_retry = true;
>  	struct nfsd_file *nf;
> @@ -1117,14 +1135,35 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	 * then unhash.
>  	 */
>  	if (status != nfs_ok || key.inode->i_nlink == 0)
> -		if (nfsd_file_unhash(nf))
> -			nfsd_file_put_noref(nf);
> +		nfsd_file_unhash_and_put(nf);
>  	clear_bit_unlock(NFSD_FILE_PENDING, &nf->nf_flags);
>  	smp_mb__after_atomic();
>  	wake_up_bit(&nf->nf_flags, NFSD_FILE_PENDING);
>  	goto out;
>  }
>  
> +/**
> + * nfsd_file_acquire_gc - Get a struct nfsd_file with an open file
> + * @rqstp: the RPC transaction being executed
> + * @fhp: the NFS filehandle of the file to be opened
> + * @may_flags: NFSD_MAY_ settings for the file
> + * @pnf: OUT: new or found "struct nfsd_file" object
> + *
> + * The nfsd_file object returned by this API is reference-counted
> + * and garbage-collected. The object is retained for a few
> + * seconds after the final nfsd_file_put() in case the caller
> + * wants to re-use it.
> + *
> + * Returns nfs_ok and sets @pnf on success; otherwise an nfsstat in
> + * network byte order is returned.
> + */
> +__be32
> +nfsd_file_acquire_gc(struct svc_rqst *rqstp, struct svc_fh *fhp,
> +		     unsigned int may_flags, struct nfsd_file **pnf)
> +{
> +	return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, true, 1);
> +}
> +
>  /**
>   * nfsd_file_acquire - Get a struct nfsd_file with an open file
>   * @rqstp: the RPC transaction being executed
> @@ -1132,6 +1171,10 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
>   * @may_flags: NFSD_MAY_ settings for the file
>   * @pnf: OUT: new or found "struct nfsd_file" object
>   *
> + * The nfsd_file_object returned by this API is reference-counted
> + * but not garbage-collected. The object is released one RCU grace
> + * period after the final nfsd_file_put().
> + *
>   * Returns nfs_ok and sets @pnf on success; otherwise an nfsstat in
>   * network byte order is returned.
>   */
> @@ -1139,7 +1182,7 @@ __be32
>  nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  		  unsigned int may_flags, struct nfsd_file **pnf)
>  {
> -	return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, true);
> +	return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, true, 0);
>  }
>  
>  /**
> @@ -1149,6 +1192,10 @@ nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
>   * @may_flags: NFSD_MAY_ settings for the file
>   * @pnf: OUT: new or found "struct nfsd_file" object
>   *
> + * The nfsd_file_object returned by this API is reference-counted
> + * but not garbage-collected. The object is released immediately
> + * one RCU grace period after the final nfsd_file_put().
> + *
>   * Returns nfs_ok and sets @pnf on success; otherwise an nfsstat in
>   * network byte order is returned.
>   */
> @@ -1156,7 +1203,7 @@ __be32
>  nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  		 unsigned int may_flags, struct nfsd_file **pnf)
>  {
> -	return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, false);
> +	return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, false, 0);
>  }
>  
>  /*
> diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
> index f81c198f4ed6..0f6546bcd3e0 100644
> --- a/fs/nfsd/filecache.h
> +++ b/fs/nfsd/filecache.h
> @@ -38,6 +38,7 @@ struct nfsd_file {
>  #define NFSD_FILE_HASHED	(0)
>  #define NFSD_FILE_PENDING	(1)
>  #define NFSD_FILE_REFERENCED	(2)
> +#define NFSD_FILE_GC		(3)
>  	unsigned long		nf_flags;
>  	struct inode		*nf_inode;	/* don't deref */
>  	refcount_t		nf_ref;
> @@ -55,6 +56,8 @@ void nfsd_file_put(struct nfsd_file *nf);
>  struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
>  void nfsd_file_close_inode_sync(struct inode *inode);
>  bool nfsd_file_is_cached(struct inode *inode);
> +__be32 nfsd_file_acquire_gc(struct svc_rqst *rqstp, struct svc_fh *fhp,
> +		  unsigned int may_flags, struct nfsd_file **nfp);
>  __be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  		  unsigned int may_flags, struct nfsd_file **nfp);
>  __be32 nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
> index d12823371857..6a5ad6c91d8e 100644
> --- a/fs/nfsd/nfs3proc.c
> +++ b/fs/nfsd/nfs3proc.c
> @@ -779,8 +779,8 @@ nfsd3_proc_commit(struct svc_rqst *rqstp)
>  				(unsigned long long) argp->offset);
>  
>  	fh_copy(&resp->fh, &argp->fh);
> -	resp->status = nfsd_file_acquire(rqstp, &resp->fh, NFSD_MAY_WRITE |
> -					 NFSD_MAY_NOT_BREAK_LEASE, &nf);
> +	resp->status = nfsd_file_acquire_gc(rqstp, &resp->fh, NFSD_MAY_WRITE |
> +					    NFSD_MAY_NOT_BREAK_LEASE, &nf);
>  	if (resp->status)
>  		goto out;
>  	resp->status = nfsd_commit(rqstp, &resp->fh, nf, argp->offset,
> diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
> index 9ebd67d461f9..4921144880d3 100644
> --- a/fs/nfsd/trace.h
> +++ b/fs/nfsd/trace.h
> @@ -742,7 +742,8 @@ DEFINE_CLID_EVENT(confirmed_r);
>  	__print_flags(val, "|",						\
>  		{ 1 << NFSD_FILE_HASHED,	"HASHED" },		\
>  		{ 1 << NFSD_FILE_PENDING,	"PENDING" },		\
> -		{ 1 << NFSD_FILE_REFERENCED,	"REFERENCED"})
> +		{ 1 << NFSD_FILE_REFERENCED,	"REFERENCED"},		\
> +		{ 1 << NFSD_FILE_GC,		"GC"})
>  
>  DECLARE_EVENT_CLASS(nfsd_file_class,
>  	TP_PROTO(struct nfsd_file *nf),
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 44f210ba17cc..89d682a56fc4 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1060,7 +1060,7 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	__be32 err;
>  
>  	trace_nfsd_read_start(rqstp, fhp, offset, *count);
> -	err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf);
> +	err = nfsd_file_acquire_gc(rqstp, fhp, NFSD_MAY_READ, &nf);
>  	if (err)
>  		return err;
>  
> @@ -1092,7 +1092,7 @@ nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t offset,
>  
>  	trace_nfsd_write_start(rqstp, fhp, offset, *cnt);
>  
> -	err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_WRITE, &nf);
> +	err = nfsd_file_acquire_gc(rqstp, fhp, NFSD_MAY_WRITE, &nf);
>  	if (err)
>  		goto out;
>  
> 
> 

Looks reasonable otherwise though. I like this approach.

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 6/9] NFSD: Use const pointers as parameters to fh_ helpers.
  2022-10-05 14:56 ` [PATCH RFC 6/9] NFSD: Use const pointers as parameters to fh_ helpers Chuck Lever
@ 2022-10-06 15:59   ` Jeff Layton
  0 siblings, 0 replies; 18+ messages in thread
From: Jeff Layton @ 2022-10-06 15:59 UTC (permalink / raw)
  To: Chuck Lever, linux-nfs

On Wed, 2022-10-05 at 10:56 -0400, Chuck Lever wrote:
> Enable callers to use const pointers where they are able to.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  fs/nfsd/nfsfh.h |   10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
> index c3ae6414fc5c..513e028b0bbe 100644
> --- a/fs/nfsd/nfsfh.h
> +++ b/fs/nfsd/nfsfh.h
> @@ -220,7 +220,7 @@ __be32	fh_update(struct svc_fh *);
>  void	fh_put(struct svc_fh *);
>  
>  static __inline__ struct svc_fh *
> -fh_copy(struct svc_fh *dst, struct svc_fh *src)
> +fh_copy(struct svc_fh *dst, const struct svc_fh *src)
>  {
>  	WARN_ON(src->fh_dentry);
>  
> @@ -229,7 +229,7 @@ fh_copy(struct svc_fh *dst, struct svc_fh *src)
>  }
>  
>  static inline void
> -fh_copy_shallow(struct knfsd_fh *dst, struct knfsd_fh *src)
> +fh_copy_shallow(struct knfsd_fh *dst, const struct knfsd_fh *src)
>  {
>  	dst->fh_size = src->fh_size;
>  	memcpy(&dst->fh_raw, &src->fh_raw, src->fh_size);
> @@ -243,7 +243,8 @@ fh_init(struct svc_fh *fhp, int maxsize)
>  	return fhp;
>  }
>  
> -static inline bool fh_match(struct knfsd_fh *fh1, struct knfsd_fh *fh2)
> +static inline bool fh_match(const struct knfsd_fh *fh1,
> +			    const struct knfsd_fh *fh2)
>  {
>  	if (fh1->fh_size != fh2->fh_size)
>  		return false;
> @@ -252,7 +253,8 @@ static inline bool fh_match(struct knfsd_fh *fh1, struct knfsd_fh *fh2)
>  	return true;
>  }
>  
> -static inline bool fh_fsid_match(struct knfsd_fh *fh1, struct knfsd_fh *fh2)
> +static inline bool fh_fsid_match(const struct knfsd_fh *fh1,
> +				 const struct knfsd_fh *fh2)
>  {
>  	if (fh1->fh_fsid_type != fh2->fh_fsid_type)
>  		return false;
> 
> 

Always a good idea.

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 8/9] NFSD: Clean up nfs4_preprocess_stateid_op() call sites
  2022-10-05 14:56 ` [PATCH RFC 8/9] NFSD: Clean up nfs4_preprocess_stateid_op() call sites Chuck Lever
@ 2022-10-06 16:05   ` Jeff Layton
  0 siblings, 0 replies; 18+ messages in thread
From: Jeff Layton @ 2022-10-06 16:05 UTC (permalink / raw)
  To: Chuck Lever, linux-nfs

On Wed, 2022-10-05 at 10:56 -0400, Chuck Lever wrote:
> Remove the lame-duck dprintk()s around nfs4_preprocess_stateid_op()
> call sites.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  fs/nfsd/nfs4proc.c |   31 +++++++------------------------
>  1 file changed, 7 insertions(+), 24 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index d0d976f847ca..42b81e88ea14 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -951,12 +951,7 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
>  					&read->rd_stateid, RD_STATE,
>  					&read->rd_nf, NULL);
> -	if (status) {
> -		dprintk("NFSD: nfsd4_read: couldn't process stateid!\n");
> -		goto out;
> -	}
> -	status = nfs_ok;
> -out:
> +
>  	read->rd_rqstp = rqstp;
>  	read->rd_fhp = &cstate->current_fh;
>  	return status;
> @@ -1125,10 +1120,8 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  		status = nfs4_preprocess_stateid_op(rqstp, cstate,
>  				&cstate->current_fh, &setattr->sa_stateid,
>  				WR_STATE, NULL, NULL);
> -		if (status) {
> -			dprintk("NFSD: nfsd4_setattr: couldn't process stateid!\n");
> +		if (status)
>  			return status;
> -		}
>  	}
>  	err = fh_want_write(&cstate->current_fh);
>  	if (err)
> @@ -1176,10 +1169,8 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  			       write->wr_offset, cnt);
>  	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
>  						stateid, WR_STATE, &nf, NULL);
> -	if (status) {
> -		dprintk("NFSD: nfsd4_write: couldn't process stateid!\n");
> +	if (status)
>  		return status;
> -	}
>  
>  	write->wr_how_written = write->wr_stable_how;
>  
> @@ -1210,17 +1201,13 @@ nfsd4_verify_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  
>  	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->save_fh,
>  					    src_stateid, RD_STATE, src, NULL);
> -	if (status) {
> -		dprintk("NFSD: %s: couldn't process src stateid!\n", __func__);
> +	if (status)
>  		goto out;
> -	}
>  
>  	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
>  					    dst_stateid, WR_STATE, dst, NULL);
> -	if (status) {
> -		dprintk("NFSD: %s: couldn't process dst stateid!\n", __func__);
> +	if (status)
>  		goto out_put_src;
> -	}
>  
>  	/* fix up for NFS-specific error code */
>  	if (!S_ISREG(file_inode((*src)->nf_file)->i_mode) ||
> @@ -1955,10 +1942,8 @@ nfsd4_fallocate(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
>  					    &fallocate->falloc_stateid,
>  					    WR_STATE, &nf, NULL);
> -	if (status != nfs_ok) {
> -		dprintk("NFSD: nfsd4_fallocate: couldn't process stateid!\n");
> +	if (status != nfs_ok)
>  		return status;
> -	}
>  
>  	status = nfsd4_vfs_fallocate(rqstp, &cstate->current_fh, nf->nf_file,
>  				     fallocate->falloc_offset,
> @@ -2014,10 +1999,8 @@ nfsd4_seek(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
>  					    &seek->seek_stateid,
>  					    RD_STATE, &nf, NULL);
> -	if (status) {
> -		dprintk("NFSD: nfsd4_seek: couldn't process stateid!\n");
> +	if (status)
>  		return status;
> -	}
>  
>  	switch (seek->seek_whence) {
>  	case NFS4_CONTENT_DATA:
> 
> 

I agree that these are not particularly helpful. It might be a good idea
to add a conditional tracepoint in nfs4_preprocess_stateid_op though,
for when it returns an error, but I don't care enough to stand in the
way of this decluttering.

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 9/9] NFSD: Trace delegation revocations
  2022-10-05 14:56 ` [PATCH RFC 9/9] NFSD: Trace delegation revocations Chuck Lever
@ 2022-10-06 16:06   ` Jeff Layton
  0 siblings, 0 replies; 18+ messages in thread
From: Jeff Layton @ 2022-10-06 16:06 UTC (permalink / raw)
  To: Chuck Lever, linux-nfs

On Wed, 2022-10-05 at 10:56 -0400, Chuck Lever wrote:
> Revocation is an exceptional event. Generate a trace record when it
> occurs so that other activity can be triggered.
> 
> Example:
> 
> nfsd-1104  [005]  1912.002544: nfsd_stid_revoke:        client 633c9343:4e82788d stateid 00000003:00000001 ref=2 type=DELEG
> 
> Trace infrastructure is provided for subsequent additional tracing
> related to nfs4_stid activity.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  fs/nfsd/nfs4state.c |    2 ++
>  fs/nfsd/trace.h     |   55 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 57 insertions(+)
> 
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 06499b9481a6..349302afa7eb 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -1404,6 +1404,8 @@ static void revoke_delegation(struct nfs4_delegation *dp)
>  
>  	WARN_ON(!list_empty(&dp->dl_recall_lru));
>  
> +	trace_nfsd_stid_revoke(&dp->dl_stid);
> +
>  	if (clp->cl_minorversion) {
>  		dp->dl_stid.sc_type = NFS4_REVOKED_DELEG_STID;
>  		refcount_inc(&dp->dl_stid.sc_count);
> diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
> index 4921144880d3..23fb39c957af 100644
> --- a/fs/nfsd/trace.h
> +++ b/fs/nfsd/trace.h
> @@ -561,6 +561,61 @@ DEFINE_EVENT(nfsd_stateseqid_class, nfsd_##name, \
>  DEFINE_STATESEQID_EVENT(preprocess);
>  DEFINE_STATESEQID_EVENT(open_confirm);
>  
> +TRACE_DEFINE_ENUM(NFS4_OPEN_STID);
> +TRACE_DEFINE_ENUM(NFS4_LOCK_STID);
> +TRACE_DEFINE_ENUM(NFS4_DELEG_STID);
> +TRACE_DEFINE_ENUM(NFS4_CLOSED_STID);
> +TRACE_DEFINE_ENUM(NFS4_REVOKED_DELEG_STID);
> +TRACE_DEFINE_ENUM(NFS4_CLOSED_DELEG_STID);
> +TRACE_DEFINE_ENUM(NFS4_LAYOUT_STID);
> +
> +#define show_stid_type(x)						\
> +	__print_flags(x, "|",						\
> +		{ NFS4_OPEN_STID,		"OPEN" },		\
> +		{ NFS4_LOCK_STID,		"LOCK" },		\
> +		{ NFS4_DELEG_STID,		"DELEG" },		\
> +		{ NFS4_CLOSED_STID,		"CLOSED" },		\
> +		{ NFS4_REVOKED_DELEG_STID,	"REVOKED" },		\
> +		{ NFS4_CLOSED_DELEG_STID,	"CLOSED_DELEG" },	\
> +		{ NFS4_LAYOUT_STID,		"LAYOUT" })
> +
> +DECLARE_EVENT_CLASS(nfsd_stid_class,
> +	TP_PROTO(
> +		const struct nfs4_stid *stid
> +	),
> +	TP_ARGS(stid),
> +	TP_STRUCT__entry(
> +		__field(unsigned long, sc_type)
> +		__field(int, sc_count)
> +		__field(u32, cl_boot)
> +		__field(u32, cl_id)
> +		__field(u32, si_id)
> +		__field(u32, si_generation)
> +	),
> +	TP_fast_assign(
> +		const stateid_t *stp = &stid->sc_stateid;
> +
> +		__entry->sc_type = stid->sc_type;
> +		__entry->sc_count = refcount_read(&stid->sc_count);
> +		__entry->cl_boot = stp->si_opaque.so_clid.cl_boot;
> +		__entry->cl_id = stp->si_opaque.so_clid.cl_id;
> +		__entry->si_id = stp->si_opaque.so_id;
> +		__entry->si_generation = stp->si_generation;
> +	),
> +	TP_printk("client %08x:%08x stateid %08x:%08x ref=%d type=%s",
> +		__entry->cl_boot, __entry->cl_id,
> +		__entry->si_id, __entry->si_generation,
> +		__entry->sc_count, show_stid_type(__entry->sc_type)
> +	)
> +);
> +
> +#define DEFINE_STID_EVENT(name)					\
> +DEFINE_EVENT(nfsd_stid_class, nfsd_stid_##name,			\
> +	TP_PROTO(const struct nfs4_stid *stid),			\
> +	TP_ARGS(stid))
> +
> +DEFINE_STID_EVENT(revoke);
> +
>  DECLARE_EVENT_CLASS(nfsd_clientid_class,
>  	TP_PROTO(const clientid_t *clid),
>  	TP_ARGS(clid),
> 
> 

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 5/9] NFSD: Add an NFSD_FILE_GC flag to enable nfsd_file garbage collection
  2022-10-06 15:59   ` Jeff Layton
@ 2022-10-06 16:06     ` Chuck Lever III
  0 siblings, 0 replies; 18+ messages in thread
From: Chuck Lever III @ 2022-10-06 16:06 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Linux NFS Mailing List



> On Oct 6, 2022, at 11:59 AM, Jeff Layton <jlayton@kernel.org> wrote:
> 
> On Wed, 2022-10-05 at 10:56 -0400, Chuck Lever wrote:
>> NFSv4 operations manage the lifetime of nfsd_file items they use by
>> means of NFSv4 OPEN and CLOSE. Hence there's no need for them to be
>> garbage collected.
>> 
>> Introduce a mechanism to enable garbage collection for nfsd_file
>> items used only by NFSv2/3 callers.
>> 
>> Note that the change in nfsd_file_put() ensures that both CLOSE and
>> DELEGRETURN will actually close out and free an nfsd_file on last
>> reference of a non-garbage-collected file.
>> 
>> Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=394
>> Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com>
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>> fs/nfsd/filecache.c |   61 +++++++++++++++++++++++++++++++++++++++++++++------
>> fs/nfsd/filecache.h |    3 +++
>> fs/nfsd/nfs3proc.c  |    4 ++-
>> fs/nfsd/trace.h     |    3 ++-
>> fs/nfsd/vfs.c       |    4 ++-
>> 5 files changed, 63 insertions(+), 12 deletions(-)
>> 
>> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
>> index b7aa523c2010..01c27deabf83 100644
>> --- a/fs/nfsd/filecache.c
>> +++ b/fs/nfsd/filecache.c
>> @@ -63,6 +63,7 @@ struct nfsd_file_lookup_key {
>> 	struct net			*net;
>> 	const struct cred		*cred;
>> 	unsigned char			need;
>> +	unsigned char			gc:1;
>> 	enum nfsd_file_lookup_type	type;
>> };
>> 
>> @@ -162,6 +163,8 @@ static int nfsd_file_obj_cmpfn(struct rhashtable_compare_arg *arg,
>> 			return 1;
>> 		if (!nfsd_match_cred(nf->nf_cred, key->cred))
>> 			return 1;
>> +		if (test_bit(NFSD_FILE_GC, &nf->nf_flags) != key->gc)
>> +			return 1;
>> 		if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0)
>> 			return 1;
>> 		break;
>> @@ -297,6 +300,8 @@ nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may)
>> 		nf->nf_flags = 0;
>> 		__set_bit(NFSD_FILE_HASHED, &nf->nf_flags);
>> 		__set_bit(NFSD_FILE_PENDING, &nf->nf_flags);
>> +		if (key->gc)
>> +			__set_bit(NFSD_FILE_GC, &nf->nf_flags);
>> 		nf->nf_inode = key->inode;
>> 		/* nf_ref is pre-incremented for hash table */
>> 		refcount_set(&nf->nf_ref, 2);
>> @@ -428,16 +433,27 @@ nfsd_file_put_noref(struct nfsd_file *nf)
>> 	}
>> }
>> 
>> +static void
>> +nfsd_file_unhash_and_put(struct nfsd_file *nf)
>> +{
>> +	if (nfsd_file_unhash(nf))
>> +		nfsd_file_put_noref(nf);
>> +}
>> +
>> void
>> nfsd_file_put(struct nfsd_file *nf)
>> {
>> 	might_sleep();
>> 
>> -	nfsd_file_lru_add(nf);
>> +	if (test_bit(NFSD_FILE_GC, &nf->nf_flags) == 1)
>> +		nfsd_file_lru_add(nf);
>> +	else if (refcount_read(&nf->nf_ref) == 2)
>> +		nfsd_file_unhash_and_put(nf);
>> +
>> 	if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) {
>> 		nfsd_file_flush(nf);
>> 		nfsd_file_put_noref(nf);
>> -	} else if (nf->nf_file) {
>> +	} else if (nf->nf_file && test_bit(NFSD_FILE_GC, &nf->nf_flags) == 1) {
>> 		nfsd_file_put_noref(nf);
>> 		nfsd_file_schedule_laundrette();
>> 	} else
>> @@ -1017,12 +1033,14 @@ nfsd_file_is_cached(struct inode *inode)
>> 
>> static __be32
>> nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
>> -		     unsigned int may_flags, struct nfsd_file **pnf, bool open)
>> +		     unsigned int may_flags, struct nfsd_file **pnf,
>> +		     bool open, int want_gc)
> 
> want_gc should be a bool

Well, that is a little easier to read, but...

I set key.gc to want_gc. key.gc is an integer field because
I compare it to the result of test_bit(), which returns an
integer 0 or 1.

I think I can trust that bool false is always 0, but can I
trust that bool true is 1 everywhere in the C universe? I'm
never sure about that.


>> {
>> 	struct nfsd_file_lookup_key key = {
>> 		.type	= NFSD_FILE_KEY_FULL,
>> 		.need	= may_flags & NFSD_FILE_MAY_MASK,
>> 		.net	= SVC_NET(rqstp),
>> +		.gc	= want_gc,
>> 	};
>> 	bool open_retry = true;
>> 	struct nfsd_file *nf;
>> @@ -1117,14 +1135,35 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
>> 	 * then unhash.
>> 	 */
>> 	if (status != nfs_ok || key.inode->i_nlink == 0)
>> -		if (nfsd_file_unhash(nf))
>> -			nfsd_file_put_noref(nf);
>> +		nfsd_file_unhash_and_put(nf);
>> 	clear_bit_unlock(NFSD_FILE_PENDING, &nf->nf_flags);
>> 	smp_mb__after_atomic();
>> 	wake_up_bit(&nf->nf_flags, NFSD_FILE_PENDING);
>> 	goto out;
>> }
>> 
>> +/**
>> + * nfsd_file_acquire_gc - Get a struct nfsd_file with an open file
>> + * @rqstp: the RPC transaction being executed
>> + * @fhp: the NFS filehandle of the file to be opened
>> + * @may_flags: NFSD_MAY_ settings for the file
>> + * @pnf: OUT: new or found "struct nfsd_file" object
>> + *
>> + * The nfsd_file object returned by this API is reference-counted
>> + * and garbage-collected. The object is retained for a few
>> + * seconds after the final nfsd_file_put() in case the caller
>> + * wants to re-use it.
>> + *
>> + * Returns nfs_ok and sets @pnf on success; otherwise an nfsstat in
>> + * network byte order is returned.
>> + */
>> +__be32
>> +nfsd_file_acquire_gc(struct svc_rqst *rqstp, struct svc_fh *fhp,
>> +		     unsigned int may_flags, struct nfsd_file **pnf)
>> +{
>> +	return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, true, 1);
>> +}
>> +
>> /**
>>  * nfsd_file_acquire - Get a struct nfsd_file with an open file
>>  * @rqstp: the RPC transaction being executed
>> @@ -1132,6 +1171,10 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>  * @may_flags: NFSD_MAY_ settings for the file
>>  * @pnf: OUT: new or found "struct nfsd_file" object
>>  *
>> + * The nfsd_file_object returned by this API is reference-counted
>> + * but not garbage-collected. The object is released one RCU grace
>> + * period after the final nfsd_file_put().
>> + *
>>  * Returns nfs_ok and sets @pnf on success; otherwise an nfsstat in
>>  * network byte order is returned.
>>  */
>> @@ -1139,7 +1182,7 @@ __be32
>> nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
>> 		  unsigned int may_flags, struct nfsd_file **pnf)
>> {
>> -	return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, true);
>> +	return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, true, 0);
>> }
>> 
>> /**
>> @@ -1149,6 +1192,10 @@ nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>  * @may_flags: NFSD_MAY_ settings for the file
>>  * @pnf: OUT: new or found "struct nfsd_file" object
>>  *
>> + * The nfsd_file_object returned by this API is reference-counted
>> + * but not garbage-collected. The object is released immediately
>> + * one RCU grace period after the final nfsd_file_put().
>> + *
>>  * Returns nfs_ok and sets @pnf on success; otherwise an nfsstat in
>>  * network byte order is returned.
>>  */
>> @@ -1156,7 +1203,7 @@ __be32
>> nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>> 		 unsigned int may_flags, struct nfsd_file **pnf)
>> {
>> -	return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, false);
>> +	return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, false, 0);
>> }
>> 
>> /*
>> diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
>> index f81c198f4ed6..0f6546bcd3e0 100644
>> --- a/fs/nfsd/filecache.h
>> +++ b/fs/nfsd/filecache.h
>> @@ -38,6 +38,7 @@ struct nfsd_file {
>> #define NFSD_FILE_HASHED	(0)
>> #define NFSD_FILE_PENDING	(1)
>> #define NFSD_FILE_REFERENCED	(2)
>> +#define NFSD_FILE_GC		(3)
>> 	unsigned long		nf_flags;
>> 	struct inode		*nf_inode;	/* don't deref */
>> 	refcount_t		nf_ref;
>> @@ -55,6 +56,8 @@ void nfsd_file_put(struct nfsd_file *nf);
>> struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
>> void nfsd_file_close_inode_sync(struct inode *inode);
>> bool nfsd_file_is_cached(struct inode *inode);
>> +__be32 nfsd_file_acquire_gc(struct svc_rqst *rqstp, struct svc_fh *fhp,
>> +		  unsigned int may_flags, struct nfsd_file **nfp);
>> __be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
>> 		  unsigned int may_flags, struct nfsd_file **nfp);
>> __be32 nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>> diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
>> index d12823371857..6a5ad6c91d8e 100644
>> --- a/fs/nfsd/nfs3proc.c
>> +++ b/fs/nfsd/nfs3proc.c
>> @@ -779,8 +779,8 @@ nfsd3_proc_commit(struct svc_rqst *rqstp)
>> 				(unsigned long long) argp->offset);
>> 
>> 	fh_copy(&resp->fh, &argp->fh);
>> -	resp->status = nfsd_file_acquire(rqstp, &resp->fh, NFSD_MAY_WRITE |
>> -					 NFSD_MAY_NOT_BREAK_LEASE, &nf);
>> +	resp->status = nfsd_file_acquire_gc(rqstp, &resp->fh, NFSD_MAY_WRITE |
>> +					    NFSD_MAY_NOT_BREAK_LEASE, &nf);
>> 	if (resp->status)
>> 		goto out;
>> 	resp->status = nfsd_commit(rqstp, &resp->fh, nf, argp->offset,
>> diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
>> index 9ebd67d461f9..4921144880d3 100644
>> --- a/fs/nfsd/trace.h
>> +++ b/fs/nfsd/trace.h
>> @@ -742,7 +742,8 @@ DEFINE_CLID_EVENT(confirmed_r);
>> 	__print_flags(val, "|",						\
>> 		{ 1 << NFSD_FILE_HASHED,	"HASHED" },		\
>> 		{ 1 << NFSD_FILE_PENDING,	"PENDING" },		\
>> -		{ 1 << NFSD_FILE_REFERENCED,	"REFERENCED"})
>> +		{ 1 << NFSD_FILE_REFERENCED,	"REFERENCED"},		\
>> +		{ 1 << NFSD_FILE_GC,		"GC"})
>> 
>> DECLARE_EVENT_CLASS(nfsd_file_class,
>> 	TP_PROTO(struct nfsd_file *nf),
>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>> index 44f210ba17cc..89d682a56fc4 100644
>> --- a/fs/nfsd/vfs.c
>> +++ b/fs/nfsd/vfs.c
>> @@ -1060,7 +1060,7 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
>> 	__be32 err;
>> 
>> 	trace_nfsd_read_start(rqstp, fhp, offset, *count);
>> -	err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf);
>> +	err = nfsd_file_acquire_gc(rqstp, fhp, NFSD_MAY_READ, &nf);
>> 	if (err)
>> 		return err;
>> 
>> @@ -1092,7 +1092,7 @@ nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t offset,
>> 
>> 	trace_nfsd_write_start(rqstp, fhp, offset, *cnt);
>> 
>> -	err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_WRITE, &nf);
>> +	err = nfsd_file_acquire_gc(rqstp, fhp, NFSD_MAY_WRITE, &nf);
>> 	if (err)
>> 		goto out;
>> 
>> 
>> 
> 
> Looks reasonable otherwise though. I like this approach.
> 
> -- 
> Jeff Layton <jlayton@kernel.org>

--
Chuck Lever




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 7/9] NFSD: Use rhashtable for managing nfs4_file objects
  2022-10-05 15:11   ` Chuck Lever III
@ 2022-10-06 16:12     ` Jeff Layton
  2022-10-06 16:15       ` Chuck Lever III
  0 siblings, 1 reply; 18+ messages in thread
From: Jeff Layton @ 2022-10-06 16:12 UTC (permalink / raw)
  To: Chuck Lever III, Linux NFS Mailing List

On Wed, 2022-10-05 at 15:11 +0000, Chuck Lever III wrote:
> 
> > On Oct 5, 2022, at 10:56 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
> > 
> > fh_match() is expensive to use for hash chains that contain more
> > than a few objects. With common workloads, I see multiple thousands
> > of objects stored in file_hashtbl[], which always has only 256
> > buckets.
> > 
> > Replace it with an rhashtable, which dynamically resizes its bucket
> > array to keep hash chains short.
> > 
> > This also enables the removal of the use of state_lock to serialize
> > operations on the new rhashtable.
> > 
> > The result is an improvement in the latency of NFSv4 operations
> > and the reduction of nfsd CPU utilization due to the cache misses
> > of walking long hash chains in file_hashtbl.
> > 
> > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > Reviewed-by: Jeff Layton <jlayton@kernel.org>
> > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > ---
> > fs/nfsd/nfs4state.c |  229 +++++++++++++++++++++++++++++++++++----------------
> > fs/nfsd/state.h     |    5 -
> > 2 files changed, 158 insertions(+), 76 deletions(-)
> > 
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index 2b850de288cf..06499b9481a6 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -44,7 +44,9 @@
> > #include <linux/jhash.h>
> > #include <linux/string_helpers.h>
> > #include <linux/fsnotify.h>
> > +#include <linux/rhashtable.h>
> > #include <linux/nfs_ssc.h>
> > +
> > #include "xdr4.h"
> > #include "xdr4cb.h"
> > #include "vfs.h"
> > @@ -84,6 +86,7 @@ static bool check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner)
> > static void nfs4_free_ol_stateid(struct nfs4_stid *stid);
> > void nfsd4_end_grace(struct nfsd_net *nn);
> > static void _free_cpntf_state_locked(struct nfsd_net *nn, struct nfs4_cpntf_state *cps);
> > +static void unhash_nfs4_file(struct nfs4_file *fp);
> > 
> > /* Locking: */
> > 
> > @@ -577,11 +580,8 @@ static void nfsd4_free_file_rcu(struct rcu_head *rcu)
> > void
> > put_nfs4_file(struct nfs4_file *fi)
> > {
> > -	might_lock(&state_lock);
> > -
> > -	if (refcount_dec_and_lock(&fi->fi_ref, &state_lock)) {
> > -		hlist_del_rcu(&fi->fi_hash);
> > -		spin_unlock(&state_lock);
> > +	if (refcount_dec_and_test(&fi->fi_ref)) {
> > +		unhash_nfs4_file(fi);
> > 		WARN_ON_ONCE(!list_empty(&fi->fi_clnt_odstate));
> > 		WARN_ON_ONCE(!list_empty(&fi->fi_delegations));
> > 		call_rcu(&fi->fi_rcu, nfsd4_free_file_rcu);
> > @@ -695,19 +695,85 @@ static unsigned int ownerstr_hashval(struct xdr_netobj *ownername)
> > 	return ret & OWNER_HASH_MASK;
> > }
> > 
> > -/* hash table for nfs4_file */
> > -#define FILE_HASH_BITS                   8
> > -#define FILE_HASH_SIZE                  (1 << FILE_HASH_BITS)
> > +static struct rhashtable nfs4_file_rhashtbl ____cacheline_aligned_in_smp;
> > 
> > -static unsigned int file_hashval(struct svc_fh *fh)
> > +/*
> > + * The returned hash value is based solely on the address of an in-code
> > + * inode, a pointer to a slab-allocated object. The entropy in such a
> > + * pointer is concentrated in its middle bits.
> > + */
> > +static u32 nfs4_file_inode_hash(const struct inode *inode, u32 seed)
> > +{
> > +	unsigned long ptr = (unsigned long)inode;
> > +	u32 k;
> > +
> > +	k = ptr >> L1_CACHE_SHIFT;
> > +	k &= 0x00ffffff;
> > +	return jhash2(&k, 1, seed);
> > +}
> > +
> > +/**
> > + * nfs4_file_key_hashfn - Compute the hash value of a lookup key
> > + * @data: key on which to compute the hash value
> > + * @len: rhash table's key_len parameter (unused)
> > + * @seed: rhash table's random seed of the day
> > + *
> > + * Return value:
> > + *   Computed 32-bit hash value
> > + */
> > +static u32 nfs4_file_key_hashfn(const void *data, u32 len, u32 seed)
> > {
> > -	struct inode *inode = d_inode(fh->fh_dentry);
> > +	const struct svc_fh *fhp = data;
> > 
> > -	/* XXX: why not (here & in file cache) use inode? */
> > -	return (unsigned int)hash_long(inode->i_ino, FILE_HASH_BITS);
> > +	return nfs4_file_inode_hash(d_inode(fhp->fh_dentry), seed);
> > }
> > 
> > -static struct hlist_head file_hashtbl[FILE_HASH_SIZE];
> > +/**
> > + * nfs4_file_obj_hashfn - Compute the hash value of an nfs4_file object
> > + * @data: object on which to compute the hash value
> > + * @len: rhash table's key_len parameter (unused)
> > + * @seed: rhash table's random seed of the day
> > + *
> > + * Return value:
> > + *   Computed 32-bit hash value
> > + */
> > +static u32 nfs4_file_obj_hashfn(const void *data, u32 len, u32 seed)
> > +{
> > +	const struct nfs4_file *fi = data;
> > +
> > +	return nfs4_file_inode_hash(fi->fi_inode, seed);
> > +}
> > +
> > +/**
> > + * nfs4_file_obj_cmpfn - Match a cache item against search criteria
> > + * @arg: search criteria
> > + * @ptr: cache item to check
> > + *
> > + * Return values:
> > + *   %0 - Item matches search criteria
> > + *   %1 - Item does not match search criteria
> > + */
> > +static int nfs4_file_obj_cmpfn(struct rhashtable_compare_arg *arg,
> > +			       const void *ptr)
> > +{
> > +	const struct svc_fh *fhp = arg->key;
> > +	const struct nfs4_file *fi = ptr;
> > +
> > +	return fh_match(&fi->fi_fhandle, &fhp->fh_handle) ? 0 : 1;
> > +}
> > +
> > +static const struct rhashtable_params nfs4_file_rhash_params = {
> > +	.key_len		= sizeof_field(struct nfs4_file, fi_inode),
> > +	.key_offset		= offsetof(struct nfs4_file, fi_inode),
> > +	.head_offset		= offsetof(struct nfs4_file, fi_rhash),
> > +	.hashfn			= nfs4_file_key_hashfn,
> > +	.obj_hashfn		= nfs4_file_obj_hashfn,
> > +	.obj_cmpfn		= nfs4_file_obj_cmpfn,
> > +
> > +	/* Reduce resizing churn on light workloads */
> > +	.min_size		= 512,		/* buckets */
> > +	.automatic_shrinking	= true,
> > +};
> > 
> > /*
> >  * Check if courtesy clients have conflicting access and resolve it if possible
> > @@ -4251,11 +4317,8 @@ static struct nfs4_file *nfsd4_alloc_file(void)
> > }
> > 
> > /* OPEN Share state helper functions */
> > -static void nfsd4_init_file(struct svc_fh *fh, unsigned int hashval,
> > -				struct nfs4_file *fp)
> > +static void init_nfs4_file(const struct svc_fh *fh, struct nfs4_file *fp)
> > {
> > -	lockdep_assert_held(&state_lock);
> > -
> > 	refcount_set(&fp->fi_ref, 1);
> > 	spin_lock_init(&fp->fi_lock);
> > 	INIT_LIST_HEAD(&fp->fi_stateids);
> > @@ -4273,7 +4336,6 @@ static void nfsd4_init_file(struct svc_fh *fh, unsigned int hashval,
> > 	INIT_LIST_HEAD(&fp->fi_lo_states);
> > 	atomic_set(&fp->fi_lo_recalls, 0);
> > #endif
> > -	hlist_add_head_rcu(&fp->fi_hash, &file_hashtbl[hashval]);
> > }
> > 
> > void
> > @@ -4626,71 +4688,84 @@ move_to_close_lru(struct nfs4_ol_stateid *s, struct net *net)
> > 		nfs4_put_stid(&last->st_stid);
> > }
> > 
> > -/* search file_hashtbl[] for file */
> > -static struct nfs4_file *
> > -find_file_locked(struct svc_fh *fh, unsigned int hashval)
> > +static struct nfs4_file *find_nfs4_file(const struct svc_fh *fhp)
> > {
> > -	struct nfs4_file *fp;
> > +	struct nfs4_file *fi;
> > 
> > -	hlist_for_each_entry_rcu(fp, &file_hashtbl[hashval], fi_hash,
> > -				lockdep_is_held(&state_lock)) {
> > -		if (fh_match(&fp->fi_fhandle, &fh->fh_handle)) {
> > -			if (refcount_inc_not_zero(&fp->fi_ref))
> > -				return fp;
> > -		}
> > -	}
> > -	return NULL;
> > +	rcu_read_lock();
> > +	fi = rhashtable_lookup(&nfs4_file_rhashtbl, fhp,
> > +			       nfs4_file_rhash_params);
> > +	if (fi)
> > +		if (!refcount_inc_not_zero(&fi->fi_ref))
> > +			fi = NULL;
> > +	rcu_read_unlock();
> > +	return fi;
> > }
> > 
> > -static struct nfs4_file *insert_file(struct nfs4_file *new, struct svc_fh *fh,
> > -				     unsigned int hashval)
> > +static void check_nfs4_file_aliases_locked(struct nfs4_file *new,
> > +					   const struct svc_fh *fhp)
> > {
> > -	struct nfs4_file *fp;
> > -	struct nfs4_file *ret = NULL;
> > -	bool alias_found = false;
> > +	struct rhashtable *ht = &nfs4_file_rhashtbl;
> > +	struct rhash_lock_head __rcu *const *bkt;
> > +	struct rhashtable_compare_arg arg = {
> > +		.ht	= ht,
> > +		.key	= fhp,
> > +	};
> > +	struct bucket_table *tbl;
> > +	struct rhash_head *he;
> > +	unsigned int hash;
> > 
> > -	spin_lock(&state_lock);
> > -	hlist_for_each_entry_rcu(fp, &file_hashtbl[hashval], fi_hash,
> > -				 lockdep_is_held(&state_lock)) {
> > -		if (fh_match(&fp->fi_fhandle, &fh->fh_handle)) {
> > -			if (refcount_inc_not_zero(&fp->fi_ref))
> > -				ret = fp;
> > -		} else if (d_inode(fh->fh_dentry) == fp->fi_inode)
> > -			fp->fi_aliased = alias_found = true;
> > -	}
> > -	if (likely(ret == NULL)) {
> > -		nfsd4_init_file(fh, hashval, new);
> > -		new->fi_aliased = alias_found;
> > -		ret = new;
> > +	/*
> > +	 * rhashtable guarantees small buckets, thus this loop stays
> > +	 * efficient.
> > +	 */
> > +	rcu_read_lock();
> > +	tbl = rht_dereference_rcu(ht->tbl, ht);
> > +	hash = rht_key_hashfn(ht, tbl, fhp, nfs4_file_rhash_params);
> > +	bkt = rht_bucket(tbl, hash);
> > +	rht_for_each_rcu_from(he, rht_ptr_rcu(bkt), tbl, hash) {
> > +		struct nfs4_file *fi;
> > +
> > +		fi = rht_obj(ht, he);
> > +		if (nfs4_file_obj_cmpfn(&arg, fi) == 0)
> > +			continue;
> > +		if (d_inode(fhp->fh_dentry) == fi->fi_inode) {
> > +			fi->fi_aliased = true;
> > +			new->fi_aliased = true;
> > +		}
> > 	}
> > -	spin_unlock(&state_lock);
> > -	return ret;
> > +	rcu_read_unlock();
> > }
> > 
> > -static struct nfs4_file * find_file(struct svc_fh *fh)
> > +static noinline struct nfs4_file *
> > +find_or_hash_nfs4_file(struct nfs4_file *new, const struct svc_fh *fhp)
> > {
> > -	struct nfs4_file *fp;
> > -	unsigned int hashval = file_hashval(fh);
> > +	struct nfs4_file *fi;
> > 
> > -	rcu_read_lock();
> > -	fp = find_file_locked(fh, hashval);
> > -	rcu_read_unlock();
> > -	return fp;
> > -}
> > +	init_nfs4_file(fhp, new);
> > 
> > -static struct nfs4_file *
> > -find_or_add_file(struct nfs4_file *new, struct svc_fh *fh)
> > -{
> > -	struct nfs4_file *fp;
> > -	unsigned int hashval = file_hashval(fh);
> > +	fi = rhashtable_lookup_get_insert_key(&nfs4_file_rhashtbl,
> > +					      fhp, &new->fi_rhash,
> > +					      nfs4_file_rhash_params);
> > +	if (!fi) {
> > +		fi = new;
> > +		goto check_aliases;
> > +	}
> > +	if (IS_ERR(fi))		/* or BUG? */
> > +		return NULL;
> > +	if (!refcount_inc_not_zero(&fi->fi_ref))
> > +		fi = new;
> 
> Ah, hrm. Given what we just had to do to nfsd_file_do_acquire(),
> maybe this needs the same fix to hang onto the RCU read lock
> while dicking with the nfs4_file object's reference count?
> 
> 

Yes. Probably we should just merge this patch if you want a fix for
mainline:

    nfsd: rework hashtable handling in nfsd_do_file_acquire


> > -	rcu_read_lock();
> > -	fp = find_file_locked(fh, hashval);
> > -	rcu_read_unlock();
> > -	if (fp)
> > -		return fp;
> > +check_aliases:
> > +	check_nfs4_file_aliases_locked(fi, fhp);
> > +
> > +	return fi;
> > +}
> > 
> > -	return insert_file(new, fh, hashval);
> > +static void unhash_nfs4_file(struct nfs4_file *fi)
> > +{
> > +	rhashtable_remove_fast(&nfs4_file_rhashtbl, &fi->fi_rhash,
> > +			       nfs4_file_rhash_params);
> > }
> > 
> > /*
> > @@ -4703,9 +4778,10 @@ nfs4_share_conflict(struct svc_fh *current_fh, unsigned int deny_type)
> > 	struct nfs4_file *fp;
> > 	__be32 ret = nfs_ok;
> > 
> > -	fp = find_file(current_fh);
> > +	fp = find_nfs4_file(current_fh);
> > 	if (!fp)
> > 		return ret;
> > +
> > 	/* Check for conflicting share reservations */
> > 	spin_lock(&fp->fi_lock);
> > 	if (fp->fi_share_deny & deny_type)
> > @@ -5548,7 +5624,9 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf
> > 	 * and check for delegations in the process of being recalled.
> > 	 * If not found, create the nfs4_file struct
> > 	 */
> > -	fp = find_or_add_file(open->op_file, current_fh);
> > +	fp = find_or_hash_nfs4_file(open->op_file, current_fh);
> > +	if (unlikely(!fp))
> > +		return nfserr_jukebox;
> > 	if (fp != open->op_file) {
> > 		status = nfs4_check_deleg(cl, open, &dp);
> > 		if (status)
> > @@ -7905,10 +7983,16 @@ nfs4_state_start(void)
> > {
> > 	int ret;
> > 
> > -	ret = nfsd4_create_callback_queue();
> > +	ret = rhashtable_init(&nfs4_file_rhashtbl, &nfs4_file_rhash_params);
> > 	if (ret)
> > 		return ret;
> > 
> > +	ret = nfsd4_create_callback_queue();
> > +	if (ret) {
> > +		rhashtable_destroy(&nfs4_file_rhashtbl);
> > +		return ret;
> > +	}
> > +
> > 	set_max_delegations();
> > 	return 0;
> > }
> > @@ -7939,6 +8023,7 @@ nfs4_state_shutdown_net(struct net *net)
> > 
> > 	nfsd4_client_tracking_exit(net);
> > 	nfs4_state_destroy_net(net);
> > +	rhashtable_destroy(&nfs4_file_rhashtbl);
> > #ifdef CONFIG_NFSD_V4_2_INTER_SSC
> > 	nfsd4_ssc_shutdown_umount(nn);
> > #endif
> > diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
> > index ae596dbf8667..879f085bc39e 100644
> > --- a/fs/nfsd/state.h
> > +++ b/fs/nfsd/state.h
> > @@ -536,16 +536,13 @@ struct nfs4_clnt_odstate {
> >  * inode can have multiple filehandles associated with it, so there is
> >  * (potentially) a many to one relationship between this struct and struct
> >  * inode.
> > - *
> > - * These are hashed by filehandle in the file_hashtbl, which is protected by
> > - * the global state_lock spinlock.
> >  */
> > struct nfs4_file {
> > 	refcount_t		fi_ref;
> > 	struct inode *		fi_inode;
> > 	bool			fi_aliased;
> > 	spinlock_t		fi_lock;
> > -	struct hlist_node       fi_hash;	/* hash on fi_fhandle */
> > +	struct rhash_head	fi_rhash;
> > 	struct list_head        fi_stateids;
> > 	union {
> > 		struct list_head	fi_delegations;
> > 
> > 
> 
> --
> Chuck Lever
> 
> 
> 

-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 7/9] NFSD: Use rhashtable for managing nfs4_file objects
  2022-10-06 16:12     ` Jeff Layton
@ 2022-10-06 16:15       ` Chuck Lever III
  0 siblings, 0 replies; 18+ messages in thread
From: Chuck Lever III @ 2022-10-06 16:15 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Linux NFS Mailing List



> On Oct 6, 2022, at 12:12 PM, Jeff Layton <jlayton@poochiereds.net> wrote:
> 
> On Wed, 2022-10-05 at 15:11 +0000, Chuck Lever III wrote:
>> 
>>> On Oct 5, 2022, at 10:56 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
>>> 
>>> fh_match() is expensive to use for hash chains that contain more
>>> than a few objects. With common workloads, I see multiple thousands
>>> of objects stored in file_hashtbl[], which always has only 256
>>> buckets.
>>> 
>>> Replace it with an rhashtable, which dynamically resizes its bucket
>>> array to keep hash chains short.
>>> 
>>> This also enables the removal of the use of state_lock to serialize
>>> operations on the new rhashtable.
>>> 
>>> The result is an improvement in the latency of NFSv4 operations
>>> and the reduction of nfsd CPU utilization due to the cache misses
>>> of walking long hash chains in file_hashtbl.
>>> 
>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>> Reviewed-by: Jeff Layton <jlayton@kernel.org>
>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>> ---
>>> fs/nfsd/nfs4state.c |  229 +++++++++++++++++++++++++++++++++++----------------
>>> fs/nfsd/state.h     |    5 -
>>> 2 files changed, 158 insertions(+), 76 deletions(-)
>>> 
>>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>>> index 2b850de288cf..06499b9481a6 100644
>>> --- a/fs/nfsd/nfs4state.c
>>> +++ b/fs/nfsd/nfs4state.c
>>> @@ -44,7 +44,9 @@
>>> #include <linux/jhash.h>
>>> #include <linux/string_helpers.h>
>>> #include <linux/fsnotify.h>
>>> +#include <linux/rhashtable.h>
>>> #include <linux/nfs_ssc.h>
>>> +
>>> #include "xdr4.h"
>>> #include "xdr4cb.h"
>>> #include "vfs.h"
>>> @@ -84,6 +86,7 @@ static bool check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner)
>>> static void nfs4_free_ol_stateid(struct nfs4_stid *stid);
>>> void nfsd4_end_grace(struct nfsd_net *nn);
>>> static void _free_cpntf_state_locked(struct nfsd_net *nn, struct nfs4_cpntf_state *cps);
>>> +static void unhash_nfs4_file(struct nfs4_file *fp);
>>> 
>>> /* Locking: */
>>> 
>>> @@ -577,11 +580,8 @@ static void nfsd4_free_file_rcu(struct rcu_head *rcu)
>>> void
>>> put_nfs4_file(struct nfs4_file *fi)
>>> {
>>> -	might_lock(&state_lock);
>>> -
>>> -	if (refcount_dec_and_lock(&fi->fi_ref, &state_lock)) {
>>> -		hlist_del_rcu(&fi->fi_hash);
>>> -		spin_unlock(&state_lock);
>>> +	if (refcount_dec_and_test(&fi->fi_ref)) {
>>> +		unhash_nfs4_file(fi);
>>> 		WARN_ON_ONCE(!list_empty(&fi->fi_clnt_odstate));
>>> 		WARN_ON_ONCE(!list_empty(&fi->fi_delegations));
>>> 		call_rcu(&fi->fi_rcu, nfsd4_free_file_rcu);
>>> @@ -695,19 +695,85 @@ static unsigned int ownerstr_hashval(struct xdr_netobj *ownername)
>>> 	return ret & OWNER_HASH_MASK;
>>> }
>>> 
>>> -/* hash table for nfs4_file */
>>> -#define FILE_HASH_BITS                   8
>>> -#define FILE_HASH_SIZE                  (1 << FILE_HASH_BITS)
>>> +static struct rhashtable nfs4_file_rhashtbl ____cacheline_aligned_in_smp;
>>> 
>>> -static unsigned int file_hashval(struct svc_fh *fh)
>>> +/*
>>> + * The returned hash value is based solely on the address of an in-code
>>> + * inode, a pointer to a slab-allocated object. The entropy in such a
>>> + * pointer is concentrated in its middle bits.
>>> + */
>>> +static u32 nfs4_file_inode_hash(const struct inode *inode, u32 seed)
>>> +{
>>> +	unsigned long ptr = (unsigned long)inode;
>>> +	u32 k;
>>> +
>>> +	k = ptr >> L1_CACHE_SHIFT;
>>> +	k &= 0x00ffffff;
>>> +	return jhash2(&k, 1, seed);
>>> +}
>>> +
>>> +/**
>>> + * nfs4_file_key_hashfn - Compute the hash value of a lookup key
>>> + * @data: key on which to compute the hash value
>>> + * @len: rhash table's key_len parameter (unused)
>>> + * @seed: rhash table's random seed of the day
>>> + *
>>> + * Return value:
>>> + *   Computed 32-bit hash value
>>> + */
>>> +static u32 nfs4_file_key_hashfn(const void *data, u32 len, u32 seed)
>>> {
>>> -	struct inode *inode = d_inode(fh->fh_dentry);
>>> +	const struct svc_fh *fhp = data;
>>> 
>>> -	/* XXX: why not (here & in file cache) use inode? */
>>> -	return (unsigned int)hash_long(inode->i_ino, FILE_HASH_BITS);
>>> +	return nfs4_file_inode_hash(d_inode(fhp->fh_dentry), seed);
>>> }
>>> 
>>> -static struct hlist_head file_hashtbl[FILE_HASH_SIZE];
>>> +/**
>>> + * nfs4_file_obj_hashfn - Compute the hash value of an nfs4_file object
>>> + * @data: object on which to compute the hash value
>>> + * @len: rhash table's key_len parameter (unused)
>>> + * @seed: rhash table's random seed of the day
>>> + *
>>> + * Return value:
>>> + *   Computed 32-bit hash value
>>> + */
>>> +static u32 nfs4_file_obj_hashfn(const void *data, u32 len, u32 seed)
>>> +{
>>> +	const struct nfs4_file *fi = data;
>>> +
>>> +	return nfs4_file_inode_hash(fi->fi_inode, seed);
>>> +}
>>> +
>>> +/**
>>> + * nfs4_file_obj_cmpfn - Match a cache item against search criteria
>>> + * @arg: search criteria
>>> + * @ptr: cache item to check
>>> + *
>>> + * Return values:
>>> + *   %0 - Item matches search criteria
>>> + *   %1 - Item does not match search criteria
>>> + */
>>> +static int nfs4_file_obj_cmpfn(struct rhashtable_compare_arg *arg,
>>> +			       const void *ptr)
>>> +{
>>> +	const struct svc_fh *fhp = arg->key;
>>> +	const struct nfs4_file *fi = ptr;
>>> +
>>> +	return fh_match(&fi->fi_fhandle, &fhp->fh_handle) ? 0 : 1;
>>> +}
>>> +
>>> +static const struct rhashtable_params nfs4_file_rhash_params = {
>>> +	.key_len		= sizeof_field(struct nfs4_file, fi_inode),
>>> +	.key_offset		= offsetof(struct nfs4_file, fi_inode),
>>> +	.head_offset		= offsetof(struct nfs4_file, fi_rhash),
>>> +	.hashfn			= nfs4_file_key_hashfn,
>>> +	.obj_hashfn		= nfs4_file_obj_hashfn,
>>> +	.obj_cmpfn		= nfs4_file_obj_cmpfn,
>>> +
>>> +	/* Reduce resizing churn on light workloads */
>>> +	.min_size		= 512,		/* buckets */
>>> +	.automatic_shrinking	= true,
>>> +};
>>> 
>>> /*
>>> * Check if courtesy clients have conflicting access and resolve it if possible
>>> @@ -4251,11 +4317,8 @@ static struct nfs4_file *nfsd4_alloc_file(void)
>>> }
>>> 
>>> /* OPEN Share state helper functions */
>>> -static void nfsd4_init_file(struct svc_fh *fh, unsigned int hashval,
>>> -				struct nfs4_file *fp)
>>> +static void init_nfs4_file(const struct svc_fh *fh, struct nfs4_file *fp)
>>> {
>>> -	lockdep_assert_held(&state_lock);
>>> -
>>> 	refcount_set(&fp->fi_ref, 1);
>>> 	spin_lock_init(&fp->fi_lock);
>>> 	INIT_LIST_HEAD(&fp->fi_stateids);
>>> @@ -4273,7 +4336,6 @@ static void nfsd4_init_file(struct svc_fh *fh, unsigned int hashval,
>>> 	INIT_LIST_HEAD(&fp->fi_lo_states);
>>> 	atomic_set(&fp->fi_lo_recalls, 0);
>>> #endif
>>> -	hlist_add_head_rcu(&fp->fi_hash, &file_hashtbl[hashval]);
>>> }
>>> 
>>> void
>>> @@ -4626,71 +4688,84 @@ move_to_close_lru(struct nfs4_ol_stateid *s, struct net *net)
>>> 		nfs4_put_stid(&last->st_stid);
>>> }
>>> 
>>> -/* search file_hashtbl[] for file */
>>> -static struct nfs4_file *
>>> -find_file_locked(struct svc_fh *fh, unsigned int hashval)
>>> +static struct nfs4_file *find_nfs4_file(const struct svc_fh *fhp)
>>> {
>>> -	struct nfs4_file *fp;
>>> +	struct nfs4_file *fi;
>>> 
>>> -	hlist_for_each_entry_rcu(fp, &file_hashtbl[hashval], fi_hash,
>>> -				lockdep_is_held(&state_lock)) {
>>> -		if (fh_match(&fp->fi_fhandle, &fh->fh_handle)) {
>>> -			if (refcount_inc_not_zero(&fp->fi_ref))
>>> -				return fp;
>>> -		}
>>> -	}
>>> -	return NULL;
>>> +	rcu_read_lock();
>>> +	fi = rhashtable_lookup(&nfs4_file_rhashtbl, fhp,
>>> +			       nfs4_file_rhash_params);
>>> +	if (fi)
>>> +		if (!refcount_inc_not_zero(&fi->fi_ref))
>>> +			fi = NULL;
>>> +	rcu_read_unlock();
>>> +	return fi;
>>> }
>>> 
>>> -static struct nfs4_file *insert_file(struct nfs4_file *new, struct svc_fh *fh,
>>> -				     unsigned int hashval)
>>> +static void check_nfs4_file_aliases_locked(struct nfs4_file *new,
>>> +					   const struct svc_fh *fhp)
>>> {
>>> -	struct nfs4_file *fp;
>>> -	struct nfs4_file *ret = NULL;
>>> -	bool alias_found = false;
>>> +	struct rhashtable *ht = &nfs4_file_rhashtbl;
>>> +	struct rhash_lock_head __rcu *const *bkt;
>>> +	struct rhashtable_compare_arg arg = {
>>> +		.ht	= ht,
>>> +		.key	= fhp,
>>> +	};
>>> +	struct bucket_table *tbl;
>>> +	struct rhash_head *he;
>>> +	unsigned int hash;
>>> 
>>> -	spin_lock(&state_lock);
>>> -	hlist_for_each_entry_rcu(fp, &file_hashtbl[hashval], fi_hash,
>>> -				 lockdep_is_held(&state_lock)) {
>>> -		if (fh_match(&fp->fi_fhandle, &fh->fh_handle)) {
>>> -			if (refcount_inc_not_zero(&fp->fi_ref))
>>> -				ret = fp;
>>> -		} else if (d_inode(fh->fh_dentry) == fp->fi_inode)
>>> -			fp->fi_aliased = alias_found = true;
>>> -	}
>>> -	if (likely(ret == NULL)) {
>>> -		nfsd4_init_file(fh, hashval, new);
>>> -		new->fi_aliased = alias_found;
>>> -		ret = new;
>>> +	/*
>>> +	 * rhashtable guarantees small buckets, thus this loop stays
>>> +	 * efficient.
>>> +	 */
>>> +	rcu_read_lock();
>>> +	tbl = rht_dereference_rcu(ht->tbl, ht);
>>> +	hash = rht_key_hashfn(ht, tbl, fhp, nfs4_file_rhash_params);
>>> +	bkt = rht_bucket(tbl, hash);
>>> +	rht_for_each_rcu_from(he, rht_ptr_rcu(bkt), tbl, hash) {
>>> +		struct nfs4_file *fi;
>>> +
>>> +		fi = rht_obj(ht, he);
>>> +		if (nfs4_file_obj_cmpfn(&arg, fi) == 0)
>>> +			continue;
>>> +		if (d_inode(fhp->fh_dentry) == fi->fi_inode) {
>>> +			fi->fi_aliased = true;
>>> +			new->fi_aliased = true;
>>> +		}
>>> 	}
>>> -	spin_unlock(&state_lock);
>>> -	return ret;
>>> +	rcu_read_unlock();
>>> }
>>> 
>>> -static struct nfs4_file * find_file(struct svc_fh *fh)
>>> +static noinline struct nfs4_file *
>>> +find_or_hash_nfs4_file(struct nfs4_file *new, const struct svc_fh *fhp)
>>> {
>>> -	struct nfs4_file *fp;
>>> -	unsigned int hashval = file_hashval(fh);
>>> +	struct nfs4_file *fi;
>>> 
>>> -	rcu_read_lock();
>>> -	fp = find_file_locked(fh, hashval);
>>> -	rcu_read_unlock();
>>> -	return fp;
>>> -}
>>> +	init_nfs4_file(fhp, new);
>>> 
>>> -static struct nfs4_file *
>>> -find_or_add_file(struct nfs4_file *new, struct svc_fh *fh)
>>> -{
>>> -	struct nfs4_file *fp;
>>> -	unsigned int hashval = file_hashval(fh);
>>> +	fi = rhashtable_lookup_get_insert_key(&nfs4_file_rhashtbl,
>>> +					      fhp, &new->fi_rhash,
>>> +					      nfs4_file_rhash_params);
>>> +	if (!fi) {
>>> +		fi = new;
>>> +		goto check_aliases;
>>> +	}
>>> +	if (IS_ERR(fi))		/* or BUG? */
>>> +		return NULL;
>>> +	if (!refcount_inc_not_zero(&fi->fi_ref))
>>> +		fi = new;
>> 
>> Ah, hrm. Given what we just had to do to nfsd_file_do_acquire(),
>> maybe this needs the same fix to hang onto the RCU read lock
>> while dicking with the nfs4_file object's reference count?
>> 
>> 
> 
> Yes. Probably we should just merge this patch if you want a fix for
> mainline:
> 
>    nfsd: rework hashtable handling in nfsd_do_file_acquire

It's queued up. I intend to submit it before leaving for Westford.

As for the file_hashtbl, I have fixed that up to be consistent
with the approach in nfsd_file_do_acquire(), and will post a
refresh in a moment.


>>> -	rcu_read_lock();
>>> -	fp = find_file_locked(fh, hashval);
>>> -	rcu_read_unlock();
>>> -	if (fp)
>>> -		return fp;
>>> +check_aliases:
>>> +	check_nfs4_file_aliases_locked(fi, fhp);
>>> +
>>> +	return fi;
>>> +}
>>> 
>>> -	return insert_file(new, fh, hashval);
>>> +static void unhash_nfs4_file(struct nfs4_file *fi)
>>> +{
>>> +	rhashtable_remove_fast(&nfs4_file_rhashtbl, &fi->fi_rhash,
>>> +			       nfs4_file_rhash_params);
>>> }
>>> 
>>> /*
>>> @@ -4703,9 +4778,10 @@ nfs4_share_conflict(struct svc_fh *current_fh, unsigned int deny_type)
>>> 	struct nfs4_file *fp;
>>> 	__be32 ret = nfs_ok;
>>> 
>>> -	fp = find_file(current_fh);
>>> +	fp = find_nfs4_file(current_fh);
>>> 	if (!fp)
>>> 		return ret;
>>> +
>>> 	/* Check for conflicting share reservations */
>>> 	spin_lock(&fp->fi_lock);
>>> 	if (fp->fi_share_deny & deny_type)
>>> @@ -5548,7 +5624,9 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf
>>> 	 * and check for delegations in the process of being recalled.
>>> 	 * If not found, create the nfs4_file struct
>>> 	 */
>>> -	fp = find_or_add_file(open->op_file, current_fh);
>>> +	fp = find_or_hash_nfs4_file(open->op_file, current_fh);
>>> +	if (unlikely(!fp))
>>> +		return nfserr_jukebox;
>>> 	if (fp != open->op_file) {
>>> 		status = nfs4_check_deleg(cl, open, &dp);
>>> 		if (status)
>>> @@ -7905,10 +7983,16 @@ nfs4_state_start(void)
>>> {
>>> 	int ret;
>>> 
>>> -	ret = nfsd4_create_callback_queue();
>>> +	ret = rhashtable_init(&nfs4_file_rhashtbl, &nfs4_file_rhash_params);
>>> 	if (ret)
>>> 		return ret;
>>> 
>>> +	ret = nfsd4_create_callback_queue();
>>> +	if (ret) {
>>> +		rhashtable_destroy(&nfs4_file_rhashtbl);
>>> +		return ret;
>>> +	}
>>> +
>>> 	set_max_delegations();
>>> 	return 0;
>>> }
>>> @@ -7939,6 +8023,7 @@ nfs4_state_shutdown_net(struct net *net)
>>> 
>>> 	nfsd4_client_tracking_exit(net);
>>> 	nfs4_state_destroy_net(net);
>>> +	rhashtable_destroy(&nfs4_file_rhashtbl);
>>> #ifdef CONFIG_NFSD_V4_2_INTER_SSC
>>> 	nfsd4_ssc_shutdown_umount(nn);
>>> #endif
>>> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
>>> index ae596dbf8667..879f085bc39e 100644
>>> --- a/fs/nfsd/state.h
>>> +++ b/fs/nfsd/state.h
>>> @@ -536,16 +536,13 @@ struct nfs4_clnt_odstate {
>>> * inode can have multiple filehandles associated with it, so there is
>>> * (potentially) a many to one relationship between this struct and struct
>>> * inode.
>>> - *
>>> - * These are hashed by filehandle in the file_hashtbl, which is protected by
>>> - * the global state_lock spinlock.
>>> */
>>> struct nfs4_file {
>>> 	refcount_t		fi_ref;
>>> 	struct inode *		fi_inode;
>>> 	bool			fi_aliased;
>>> 	spinlock_t		fi_lock;
>>> -	struct hlist_node       fi_hash;	/* hash on fi_fhandle */
>>> +	struct rhash_head	fi_rhash;
>>> 	struct list_head        fi_stateids;
>>> 	union {
>>> 		struct list_head	fi_delegations;
>>> 
>>> 
>> 
>> --
>> Chuck Lever
>> 
>> 
>> 
> 
> -- 
> Jeff Layton <jlayton@poochiereds.net>

--
Chuck Lever




^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-10-06 16:15 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-05 14:55 [PATCH RFC 0/9] A course adjustment, maybe Chuck Lever
2022-10-05 14:55 ` [PATCH RFC 1/9] nfsd: fix nfsd_file_unhash_and_dispose Chuck Lever
2022-10-05 14:55 ` [PATCH RFC 2/9] nfsd: rework hashtable handling in nfsd_do_file_acquire Chuck Lever
2022-10-05 14:55 ` [PATCH RFC 3/9] NFSD: Pass the target nfsd_file to nfsd_commit() Chuck Lever
2022-10-05 14:56 ` [PATCH RFC 4/9] NFSD: Revert "NFSD: NFSv4 CLOSE should release an nfsd_file immediately" Chuck Lever
2022-10-05 14:56 ` [PATCH RFC 5/9] NFSD: Add an NFSD_FILE_GC flag to enable nfsd_file garbage collection Chuck Lever
2022-10-06 15:59   ` Jeff Layton
2022-10-06 16:06     ` Chuck Lever III
2022-10-05 14:56 ` [PATCH RFC 6/9] NFSD: Use const pointers as parameters to fh_ helpers Chuck Lever
2022-10-06 15:59   ` Jeff Layton
2022-10-05 14:56 ` [PATCH RFC 7/9] NFSD: Use rhashtable for managing nfs4_file objects Chuck Lever
2022-10-05 15:11   ` Chuck Lever III
2022-10-06 16:12     ` Jeff Layton
2022-10-06 16:15       ` Chuck Lever III
2022-10-05 14:56 ` [PATCH RFC 8/9] NFSD: Clean up nfs4_preprocess_stateid_op() call sites Chuck Lever
2022-10-06 16:05   ` Jeff Layton
2022-10-05 14:56 ` [PATCH RFC 9/9] NFSD: Trace delegation revocations Chuck Lever
2022-10-06 16:06   ` Jeff Layton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).