All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 00/30] Overhaul NFSD filecache
@ 2022-06-22 14:12 Chuck Lever
  2022-06-22 14:12 ` [PATCH RFC 01/30] NFSD: Report filecache LRU size Chuck Lever
                   ` (31 more replies)
  0 siblings, 32 replies; 51+ messages in thread
From: Chuck Lever @ 2022-06-22 14:12 UTC (permalink / raw)
  To: linux-nfs, netdev; +Cc: david, tgraf, jlayton

This series overhauls the NFSD filecache, a cache of server-side
"struct file" objects recently used by NFS clients. The purposes of
this overhaul are an immediate improvement in cache scalability in
the number of open files, and preparation for further improvements.

There are three categories of patches in this series:

1. Add observability of cache operation so we can see what we're
doing as changes are made to the code.

2. Improve the scalability of filecache garbage collection,
addressing several bugs along the way.

3. Improve the scalability of the filecache hash table by converting
it to use rhashtable.

The series as it stands survives typical test workloads. Running
stress-tests like generic/531 is the next step.

These patches are also available in the linux-nfs-bugzilla-386
branch of

  https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git 

---

Chuck Lever (30):
      NFSD: Report filecache LRU size
      NFSD: Report count of calls to nfsd_file_acquire()
      NFSD: Report count of freed filecache items
      NFSD: Report average age of filecache items
      NFSD: Add nfsd_file_lru_dispose_list() helper
      NFSD: Refactor nfsd_file_gc()
      NFSD: Refactor nfsd_file_lru_scan()
      NFSD: Report the number of items evicted by the LRU walk
      NFSD: Record number of flush calls
      NFSD: Report filecache item construction failures
      NFSD: Zero counters when the filecache is re-initialized
      NFSD: Hook up the filecache stat file
      NFSD: WARN when freeing an item still linked via nf_lru
      NFSD: Trace filecache LRU activity
      NFSD: Leave open files out of the filecache LRU
      NFSD: Fix the filecache LRU shrinker
      NFSD: Never call nfsd_file_gc() in foreground paths
      NFSD: No longer record nf_hashval in the trace log
      NFSD: Remove lockdep assertion from unhash_and_release_locked()
      NFSD: nfsd_file_unhash can compute hashval from nf->nf_inode
      NFSD: Refactor __nfsd_file_close_inode()
      NFSD: nfsd_file_hash_remove can compute hashval
      NFSD: Remove nfsd_file::nf_hashval
      NFSD: Remove stale comment from nfsd_file_acquire()
      NFSD: Clean up "open file" case in nfsd_file_acquire()
      NFSD: Document nfsd_file_cache_purge() API contract
      NFSD: Replace the "init once" mechanism
      NFSD: Set up an rhashtable for the filecache
      NFSD: Convert the filecache to use rhashtable
      NFSD: Clean up unusued code after rhashtable conversion


 fs/nfsd/filecache.c | 677 +++++++++++++++++++++++++++-----------------
 fs/nfsd/filecache.h |   6 +-
 fs/nfsd/nfsctl.c    |  10 +
 fs/nfsd/trace.h     | 117 ++++++--
 4 files changed, 522 insertions(+), 288 deletions(-)

--
Chuck Lever


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH RFC 01/30] NFSD: Report filecache LRU size
  2022-06-22 14:12 [PATCH RFC 00/30] Overhaul NFSD filecache Chuck Lever
@ 2022-06-22 14:12 ` Chuck Lever
  2022-06-22 14:12 ` [PATCH RFC 02/30] NFSD: Report count of calls to nfsd_file_acquire() Chuck Lever
                   ` (30 subsequent siblings)
  31 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2022-06-22 14:12 UTC (permalink / raw)
  To: linux-nfs, netdev; +Cc: david, tgraf, jlayton

Surface the NFSD filecache's LRU list length to help field
troubleshooters monitor filecache issues.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 9cb2d590c036..932db96f854a 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -1089,6 +1089,7 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v)
 
 	seq_printf(m, "total entries: %u\n", count);
 	seq_printf(m, "longest chain: %u\n", longest);
+	seq_printf(m, "lru entries:   %lu\n", list_lru_count(&nfsd_file_lru));
 	seq_printf(m, "cache hits:    %lu\n", hits);
 	return 0;
 }



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH RFC 02/30] NFSD: Report count of calls to nfsd_file_acquire()
  2022-06-22 14:12 [PATCH RFC 00/30] Overhaul NFSD filecache Chuck Lever
  2022-06-22 14:12 ` [PATCH RFC 01/30] NFSD: Report filecache LRU size Chuck Lever
@ 2022-06-22 14:12 ` Chuck Lever
  2022-06-22 14:13 ` [PATCH RFC 03/30] NFSD: Report count of freed filecache items Chuck Lever
                   ` (29 subsequent siblings)
  31 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2022-06-22 14:12 UTC (permalink / raw)
  To: linux-nfs, netdev; +Cc: david, tgraf, jlayton

Count the number of successful acquisitions that did not create a
file (ie, acquisitions that do not result in a compulsory cache
miss). This count can be compared directly with the reported hit
count to compute a hit ratio.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 932db96f854a..128e8934f12a 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -43,6 +43,7 @@ struct nfsd_fcache_bucket {
 };
 
 static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits);
+static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions);
 
 struct nfsd_fcache_disposal {
 	struct work_struct work;
@@ -975,6 +976,8 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	}
 out:
 	if (status == nfs_ok) {
+		if (open)
+			this_cpu_inc(nfsd_file_acquisitions);
 		*pnf = nf;
 	} else {
 		nfsd_file_put(nf);
@@ -1067,8 +1070,8 @@ nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
  */
 static int nfsd_file_cache_stats_show(struct seq_file *m, void *v)
 {
+	unsigned long hits = 0, acquisitions = 0;
 	unsigned int i, count = 0, longest = 0;
-	unsigned long hits = 0;
 
 	/*
 	 * No need for spinlocks here since we're not terribly interested in
@@ -1084,13 +1087,16 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v)
 	}
 	mutex_unlock(&nfsd_mutex);
 
-	for_each_possible_cpu(i)
+	for_each_possible_cpu(i) {
 		hits += per_cpu(nfsd_file_cache_hits, i);
+		acquisitions += per_cpu(nfsd_file_acquisitions, i);
+	}
 
 	seq_printf(m, "total entries: %u\n", count);
 	seq_printf(m, "longest chain: %u\n", longest);
 	seq_printf(m, "lru entries:   %lu\n", list_lru_count(&nfsd_file_lru));
 	seq_printf(m, "cache hits:    %lu\n", hits);
+	seq_printf(m, "acquisitions:  %lu\n", acquisitions);
 	return 0;
 }
 



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH RFC 03/30] NFSD: Report count of freed filecache items
  2022-06-22 14:12 [PATCH RFC 00/30] Overhaul NFSD filecache Chuck Lever
  2022-06-22 14:12 ` [PATCH RFC 01/30] NFSD: Report filecache LRU size Chuck Lever
  2022-06-22 14:12 ` [PATCH RFC 02/30] NFSD: Report count of calls to nfsd_file_acquire() Chuck Lever
@ 2022-06-22 14:13 ` Chuck Lever
  2022-06-22 14:13 ` [PATCH RFC 04/30] NFSD: Report average age of " Chuck Lever
                   ` (28 subsequent siblings)
  31 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2022-06-22 14:13 UTC (permalink / raw)
  To: linux-nfs, netdev; +Cc: david, tgraf, jlayton

Surface the count of freed  nfsd_file items.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 128e8934f12a..f735f91e576b 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -44,6 +44,7 @@ struct nfsd_fcache_bucket {
 
 static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits);
 static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions);
+static DEFINE_PER_CPU(unsigned long, nfsd_file_releases);
 
 struct nfsd_fcache_disposal {
 	struct work_struct work;
@@ -202,6 +203,8 @@ nfsd_file_free(struct nfsd_file *nf)
 {
 	bool flush = false;
 
+	this_cpu_inc(nfsd_file_releases);
+
 	trace_nfsd_file_put_final(nf);
 	if (nf->nf_mark)
 		nfsd_file_mark_put(nf->nf_mark);
@@ -1070,7 +1073,7 @@ nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
  */
 static int nfsd_file_cache_stats_show(struct seq_file *m, void *v)
 {
-	unsigned long hits = 0, acquisitions = 0;
+	unsigned long hits = 0, acquisitions = 0, releases = 0;
 	unsigned int i, count = 0, longest = 0;
 
 	/*
@@ -1090,6 +1093,7 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v)
 	for_each_possible_cpu(i) {
 		hits += per_cpu(nfsd_file_cache_hits, i);
 		acquisitions += per_cpu(nfsd_file_acquisitions, i);
+		releases += per_cpu(nfsd_file_releases, i);
 	}
 
 	seq_printf(m, "total entries: %u\n", count);
@@ -1097,6 +1101,7 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v)
 	seq_printf(m, "lru entries:   %lu\n", list_lru_count(&nfsd_file_lru));
 	seq_printf(m, "cache hits:    %lu\n", hits);
 	seq_printf(m, "acquisitions:  %lu\n", acquisitions);
+	seq_printf(m, "releases:      %lu\n", releases);
 	return 0;
 }
 



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH RFC 04/30] NFSD: Report average age of filecache items
  2022-06-22 14:12 [PATCH RFC 00/30] Overhaul NFSD filecache Chuck Lever
                   ` (2 preceding siblings ...)
  2022-06-22 14:13 ` [PATCH RFC 03/30] NFSD: Report count of freed filecache items Chuck Lever
@ 2022-06-22 14:13 ` Chuck Lever
  2022-06-22 14:13 ` [PATCH RFC 05/30] NFSD: Add nfsd_file_lru_dispose_list() helper Chuck Lever
                   ` (27 subsequent siblings)
  31 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2022-06-22 14:13 UTC (permalink / raw)
  To: linux-nfs, netdev; +Cc: david, tgraf, jlayton

This is a measure of how long items stay in the filecache, to help
assess how efficient the cache is.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c |    9 +++++++++
 fs/nfsd/filecache.h |    1 +
 2 files changed, 10 insertions(+)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index f735f91e576b..6f48528c6284 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -61,6 +61,7 @@ static struct list_lru			nfsd_file_lru;
 static long				nfsd_file_lru_flags;
 static struct fsnotify_group		*nfsd_file_fsnotify_group;
 static atomic_long_t			nfsd_filecache_count;
+static atomic_long_t			nfsd_file_total_age;
 static struct delayed_work		nfsd_filecache_laundrette;
 
 static void nfsd_file_gc(void);
@@ -178,6 +179,7 @@ nfsd_file_alloc(struct inode *inode, unsigned int may, unsigned int hashval,
 	if (nf) {
 		INIT_HLIST_NODE(&nf->nf_node);
 		INIT_LIST_HEAD(&nf->nf_lru);
+		nf->nf_birthtime = ktime_get();
 		nf->nf_file = NULL;
 		nf->nf_cred = get_current_cred();
 		nf->nf_net = net;
@@ -201,9 +203,11 @@ nfsd_file_alloc(struct inode *inode, unsigned int may, unsigned int hashval,
 static bool
 nfsd_file_free(struct nfsd_file *nf)
 {
+	s64 age = ktime_to_ms(ktime_sub(ktime_get(), nf->nf_birthtime));
 	bool flush = false;
 
 	this_cpu_inc(nfsd_file_releases);
+	atomic_long_add(age, &nfsd_file_total_age);
 
 	trace_nfsd_file_put_final(nf);
 	if (nf->nf_mark)
@@ -1102,6 +1106,11 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v)
 	seq_printf(m, "cache hits:    %lu\n", hits);
 	seq_printf(m, "acquisitions:  %lu\n", acquisitions);
 	seq_printf(m, "releases:      %lu\n", releases);
+	if (releases)
+		seq_printf(m, "mean age (ms): %ld\n",
+			atomic_long_read(&nfsd_file_total_age) / releases);
+	else
+		seq_printf(m, "mean age (ms): -\n");
 	return 0;
 }
 
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index 1da0c79a5580..d0c42619dc10 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -46,6 +46,7 @@ struct nfsd_file {
 	refcount_t		nf_ref;
 	unsigned char		nf_may;
 	struct nfsd_file_mark	*nf_mark;
+	ktime_t			nf_birthtime;
 };
 
 int nfsd_file_cache_init(void);



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH RFC 05/30] NFSD: Add nfsd_file_lru_dispose_list() helper
  2022-06-22 14:12 [PATCH RFC 00/30] Overhaul NFSD filecache Chuck Lever
                   ` (3 preceding siblings ...)
  2022-06-22 14:13 ` [PATCH RFC 04/30] NFSD: Report average age of " Chuck Lever
@ 2022-06-22 14:13 ` Chuck Lever
  2022-06-22 14:13 ` [PATCH RFC 06/30] NFSD: Refactor nfsd_file_gc() Chuck Lever
                   ` (26 subsequent siblings)
  31 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2022-06-22 14:13 UTC (permalink / raw)
  To: linux-nfs, netdev; +Cc: david, tgraf, jlayton

Refactor the invariant part of nfsd_file_lru_walk_list() into a
separate helper function.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c |   29 ++++++++++++++++++++++-------
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 6f48528c6284..763a08196dcd 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -457,11 +457,31 @@ nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru,
 	return LRU_SKIP;
 }
 
+/*
+ * Unhash items on @dispose immediately, then queue them on the
+ * disposal workqueue to finish releasing them in the background.
+ *
+ * cel: Note that between the time list_lru_shrink_walk runs and
+ * now, these items are in the hash table but marked unhashed.
+ * Why release these outside of lru_cb ? There's no lock ordering
+ * problem since lru_cb currently takes no lock.
+ */
+static void nfsd_file_gc_dispose_list(struct list_head *dispose)
+{
+	struct nfsd_file *nf;
+
+	list_for_each_entry(nf, dispose, nf_lru) {
+		spin_lock(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock);
+		nfsd_file_do_unhash(nf);
+		spin_unlock(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock);
+	}
+	nfsd_file_dispose_list_delayed(dispose);
+}
+
 static unsigned long
 nfsd_file_lru_walk_list(struct shrink_control *sc)
 {
 	LIST_HEAD(head);
-	struct nfsd_file *nf;
 	unsigned long ret;
 
 	if (sc)
@@ -471,12 +491,7 @@ nfsd_file_lru_walk_list(struct shrink_control *sc)
 		ret = list_lru_walk(&nfsd_file_lru,
 				nfsd_file_lru_cb,
 				&head, LONG_MAX);
-	list_for_each_entry(nf, &head, nf_lru) {
-		spin_lock(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock);
-		nfsd_file_do_unhash(nf);
-		spin_unlock(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock);
-	}
-	nfsd_file_dispose_list_delayed(&head);
+	nfsd_file_gc_dispose_list(&head);
 	return ret;
 }
 



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH RFC 06/30] NFSD: Refactor nfsd_file_gc()
  2022-06-22 14:12 [PATCH RFC 00/30] Overhaul NFSD filecache Chuck Lever
                   ` (4 preceding siblings ...)
  2022-06-22 14:13 ` [PATCH RFC 05/30] NFSD: Add nfsd_file_lru_dispose_list() helper Chuck Lever
@ 2022-06-22 14:13 ` Chuck Lever
  2022-06-22 14:13 ` [PATCH RFC 07/30] NFSD: Refactor nfsd_file_lru_scan() Chuck Lever
                   ` (25 subsequent siblings)
  31 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2022-06-22 14:13 UTC (permalink / raw)
  To: linux-nfs, netdev; +Cc: david, tgraf, jlayton

Refactor nfsd_file_gc() to use the new list_lru helper.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 763a08196dcd..930f1448173f 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -498,7 +498,11 @@ nfsd_file_lru_walk_list(struct shrink_control *sc)
 static void
 nfsd_file_gc(void)
 {
-	nfsd_file_lru_walk_list(NULL);
+	LIST_HEAD(dispose);
+
+	list_lru_walk(&nfsd_file_lru, nfsd_file_lru_cb,
+		      &dispose, LONG_MAX);
+	nfsd_file_gc_dispose_list(&dispose);
 }
 
 static void



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH RFC 07/30] NFSD: Refactor nfsd_file_lru_scan()
  2022-06-22 14:12 [PATCH RFC 00/30] Overhaul NFSD filecache Chuck Lever
                   ` (5 preceding siblings ...)
  2022-06-22 14:13 ` [PATCH RFC 06/30] NFSD: Refactor nfsd_file_gc() Chuck Lever
@ 2022-06-22 14:13 ` Chuck Lever
  2022-06-22 14:13 ` [PATCH RFC 08/30] NFSD: Report the number of items evicted by the LRU walk Chuck Lever
                   ` (24 subsequent siblings)
  31 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2022-06-22 14:13 UTC (permalink / raw)
  To: linux-nfs, netdev; +Cc: david, tgraf, jlayton

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c |   25 +++++++------------------
 1 file changed, 7 insertions(+), 18 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 930f1448173f..b1e7588d578a 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -478,23 +478,6 @@ static void nfsd_file_gc_dispose_list(struct list_head *dispose)
 	nfsd_file_dispose_list_delayed(dispose);
 }
 
-static unsigned long
-nfsd_file_lru_walk_list(struct shrink_control *sc)
-{
-	LIST_HEAD(head);
-	unsigned long ret;
-
-	if (sc)
-		ret = list_lru_shrink_walk(&nfsd_file_lru, sc,
-				nfsd_file_lru_cb, &head);
-	else
-		ret = list_lru_walk(&nfsd_file_lru,
-				nfsd_file_lru_cb,
-				&head, LONG_MAX);
-	nfsd_file_gc_dispose_list(&head);
-	return ret;
-}
-
 static void
 nfsd_file_gc(void)
 {
@@ -521,7 +504,13 @@ nfsd_file_lru_count(struct shrinker *s, struct shrink_control *sc)
 static unsigned long
 nfsd_file_lru_scan(struct shrinker *s, struct shrink_control *sc)
 {
-	return nfsd_file_lru_walk_list(sc);
+	LIST_HEAD(dispose);
+	unsigned long ret;
+
+	ret = list_lru_shrink_walk(&nfsd_file_lru, sc,
+				   nfsd_file_lru_cb, &dispose);
+	nfsd_file_gc_dispose_list(&dispose);
+	return ret;
 }
 
 static struct shrinker	nfsd_file_shrinker = {



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH RFC 08/30] NFSD: Report the number of items evicted by the LRU walk
  2022-06-22 14:12 [PATCH RFC 00/30] Overhaul NFSD filecache Chuck Lever
                   ` (6 preceding siblings ...)
  2022-06-22 14:13 ` [PATCH RFC 07/30] NFSD: Refactor nfsd_file_lru_scan() Chuck Lever
@ 2022-06-22 14:13 ` Chuck Lever
  2022-06-22 14:13 ` [PATCH RFC 09/30] NFSD: Record number of flush calls Chuck Lever
                   ` (23 subsequent siblings)
  31 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2022-06-22 14:13 UTC (permalink / raw)
  To: linux-nfs, netdev; +Cc: david, tgraf, jlayton

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c |   14 +++++++++++---
 fs/nfsd/trace.h     |   29 +++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index b1e7588d578a..d597acfdab28 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -45,6 +45,7 @@ struct nfsd_fcache_bucket {
 static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits);
 static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions);
 static DEFINE_PER_CPU(unsigned long, nfsd_file_releases);
+static DEFINE_PER_CPU(unsigned long, nfsd_file_evictions);
 
 struct nfsd_fcache_disposal {
 	struct work_struct work;
@@ -482,9 +483,12 @@ static void
 nfsd_file_gc(void)
 {
 	LIST_HEAD(dispose);
+	unsigned long ret;
 
-	list_lru_walk(&nfsd_file_lru, nfsd_file_lru_cb,
-		      &dispose, LONG_MAX);
+	ret = list_lru_walk(&nfsd_file_lru, nfsd_file_lru_cb,
+			    &dispose, LONG_MAX);
+	this_cpu_add(nfsd_file_evictions, ret);
+	trace_nfsd_file_gc_evicted(ret, list_lru_count(&nfsd_file_lru));
 	nfsd_file_gc_dispose_list(&dispose);
 }
 
@@ -509,6 +513,8 @@ nfsd_file_lru_scan(struct shrinker *s, struct shrink_control *sc)
 
 	ret = list_lru_shrink_walk(&nfsd_file_lru, sc,
 				   nfsd_file_lru_cb, &dispose);
+	this_cpu_add(nfsd_file_evictions, ret);
+	trace_nfsd_file_shrinker_evicted(ret, list_lru_count(&nfsd_file_lru));
 	nfsd_file_gc_dispose_list(&dispose);
 	return ret;
 }
@@ -1085,7 +1091,7 @@ nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
  */
 static int nfsd_file_cache_stats_show(struct seq_file *m, void *v)
 {
-	unsigned long hits = 0, acquisitions = 0, releases = 0;
+	unsigned long hits = 0, acquisitions = 0, releases = 0, evictions = 0;
 	unsigned int i, count = 0, longest = 0;
 
 	/*
@@ -1106,6 +1112,7 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v)
 		hits += per_cpu(nfsd_file_cache_hits, i);
 		acquisitions += per_cpu(nfsd_file_acquisitions, i);
 		releases += per_cpu(nfsd_file_releases, i);
+		evictions += per_cpu(nfsd_file_evictions, i);
 	}
 
 	seq_printf(m, "total entries: %u\n", count);
@@ -1114,6 +1121,7 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v)
 	seq_printf(m, "cache hits:    %lu\n", hits);
 	seq_printf(m, "acquisitions:  %lu\n", acquisitions);
 	seq_printf(m, "releases:      %lu\n", releases);
+	seq_printf(m, "evictions:     %lu\n", evictions);
 	if (releases)
 		seq_printf(m, "mean age (ms): %ld\n",
 			atomic_long_read(&nfsd_file_total_age) / releases);
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index a60ead3b227a..c055c6361bd5 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -851,6 +851,35 @@ TRACE_EVENT(nfsd_file_fsnotify_handle_event,
 			__entry->nlink, __entry->mode, __entry->mask)
 );
 
+DECLARE_EVENT_CLASS(nfsd_file_lruwalk_class,
+	TP_PROTO(
+		unsigned long evicted,
+		unsigned long remaining
+	),
+	TP_ARGS(evicted, remaining),
+	TP_STRUCT__entry(
+		__field(unsigned long, evicted)
+		__field(unsigned long, remaining)
+	),
+	TP_fast_assign(
+		__entry->evicted = evicted;
+		__entry->remaining = remaining;
+	),
+	TP_printk("%lu entries evicted, %lu remaining",
+		__entry->evicted, __entry->remaining)
+);
+
+#define DEFINE_NFSD_FILE_LRUWALK_EVENT(name)				\
+DEFINE_EVENT(nfsd_file_lruwalk_class, name,				\
+	TP_PROTO(							\
+		unsigned long evicted,					\
+		unsigned long remaining					\
+	),								\
+	TP_ARGS(evicted, remaining))
+
+DEFINE_NFSD_FILE_LRUWALK_EVENT(nfsd_file_gc_evicted);
+DEFINE_NFSD_FILE_LRUWALK_EVENT(nfsd_file_shrinker_evicted);
+
 #include "cache.h"
 
 TRACE_DEFINE_ENUM(RC_DROPIT);



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH RFC 09/30] NFSD: Record number of flush calls
  2022-06-22 14:12 [PATCH RFC 00/30] Overhaul NFSD filecache Chuck Lever
                   ` (7 preceding siblings ...)
  2022-06-22 14:13 ` [PATCH RFC 08/30] NFSD: Report the number of items evicted by the LRU walk Chuck Lever
@ 2022-06-22 14:13 ` Chuck Lever
  2022-06-22 14:13 ` [PATCH RFC 10/30] NFSD: Report filecache item construction failures Chuck Lever
                   ` (22 subsequent siblings)
  31 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2022-06-22 14:13 UTC (permalink / raw)
  To: linux-nfs, netdev; +Cc: david, tgraf, jlayton

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c |   13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index d597acfdab28..cae7fa2343c1 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -45,6 +45,7 @@ struct nfsd_fcache_bucket {
 static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits);
 static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions);
 static DEFINE_PER_CPU(unsigned long, nfsd_file_releases);
+static DEFINE_PER_CPU(unsigned long, nfsd_file_pages_flushed);
 static DEFINE_PER_CPU(unsigned long, nfsd_file_evictions);
 
 struct nfsd_fcache_disposal {
@@ -249,7 +250,12 @@ nfsd_file_check_write_error(struct nfsd_file *nf)
 static void
 nfsd_file_flush(struct nfsd_file *nf)
 {
-	if (nf->nf_file && vfs_fsync(nf->nf_file, 1) != 0)
+	struct file *file = nf->nf_file;
+
+	if (!file || !(file->f_mode & FMODE_WRITE))
+		return;
+	this_cpu_add(nfsd_file_pages_flushed, file->f_mapping->nrpages);
+	if (vfs_fsync(file, 1) != 0)
 		nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id));
 }
 
@@ -1091,7 +1097,8 @@ nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
  */
 static int nfsd_file_cache_stats_show(struct seq_file *m, void *v)
 {
-	unsigned long hits = 0, acquisitions = 0, releases = 0, evictions = 0;
+	unsigned long releases = 0, pages_flushed = 0, evictions = 0;
+	unsigned long hits = 0, acquisitions = 0;
 	unsigned int i, count = 0, longest = 0;
 
 	/*
@@ -1113,6 +1120,7 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v)
 		acquisitions += per_cpu(nfsd_file_acquisitions, i);
 		releases += per_cpu(nfsd_file_releases, i);
 		evictions += per_cpu(nfsd_file_evictions, i);
+		pages_flushed += per_cpu(nfsd_file_pages_flushed, i);
 	}
 
 	seq_printf(m, "total entries: %u\n", count);
@@ -1127,6 +1135,7 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v)
 			atomic_long_read(&nfsd_file_total_age) / releases);
 	else
 		seq_printf(m, "mean age (ms): -\n");
+	seq_printf(m, "pages flushed: %lu\n", pages_flushed);
 	return 0;
 }
 



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH RFC 10/30] NFSD: Report filecache item construction failures
  2022-06-22 14:12 [PATCH RFC 00/30] Overhaul NFSD filecache Chuck Lever
                   ` (8 preceding siblings ...)
  2022-06-22 14:13 ` [PATCH RFC 09/30] NFSD: Record number of flush calls Chuck Lever
@ 2022-06-22 14:13 ` Chuck Lever
  2022-06-22 14:13 ` [PATCH RFC 11/30] NFSD: Zero counters when the filecache is re-initialized Chuck Lever
                   ` (21 subsequent siblings)
  31 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2022-06-22 14:13 UTC (permalink / raw)
  To: linux-nfs, netdev; +Cc: david, tgraf, jlayton

My guess is this is exceptionally rare, but it's worth reporting
to see how nfsd_file_acquire() behaves when the cache is full.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index cae7fa2343c1..a2a78163bf8d 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -47,6 +47,7 @@ static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions);
 static DEFINE_PER_CPU(unsigned long, nfsd_file_releases);
 static DEFINE_PER_CPU(unsigned long, nfsd_file_pages_flushed);
 static DEFINE_PER_CPU(unsigned long, nfsd_file_evictions);
+static DEFINE_PER_CPU(unsigned long, nfsd_file_cons_fails);
 
 struct nfsd_fcache_disposal {
 	struct work_struct work;
@@ -975,6 +976,7 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 
 	/* Did construction of this file fail? */
 	if (!test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
+		this_cpu_inc(nfsd_file_cons_fails);
 		if (!retry) {
 			status = nfserr_jukebox;
 			goto out;
@@ -1098,7 +1100,7 @@ nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
 static int nfsd_file_cache_stats_show(struct seq_file *m, void *v)
 {
 	unsigned long releases = 0, pages_flushed = 0, evictions = 0;
-	unsigned long hits = 0, acquisitions = 0;
+	unsigned long hits = 0, acquisitions = 0, cons_fails = 0;
 	unsigned int i, count = 0, longest = 0;
 
 	/*
@@ -1121,6 +1123,7 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v)
 		releases += per_cpu(nfsd_file_releases, i);
 		evictions += per_cpu(nfsd_file_evictions, i);
 		pages_flushed += per_cpu(nfsd_file_pages_flushed, i);
+		cons_fails += per_cpu(nfsd_file_cons_fails, i);
 	}
 
 	seq_printf(m, "total entries: %u\n", count);
@@ -1136,6 +1139,7 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v)
 	else
 		seq_printf(m, "mean age (ms): -\n");
 	seq_printf(m, "pages flushed: %lu\n", pages_flushed);
+	seq_printf(m, "cons fails:    %lu\n", cons_fails);
 	return 0;
 }
 



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH RFC 11/30] NFSD: Zero counters when the filecache is re-initialized
  2022-06-22 14:12 [PATCH RFC 00/30] Overhaul NFSD filecache Chuck Lever
                   ` (9 preceding siblings ...)
  2022-06-22 14:13 ` [PATCH RFC 10/30] NFSD: Report filecache item construction failures Chuck Lever
@ 2022-06-22 14:13 ` Chuck Lever
  2022-06-22 14:14 ` [PATCH RFC 12/30] NFSD: Hook up the filecache stat file Chuck Lever
                   ` (20 subsequent siblings)
  31 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2022-06-22 14:13 UTC (permalink / raw)
  To: linux-nfs, netdev; +Cc: david, tgraf, jlayton

If nfsd_file_cache_init() is called after a shutdown, be sure the
stat counters are reset.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c |   11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index a2a78163bf8d..0cf2e44e874f 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -832,6 +832,8 @@ nfsd_file_cache_shutdown_net(struct net *net)
 void
 nfsd_file_cache_shutdown(void)
 {
+	int i;
+
 	set_bit(NFSD_FILE_SHUTDOWN, &nfsd_file_lru_flags);
 
 	lease_unregister_notifier(&nfsd_file_lease_notifier);
@@ -855,6 +857,15 @@ nfsd_file_cache_shutdown(void)
 	nfsd_file_hashtbl = NULL;
 	destroy_workqueue(nfsd_filecache_wq);
 	nfsd_filecache_wq = NULL;
+
+	for_each_possible_cpu(i) {
+		this_cpu_write(nfsd_file_cache_hits, 0);
+		this_cpu_write(nfsd_file_acquisitions, 0);
+		this_cpu_write(nfsd_file_releases, 0);
+		this_cpu_write(nfsd_file_evictions, 0);
+		this_cpu_write(nfsd_file_pages_flushed, 0);
+		this_cpu_write(nfsd_file_cons_fails, 0);
+	}
 }
 
 static bool



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH RFC 12/30] NFSD: Hook up the filecache stat file
  2022-06-22 14:12 [PATCH RFC 00/30] Overhaul NFSD filecache Chuck Lever
                   ` (10 preceding siblings ...)
  2022-06-22 14:13 ` [PATCH RFC 11/30] NFSD: Zero counters when the filecache is re-initialized Chuck Lever
@ 2022-06-22 14:14 ` Chuck Lever
  2022-06-22 14:14 ` [PATCH RFC 13/30] NFSD: WARN when freeing an item still linked via nf_lru Chuck Lever
                   ` (19 subsequent siblings)
  31 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2022-06-22 14:14 UTC (permalink / raw)
  To: linux-nfs, netdev; +Cc: david, tgraf, jlayton

There has always been the capability of exporting filecache metrics
via /proc, but it was never hooked up. Let's surface these metrics
to enable better observability of the filecache.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/nfsctl.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 0621c2faf242..631bf8422c0f 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -25,6 +25,7 @@
 #include "state.h"
 #include "netns.h"
 #include "pnfs.h"
+#include "filecache.h"
 
 /*
  *	We have a single directory with several nodes in it.
@@ -46,6 +47,7 @@ enum {
 	NFSD_MaxBlkSize,
 	NFSD_MaxConnections,
 	NFSD_SupportedEnctypes,
+	NFSD_Filecache,
 	/*
 	 * The below MUST come last.  Otherwise we leave a hole in nfsd_files[]
 	 * with !CONFIG_NFSD_V4 and simple_fill_super() goes oops
@@ -229,6 +231,13 @@ static const struct file_operations reply_cache_stats_operations = {
 	.release	= single_release,
 };
 
+static const struct file_operations filecache_ops = {
+	.open		= nfsd_file_cache_stats_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
 /*----------------------------------------------------------------------------*/
 /*
  * payload - write methods
@@ -1371,6 +1380,7 @@ static int nfsd_fill_super(struct super_block *sb, struct fs_context *fc)
 		[NFSD_Ports] = {"portlist", &transaction_ops, S_IWUSR|S_IRUGO},
 		[NFSD_MaxBlkSize] = {"max_block_size", &transaction_ops, S_IWUSR|S_IRUGO},
 		[NFSD_MaxConnections] = {"max_connections", &transaction_ops, S_IWUSR|S_IRUGO},
+		[NFSD_Filecache] = {"filecache", &filecache_ops, S_IRUGO},
 #if defined(CONFIG_SUNRPC_GSS) || defined(CONFIG_SUNRPC_GSS_MODULE)
 		[NFSD_SupportedEnctypes] = {"supported_krb5_enctypes", &supported_enctypes_ops, S_IRUGO},
 #endif /* CONFIG_SUNRPC_GSS or CONFIG_SUNRPC_GSS_MODULE */



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH RFC 13/30] NFSD: WARN when freeing an item still linked via nf_lru
  2022-06-22 14:12 [PATCH RFC 00/30] Overhaul NFSD filecache Chuck Lever
                   ` (11 preceding siblings ...)
  2022-06-22 14:14 ` [PATCH RFC 12/30] NFSD: Hook up the filecache stat file Chuck Lever
@ 2022-06-22 14:14 ` Chuck Lever
  2022-06-22 14:14 ` [PATCH RFC 14/30] NFSD: Trace filecache LRU activity Chuck Lever
                   ` (18 subsequent siblings)
  31 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2022-06-22 14:14 UTC (permalink / raw)
  To: linux-nfs, netdev; +Cc: david, tgraf, jlayton

Add a guardrail to prevent freeing memory that is still on a list.
This includes either a dispose list or the LRU list.

This is the sign of a bug, but this class of bugs can be detected
so that they don't endanger system stability, especially while
debugging.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c |   12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 0cf2e44e874f..6bb37d3abbaa 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -221,6 +221,14 @@ nfsd_file_free(struct nfsd_file *nf)
 		fput(nf->nf_file);
 		flush = true;
 	}
+
+	/*
+	 * If this item is still linked via nf_lru, that's a bug.
+	 * WARN and leak it to preserve system stability.
+	 */
+	if (WARN_ON_ONCE(!list_empty(&nf->nf_lru)))
+		return flush;
+
 	call_rcu(&nf->nf_rcu, nfsd_file_slab_free);
 	return flush;
 }
@@ -350,7 +358,7 @@ nfsd_file_dispose_list(struct list_head *dispose)
 
 	while(!list_empty(dispose)) {
 		nf = list_first_entry(dispose, struct nfsd_file, nf_lru);
-		list_del(&nf->nf_lru);
+		list_del_init(&nf->nf_lru);
 		nfsd_file_flush(nf);
 		nfsd_file_put_noref(nf);
 	}
@@ -364,7 +372,7 @@ nfsd_file_dispose_list_sync(struct list_head *dispose)
 
 	while(!list_empty(dispose)) {
 		nf = list_first_entry(dispose, struct nfsd_file, nf_lru);
-		list_del(&nf->nf_lru);
+		list_del_init(&nf->nf_lru);
 		nfsd_file_flush(nf);
 		if (!refcount_dec_and_test(&nf->nf_ref))
 			continue;



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH RFC 14/30] NFSD: Trace filecache LRU activity
  2022-06-22 14:12 [PATCH RFC 00/30] Overhaul NFSD filecache Chuck Lever
                   ` (12 preceding siblings ...)
  2022-06-22 14:14 ` [PATCH RFC 13/30] NFSD: WARN when freeing an item still linked via nf_lru Chuck Lever
@ 2022-06-22 14:14 ` Chuck Lever
  2022-06-22 14:14 ` [PATCH RFC 15/30] NFSD: Leave open files out of the filecache LRU Chuck Lever
                   ` (17 subsequent siblings)
  31 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2022-06-22 14:14 UTC (permalink / raw)
  To: linux-nfs, netdev; +Cc: david, tgraf, jlayton

Observe the operation of garbage collection and the lifetime of
filecache items.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c |   44 +++++++++++++++++++++++++++++++-------------
 fs/nfsd/trace.h     |   39 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 70 insertions(+), 13 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 6bb37d3abbaa..1f65065cd325 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -268,6 +268,18 @@ nfsd_file_flush(struct nfsd_file *nf)
 		nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id));
 }
 
+static void nfsd_file_lru_add(struct nfsd_file *nf)
+{
+	if (list_lru_add(&nfsd_file_lru, &nf->nf_lru))
+		trace_nfsd_file_lru_add(nf);
+}
+
+static void nfsd_file_lru_remove(struct nfsd_file *nf)
+{
+	if (list_lru_del(&nfsd_file_lru, &nf->nf_lru))
+		trace_nfsd_file_lru_del(nf);
+}
+
 static void
 nfsd_file_do_unhash(struct nfsd_file *nf)
 {
@@ -287,8 +299,7 @@ nfsd_file_unhash(struct nfsd_file *nf)
 {
 	if (test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
 		nfsd_file_do_unhash(nf);
-		if (!list_empty(&nf->nf_lru))
-			list_lru_del(&nfsd_file_lru, &nf->nf_lru);
+		nfsd_file_lru_remove(nf);
 		return true;
 	}
 	return false;
@@ -451,26 +462,33 @@ nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru,
 	 * counter. Here we check the counter and then test and clear the flag.
 	 * That order is deliberate to ensure that we can do this locklessly.
 	 */
-	if (refcount_read(&nf->nf_ref) > 1)
-		goto out_skip;
+	if (refcount_read(&nf->nf_ref) > 1) {
+		trace_nfsd_file_gc_in_use(nf);
+		return LRU_SKIP;
+	}
 
 	/*
 	 * Don't throw out files that are still undergoing I/O or
 	 * that have uncleared errors pending.
 	 */
-	if (nfsd_file_check_writeback(nf))
-		goto out_skip;
+	if (nfsd_file_check_writeback(nf)) {
+		trace_nfsd_file_gc_writeback(nf);
+		return LRU_SKIP;
+	}
 
-	if (test_and_clear_bit(NFSD_FILE_REFERENCED, &nf->nf_flags))
-		goto out_skip;
+	if (test_and_clear_bit(NFSD_FILE_REFERENCED, &nf->nf_flags)) {
+		trace_nfsd_file_gc_referenced(nf);
+		return LRU_SKIP;
+	}
 
-	if (!test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags))
-		goto out_skip;
+	if (!test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
+		trace_nfsd_file_gc_hashed(nf);
+		return LRU_SKIP;
+	}
 
 	list_lru_isolate_move(lru, &nf->nf_lru, head);
+	trace_nfsd_file_gc_disposed(nf);
 	return LRU_REMOVED;
-out_skip:
-	return LRU_SKIP;
 }
 
 /*
@@ -1040,7 +1058,7 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	refcount_inc(&nf->nf_ref);
 	__set_bit(NFSD_FILE_HASHED, &nf->nf_flags);
 	__set_bit(NFSD_FILE_PENDING, &nf->nf_flags);
-	list_lru_add(&nfsd_file_lru, &nf->nf_lru);
+	nfsd_file_lru_add(nf);
 	hlist_add_head_rcu(&nf->nf_node, &nfsd_file_hashtbl[hashval].nfb_head);
 	++nfsd_file_hashtbl[hashval].nfb_count;
 	nfsd_file_hashtbl[hashval].nfb_maxcount = max(nfsd_file_hashtbl[hashval].nfb_maxcount,
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index c055c6361bd5..e56fe3dfa44c 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -851,6 +851,45 @@ TRACE_EVENT(nfsd_file_fsnotify_handle_event,
 			__entry->nlink, __entry->mode, __entry->mask)
 );
 
+DECLARE_EVENT_CLASS(nfsd_file_gc_class,
+	TP_PROTO(
+		const struct nfsd_file *nf
+	),
+	TP_ARGS(nf),
+	TP_STRUCT__entry(
+		__field(void *, nf_inode)
+		__field(void *, nf_file)
+		__field(int, nf_ref)
+		__field(unsigned long, nf_flags)
+	),
+	TP_fast_assign(
+		__entry->nf_inode = nf->nf_inode;
+		__entry->nf_file = nf->nf_file;
+		__entry->nf_ref = refcount_read(&nf->nf_ref);
+		__entry->nf_flags = nf->nf_flags;
+	),
+	TP_printk("inode=%p ref=%d nf_flags=%s nf_file=%p",
+		__entry->nf_inode, __entry->nf_ref,
+		show_nf_flags(__entry->nf_flags),
+		__entry->nf_file
+	)
+);
+
+#define DEFINE_NFSD_FILE_GC_EVENT(name)					\
+DEFINE_EVENT(nfsd_file_gc_class, name,					\
+	TP_PROTO(							\
+		const struct nfsd_file *nf				\
+	),								\
+	TP_ARGS(nf))
+
+DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_add);
+DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_del);
+DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_in_use);
+DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_writeback);
+DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_referenced);
+DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_hashed);
+DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_disposed);
+
 DECLARE_EVENT_CLASS(nfsd_file_lruwalk_class,
 	TP_PROTO(
 		unsigned long evicted,



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH RFC 15/30] NFSD: Leave open files out of the filecache LRU
  2022-06-22 14:12 [PATCH RFC 00/30] Overhaul NFSD filecache Chuck Lever
                   ` (13 preceding siblings ...)
  2022-06-22 14:14 ` [PATCH RFC 14/30] NFSD: Trace filecache LRU activity Chuck Lever
@ 2022-06-22 14:14 ` Chuck Lever
  2022-06-22 14:14 ` [PATCH RFC 16/30] NFSD: Fix the filecache LRU shrinker Chuck Lever
                   ` (16 subsequent siblings)
  31 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2022-06-22 14:14 UTC (permalink / raw)
  To: linux-nfs, netdev; +Cc: david, tgraf, jlayton

There have been reports of problems when running fstests generic/531
against Linux NFS servers with NFSv4. The NFS server that hosts the
test's SCRATCH_DEV suffers from CPU soft lock-ups during the test.
Analysis shows that:

fs/nfsd/filecache.c
 482                 ret = list_lru_walk(&nfsd_file_lru,
 483                                 nfsd_file_lru_cb,
 484                                 &head, LONG_MAX);

causes nfsd_file_gc() to walk the entire length of the filecache LRU
list every time it is called (which is quite frequently). The walk
holds a spinlock the entire time that prevents other nfsd threads
from accessing the filecache.

What's more, for NFSv4 workloads, none of the items that are visited
during this walk may be evicted, since they are all files that are
held OPEN by NFS clients.

Address this by ensuring that open files are not kept on the LRU
list.

Reported-by: Frank van der Linden <fllinden@amazon.com>
Reported-by: Wang Yugui <wangyugui@e16-tech.com>
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=386
Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c |   24 +++++++++++++++++++-----
 fs/nfsd/trace.h     |    2 ++
 2 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 1f65065cd325..65085853cc42 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -270,6 +270,7 @@ nfsd_file_flush(struct nfsd_file *nf)
 
 static void nfsd_file_lru_add(struct nfsd_file *nf)
 {
+	set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags);
 	if (list_lru_add(&nfsd_file_lru, &nf->nf_lru))
 		trace_nfsd_file_lru_add(nf);
 }
@@ -299,7 +300,6 @@ nfsd_file_unhash(struct nfsd_file *nf)
 {
 	if (test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
 		nfsd_file_do_unhash(nf);
-		nfsd_file_lru_remove(nf);
 		return true;
 	}
 	return false;
@@ -320,6 +320,7 @@ nfsd_file_unhash_and_release_locked(struct nfsd_file *nf, struct list_head *disp
 	if (refcount_dec_not_one(&nf->nf_ref))
 		return true;
 
+	nfsd_file_lru_remove(nf);
 	list_add(&nf->nf_lru, dispose);
 	return true;
 }
@@ -331,6 +332,7 @@ nfsd_file_put_noref(struct nfsd_file *nf)
 
 	if (refcount_dec_and_test(&nf->nf_ref)) {
 		WARN_ON(test_bit(NFSD_FILE_HASHED, &nf->nf_flags));
+		nfsd_file_lru_remove(nf);
 		nfsd_file_free(nf);
 	}
 }
@@ -340,7 +342,7 @@ nfsd_file_put(struct nfsd_file *nf)
 {
 	might_sleep();
 
-	set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags);
+	nfsd_file_lru_add(nf);
 	if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) {
 		nfsd_file_flush(nf);
 		nfsd_file_put_noref(nf);
@@ -440,8 +442,18 @@ nfsd_file_dispose_list_delayed(struct list_head *dispose)
 	}
 }
 
-/*
+/**
+ * nfsd_file_lru_cb - Examine an entry on the LRU list
+ * @item: LRU entry to examine
+ * @lru: controlling LRU
+ * @lock: LRU list lock (unused)
+ * @arg: dispose list
+ *
  * Note this can deadlock with nfsd_file_cache_purge.
+ *
+ * Return values:
+ *   %LRU_REMOVED: @item was removed from the LRU
+ *   %LRU_SKIP: @item cannot be evicted
  */
 static enum lru_status
 nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru,
@@ -463,8 +475,9 @@ nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru,
 	 * That order is deliberate to ensure that we can do this locklessly.
 	 */
 	if (refcount_read(&nf->nf_ref) > 1) {
+		list_lru_isolate(lru, &nf->nf_lru);
 		trace_nfsd_file_gc_in_use(nf);
-		return LRU_SKIP;
+		return LRU_REMOVED;
 	}
 
 	/*
@@ -1023,6 +1036,7 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		goto retry;
 	}
 
+	nfsd_file_lru_remove(nf);
 	this_cpu_inc(nfsd_file_cache_hits);
 
 	if (!(may_flags & NFSD_MAY_NOT_BREAK_LEASE)) {
@@ -1058,7 +1072,6 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	refcount_inc(&nf->nf_ref);
 	__set_bit(NFSD_FILE_HASHED, &nf->nf_flags);
 	__set_bit(NFSD_FILE_PENDING, &nf->nf_flags);
-	nfsd_file_lru_add(nf);
 	hlist_add_head_rcu(&nf->nf_node, &nfsd_file_hashtbl[hashval].nfb_head);
 	++nfsd_file_hashtbl[hashval].nfb_count;
 	nfsd_file_hashtbl[hashval].nfb_maxcount = max(nfsd_file_hashtbl[hashval].nfb_maxcount,
@@ -1083,6 +1096,7 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	 */
 	if (status != nfs_ok || inode->i_nlink == 0) {
 		bool do_free;
+		nfsd_file_lru_remove(nf);
 		spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock);
 		do_free = nfsd_file_unhash(nf);
 		spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index e56fe3dfa44c..954838616c51 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -883,7 +883,9 @@ DEFINE_EVENT(nfsd_file_gc_class, name,					\
 	TP_ARGS(nf))
 
 DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_add);
+DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_add_disposed);
 DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_del);
+DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_del_disposed);
 DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_in_use);
 DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_writeback);
 DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_referenced);



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH RFC 16/30] NFSD: Fix the filecache LRU shrinker
  2022-06-22 14:12 [PATCH RFC 00/30] Overhaul NFSD filecache Chuck Lever
                   ` (14 preceding siblings ...)
  2022-06-22 14:14 ` [PATCH RFC 15/30] NFSD: Leave open files out of the filecache LRU Chuck Lever
@ 2022-06-22 14:14 ` Chuck Lever
  2022-06-22 14:14 ` [PATCH RFC 17/30] NFSD: Never call nfsd_file_gc() in foreground paths Chuck Lever
                   ` (15 subsequent siblings)
  31 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2022-06-22 14:14 UTC (permalink / raw)
  To: linux-nfs, netdev; +Cc: david, tgraf, jlayton

Without LRU item rotation, the shrinker visits only a few items on
the end of the LRU list, and those would always be long-term OPEN
files for NFSv4 workloads. That makes the filecache shrinker
completely ineffective.

Adopt the same strategy as the inode LRU by using LRU_ROTATE.

Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c |    8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 65085853cc42..deb842f45117 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -453,6 +453,7 @@ nfsd_file_dispose_list_delayed(struct list_head *dispose)
  *
  * Return values:
  *   %LRU_REMOVED: @item was removed from the LRU
+ *   %LRU_ROTATED: @item is to be moved to the LRU tail
  *   %LRU_SKIP: @item cannot be evicted
  */
 static enum lru_status
@@ -491,7 +492,7 @@ nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru,
 
 	if (test_and_clear_bit(NFSD_FILE_REFERENCED, &nf->nf_flags)) {
 		trace_nfsd_file_gc_referenced(nf);
-		return LRU_SKIP;
+		return LRU_ROTATE;
 	}
 
 	if (!test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
@@ -528,13 +529,14 @@ static void nfsd_file_gc_dispose_list(struct list_head *dispose)
 static void
 nfsd_file_gc(void)
 {
+	unsigned long max = list_lru_count(&nfsd_file_lru);
 	LIST_HEAD(dispose);
 	unsigned long ret;
 
 	ret = list_lru_walk(&nfsd_file_lru, nfsd_file_lru_cb,
-			    &dispose, LONG_MAX);
+			    &dispose, max);
 	this_cpu_add(nfsd_file_evictions, ret);
-	trace_nfsd_file_gc_evicted(ret, list_lru_count(&nfsd_file_lru));
+	trace_nfsd_file_gc_evicted(ret, max);
 	nfsd_file_gc_dispose_list(&dispose);
 }
 



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH RFC 17/30] NFSD: Never call nfsd_file_gc() in foreground paths
  2022-06-22 14:12 [PATCH RFC 00/30] Overhaul NFSD filecache Chuck Lever
                   ` (15 preceding siblings ...)
  2022-06-22 14:14 ` [PATCH RFC 16/30] NFSD: Fix the filecache LRU shrinker Chuck Lever
@ 2022-06-22 14:14 ` Chuck Lever
  2022-06-22 14:14 ` [PATCH RFC 18/30] NFSD: No longer record nf_hashval in the trace log Chuck Lever
                   ` (14 subsequent siblings)
  31 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2022-06-22 14:14 UTC (permalink / raw)
  To: linux-nfs, netdev; +Cc: david, tgraf, jlayton

The checks in nfsd_file_acquire() and nfsd_file_put() that directly
invoke filecache garbage collection are intended to keep cache
occupancy between a low- and high-watermark. The reason to limit the
capacity of the filecache is to keep filecache lookups reasonably
fast.

However, invoking garbage collection at those points has some
undesirable negative impacts. Files that are held open by NFSv4
clients often push the occupancy of the filecache over these
watermarks. At that point:

- Every call to nfsd_file_acquire() and nfsd_file_put() results in
  an LRU walk. This has the same effect on lookup latency as long
  chains in the hash table.
- Garbage collection will then run on every nfsd thread, causing a
  lot of unnecessary lock contention.
- Limiting cache capacity pushes out files used only by NFSv3
  clients, which are the type of files the filecache is supposed to
  help.

To address those negative impacts, remove the direct calls to the
garbage collector. Subsequent patches will address maintaining
lookup efficiency as cache capacity increases.

Suggested-by: Wang Yugui <wangyugui@e16-tech.com>
Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c |   10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index deb842f45117..9d2e4b042b46 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -29,8 +29,6 @@
 #define NFSD_LAUNDRETTE_DELAY		     (2 * HZ)
 
 #define NFSD_FILE_SHUTDOWN		     (1)
-#define NFSD_FILE_LRU_THRESHOLD		     (4096UL)
-#define NFSD_FILE_LRU_LIMIT		     (NFSD_FILE_LRU_THRESHOLD << 2)
 
 /* We only care about NFSD_MAY_READ/WRITE for this cache */
 #define NFSD_FILE_MAY_MASK	(NFSD_MAY_READ|NFSD_MAY_WRITE)
@@ -67,8 +65,6 @@ static atomic_long_t			nfsd_filecache_count;
 static atomic_long_t			nfsd_file_total_age;
 static struct delayed_work		nfsd_filecache_laundrette;
 
-static void nfsd_file_gc(void);
-
 static void
 nfsd_file_schedule_laundrette(void)
 {
@@ -351,9 +347,6 @@ nfsd_file_put(struct nfsd_file *nf)
 		nfsd_file_schedule_laundrette();
 	} else
 		nfsd_file_put_noref(nf);
-
-	if (atomic_long_read(&nfsd_filecache_count) >= NFSD_FILE_LRU_LIMIT)
-		nfsd_file_gc();
 }
 
 struct nfsd_file *
@@ -1079,8 +1072,7 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	nfsd_file_hashtbl[hashval].nfb_maxcount = max(nfsd_file_hashtbl[hashval].nfb_maxcount,
 			nfsd_file_hashtbl[hashval].nfb_count);
 	spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
-	if (atomic_long_inc_return(&nfsd_filecache_count) >= NFSD_FILE_LRU_THRESHOLD)
-		nfsd_file_gc();
+	atomic_long_inc(&nfsd_filecache_count);
 
 	nf->nf_mark = nfsd_file_mark_find_or_create(nf);
 	if (nf->nf_mark) {



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH RFC 18/30] NFSD: No longer record nf_hashval in the trace log
  2022-06-22 14:12 [PATCH RFC 00/30] Overhaul NFSD filecache Chuck Lever
                   ` (16 preceding siblings ...)
  2022-06-22 14:14 ` [PATCH RFC 17/30] NFSD: Never call nfsd_file_gc() in foreground paths Chuck Lever
@ 2022-06-22 14:14 ` Chuck Lever
  2022-06-22 14:14 ` [PATCH RFC 19/30] NFSD: Remove lockdep assertion from unhash_and_release_locked() Chuck Lever
                   ` (13 subsequent siblings)
  31 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2022-06-22 14:14 UTC (permalink / raw)
  To: linux-nfs, netdev; +Cc: david, tgraf, jlayton

I'm about to replace nfsd_file_hashtbl with an rhashtable. The
individual hash values will no longer be visible or relevant, so
remove them from the tracepoints.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c |   15 ++++++++-------
 fs/nfsd/trace.h     |   45 +++++++++++++++++++++------------------------
 2 files changed, 29 insertions(+), 31 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 9d2e4b042b46..d620f18924a1 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -598,7 +598,7 @@ nfsd_file_close_inode_sync(struct inode *inode)
 	LIST_HEAD(dispose);
 
 	__nfsd_file_close_inode(inode, hashval, &dispose);
-	trace_nfsd_file_close_inode_sync(inode, hashval, !list_empty(&dispose));
+	trace_nfsd_file_close_inode_sync(inode, !list_empty(&dispose));
 	nfsd_file_dispose_list_sync(&dispose);
 }
 
@@ -618,7 +618,7 @@ nfsd_file_close_inode(struct inode *inode)
 	LIST_HEAD(dispose);
 
 	__nfsd_file_close_inode(inode, hashval, &dispose);
-	trace_nfsd_file_close_inode(inode, hashval, !list_empty(&dispose));
+	trace_nfsd_file_close_inode(inode, !list_empty(&dispose));
 	nfsd_file_dispose_list_delayed(&dispose);
 }
 
@@ -972,7 +972,7 @@ nfsd_file_is_cached(struct inode *inode)
 		}
 	}
 	rcu_read_unlock();
-	trace_nfsd_file_is_cached(inode, hashval, (int)ret);
+	trace_nfsd_file_is_cached(inode, (int)ret);
 	return ret;
 }
 
@@ -1004,9 +1004,8 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 
 	new = nfsd_file_alloc(inode, may_flags, hashval, net);
 	if (!new) {
-		trace_nfsd_file_acquire(rqstp, hashval, inode, may_flags,
-					NULL, nfserr_jukebox);
-		return nfserr_jukebox;
+		status = nfserr_jukebox;
+		goto out_status;
 	}
 
 	spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock);
@@ -1059,8 +1058,10 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		nf = NULL;
 	}
 
-	trace_nfsd_file_acquire(rqstp, hashval, inode, may_flags, nf, status);
+out_status:
+	trace_nfsd_file_acquire(rqstp, inode, may_flags, nf, status);
 	return status;
+
 open_file:
 	nf = new;
 	/* Take reference for the hashtable */
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index 954838616c51..c64336016d2c 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -704,7 +704,6 @@ DECLARE_EVENT_CLASS(nfsd_file_class,
 	TP_PROTO(struct nfsd_file *nf),
 	TP_ARGS(nf),
 	TP_STRUCT__entry(
-		__field(unsigned int, nf_hashval)
 		__field(void *, nf_inode)
 		__field(int, nf_ref)
 		__field(unsigned long, nf_flags)
@@ -712,15 +711,13 @@ DECLARE_EVENT_CLASS(nfsd_file_class,
 		__field(struct file *, nf_file)
 	),
 	TP_fast_assign(
-		__entry->nf_hashval = nf->nf_hashval;
 		__entry->nf_inode = nf->nf_inode;
 		__entry->nf_ref = refcount_read(&nf->nf_ref);
 		__entry->nf_flags = nf->nf_flags;
 		__entry->nf_may = nf->nf_may;
 		__entry->nf_file = nf->nf_file;
 	),
-	TP_printk("hash=0x%x inode=%p ref=%d flags=%s may=%s file=%p",
-		__entry->nf_hashval,
+	TP_printk("inode=%p ref=%d flags=%s may=%s nf_file=%p",
 		__entry->nf_inode,
 		__entry->nf_ref,
 		show_nf_flags(__entry->nf_flags),
@@ -740,15 +737,18 @@ DEFINE_NFSD_FILE_EVENT(nfsd_file_put);
 DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash_and_release_locked);
 
 TRACE_EVENT(nfsd_file_acquire,
-	TP_PROTO(struct svc_rqst *rqstp, unsigned int hash,
-		 struct inode *inode, unsigned int may_flags,
-		 struct nfsd_file *nf, __be32 status),
+	TP_PROTO(
+		struct svc_rqst *rqstp,
+		struct inode *inode,
+		unsigned int may_flags,
+		struct nfsd_file *nf,
+		__be32 status
+	),
 
-	TP_ARGS(rqstp, hash, inode, may_flags, nf, status),
+	TP_ARGS(rqstp, inode, may_flags, nf, status),
 
 	TP_STRUCT__entry(
 		__field(u32, xid)
-		__field(unsigned int, hash)
 		__field(void *, inode)
 		__field(unsigned long, may_flags)
 		__field(int, nf_ref)
@@ -760,7 +760,6 @@ TRACE_EVENT(nfsd_file_acquire,
 
 	TP_fast_assign(
 		__entry->xid = be32_to_cpu(rqstp->rq_xid);
-		__entry->hash = hash;
 		__entry->inode = inode;
 		__entry->may_flags = may_flags;
 		__entry->nf_ref = nf ? refcount_read(&nf->nf_ref) : 0;
@@ -770,8 +769,8 @@ TRACE_EVENT(nfsd_file_acquire,
 		__entry->status = be32_to_cpu(status);
 	),
 
-	TP_printk("xid=0x%x hash=0x%x inode=%p may_flags=%s ref=%d nf_flags=%s nf_may=%s nf_file=%p status=%u",
-			__entry->xid, __entry->hash, __entry->inode,
+	TP_printk("xid=0x%x inode=%p may_flags=%s ref=%d nf_flags=%s nf_may=%s nf_file=%p status=%u",
+			__entry->xid, __entry->inode,
 			show_nfsd_may_flags(__entry->may_flags),
 			__entry->nf_ref, show_nf_flags(__entry->nf_flags),
 			show_nfsd_may_flags(__entry->nf_may),
@@ -782,7 +781,6 @@ TRACE_EVENT(nfsd_file_open,
 	TP_PROTO(struct nfsd_file *nf, __be32 status),
 	TP_ARGS(nf, status),
 	TP_STRUCT__entry(
-		__field(unsigned int, nf_hashval)
 		__field(void *, nf_inode)	/* cannot be dereferenced */
 		__field(int, nf_ref)
 		__field(unsigned long, nf_flags)
@@ -790,15 +788,13 @@ TRACE_EVENT(nfsd_file_open,
 		__field(void *, nf_file)	/* cannot be dereferenced */
 	),
 	TP_fast_assign(
-		__entry->nf_hashval = nf->nf_hashval;
 		__entry->nf_inode = nf->nf_inode;
 		__entry->nf_ref = refcount_read(&nf->nf_ref);
 		__entry->nf_flags = nf->nf_flags;
 		__entry->nf_may = nf->nf_may;
 		__entry->nf_file = nf->nf_file;
 	),
-	TP_printk("hash=0x%x inode=%p ref=%d flags=%s may=%s file=%p",
-		__entry->nf_hashval,
+	TP_printk("inode=%p ref=%d flags=%s may=%s file=%p",
 		__entry->nf_inode,
 		__entry->nf_ref,
 		show_nf_flags(__entry->nf_flags),
@@ -807,26 +803,27 @@ TRACE_EVENT(nfsd_file_open,
 )
 
 DECLARE_EVENT_CLASS(nfsd_file_search_class,
-	TP_PROTO(struct inode *inode, unsigned int hash, int found),
-	TP_ARGS(inode, hash, found),
+	TP_PROTO(
+		struct inode *inode,
+		int found
+	),
+	TP_ARGS(inode, found),
 	TP_STRUCT__entry(
 		__field(struct inode *, inode)
-		__field(unsigned int, hash)
 		__field(int, found)
 	),
 	TP_fast_assign(
 		__entry->inode = inode;
-		__entry->hash = hash;
 		__entry->found = found;
 	),
-	TP_printk("hash=0x%x inode=%p found=%d", __entry->hash,
-			__entry->inode, __entry->found)
+	TP_printk("inode=%p found=%d",
+		__entry->inode, __entry->found)
 );
 
 #define DEFINE_NFSD_FILE_SEARCH_EVENT(name)				\
 DEFINE_EVENT(nfsd_file_search_class, name,				\
-	TP_PROTO(struct inode *inode, unsigned int hash, int found),	\
-	TP_ARGS(inode, hash, found))
+	TP_PROTO(struct inode *inode, int found),			\
+	TP_ARGS(inode, found))
 
 DEFINE_NFSD_FILE_SEARCH_EVENT(nfsd_file_close_inode_sync);
 DEFINE_NFSD_FILE_SEARCH_EVENT(nfsd_file_close_inode);



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH RFC 19/30] NFSD: Remove lockdep assertion from unhash_and_release_locked()
  2022-06-22 14:12 [PATCH RFC 00/30] Overhaul NFSD filecache Chuck Lever
                   ` (17 preceding siblings ...)
  2022-06-22 14:14 ` [PATCH RFC 18/30] NFSD: No longer record nf_hashval in the trace log Chuck Lever
@ 2022-06-22 14:14 ` Chuck Lever
  2022-06-22 14:14 ` [PATCH RFC 20/30] NFSD: nfsd_file_unhash can compute hashval from nf->nf_inode Chuck Lever
                   ` (12 subsequent siblings)
  31 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2022-06-22 14:14 UTC (permalink / raw)
  To: linux-nfs, netdev; +Cc: david, tgraf, jlayton

IIUC, holding the hash bucket lock is needed only in
nfsd_file_unhash, and there is already a lockdep assertion there.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c |    2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index d620f18924a1..304faa28afd7 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -307,8 +307,6 @@ nfsd_file_unhash(struct nfsd_file *nf)
 static bool
 nfsd_file_unhash_and_release_locked(struct nfsd_file *nf, struct list_head *dispose)
 {
-	lockdep_assert_held(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock);
-
 	trace_nfsd_file_unhash_and_release_locked(nf);
 	if (!nfsd_file_unhash(nf))
 		return false;



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH RFC 20/30] NFSD: nfsd_file_unhash can compute hashval from nf->nf_inode
  2022-06-22 14:12 [PATCH RFC 00/30] Overhaul NFSD filecache Chuck Lever
                   ` (18 preceding siblings ...)
  2022-06-22 14:14 ` [PATCH RFC 19/30] NFSD: Remove lockdep assertion from unhash_and_release_locked() Chuck Lever
@ 2022-06-22 14:14 ` Chuck Lever
  2022-06-22 14:15 ` [PATCH RFC 21/30] NFSD: Refactor __nfsd_file_close_inode() Chuck Lever
                   ` (11 subsequent siblings)
  31 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2022-06-22 14:14 UTC (permalink / raw)
  To: linux-nfs, netdev; +Cc: david, tgraf, jlayton

Remove an unnecessary usage of nf_hashval.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 304faa28afd7..16679a80f20e 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -280,13 +280,17 @@ static void nfsd_file_lru_remove(struct nfsd_file *nf)
 static void
 nfsd_file_do_unhash(struct nfsd_file *nf)
 {
-	lockdep_assert_held(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock);
+	struct inode *inode = nf->nf_inode;
+	unsigned int hashval = (unsigned int)hash_long(inode->i_ino,
+				NFSD_FILE_HASH_BITS);
+
+	lockdep_assert_held(&nfsd_file_hashtbl[hashval].nfb_lock);
 
 	trace_nfsd_file_unhash(nf);
 
 	if (nfsd_file_check_write_error(nf))
 		nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id));
-	--nfsd_file_hashtbl[nf->nf_hashval].nfb_count;
+	--nfsd_file_hashtbl[hashval].nfb_count;
 	hlist_del_rcu(&nf->nf_node);
 	atomic_long_dec(&nfsd_filecache_count);
 }



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH RFC 21/30] NFSD: Refactor __nfsd_file_close_inode()
  2022-06-22 14:12 [PATCH RFC 00/30] Overhaul NFSD filecache Chuck Lever
                   ` (19 preceding siblings ...)
  2022-06-22 14:14 ` [PATCH RFC 20/30] NFSD: nfsd_file_unhash can compute hashval from nf->nf_inode Chuck Lever
@ 2022-06-22 14:15 ` Chuck Lever
  2022-06-22 14:15 ` [PATCH RFC 22/30] NFSD: nfsd_file_hash_remove can compute hashval Chuck Lever
                   ` (10 subsequent siblings)
  31 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2022-06-22 14:15 UTC (permalink / raw)
  To: linux-nfs, netdev; +Cc: david, tgraf, jlayton

The code that computes the hashval is the same in both callers.

To prevent them from going stale, reframe the documenting comments
to remove descriptions of the underlying hash table structure, which
is about to be replaced.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c |   26 +++++++++++---------------
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 16679a80f20e..0387b2028a9b 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -568,10 +568,15 @@ static struct shrinker	nfsd_file_shrinker = {
 	.seeks = 1,
 };
 
+/*
+ * Find all cache items that match the inode and move them to @dispose.
+ * This process is atomic wrt nfsd_file_insert().
+ */
 static void
-__nfsd_file_close_inode(struct inode *inode, unsigned int hashval,
-			struct list_head *dispose)
+__nfsd_file_close_inode(struct inode *inode, struct list_head *dispose)
 {
+	unsigned int		hashval = (unsigned int)hash_long(inode->i_ino,
+						NFSD_FILE_HASH_BITS);
 	struct nfsd_file	*nf;
 	struct hlist_node	*tmp;
 
@@ -587,19 +592,14 @@ __nfsd_file_close_inode(struct inode *inode, unsigned int hashval,
  * nfsd_file_close_inode_sync - attempt to forcibly close a nfsd_file
  * @inode: inode of the file to attempt to remove
  *
- * Walk the whole hash bucket, looking for any files that correspond to "inode".
- * If any do, then unhash them and put the hashtable reference to them and
- * destroy any that had their last reference put. Also ensure that any of the
- * fputs also have their final __fput done as well.
+ * Unhash and put, then flush and fput all cache items associated with @inode.
  */
 void
 nfsd_file_close_inode_sync(struct inode *inode)
 {
-	unsigned int		hashval = (unsigned int)hash_long(inode->i_ino,
-						NFSD_FILE_HASH_BITS);
 	LIST_HEAD(dispose);
 
-	__nfsd_file_close_inode(inode, hashval, &dispose);
+	__nfsd_file_close_inode(inode, &dispose);
 	trace_nfsd_file_close_inode_sync(inode, !list_empty(&dispose));
 	nfsd_file_dispose_list_sync(&dispose);
 }
@@ -608,18 +608,14 @@ nfsd_file_close_inode_sync(struct inode *inode)
  * nfsd_file_close_inode - attempt a delayed close of a nfsd_file
  * @inode: inode of the file to attempt to remove
  *
- * Walk the whole hash bucket, looking for any files that correspond to "inode".
- * If any do, then unhash them and put the hashtable reference to them and
- * destroy any that had their last reference put.
+ * Unhash and put all cache item associated with @inode.
  */
 static void
 nfsd_file_close_inode(struct inode *inode)
 {
-	unsigned int		hashval = (unsigned int)hash_long(inode->i_ino,
-						NFSD_FILE_HASH_BITS);
 	LIST_HEAD(dispose);
 
-	__nfsd_file_close_inode(inode, hashval, &dispose);
+	__nfsd_file_close_inode(inode, &dispose);
 	trace_nfsd_file_close_inode(inode, !list_empty(&dispose));
 	nfsd_file_dispose_list_delayed(&dispose);
 }



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH RFC 22/30] NFSD: nfsd_file_hash_remove can compute hashval
  2022-06-22 14:12 [PATCH RFC 00/30] Overhaul NFSD filecache Chuck Lever
                   ` (20 preceding siblings ...)
  2022-06-22 14:15 ` [PATCH RFC 21/30] NFSD: Refactor __nfsd_file_close_inode() Chuck Lever
@ 2022-06-22 14:15 ` Chuck Lever
  2022-06-22 14:15 ` [PATCH RFC 23/30] NFSD: Remove nfsd_file::nf_hashval Chuck Lever
                   ` (9 subsequent siblings)
  31 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2022-06-22 14:15 UTC (permalink / raw)
  To: linux-nfs, netdev; +Cc: david, tgraf, jlayton

Remove an unnecessary use of nf_hashval.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c |   19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 0387b2028a9b..fa793413bc1f 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -295,6 +295,18 @@ nfsd_file_do_unhash(struct nfsd_file *nf)
 	atomic_long_dec(&nfsd_filecache_count);
 }
 
+static void
+nfsd_file_hash_remove(struct nfsd_file *nf)
+{
+	struct inode *inode = nf->nf_inode;
+	unsigned int hashval = (unsigned int)hash_long(inode->i_ino,
+				NFSD_FILE_HASH_BITS);
+
+	spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock);
+	nfsd_file_do_unhash(nf);
+	spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
+}
+
 static bool
 nfsd_file_unhash(struct nfsd_file *nf)
 {
@@ -513,11 +525,8 @@ static void nfsd_file_gc_dispose_list(struct list_head *dispose)
 {
 	struct nfsd_file *nf;
 
-	list_for_each_entry(nf, dispose, nf_lru) {
-		spin_lock(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock);
-		nfsd_file_do_unhash(nf);
-		spin_unlock(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock);
-	}
+	list_for_each_entry(nf, dispose, nf_lru)
+		nfsd_file_hash_remove(nf);
 	nfsd_file_dispose_list_delayed(dispose);
 }
 



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH RFC 23/30] NFSD: Remove nfsd_file::nf_hashval
  2022-06-22 14:12 [PATCH RFC 00/30] Overhaul NFSD filecache Chuck Lever
                   ` (21 preceding siblings ...)
  2022-06-22 14:15 ` [PATCH RFC 22/30] NFSD: nfsd_file_hash_remove can compute hashval Chuck Lever
@ 2022-06-22 14:15 ` Chuck Lever
  2022-06-22 14:15 ` [PATCH RFC 24/30] NFSD: Remove stale comment from nfsd_file_acquire() Chuck Lever
                   ` (8 subsequent siblings)
  31 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2022-06-22 14:15 UTC (permalink / raw)
  To: linux-nfs, netdev; +Cc: david, tgraf, jlayton

The value in this field can always be computed from nf_inode, thus
it is no longer used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c |    6 ++----
 fs/nfsd/filecache.h |    1 -
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index fa793413bc1f..23c51b95d2a2 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -169,8 +169,7 @@ nfsd_file_mark_find_or_create(struct nfsd_file *nf)
 }
 
 static struct nfsd_file *
-nfsd_file_alloc(struct inode *inode, unsigned int may, unsigned int hashval,
-		struct net *net)
+nfsd_file_alloc(struct inode *inode, unsigned int may, struct net *net)
 {
 	struct nfsd_file *nf;
 
@@ -184,7 +183,6 @@ nfsd_file_alloc(struct inode *inode, unsigned int may, unsigned int hashval,
 		nf->nf_net = net;
 		nf->nf_flags = 0;
 		nf->nf_inode = inode;
-		nf->nf_hashval = hashval;
 		refcount_set(&nf->nf_ref, 1);
 		nf->nf_may = may & NFSD_FILE_MAY_MASK;
 		if (may & NFSD_MAY_NOT_BREAK_LEASE) {
@@ -1009,7 +1007,7 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	if (nf)
 		goto wait_for_construction;
 
-	new = nfsd_file_alloc(inode, may_flags, hashval, net);
+	new = nfsd_file_alloc(inode, may_flags, net);
 	if (!new) {
 		status = nfserr_jukebox;
 		goto out_status;
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index d0c42619dc10..31dc65f82c75 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -42,7 +42,6 @@ struct nfsd_file {
 #define NFSD_FILE_REFERENCED	(4)
 	unsigned long		nf_flags;
 	struct inode		*nf_inode;
-	unsigned int		nf_hashval;
 	refcount_t		nf_ref;
 	unsigned char		nf_may;
 	struct nfsd_file_mark	*nf_mark;



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH RFC 24/30] NFSD: Remove stale comment from nfsd_file_acquire()
  2022-06-22 14:12 [PATCH RFC 00/30] Overhaul NFSD filecache Chuck Lever
                   ` (22 preceding siblings ...)
  2022-06-22 14:15 ` [PATCH RFC 23/30] NFSD: Remove nfsd_file::nf_hashval Chuck Lever
@ 2022-06-22 14:15 ` Chuck Lever
  2022-06-22 14:15 ` [PATCH RFC 25/30] NFSD: Clean up "open file" case in nfsd_file_acquire() Chuck Lever
                   ` (7 subsequent siblings)
  31 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2022-06-22 14:15 UTC (permalink / raw)
  To: linux-nfs, netdev; +Cc: david, tgraf, jlayton

I tried the change suggested by the comment, and things broke. IMO
that demonstrates the necessity of leaving the fh_verify() call in
place.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c |    1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 23c51b95d2a2..ae813e6f645f 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -992,7 +992,6 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	unsigned int hashval;
 	bool retry = true;
 
-	/* FIXME: skip this if fh_dentry is already set? */
 	status = fh_verify(rqstp, fhp, S_IFREG,
 				may_flags|NFSD_MAY_OWNER_OVERRIDE);
 	if (status != nfs_ok)



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH RFC 25/30] NFSD: Clean up "open file" case in nfsd_file_acquire()
  2022-06-22 14:12 [PATCH RFC 00/30] Overhaul NFSD filecache Chuck Lever
                   ` (23 preceding siblings ...)
  2022-06-22 14:15 ` [PATCH RFC 24/30] NFSD: Remove stale comment from nfsd_file_acquire() Chuck Lever
@ 2022-06-22 14:15 ` Chuck Lever
  2022-06-22 14:15 ` [PATCH RFC 26/30] NFSD: Document nfsd_file_cache_purge() API contract Chuck Lever
                   ` (6 subsequent siblings)
  31 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2022-06-22 14:15 UTC (permalink / raw)
  To: linux-nfs, netdev; +Cc: david, tgraf, jlayton

Refactor a little to prepare for changes to nfsd_file_find_locked().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c |   27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index ae813e6f645f..8b8d765a0df0 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -1003,23 +1003,22 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	rcu_read_lock();
 	nf = nfsd_file_find_locked(inode, may_flags, hashval, net);
 	rcu_read_unlock();
-	if (nf)
-		goto wait_for_construction;
+	if (nf == NULL) {
+		new = nfsd_file_alloc(inode, may_flags, net);
+		if (!new) {
+			status = nfserr_jukebox;
+			goto out_status;
+		}
 
-	new = nfsd_file_alloc(inode, may_flags, net);
-	if (!new) {
-		status = nfserr_jukebox;
-		goto out_status;
-	}
+		spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock);
+		nf = nfsd_file_find_locked(inode, may_flags, hashval, net);
+		if (nf == NULL)
+			goto open_file;
+		spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
 
-	spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock);
-	nf = nfsd_file_find_locked(inode, may_flags, hashval, net);
-	if (nf == NULL)
-		goto open_file;
-	spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
-	nfsd_file_slab_free(&new->nf_rcu);
+		nfsd_file_slab_free(&new->nf_rcu);
+	}
 
-wait_for_construction:
 	wait_on_bit(&nf->nf_flags, NFSD_FILE_PENDING, TASK_UNINTERRUPTIBLE);
 
 	/* Did construction of this file fail? */



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH RFC 26/30] NFSD: Document nfsd_file_cache_purge() API contract
  2022-06-22 14:12 [PATCH RFC 00/30] Overhaul NFSD filecache Chuck Lever
                   ` (24 preceding siblings ...)
  2022-06-22 14:15 ` [PATCH RFC 25/30] NFSD: Clean up "open file" case in nfsd_file_acquire() Chuck Lever
@ 2022-06-22 14:15 ` Chuck Lever
  2022-06-22 14:15 ` [PATCH RFC 27/30] NFSD: Replace the "init once" mechanism Chuck Lever
                   ` (5 subsequent siblings)
  31 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2022-06-22 14:15 UTC (permalink / raw)
  To: linux-nfs, netdev; +Cc: david, tgraf, jlayton

In particular, document that the caller must hold nfsd_mutex.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c |    8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 8b8d765a0df0..943db8cc87af 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -786,7 +786,10 @@ nfsd_file_cache_init(void)
 	goto out;
 }
 
-/*
+/**
+ * nfsd_file_cache_purge - Remove all cache items associated with @net
+ * @net: target net namespace
+ *
  * Note this can deadlock with nfsd_file_lru_cb.
  */
 void
@@ -798,6 +801,8 @@ nfsd_file_cache_purge(struct net *net)
 	LIST_HEAD(dispose);
 	bool del;
 
+	lockdep_assert_held(&nfsd_mutex);
+
 	if (!nfsd_file_hashtbl)
 		return;
 
@@ -1000,6 +1005,7 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	inode = d_inode(fhp->fh_dentry);
 	hashval = (unsigned int)hash_long(inode->i_ino, NFSD_FILE_HASH_BITS);
 retry:
+	/* Avoid allocation if the item is already in cache */
 	rcu_read_lock();
 	nf = nfsd_file_find_locked(inode, may_flags, hashval, net);
 	rcu_read_unlock();



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH RFC 27/30] NFSD: Replace the "init once" mechanism
  2022-06-22 14:12 [PATCH RFC 00/30] Overhaul NFSD filecache Chuck Lever
                   ` (25 preceding siblings ...)
  2022-06-22 14:15 ` [PATCH RFC 26/30] NFSD: Document nfsd_file_cache_purge() API contract Chuck Lever
@ 2022-06-22 14:15 ` Chuck Lever
  2022-06-22 14:15 ` [PATCH RFC 28/30] NFSD: Set up an rhashtable for the filecache Chuck Lever
                   ` (4 subsequent siblings)
  31 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2022-06-22 14:15 UTC (permalink / raw)
  To: linux-nfs, netdev; +Cc: david, tgraf, jlayton

In a moment, the nfsd_file_hashtbl global will be replaced with an
rhashtable. Replace the one or two spots that need to check if the
hash table is available. We can easily reuse the SHUTDOWN flag for
this purpose.

Document that this mechanism relies on callers to hold the
nfsd_mutex to prevent init, shutdown, and purging to run
concurrently.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c |   49 ++++++++++++++++++++++++++-----------------------
 1 file changed, 26 insertions(+), 23 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 943db8cc87af..75cb1f52152c 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -28,7 +28,7 @@
 #define NFSD_FILE_HASH_SIZE                  (1 << NFSD_FILE_HASH_BITS)
 #define NFSD_LAUNDRETTE_DELAY		     (2 * HZ)
 
-#define NFSD_FILE_SHUTDOWN		     (1)
+#define NFSD_FILE_CACHE_UP		     (0)
 
 /* We only care about NFSD_MAY_READ/WRITE for this cache */
 #define NFSD_FILE_MAY_MASK	(NFSD_MAY_READ|NFSD_MAY_WRITE)
@@ -59,7 +59,7 @@ static struct kmem_cache		*nfsd_file_slab;
 static struct kmem_cache		*nfsd_file_mark_slab;
 static struct nfsd_fcache_bucket	*nfsd_file_hashtbl;
 static struct list_lru			nfsd_file_lru;
-static long				nfsd_file_lru_flags;
+static unsigned long			nfsd_file_flags;
 static struct fsnotify_group		*nfsd_file_fsnotify_group;
 static atomic_long_t			nfsd_filecache_count;
 static atomic_long_t			nfsd_file_total_age;
@@ -68,9 +68,8 @@ static struct delayed_work		nfsd_filecache_laundrette;
 static void
 nfsd_file_schedule_laundrette(void)
 {
-	long count = atomic_long_read(&nfsd_filecache_count);
-
-	if (count == 0 || test_bit(NFSD_FILE_SHUTDOWN, &nfsd_file_lru_flags))
+	if ((atomic_long_read(&nfsd_filecache_count) == 0) ||
+	    test_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 0)
 		return;
 
 	queue_delayed_work(system_wq, &nfsd_filecache_laundrette,
@@ -701,9 +700,8 @@ nfsd_file_cache_init(void)
 	int		ret = -ENOMEM;
 	unsigned int	i;
 
-	clear_bit(NFSD_FILE_SHUTDOWN, &nfsd_file_lru_flags);
-
-	if (nfsd_file_hashtbl)
+	lockdep_assert_held(&nfsd_mutex);
+	if (test_and_set_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 1)
 		return 0;
 
 	nfsd_filecache_wq = alloc_workqueue("nfsd_filecache", 0, 0);
@@ -786,14 +784,8 @@ nfsd_file_cache_init(void)
 	goto out;
 }
 
-/**
- * nfsd_file_cache_purge - Remove all cache items associated with @net
- * @net: target net namespace
- *
- * Note this can deadlock with nfsd_file_lru_cb.
- */
-void
-nfsd_file_cache_purge(struct net *net)
+static void
+__nfsd_file_cache_purge(struct net *net)
 {
 	unsigned int		i;
 	struct nfsd_file	*nf;
@@ -801,11 +793,6 @@ nfsd_file_cache_purge(struct net *net)
 	LIST_HEAD(dispose);
 	bool del;
 
-	lockdep_assert_held(&nfsd_mutex);
-
-	if (!nfsd_file_hashtbl)
-		return;
-
 	for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) {
 		struct nfsd_fcache_bucket *nfb = &nfsd_file_hashtbl[i];
 
@@ -866,6 +853,20 @@ nfsd_file_cache_start_net(struct net *net)
 	return nn->fcache_disposal ? 0 : -ENOMEM;
 }
 
+/**
+ * nfsd_file_cache_purge - Remove all cache items associated with @net
+ * @net: target net namespace
+ *
+ * Note this can deadlock with nfsd_file_lru_cb.
+ */
+void
+nfsd_file_cache_purge(struct net *net)
+{
+	lockdep_assert_held(&nfsd_mutex);
+	if (test_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 1)
+		__nfsd_file_cache_purge(net);
+}
+
 void
 nfsd_file_cache_shutdown_net(struct net *net)
 {
@@ -878,7 +879,9 @@ nfsd_file_cache_shutdown(void)
 {
 	int i;
 
-	set_bit(NFSD_FILE_SHUTDOWN, &nfsd_file_lru_flags);
+	lockdep_assert_held(&nfsd_mutex);
+	if (test_and_clear_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 0)
+		return;
 
 	lease_unregister_notifier(&nfsd_file_lease_notifier);
 	unregister_shrinker(&nfsd_file_shrinker);
@@ -887,7 +890,7 @@ nfsd_file_cache_shutdown(void)
 	 * calling nfsd_file_cache_purge
 	 */
 	cancel_delayed_work_sync(&nfsd_filecache_laundrette);
-	nfsd_file_cache_purge(NULL);
+	__nfsd_file_cache_purge(NULL);
 	list_lru_destroy(&nfsd_file_lru);
 	rcu_barrier();
 	fsnotify_put_group(nfsd_file_fsnotify_group);



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH RFC 28/30] NFSD: Set up an rhashtable for the filecache
  2022-06-22 14:12 [PATCH RFC 00/30] Overhaul NFSD filecache Chuck Lever
                   ` (26 preceding siblings ...)
  2022-06-22 14:15 ` [PATCH RFC 27/30] NFSD: Replace the "init once" mechanism Chuck Lever
@ 2022-06-22 14:15 ` Chuck Lever
  2022-06-23 22:56   ` Al Viro
  2022-06-22 14:15 ` [PATCH RFC 29/30] NFSD: Convert the filecache to use rhashtable Chuck Lever
                   ` (3 subsequent siblings)
  31 siblings, 1 reply; 51+ messages in thread
From: Chuck Lever @ 2022-06-22 14:15 UTC (permalink / raw)
  To: linux-nfs, netdev; +Cc: david, tgraf, jlayton

Add code to initialize and tear down an rhashtable. The rhashtable
is not used yet.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c |  131 +++++++++++++++++++++++++++++++++++++++++++--------
 fs/nfsd/filecache.h |    1 
 2 files changed, 111 insertions(+), 21 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 75cb1f52152c..a491519598fc 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -13,6 +13,7 @@
 #include <linux/fsnotify_backend.h>
 #include <linux/fsnotify.h>
 #include <linux/seq_file.h>
+#include <linux/rhashtable.h>
 
 #include "vfs.h"
 #include "nfsd.h"
@@ -65,6 +66,107 @@ static atomic_long_t			nfsd_filecache_count;
 static atomic_long_t			nfsd_file_total_age;
 static struct delayed_work		nfsd_filecache_laundrette;
 
+static struct rhashtable nfsd_file_rhash_tbl ____cacheline_aligned_in_smp;
+
+struct nfsd_file_lookup_key {
+	struct inode *inode;
+	struct net *net;
+	const struct cred *cred;
+	unsigned char type;
+	unsigned char need;
+};
+
+enum {
+	NFSD_FILE_KEY_INODE,
+	NFSD_FILE_KEY_FULL,
+};
+
+static bool
+nfsd_match_cred(const struct cred *c1, const struct cred *c2)
+{
+	int i;
+
+	if (!uid_eq(c1->fsuid, c2->fsuid))
+		return false;
+	if (!gid_eq(c1->fsgid, c2->fsgid))
+		return false;
+	if (c1->group_info == NULL || c2->group_info == NULL)
+		return c1->group_info == c2->group_info;
+	if (c1->group_info->ngroups != c2->group_info->ngroups)
+		return false;
+	for (i = 0; i < c1->group_info->ngroups; i++) {
+		if (!gid_eq(c1->group_info->gid[i], c2->group_info->gid[i]))
+			return false;
+	}
+	return true;
+}
+
+/**
+ * nfsd_file_obj_hashfn - Compute the hash value of an nfsd_file
+ * @data: object on which to compute the hash value
+ * @len: rhash table's key_len parameter (unused)
+ * @seed: rhash table's random seed of the day
+ *
+ * Return value:
+ *   Computed 32-bit hash value
+ */
+static u32 nfsd_file_obj_hashfn(const void *data, u32 len, u32 seed)
+{
+	const struct nfsd_file *nf = data;
+
+	return jhash2((const u32 *)&nf->nf_inode,
+		      sizeof_field(struct nfsd_file, nf_inode) / sizeof(u32),
+		      seed);
+}
+
+/**
+ * nfsd_file_obj_cmpfn - Match a cache item against search criteria
+ * @arg: search criteria
+ * @ptr: cache item to check
+ *
+ * Return values:
+ *   %0 - Item matches search criteria
+ *   %1 - Item does not match search criteria
+ */
+static int nfsd_file_obj_cmpfn(struct rhashtable_compare_arg *arg,
+			       const void *ptr)
+{
+	const struct nfsd_file_lookup_key *key = arg->key;
+	const struct nfsd_file *nf = ptr;
+
+	switch (key->type) {
+	case NFSD_FILE_KEY_INODE:
+		if (nf->nf_inode != key->inode)
+			return 1;
+		break;
+	case NFSD_FILE_KEY_FULL:
+		if (nf->nf_inode != key->inode)
+			return 1;
+		if (nf->nf_may != key->need)
+			return 1;
+		if (nf->nf_net != key->net)
+			return 1;
+		if (!nfsd_match_cred(nf->nf_cred, key->cred))
+			return 1;
+		if (!test_bit(NFSD_FILE_HASHED, &nf->nf_flags))
+			return 1;
+		break;
+	}
+
+	return 0;
+}
+
+static const struct rhashtable_params nfsd_file_rhash_params = {
+	.key_len		= sizeof_field(struct nfsd_file, nf_inode),
+	.key_offset		= offsetof(struct nfsd_file, nf_inode),
+	.head_offset		= offsetof(struct nfsd_file, nf_rhash),
+	.obj_hashfn		= nfsd_file_obj_hashfn,
+	.obj_cmpfn		= nfsd_file_obj_cmpfn,
+	.max_size		= 131072,	/* buckets */
+	.min_size		= 1024,		/* buckets */
+	.automatic_shrinking	= true,
+};
+
 static void
 nfsd_file_schedule_laundrette(void)
 {
@@ -697,13 +799,18 @@ static const struct fsnotify_ops nfsd_file_fsnotify_ops = {
 int
 nfsd_file_cache_init(void)
 {
-	int		ret = -ENOMEM;
+	int		ret;
 	unsigned int	i;
 
 	lockdep_assert_held(&nfsd_mutex);
 	if (test_and_set_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 1)
 		return 0;
 
+	ret = rhashtable_init(&nfsd_file_rhash_tbl, &nfsd_file_rhash_params);
+	if (ret)
+		return ret;
+
+	ret = -ENOMEM;
 	nfsd_filecache_wq = alloc_workqueue("nfsd_filecache", 0, 0);
 	if (!nfsd_filecache_wq)
 		goto out;
@@ -781,6 +888,7 @@ nfsd_file_cache_init(void)
 	nfsd_file_hashtbl = NULL;
 	destroy_workqueue(nfsd_filecache_wq);
 	nfsd_filecache_wq = NULL;
+	rhashtable_destroy(&nfsd_file_rhash_tbl);
 	goto out;
 }
 
@@ -904,6 +1012,7 @@ nfsd_file_cache_shutdown(void)
 	nfsd_file_hashtbl = NULL;
 	destroy_workqueue(nfsd_filecache_wq);
 	nfsd_filecache_wq = NULL;
+	rhashtable_destroy(&nfsd_file_rhash_tbl);
 
 	for_each_possible_cpu(i) {
 		this_cpu_write(nfsd_file_cache_hits, 0);
@@ -915,26 +1024,6 @@ nfsd_file_cache_shutdown(void)
 	}
 }
 
-static bool
-nfsd_match_cred(const struct cred *c1, const struct cred *c2)
-{
-	int i;
-
-	if (!uid_eq(c1->fsuid, c2->fsuid))
-		return false;
-	if (!gid_eq(c1->fsgid, c2->fsgid))
-		return false;
-	if (c1->group_info == NULL || c2->group_info == NULL)
-		return c1->group_info == c2->group_info;
-	if (c1->group_info->ngroups != c2->group_info->ngroups)
-		return false;
-	for (i = 0; i < c1->group_info->ngroups; i++) {
-		if (!gid_eq(c1->group_info->gid[i], c2->group_info->gid[i]))
-			return false;
-	}
-	return true;
-}
-
 static struct nfsd_file *
 nfsd_file_find_locked(struct inode *inode, unsigned int may_flags,
 			unsigned int hashval, struct net *net)
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index 31dc65f82c75..7fc017e7b09e 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -29,6 +29,7 @@ struct nfsd_file_mark {
  * never be dereferenced, only used for comparison.
  */
 struct nfsd_file {
+	struct rhash_head	nf_rhash;
 	struct hlist_node	nf_node;
 	struct list_head	nf_lru;
 	struct rcu_head		nf_rcu;



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH RFC 29/30] NFSD: Convert the filecache to use rhashtable
  2022-06-22 14:12 [PATCH RFC 00/30] Overhaul NFSD filecache Chuck Lever
                   ` (27 preceding siblings ...)
  2022-06-22 14:15 ` [PATCH RFC 28/30] NFSD: Set up an rhashtable for the filecache Chuck Lever
@ 2022-06-22 14:15 ` Chuck Lever
  2022-06-23  0:38   ` Dave Chinner
  2022-06-22 14:16 ` [PATCH RFC 30/30] NFSD: Clean up unusued code after rhashtable conversion Chuck Lever
                   ` (2 subsequent siblings)
  31 siblings, 1 reply; 51+ messages in thread
From: Chuck Lever @ 2022-06-22 14:15 UTC (permalink / raw)
  To: linux-nfs, netdev; +Cc: david, tgraf, jlayton

Enable the filecache hash table to start small, then grow with the
workload. Smaller server deployments benefit because there should
be lower memory utilization. Larger server deployments should see
improved scaling with the number of open files.

I know this is a big and messy patch, but there's no good way to
rip out and replace a data structure like this.

Suggested-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c |  259 ++++++++++++++++++++++++---------------------------
 fs/nfsd/trace.h     |    2 
 2 files changed, 125 insertions(+), 136 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index a491519598fc..14b607e544bf 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -62,7 +62,6 @@ static struct nfsd_fcache_bucket	*nfsd_file_hashtbl;
 static struct list_lru			nfsd_file_lru;
 static unsigned long			nfsd_file_flags;
 static struct fsnotify_group		*nfsd_file_fsnotify_group;
-static atomic_long_t			nfsd_filecache_count;
 static atomic_long_t			nfsd_file_total_age;
 static struct delayed_work		nfsd_filecache_laundrette;
 
@@ -170,7 +169,7 @@ static const struct rhashtable_params nfsd_file_rhash_params = {
 static void
 nfsd_file_schedule_laundrette(void)
 {
-	if ((atomic_long_read(&nfsd_filecache_count) == 0) ||
+	if ((atomic_read(&nfsd_file_rhash_tbl.nelems) == 0) ||
 	    test_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 0)
 		return;
 
@@ -282,9 +281,10 @@ nfsd_file_alloc(struct inode *inode, unsigned int may, struct net *net)
 		nf->nf_file = NULL;
 		nf->nf_cred = get_current_cred();
 		nf->nf_net = net;
-		nf->nf_flags = 0;
+		nf->nf_flags = BIT(NFSD_FILE_HASHED) | BIT(NFSD_FILE_PENDING);
 		nf->nf_inode = inode;
-		refcount_set(&nf->nf_ref, 1);
+		/* nf_ref is pre-incremented for hash table */
+		refcount_set(&nf->nf_ref, 2);
 		nf->nf_may = may & NFSD_FILE_MAY_MASK;
 		if (may & NFSD_MAY_NOT_BREAK_LEASE) {
 			if (may & NFSD_MAY_WRITE)
@@ -377,40 +377,21 @@ static void nfsd_file_lru_remove(struct nfsd_file *nf)
 }
 
 static void
-nfsd_file_do_unhash(struct nfsd_file *nf)
+nfsd_file_hash_remove(struct nfsd_file *nf)
 {
-	struct inode *inode = nf->nf_inode;
-	unsigned int hashval = (unsigned int)hash_long(inode->i_ino,
-				NFSD_FILE_HASH_BITS);
-
-	lockdep_assert_held(&nfsd_file_hashtbl[hashval].nfb_lock);
-
 	trace_nfsd_file_unhash(nf);
 
 	if (nfsd_file_check_write_error(nf))
 		nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id));
-	--nfsd_file_hashtbl[hashval].nfb_count;
-	hlist_del_rcu(&nf->nf_node);
-	atomic_long_dec(&nfsd_filecache_count);
-}
-
-static void
-nfsd_file_hash_remove(struct nfsd_file *nf)
-{
-	struct inode *inode = nf->nf_inode;
-	unsigned int hashval = (unsigned int)hash_long(inode->i_ino,
-				NFSD_FILE_HASH_BITS);
-
-	spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock);
-	nfsd_file_do_unhash(nf);
-	spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
+	rhashtable_remove_fast(&nfsd_file_rhash_tbl, &nf->nf_rhash,
+			       nfsd_file_rhash_params);
 }
 
 static bool
 nfsd_file_unhash(struct nfsd_file *nf)
 {
 	if (test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
-		nfsd_file_do_unhash(nf);
+		nfsd_file_hash_remove(nf);
 		return true;
 	}
 	return false;
@@ -420,9 +401,9 @@ nfsd_file_unhash(struct nfsd_file *nf)
  * Return true if the file was unhashed.
  */
 static bool
-nfsd_file_unhash_and_release_locked(struct nfsd_file *nf, struct list_head *dispose)
+nfsd_file_unhash_and_dispose(struct nfsd_file *nf, struct list_head *dispose)
 {
-	trace_nfsd_file_unhash_and_release_locked(nf);
+	trace_nfsd_file_unhash_and_dispose(nf);
 	if (!nfsd_file_unhash(nf))
 		return false;
 	/* keep final reference for nfsd_file_lru_dispose */
@@ -683,17 +664,21 @@ static struct shrinker	nfsd_file_shrinker = {
 static void
 __nfsd_file_close_inode(struct inode *inode, struct list_head *dispose)
 {
-	unsigned int		hashval = (unsigned int)hash_long(inode->i_ino,
-						NFSD_FILE_HASH_BITS);
-	struct nfsd_file	*nf;
-	struct hlist_node	*tmp;
-
-	spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock);
-	hlist_for_each_entry_safe(nf, tmp, &nfsd_file_hashtbl[hashval].nfb_head, nf_node) {
-		if (inode == nf->nf_inode)
-			nfsd_file_unhash_and_release_locked(nf, dispose);
-	}
-	spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
+	struct nfsd_file_lookup_key key = {
+		.type	= NFSD_FILE_KEY_INODE,
+		.inode	= inode,
+	};
+	struct nfsd_file *nf;
+
+	rcu_read_lock();
+	do {
+		nf = rhashtable_lookup(&nfsd_file_rhash_tbl, &key,
+				       nfsd_file_rhash_params);
+		if (!nf)
+			break;
+		nfsd_file_unhash_and_dispose(nf, dispose);
+	} while (1);
+	rcu_read_unlock();
 }
 
 /**
@@ -895,30 +880,39 @@ nfsd_file_cache_init(void)
 static void
 __nfsd_file_cache_purge(struct net *net)
 {
-	unsigned int		i;
-	struct nfsd_file	*nf;
-	struct hlist_node	*next;
+	struct rhashtable_iter iter;
+	struct nfsd_file *nf;
 	LIST_HEAD(dispose);
 	bool del;
 
-	for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) {
-		struct nfsd_fcache_bucket *nfb = &nfsd_file_hashtbl[i];
+	lockdep_assert_held(&nfsd_mutex);
+	if (test_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 0)
+		return;
+
+	rhashtable_walk_enter(&nfsd_file_rhash_tbl, &iter);
+	do {
+		rhashtable_walk_start(&iter);
 
-		spin_lock(&nfb->nfb_lock);
-		hlist_for_each_entry_safe(nf, next, &nfb->nfb_head, nf_node) {
+		nf = rhashtable_walk_next(&iter);
+		while (!IS_ERR_OR_NULL(nf)) {
 			if (net && nf->nf_net != net)
 				continue;
-			del = nfsd_file_unhash_and_release_locked(nf, &dispose);
+			del = nfsd_file_unhash_and_dispose(nf, &dispose);
 
 			/*
 			 * Deadlock detected! Something marked this entry as
 			 * unhased, but hasn't removed it from the hash list.
 			 */
 			WARN_ON_ONCE(!del);
+
+			nf = rhashtable_walk_next(&iter);
 		}
-		spin_unlock(&nfb->nfb_lock);
-		nfsd_file_dispose_list(&dispose);
-	}
+
+		rhashtable_walk_stop(&iter);
+	} while (nf == ERR_PTR(-EAGAIN));
+	rhashtable_walk_exit(&iter);
+
+	nfsd_file_dispose_list(&dispose);
 }
 
 static struct nfsd_fcache_disposal *
@@ -1025,55 +1019,73 @@ nfsd_file_cache_shutdown(void)
 }
 
 static struct nfsd_file *
-nfsd_file_find_locked(struct inode *inode, unsigned int may_flags,
-			unsigned int hashval, struct net *net)
+nfsd_file_find(struct inode *inode, unsigned int may_flags, struct net *net)
 {
+	struct nfsd_file_lookup_key key = {
+		.type	= NFSD_FILE_KEY_FULL,
+		.inode	= inode,
+		.need	= may_flags & NFSD_FILE_MAY_MASK,
+		.net	= net,
+		.cred	= current_cred(),
+	};
 	struct nfsd_file *nf;
-	unsigned char need = may_flags & NFSD_FILE_MAY_MASK;
 
-	hlist_for_each_entry_rcu(nf, &nfsd_file_hashtbl[hashval].nfb_head,
-				 nf_node, lockdep_is_held(&nfsd_file_hashtbl[hashval].nfb_lock)) {
-		if (nf->nf_may != need)
-			continue;
-		if (nf->nf_inode != inode)
-			continue;
-		if (nf->nf_net != net)
-			continue;
-		if (!nfsd_match_cred(nf->nf_cred, current_cred()))
-			continue;
-		if (!test_bit(NFSD_FILE_HASHED, &nf->nf_flags))
-			continue;
-		if (nfsd_file_get(nf) != NULL)
-			return nf;
-	}
-	return NULL;
+	rcu_read_lock();
+	nf = rhashtable_lookup(&nfsd_file_rhash_tbl, &key,
+			       nfsd_file_rhash_params);
+	if (nf)
+		nf = nfsd_file_get(nf);
+	rcu_read_unlock();
+	return nf;
+}
+
+/*
+ * Atomically insert a new nfsd_file item into nfsd_file_rhash_tbl.
+ *
+ * Return values:
+ *   %NULL: @new was inserted successfully
+ *   %A valid pointer: @new was not inserted, a matching item is returned
+ *   %ERR_PTR: an unexpected error occurred during insertion
+ */
+static struct nfsd_file *nfsd_file_insert(struct nfsd_file *new)
+{
+	struct nfsd_file_lookup_key key = {
+		.type	= NFSD_FILE_KEY_FULL,
+		.inode	= new->nf_inode,
+		.need	= new->nf_flags,
+		.net	= new->nf_net,
+		.cred	= current_cred(),
+	};
+	struct nfsd_file *nf;
+
+	nf = rhashtable_lookup_get_insert_key(&nfsd_file_rhash_tbl,
+					      &key, &new->nf_rhash,
+					      nfsd_file_rhash_params);
+	if (!nf)
+		return nf;
+	return nfsd_file_get(nf);
 }
 
 /**
- * nfsd_file_is_cached - are there any cached open files for this fh?
- * @inode: inode of the file to check
+ * nfsd_file_is_cached - are there any cached open files for this inode?
+ * @inode: inode to check
  *
- * Scan the hashtable for open files that match this fh. Returns true if there
- * are any, and false if not.
+ * Return values:
+ *   %true: filecache contains at least one file matching this inode
+ *   %false: filecache contains no files matching this inode
  */
 bool
 nfsd_file_is_cached(struct inode *inode)
 {
-	bool			ret = false;
-	struct nfsd_file	*nf;
-	unsigned int		hashval;
-
-        hashval = (unsigned int)hash_long(inode->i_ino, NFSD_FILE_HASH_BITS);
-
-	rcu_read_lock();
-	hlist_for_each_entry_rcu(nf, &nfsd_file_hashtbl[hashval].nfb_head,
-				 nf_node) {
-		if (inode == nf->nf_inode) {
-			ret = true;
-			break;
-		}
-	}
-	rcu_read_unlock();
+	struct nfsd_file_lookup_key key = {
+		.type	= NFSD_FILE_KEY_INODE,
+		.inode	= inode,
+	};
+	bool ret = false;
+
+	if (rhashtable_lookup_fast(&nfsd_file_rhash_tbl, &key,
+				   nfsd_file_rhash_params) != NULL)
+		ret = true;
 	trace_nfsd_file_is_cached(inode, (int)ret);
 	return ret;
 }
@@ -1086,7 +1098,6 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	struct net *net = SVC_NET(rqstp);
 	struct nfsd_file *nf, *new;
 	struct inode *inode;
-	unsigned int hashval;
 	bool retry = true;
 
 	status = fh_verify(rqstp, fhp, S_IFREG,
@@ -1095,12 +1106,9 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		return status;
 
 	inode = d_inode(fhp->fh_dentry);
-	hashval = (unsigned int)hash_long(inode->i_ino, NFSD_FILE_HASH_BITS);
 retry:
 	/* Avoid allocation if the item is already in cache */
-	rcu_read_lock();
-	nf = nfsd_file_find_locked(inode, may_flags, hashval, net);
-	rcu_read_unlock();
+	nf = nfsd_file_find(inode, may_flags, net);
 	if (nf == NULL) {
 		new = nfsd_file_alloc(inode, may_flags, net);
 		if (!new) {
@@ -1108,18 +1116,20 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 			goto out_status;
 		}
 
-		spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock);
-		nf = nfsd_file_find_locked(inode, may_flags, hashval, net);
-		if (nf == NULL)
+		nf = nfsd_file_insert(new);
+		if (nf == NULL) {
+			nf = new;
 			goto open_file;
-		spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
+		}
 
 		nfsd_file_slab_free(&new->nf_rcu);
+		if (IS_ERR(nf)) {
+			status = nfserr_jukebox;
+			goto out_status;
+		}
 	}
 
 	wait_on_bit(&nf->nf_flags, NFSD_FILE_PENDING, TASK_UNINTERRUPTIBLE);
-
-	/* Did construction of this file fail? */
 	if (!test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
 		this_cpu_inc(nfsd_file_cons_fails);
 		if (!retry) {
@@ -1128,6 +1138,7 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		}
 		retry = false;
 		nfsd_file_put_noref(nf);
+		cond_resched();
 		goto retry;
 	}
 
@@ -1164,18 +1175,6 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	return status;
 
 open_file:
-	nf = new;
-	/* Take reference for the hashtable */
-	refcount_inc(&nf->nf_ref);
-	__set_bit(NFSD_FILE_HASHED, &nf->nf_flags);
-	__set_bit(NFSD_FILE_PENDING, &nf->nf_flags);
-	hlist_add_head_rcu(&nf->nf_node, &nfsd_file_hashtbl[hashval].nfb_head);
-	++nfsd_file_hashtbl[hashval].nfb_count;
-	nfsd_file_hashtbl[hashval].nfb_maxcount = max(nfsd_file_hashtbl[hashval].nfb_maxcount,
-			nfsd_file_hashtbl[hashval].nfb_count);
-	spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
-	atomic_long_inc(&nfsd_filecache_count);
-
 	nf->nf_mark = nfsd_file_mark_find_or_create(nf);
 	if (nf->nf_mark) {
 		if (open) {
@@ -1190,15 +1189,9 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	 * If construction failed, or we raced with a call to unlink()
 	 * then unhash.
 	 */
-	if (status != nfs_ok || inode->i_nlink == 0) {
-		bool do_free;
-		nfsd_file_lru_remove(nf);
-		spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock);
-		do_free = nfsd_file_unhash(nf);
-		spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
-		if (do_free)
+	if (status != nfs_ok || inode->i_nlink == 0)
+		if (nfsd_file_unhash(nf))
 			nfsd_file_put_noref(nf);
-	}
 	clear_bit_unlock(NFSD_FILE_PENDING, &nf->nf_flags);
 	smp_mb__after_atomic();
 	wake_up_bit(&nf->nf_flags, NFSD_FILE_PENDING);
@@ -1248,21 +1241,15 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v)
 {
 	unsigned long releases = 0, pages_flushed = 0, evictions = 0;
 	unsigned long hits = 0, acquisitions = 0, cons_fails = 0;
-	unsigned int i, count = 0, longest = 0;
+	struct bucket_table *tbl;
+	int i;
 
-	/*
-	 * No need for spinlocks here since we're not terribly interested in
-	 * accuracy. We do take the nfsd_mutex simply to ensure that we
-	 * don't end up racing with server shutdown
-	 */
+	/* Serialize with server shutdown */
 	mutex_lock(&nfsd_mutex);
-	if (nfsd_file_hashtbl) {
-		for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) {
-			count += nfsd_file_hashtbl[i].nfb_count;
-			longest = max(longest, nfsd_file_hashtbl[i].nfb_count);
-		}
-	}
-	mutex_unlock(&nfsd_mutex);
+
+	rcu_read_lock();
+	tbl = rht_dereference_rcu(nfsd_file_rhash_tbl.tbl, &nfsd_file_rhash_tbl);
+	rcu_read_unlock();
 
 	for_each_possible_cpu(i) {
 		hits += per_cpu(nfsd_file_cache_hits, i);
@@ -1273,8 +1260,8 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v)
 		cons_fails += per_cpu(nfsd_file_cons_fails, i);
 	}
 
-	seq_printf(m, "total entries: %u\n", count);
-	seq_printf(m, "longest chain: %u\n", longest);
+	seq_printf(m, "total entries: %d\n", atomic_read(&nfsd_file_rhash_tbl.nelems));
+	seq_printf(m, "hash buckets:  %u\n", tbl->size);
 	seq_printf(m, "lru entries:   %lu\n", list_lru_count(&nfsd_file_lru));
 	seq_printf(m, "cache hits:    %lu\n", hits);
 	seq_printf(m, "acquisitions:  %lu\n", acquisitions);
@@ -1287,6 +1274,8 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v)
 		seq_printf(m, "mean age (ms): -\n");
 	seq_printf(m, "pages flushed: %lu\n", pages_flushed);
 	seq_printf(m, "cons fails:    %lu\n", cons_fails);
+
+	mutex_unlock(&nfsd_mutex);
 	return 0;
 }
 
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index c64336016d2c..ac2712271b08 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -734,7 +734,7 @@ DEFINE_NFSD_FILE_EVENT(nfsd_file_alloc);
 DEFINE_NFSD_FILE_EVENT(nfsd_file_put_final);
 DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash);
 DEFINE_NFSD_FILE_EVENT(nfsd_file_put);
-DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash_and_release_locked);
+DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash_and_dispose);
 
 TRACE_EVENT(nfsd_file_acquire,
 	TP_PROTO(



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH RFC 30/30] NFSD: Clean up unusued code after rhashtable conversion
  2022-06-22 14:12 [PATCH RFC 00/30] Overhaul NFSD filecache Chuck Lever
                   ` (28 preceding siblings ...)
  2022-06-22 14:15 ` [PATCH RFC 29/30] NFSD: Convert the filecache to use rhashtable Chuck Lever
@ 2022-06-22 14:16 ` Chuck Lever
  2022-06-22 18:36 ` [PATCH RFC 00/30] Overhaul NFSD filecache Wang Yugui
  2022-06-23 20:27 ` Frank van der Linden
  31 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2022-06-22 14:16 UTC (permalink / raw)
  To: linux-nfs, netdev; +Cc: david, tgraf, jlayton

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c |   31 +------------------------------
 fs/nfsd/filecache.h |    3 +--
 2 files changed, 2 insertions(+), 32 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 14b607e544bf..88c5d8393981 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -24,9 +24,6 @@
 
 #define NFSDDBG_FACILITY	NFSDDBG_FH
 
-/* FIXME: dynamically size this for the machine somehow? */
-#define NFSD_FILE_HASH_BITS                   12
-#define NFSD_FILE_HASH_SIZE                  (1 << NFSD_FILE_HASH_BITS)
 #define NFSD_LAUNDRETTE_DELAY		     (2 * HZ)
 
 #define NFSD_FILE_CACHE_UP		     (0)
@@ -34,13 +31,6 @@
 /* We only care about NFSD_MAY_READ/WRITE for this cache */
 #define NFSD_FILE_MAY_MASK	(NFSD_MAY_READ|NFSD_MAY_WRITE)
 
-struct nfsd_fcache_bucket {
-	struct hlist_head	nfb_head;
-	spinlock_t		nfb_lock;
-	unsigned int		nfb_count;
-	unsigned int		nfb_maxcount;
-};
-
 static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits);
 static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions);
 static DEFINE_PER_CPU(unsigned long, nfsd_file_releases);
@@ -58,7 +48,6 @@ static struct workqueue_struct *nfsd_filecache_wq __read_mostly;
 
 static struct kmem_cache		*nfsd_file_slab;
 static struct kmem_cache		*nfsd_file_mark_slab;
-static struct nfsd_fcache_bucket	*nfsd_file_hashtbl;
 static struct list_lru			nfsd_file_lru;
 static unsigned long			nfsd_file_flags;
 static struct fsnotify_group		*nfsd_file_fsnotify_group;
@@ -275,7 +264,6 @@ nfsd_file_alloc(struct inode *inode, unsigned int may, struct net *net)
 
 	nf = kmem_cache_alloc(nfsd_file_slab, GFP_KERNEL);
 	if (nf) {
-		INIT_HLIST_NODE(&nf->nf_node);
 		INIT_LIST_HEAD(&nf->nf_lru);
 		nf->nf_birthtime = ktime_get();
 		nf->nf_file = NULL;
@@ -784,8 +772,7 @@ static const struct fsnotify_ops nfsd_file_fsnotify_ops = {
 int
 nfsd_file_cache_init(void)
 {
-	int		ret;
-	unsigned int	i;
+	int ret;
 
 	lockdep_assert_held(&nfsd_mutex);
 	if (test_and_set_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 1)
@@ -800,13 +787,6 @@ nfsd_file_cache_init(void)
 	if (!nfsd_filecache_wq)
 		goto out;
 
-	nfsd_file_hashtbl = kvcalloc(NFSD_FILE_HASH_SIZE,
-				sizeof(*nfsd_file_hashtbl), GFP_KERNEL);
-	if (!nfsd_file_hashtbl) {
-		pr_err("nfsd: unable to allocate nfsd_file_hashtbl\n");
-		goto out_err;
-	}
-
 	nfsd_file_slab = kmem_cache_create("nfsd_file",
 				sizeof(struct nfsd_file), 0, 0, NULL);
 	if (!nfsd_file_slab) {
@@ -850,11 +830,6 @@ nfsd_file_cache_init(void)
 		goto out_notifier;
 	}
 
-	for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) {
-		INIT_HLIST_HEAD(&nfsd_file_hashtbl[i].nfb_head);
-		spin_lock_init(&nfsd_file_hashtbl[i].nfb_lock);
-	}
-
 	INIT_DELAYED_WORK(&nfsd_filecache_laundrette, nfsd_file_gc_worker);
 out:
 	return ret;
@@ -869,8 +844,6 @@ nfsd_file_cache_init(void)
 	nfsd_file_slab = NULL;
 	kmem_cache_destroy(nfsd_file_mark_slab);
 	nfsd_file_mark_slab = NULL;
-	kvfree(nfsd_file_hashtbl);
-	nfsd_file_hashtbl = NULL;
 	destroy_workqueue(nfsd_filecache_wq);
 	nfsd_filecache_wq = NULL;
 	rhashtable_destroy(&nfsd_file_rhash_tbl);
@@ -1002,8 +975,6 @@ nfsd_file_cache_shutdown(void)
 	fsnotify_wait_marks_destroyed();
 	kmem_cache_destroy(nfsd_file_mark_slab);
 	nfsd_file_mark_slab = NULL;
-	kvfree(nfsd_file_hashtbl);
-	nfsd_file_hashtbl = NULL;
 	destroy_workqueue(nfsd_filecache_wq);
 	nfsd_filecache_wq = NULL;
 	rhashtable_destroy(&nfsd_file_rhash_tbl);
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index 7fc017e7b09e..5ce3fdf3b729 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -24,13 +24,12 @@ struct nfsd_file_mark {
 
 /*
  * A representation of a file that has been opened by knfsd. These are hashed
- * in the hashtable by inode pointer value. Note that this object doesn't
+ * in an rhashtable by inode pointer value. Note that this object doesn't
  * hold a reference to the inode by itself, so the nf_inode pointer should
  * never be dereferenced, only used for comparison.
  */
 struct nfsd_file {
 	struct rhash_head	nf_rhash;
-	struct hlist_node	nf_node;
 	struct list_head	nf_lru;
 	struct rcu_head		nf_rcu;
 	struct file		*nf_file;



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 00/30] Overhaul NFSD filecache
  2022-06-22 14:12 [PATCH RFC 00/30] Overhaul NFSD filecache Chuck Lever
                   ` (29 preceding siblings ...)
  2022-06-22 14:16 ` [PATCH RFC 30/30] NFSD: Clean up unusued code after rhashtable conversion Chuck Lever
@ 2022-06-22 18:36 ` Wang Yugui
  2022-06-22 19:04   ` Chuck Lever III
  2022-06-23 20:27 ` Frank van der Linden
  31 siblings, 1 reply; 51+ messages in thread
From: Wang Yugui @ 2022-06-22 18:36 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs, netdev, david, tgraf, jlayton

[-- Attachment #1: Type: text/plain, Size: 3794 bytes --]

Hi,

fstests generic/531 triggered a panic on kernel 5.19.0-rc3 with this
patchset.

[  405.478056] BUG: kernel NULL pointer dereference, address: 0000000000000049

[  405.608016] Call Trace:
[  405.608016]  <TASK>
[  405.613020]  nfs4_get_vfs_file+0x325/0x410 [nfsd]
[  405.618018]  nfsd4_process_open2+0x4ba/0x16d0 [nfsd]
[  405.623016]  ? inode_get_bytes+0x38/0x40
[  405.623016]  ? nfsd_permission+0x97/0xf0 [nfsd]
[  405.628022]  ? fh_verify+0x1cc/0x6f0 [nfsd]
[  405.633025]  nfsd4_open+0x640/0xb30 [nfsd]
[  405.638025]  nfsd4_proc_compound+0x3bd/0x710 [nfsd]
[  405.643017]  nfsd_dispatch+0x143/0x270 [nfsd]
[  405.648019]  svc_process_common+0x3bf/0x5b0 [sunrpc]

more detail in attachment file(531.dmesg)

local.config of fstests:
	export NFS_MOUNT_OPTIONS="-o rw,relatime,vers=4.2,nconnect=8"
changes of generic/531
	max_allowable_files=$(( 1 * 1024 * 1024 / $nr_cpus / 2 ))

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2022/06/23

> This series overhauls the NFSD filecache, a cache of server-side
> "struct file" objects recently used by NFS clients. The purposes of
> this overhaul are an immediate improvement in cache scalability in
> the number of open files, and preparation for further improvements.
> 
> There are three categories of patches in this series:
> 
> 1. Add observability of cache operation so we can see what we're
> doing as changes are made to the code.
> 
> 2. Improve the scalability of filecache garbage collection,
> addressing several bugs along the way.
> 
> 3. Improve the scalability of the filecache hash table by converting
> it to use rhashtable.
> 
> The series as it stands survives typical test workloads. Running
> stress-tests like generic/531 is the next step.
> 
> These patches are also available in the linux-nfs-bugzilla-386
> branch of
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git 
> 
> ---
> 
> Chuck Lever (30):
>       NFSD: Report filecache LRU size
>       NFSD: Report count of calls to nfsd_file_acquire()
>       NFSD: Report count of freed filecache items
>       NFSD: Report average age of filecache items
>       NFSD: Add nfsd_file_lru_dispose_list() helper
>       NFSD: Refactor nfsd_file_gc()
>       NFSD: Refactor nfsd_file_lru_scan()
>       NFSD: Report the number of items evicted by the LRU walk
>       NFSD: Record number of flush calls
>       NFSD: Report filecache item construction failures
>       NFSD: Zero counters when the filecache is re-initialized
>       NFSD: Hook up the filecache stat file
>       NFSD: WARN when freeing an item still linked via nf_lru
>       NFSD: Trace filecache LRU activity
>       NFSD: Leave open files out of the filecache LRU
>       NFSD: Fix the filecache LRU shrinker
>       NFSD: Never call nfsd_file_gc() in foreground paths
>       NFSD: No longer record nf_hashval in the trace log
>       NFSD: Remove lockdep assertion from unhash_and_release_locked()
>       NFSD: nfsd_file_unhash can compute hashval from nf->nf_inode
>       NFSD: Refactor __nfsd_file_close_inode()
>       NFSD: nfsd_file_hash_remove can compute hashval
>       NFSD: Remove nfsd_file::nf_hashval
>       NFSD: Remove stale comment from nfsd_file_acquire()
>       NFSD: Clean up "open file" case in nfsd_file_acquire()
>       NFSD: Document nfsd_file_cache_purge() API contract
>       NFSD: Replace the "init once" mechanism
>       NFSD: Set up an rhashtable for the filecache
>       NFSD: Convert the filecache to use rhashtable
>       NFSD: Clean up unusued code after rhashtable conversion
> 
> 
>  fs/nfsd/filecache.c | 677 +++++++++++++++++++++++++++-----------------
>  fs/nfsd/filecache.h |   6 +-
>  fs/nfsd/nfsctl.c    |  10 +
>  fs/nfsd/trace.h     | 117 ++++++--
>  4 files changed, 522 insertions(+), 288 deletions(-)
> 
> --
> Chuck Lever


[-- Attachment #2: 531.dmesg --]
[-- Type: application/octet-stream, Size: 55461 bytes --]


T7610 login: 
[  390.362351] run fstests generic/531 at 2022-06-23 02:29:34
[  405.478056] BUG: kernel NULL pointer dereference, address: 0000000000000049
[  405.483037] #PF: supervisor read access in kernel mode
[  405.488015] #PF: error_code(0x0000) - not-present page
[  405.493024] PGD 12283d067 P4D 12283d067 PUD 1518d2067 PMD 0
[  405.498024] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  405.503014] CPU: 33 PID: 2044 Comm: nfsd Tainted: G           OE     5.19.0-3.1.el7.x86_64 #1
[  405.513022] Hardware name: Dell Inc. Precision T7610/0NK70N, BIOS A18 09/11/2019
[  405.518022] RIP: 0010:nfsd_do_file_acquire+0x4e1/0xb80 [nfsd]
[  405.523019] Code: e8 a4 a3 da c9 e8 af 24 e3 c9 4c 89 ea 48 c7 c7 80 5a fb c1 48 8d 74 24 58 e8 9b 63 21 ca 49 89 c7 4d 85 ff 0f 84 db 00 00 00 <41> 8b 4f 50 49 8d 7f 50 85 c9 0f 84 cb 00 00 00 8d 51 01 89 c8 f0
[  405.543017] RSP: 0018:ffffbd590e9bbb78 EFLAGS: 00010286
[  405.548019] RAX: 0000000000000000 RBX: 0000000000000006 RCX: ffff98b320271b80
[  405.558023] RDX: ffffbd590e9bbbd0 RSI: 00000000fffffe01 RDI: ffff98b24d264d40
[  405.563021] RBP: ffff98d2481ec000 R08: ffff98d2f30f3dd0 R09: 0000000000000000
[  405.573020] R10: ffffbd590e9bbd88 R11: 0000000000008000 R12: 000000000000000e
[  405.578022] R13: ffff98d2c0075208 R14: ffffffffc1fb5a80 R15: fffffffffffffff9
[  405.583018] FS:  0000000000000000(0000) GS:ffff98e16fb40000(0000) knlGS:0000000000000000
[  405.593018] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  405.598019] CR2: 0000000000000049 CR3: 00000020d4850006 CR4: 00000000001706e0
[  405.608016] Call Trace:
[  405.608016]  <TASK>
[  405.613020]  nfs4_get_vfs_file+0x325/0x410 [nfsd]
[  405.618018]  nfsd4_process_open2+0x4ba/0x16d0 [nfsd]
[  405.623016]  ? inode_get_bytes+0x38/0x40
[  405.623016]  ? nfsd_permission+0x97/0xf0 [nfsd]
[  405.628022]  ? fh_verify+0x1cc/0x6f0 [nfsd]
[  405.633025]  nfsd4_open+0x640/0xb30 [nfsd]
[  405.638025]  nfsd4_proc_compound+0x3bd/0x710 [nfsd]
[  405.643017]  nfsd_dispatch+0x143/0x270 [nfsd]
[  405.648019]  svc_process_common+0x3bf/0x5b0 [sunrpc]
[  405.653026]  ? svc_sock_secure_port+0x12/0x30 [sunrpc]
[  405.658021]  ? nfsd_svc+0x350/0x350 [nfsd]
[  405.663019]  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
[  405.668017]  svc_process+0xb7/0xf0 [sunrpc]
[  405.673022]  nfsd+0xd5/0x190 [nfsd]
[  405.678019]  kthread+0xe3/0x110
[  405.678019]  ? kthread_complete_and_exit+0x20/0x20
[  405.688019]  ret_from_fork+0x1f/0x30
[  405.693017]  </TASK>
[  405.693017] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm nfsd nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill ib_core sunrpc dm_multipath dm_mod intel_rapl_msr intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio snd_hda_intel sb_edac snd_intel_dspcfg x86_pkg_temp_thermal snd_intel_sdw_acpi intel_powerclamp coretemp snd_hda_codec mei_wdt kvm_intel snd_hda_core dcdbas snd_hwdep iTCO_wdt iTCO_vendor_support dell_smm_hwmon snd_seq kvm btrfs(OE) snd_seq_device irqbypass snd_pcm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl blake2b_generic raid6_pq mei_me snd_timer intel_cstate i2c_i801 zstd_compress snd intel_uncore pcspkr lpc_ich mei i2c_smbus soundcore xfs amdgpu iommu_v2 gpu_sched drm_buddy radeon i2c_algo_bit drm_ttm_helper ttm drm_display_helper sd_mod drm_kms_helper t10_pi syscopyarea sysfillrect sr_mod cdrom sysimgblt sg fb_sys_fops ahci mpt3sas bnx2x libahci drm raid_class libata e1000e
[  405.693017]  crc32c_intel mdio cec scsi_transport_sas wmi i2c_dev ipmi_devintf ipmi_msghandler
[  405.793022] Unloaded tainted modules: pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1
[  405.798022]  acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1
[  405.898029]  acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1
[  406.013024] CR2: 0000000000000049
[  406.018020] ---[ end trace 0000000000000000 ]---
[  405.479004] BUG: kernel NULL pointer dereference, address: 0000000000000049
[  406.023013] RIP: 0010:nfsd_do_file_acquire+0x4e1/0xb80 [nfsd]
[  405.479004] #PF: supervisor read access in kernel mode
[  406.023013] Code: e8 a4 a3 da c9 e8 af 24 e3 c9 4c 89 ea 48 c7 c7 80 5a fb c1 48 8d 74 24 58 e8 9b 63 21 ca 49 89 c7 4d 85 ff 0f 84 db 00 00 00 <41> 8b 4f 50 49 8d 7f 50 85 c9 0f 84 cb 00 00 00 8d 51 01 89 c8 f0
[  405.479004] #PF: error_code(0x0000) - not-present page
[  406.054067] RSP: 0018:ffffbd590e9bbb78 EFLAGS: 00010286
[  405.479004] PGD 15475e067
[  406.064016]
[  405.479004] P4D 15475e067
[  406.064016] RAX: 0000000000000000 RBX: 0000000000000006 RCX: ffff98b320271b80
[  405.479004] PUD 15475f067
[  406.069011] RDX: ffffbd590e9bbbd0 RSI: 00000000fffffe01 RDI: ffff98b24d264d40
[  405.479004] PMD 0
[  406.069011] RBP: ffff98d2481ec000 R08: ffff98d2f30f3dd0 R09: 0000000000000000
[  405.479004]
[  406.074015] R10: ffffbd590e9bbd88 R11: 0000000000008000 R12: 000000000000000e
[  405.479004] Oops: 0000 [#2] PREEMPT SMP NOPTI
[  406.079021] R13: ffff98d2c0075208 R14: ffffffffc1fb5a80 R15: fffffffffffffff9
[  405.479004] CPU: 18 PID: 2046 Comm: nfsd Tainted: G      D    OE     5.19.0-3.1.el7.x86_64 #1
[  406.079021] FS:  0000000000000000(0000) GS:ffff98e16fb40000(0000) knlGS:0000000000000000
[  405.479004] Hardware name: Dell Inc. Precision T7610/0NK70N, BIOS A18 09/11/2019
[  406.084009] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  405.479004] RIP: 0010:nfsd_do_file_acquire+0x4e1/0xb80 [nfsd]
[  406.089015] CR2: 0000000000000049 CR3: 00000020d4850006 CR4: 00000000001706e0
[  405.479004] Code: e8 a4 a3 da c9 e8 af 24 e3 c9 4c 89 ea 48 c7 c7 80 5a fb c1 48 8d 74 24 58 e8 9b 63 21 ca 49 89 c7 4d 85 ff 0f 84 db 00 00 00 <41> 8b 4f 50 49 8d 7f 50 85 c9 0f 84 cb 00 00 00 8d 51 01 89 c8 f0
[  406.089015] Kernel panic - not syncing: Fatal exception
[  405.479004] RSP: 0018:ffffbd590ec3bb78 EFLAGS: 00010286
[  405.479004] RAX: 0000000000000000 RBX: 0000000000000006 RCX: ffff98b3202ad338
[  405.479004] RDX: ffffbd590ec3bbd0 RSI: 00000000fffffe01 RDI: ffff98b24d260000
[  405.479004] RBP: ffff98d1b8a38000 R08: ffff98d3995e8000 R09: 0000000000000000
[  405.479004] R10: ffffbd590ec3bd88 R11: 0000000000008000 R12: 000000000000000c
[  405.479004] R13: ffff98d32dc42340 R14: ffffffffc1fb5a80 R15: fffffffffffffff9
[  405.479004] FS:  0000000000000000(0000) GS:ffff98e16fa00000(0000) knlGS:0000000000000000
[  405.479004] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  405.479004] CR2: 0000000000000049 CR3: 000000010e2aa006 CR4: 00000000001706e0
[  405.479004] Call Trace:
[  405.479004]  <TASK>
[  405.479004]  nfs4_get_vfs_file+0x325/0x410 [nfsd]
[  405.479004]  ? native_queued_spin_lock_slowpath+0x257/0x290
[  405.479004]  nfsd4_process_open2+0x4ba/0x16d0 [nfsd]
[  405.479004]  ? inode_get_bytes+0x38/0x40
[  405.479004]  ? nfsd_permission+0x97/0xf0 [nfsd]
[  405.479004]  ? fh_verify+0x1cc/0x6f0 [nfsd]
[  405.479004]  nfsd4_open+0x640/0xb30 [nfsd]
[  405.479004]  nfsd4_proc_compound+0x3bd/0x710 [nfsd]
[  405.479004]  nfsd_dispatch+0x143/0x270 [nfsd]
[  405.479004]  svc_process_common+0x3bf/0x5b0 [sunrpc]
[  405.479004]  ? svc_sock_secure_port+0x12/0x30 [sunrpc]
[  405.479004]  ? nfsd_svc+0x350/0x350 [nfsd]
[  405.479004]  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
[  405.479004]  svc_process+0xb7/0xf0 [sunrpc]
[  405.479004]  nfsd+0xd5/0x190 [nfsd]
[  405.479004]  kthread+0xe3/0x110
[  405.479004]  ? kthread_complete_and_exit+0x20/0x20
[  405.479004]  ret_from_fork+0x1f/0x30
[  405.479004]  </TASK>
[  405.479004] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm nfsd nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill ib_core sunrpc dm_multipath dm_mod intel_rapl_msr intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio snd_hda_intel sb_edac snd_intel_dspcfg x86_pkg_temp_thermal snd_intel_sdw_acpi intel_powerclamp coretemp snd_hda_codec mei_wdt kvm_intel snd_hda_core dcdbas snd_hwdep iTCO_wdt iTCO_vendor_support dell_smm_hwmon snd_seq kvm btrfs(OE) snd_seq_device irqbypass snd_pcm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl blake2b_generic raid6_pq mei_me snd_timer intel_cstate i2c_i801 zstd_compress snd intel_uncore pcspkr lpc_ich mei i2c_smbus soundcore xfs amdgpu iommu_v2 gpu_sched drm_buddy radeon i2c_algo_bit drm_ttm_helper ttm drm_display_helper sd_mod drm_kms_helper t10_pi syscopyarea sysfillrect sr_mod cdrom sysimgblt sg fb_sys_fops ahci mpt3sas bnx2x libahci drm raid_class libata e1000e
[  405.479004]  crc32c_intel mdio cec scsi_transport_sas wmi i2c_dev ipmi_devintf ipmi_msghandler
[  405.479004] Unloaded tainted modules: pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1
[  405.479004]  acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1
[  405.479004]  acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1
[  405.479004] CR2: 0000000000000049
[  405.479004] ---[ end trace 0000000000000000 ]---
[  405.479004] BUG: kernel NULL pointer dereference, address: 0000000000000049
[  405.479004] #PF: supervisor read access in kernel mode
[  405.479004] #PF: error_code(0x0000) - not-present page
[  405.479004] PGD 10a458067 P4D 10a458067 PUD 10a457067 PMD 0
[  405.479004] Oops: 0000 [#3] PREEMPT SMP NOPTI
[  405.479004] CPU: 36 PID: 2039 Comm: nfsd Tainted: G      D    OE     5.19.0-3.1.el7.x86_64 #1
[  405.479004] Hardware name: Dell Inc. Precision T7610/0NK70N, BIOS A18 09/11/2019
[  405.479004] RIP: 0010:nfsd_do_file_acquire+0x4e1/0xb80 [nfsd]
[  405.479004] Code: e8 a4 a3 da c9 e8 af 24 e3 c9 4c 89 ea 48 c7 c7 80 5a fb c1 48 8d 74 24 58 e8 9b 63 21 ca 49 89 c7 4d 85 ff 0f 84 db 00 00 00 <41> 8b 4f 50 49 8d 7f 50 85 c9 0f 84 cb 00 00 00 8d 51 01 89 c8 f0
[  405.479004] RSP: 0018:ffffbd590e63fb78 EFLAGS: 00010286
[  405.479004] RAX: 0000000000000000 RBX: 0000000000000006 RCX: ffff98b320299658
[  405.479004] RDX: ffffbd590e63fbd0 RSI: 00000000fffffe01 RDI: ffff98b29041b380
[  405.479004] RBP: ffff98d2481d8000 R08: ffff98b339352d00 R09: 0000000000000000
[  405.479004] R10: ffffbd590e63fd88 R11: 0000000000008000 R12: 000000000000000d
[  405.479004] R13: ffff98d21be40b60 R14: ffffffffc1fb5a80 R15: fffffffffffffff9
[  405.479004] FS:  0000000000000000(0000) GS:ffff98e16fc00000(0000) knlGS:0000000000000000
[  405.479004] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  405.479004] CR2: 0000000000000049 CR3: 000000209c690003 CR4: 00000000001706e0
[  405.479004] Call Trace:
[  405.479004]  <TASK>
[  405.479004]  nfs4_get_vfs_file+0x325/0x410 [nfsd]
[  405.479004]  ? kmem_cache_alloc+0x172/0x2e0
[  405.479004]  ? security_prepare_creds+0x46/0xa0
[  405.479004]  nfsd4_process_open2+0x4ba/0x16d0 [nfsd]
[  405.479004]  ? inode_get_bytes+0x38/0x40
[  405.479004]  ? nfsd_permission+0x97/0xf0 [nfsd]
[  405.479004]  ? fh_verify+0x1cc/0x6f0 [nfsd]
[  405.479004]  nfsd4_open+0x640/0xb30 [nfsd]
[  405.479004]  nfsd4_proc_compound+0x3bd/0x710 [nfsd]
[  405.479004]  nfsd_dispatch+0x143/0x270 [nfsd]
[  405.479004]  svc_process_common+0x3bf/0x5b0 [sunrpc]
[  405.479004]  ? svc_sock_secure_port+0x12/0x30 [sunrpc]
[  405.479004]  ? nfsd_svc+0x350/0x350 [nfsd]
[  405.479004]  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
[  405.479004]  svc_process+0xb7/0xf0 [sunrpc]
[  405.479004]  nfsd+0xd5/0x190 [nfsd]
[  405.479004]  kthread+0xe3/0x110
[  405.479004]  ? kthread_complete_and_exit+0x20/0x20
[  405.479004]  ret_from_fork+0x1f/0x30
[  405.479004]  </TASK>
[  405.479004] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm nfsd nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill ib_core sunrpc dm_multipath dm_mod intel_rapl_msr intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio snd_hda_intel sb_edac snd_intel_dspcfg x86_pkg_temp_thermal snd_intel_sdw_acpi intel_powerclamp coretemp snd_hda_codec mei_wdt kvm_intel snd_hda_core dcdbas snd_hwdep iTCO_wdt iTCO_vendor_support dell_smm_hwmon snd_seq kvm btrfs(OE) snd_seq_device irqbypass snd_pcm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl blake2b_generic raid6_pq mei_me snd_timer intel_cstate i2c_i801 zstd_compress snd intel_uncore pcspkr lpc_ich mei i2c_smbus soundcore xfs amdgpu iommu_v2 gpu_sched drm_buddy radeon i2c_algo_bit drm_ttm_helper ttm drm_display_helper sd_mod drm_kms_helper t10_pi syscopyarea sysfillrect sr_mod cdrom sysimgblt sg fb_sys_fops ahci mpt3sas bnx2x libahci drm raid_class libata e1000e
[  405.479004]  crc32c_intel mdio cec scsi_transport_sas wmi i2c_dev ipmi_devintf ipmi_msghandler
[  405.479004] Unloaded tainted modules: pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1
[  405.479004]  acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1
[  405.479004] RIP: 0010:nfsd_do_file_acquire+0x4e1/0xb80 [nfsd]
[  405.479004]  acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1
[  405.479004] Code: e8 a4 a3 da c9 e8 af 24 e3 c9 4c 89 ea 48 c7 c7 80 5a fb c1 48 8d 74 24 58 e8 9b 63 21 ca 49 89 c7 4d 85 ff 0f 84 db 00 00 00 <41> 8b 4f 50 49 8d 7f 50 85 c9 0f 84 cb 00 00 00 8d 51 01 89 c8 f0
[  405.479004]  acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1
[  405.479004] RSP: 0018:ffffbd590e9bbb78 EFLAGS: 00010286
[  405.479004]  acpi_cpufreq():1
[  405.479004]
[  405.479004]  pcc_cpufreq():1
[  405.479004] RAX: 0000000000000000 RBX: 0000000000000006 RCX: ffff98b320271b80
[  405.479004]  acpi_cpufreq():1
[  405.479004] RDX: ffffbd590e9bbbd0 RSI: 00000000fffffe01 RDI: ffff98b24d264d40
[  405.479004]  acpi_cpufreq():1 fjes():1
[  405.479004] RBP: ffff98d2481ec000 R08: ffff98d2f30f3dd0 R09: 0000000000000000
[  405.479004]  fjes():1
[  405.479004] R10: ffffbd590e9bbd88 R11: 0000000000008000 R12: 000000000000000e
[  405.479004]  fjes():1 fjes():1
[  405.479004] R13: ffff98d2c0075208 R14: ffffffffc1fb5a80 R15: fffffffffffffff9
[  405.479004]  fjes():1 fjes():1
[  405.479004] FS:  0000000000000000(0000) GS:ffff98e16fa00000(0000) knlGS:0000000000000000
[  405.479004]
[  405.479004] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  405.479004] CR2: 0000000000000049
[  405.479004] CR2: 0000000000000049 CR3: 000000010e2aa006 CR4: 00000000001706e0
[  405.479004] ---[ end trace 0000000000000000 ]---
[  405.479001] BUG: kernel NULL pointer dereference, address: 0000000000000049
[  405.479001] #PF: supervisor read access in kernel mode
[  405.479001] #PF: error_code(0x0000) - not-present page
[  405.479001] PGD 13cf26067 P4D 13cf26067 PUD 155b8a067 PMD 0
[  405.479001] Oops: 0000 [#4] PREEMPT SMP NOPTI
[  405.479001] CPU: 24 PID: 2042 Comm: nfsd Tainted: G      D    OE     5.19.0-3.1.el7.x86_64 #1
[  405.479001] Hardware name: Dell Inc. Precision T7610/0NK70N, BIOS A18 09/11/2019
[  405.479001] RIP: 0010:nfsd_do_file_acquire+0x4e1/0xb80 [nfsd]
[  405.479001] Code: e8 a4 a3 da c9 e8 af 24 e3 c9 4c 89 ea 48 c7 c7 80 5a fb c1 48 8d 74 24 58 e8 9b 63 21 ca 49 89 c7 4d 85 ff 0f 84 db 00 00 00 <41> 8b 4f 50 49 8d 7f 50 85 c9 0f 84 cb 00 00 00 8d 51 01 89 c8 f0
[  405.479001] RSP: 0018:ffffbd590e6e7b78 EFLAGS: 00010286
[  405.479001] RAX: 0000000000000000 RBX: 0000000000000006 RCX: ffff98b3202bc0f8
[  405.479001] RDX: ffffbd590e6e7bd0 RSI: 00000000fffffe01 RDI: ffff98b2904199c0
[  405.479001] RBP: ffff98d2481e4000 R08: ffff98d287957000 R09: 0000000000000000
[  405.479001] R10: ffffbd590e6e7d88 R11: 0000000000008000 R12: 000000000000000e
[  405.479001] R13: ffff98b30c91f820 R14: ffffffffc1fb5a80 R15: fffffffffffffff9
[  405.479001] FS:  0000000000000000(0000) GS:ffff98d12fb80000(0000) knlGS:0000000000000000
[  405.479001] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  405.479001] CR2: 0000000000000049 CR3: 0000000124d10004 CR4: 00000000001706e0
[  405.479001] Call Trace:
[  405.479001]  <TASK>
[  405.479001]  nfs4_get_vfs_file+0x325/0x410 [nfsd]
[  405.479001]  ? kmem_cache_alloc+0x172/0x2e0
[  405.479001]  ? security_prepare_creds+0x46/0xa0
[  405.479001]  nfsd4_process_open2+0x4ba/0x16d0 [nfsd]
[  405.479001]  ? inode_get_bytes+0x38/0x40
[  405.479001]  ? nfsd_permission+0x97/0xf0 [nfsd]
[  405.479001]  ? fh_verify+0x1cc/0x6f0 [nfsd]
[  405.479001]  nfsd4_open+0x640/0xb30 [nfsd]
[  405.479001]  nfsd4_proc_compound+0x3bd/0x710 [nfsd]
[  405.479001]  nfsd_dispatch+0x143/0x270 [nfsd]
[  405.479001]  svc_process_common+0x3bf/0x5b0 [sunrpc]
[  405.479001]  ? svc_sock_secure_port+0x12/0x30 [sunrpc]
[  405.479001]  ? nfsd_svc+0x350/0x350 [nfsd]
[  405.479001]  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
[  405.479001]  svc_process+0xb7/0xf0 [sunrpc]
[  405.479001]  nfsd+0xd5/0x190 [nfsd]
[  405.479001]  kthread+0xe3/0x110
[  405.479001]  ? kthread_complete_and_exit+0x20/0x20
[  405.479001]  ret_from_fork+0x1f/0x30
[  405.479001]  </TASK>
[  405.479001] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm nfsd nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill ib_core sunrpc dm_multipath dm_mod intel_rapl_msr intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio snd_hda_intel sb_edac snd_intel_dspcfg x86_pkg_temp_thermal snd_intel_sdw_acpi intel_powerclamp coretemp snd_hda_codec mei_wdt kvm_intel snd_hda_core dcdbas snd_hwdep iTCO_wdt iTCO_vendor_support dell_smm_hwmon snd_seq kvm btrfs(OE) snd_seq_device irqbypass snd_pcm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl blake2b_generic raid6_pq mei_me snd_timer intel_cstate i2c_i801 zstd_compress snd intel_uncore pcspkr lpc_ich mei i2c_smbus soundcore xfs amdgpu iommu_v2 gpu_sched drm_buddy radeon i2c_algo_bit drm_ttm_helper ttm drm_display_helper sd_mod drm_kms_helper t10_pi syscopyarea sysfillrect sr_mod cdrom sysimgblt sg fb_sys_fops ahci mpt3sas bnx2x libahci drm raid_class libata e1000e
[  405.479001]  crc32c_intel mdio cec scsi_transport_sas wmi i2c_dev ipmi_devintf ipmi_msghandler
[  405.479001] Unloaded tainted modules: pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1
[  405.479004] RIP: 0010:nfsd_do_file_acquire+0x4e1/0xb80 [nfsd]
[  405.479001]  pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1
[  405.479004] Code: e8 a4 a3 da c9 e8 af 24 e3 c9 4c 89 ea 48 c7 c7 80 5a fb c1 48 8d 74 24 58 e8 9b 63 21 ca 49 89 c7 4d 85 ff 0f 84 db 00 00 00 <41> 8b 4f 50 49 8d 7f 50 85 c9 0f 84 cb 00 00 00 8d 51 01 89 c8 f0
[  405.479001]  acpi_cpufreq():1
[  405.479004] RSP: 0018:ffffbd590e9bbb78 EFLAGS: 00010286
[  405.479001]  acpi_cpufreq():1
[  405.479004]
[  405.479001]  acpi_cpufreq():1
[  405.479004] RAX: 0000000000000000 RBX: 0000000000000006 RCX: ffff98b320271b80
[  405.479004] RDX: ffffbd590e9bbbd0 RSI: 00000000fffffe01 RDI: ffff98b24d264d40
[  405.479001]  acpi_cpufreq():1
[  405.479004] RBP: ffff98d2481ec000 R08: ffff98d2f30f3dd0 R09: 0000000000000000
[  405.479001]  pcc_cpufreq():1
[  405.479004] R10: ffffbd590e9bbd88 R11: 0000000000008000 R12: 000000000000000e
[  405.479001]  acpi_cpufreq():1
[  405.479004] R13: ffff98d2c0075208 R14: ffffffffc1fb5a80 R15: fffffffffffffff9
[  405.479001]  acpi_cpufreq():1
[  405.479004] FS:  0000000000000000(0000) GS:ffff98e16fc00000(0000) knlGS:0000000000000000
[  405.479001]  acpi_cpufreq():1
[  405.479004] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  405.479001]  acpi_cpufreq():1
[  405.479004] CR2: 0000000000000049 CR3: 000000209c690003 CR4: 00000000001706e0
[  405.479001]  pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 acpi_cpufreq():1
[  405.479001]  acpi_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1
[  405.479001] CR2: 0000000000000049
[  405.479001] ---[ end trace 0000000000000000 ]---
[  405.479208] BUG: kernel NULL pointer dereference, address: 0000000000000049
[  405.479208] #PF: supervisor read access in kernel mode
[  405.479208] #PF: error_code(0x0000) - not-present page
[  405.479208] PGD 70bda7067 P4D 70bda7067 PUD 121f33067 PMD 0
[  405.479208] Oops: 0000 [#5] PREEMPT SMP NOPTI
[  405.479208] CPU: 4 PID: 2040 Comm: nfsd Tainted: G      D    OE     5.19.0-3.1.el7.x86_64 #1
[  405.479208] Hardware name: Dell Inc. Precision T7610/0NK70N, BIOS A18 09/11/2019
[  405.479208] RIP: 0010:nfsd_do_file_acquire+0x4e1/0xb80 [nfsd]
[  405.479208] Code: e8 a4 a3 da c9 e8 af 24 e3 c9 4c 89 ea 48 c7 c7 80 5a fb c1 48 8d 74 24 58 e8 9b 63 21 ca 49 89 c7 4d 85 ff 0f 84 db 00 00 00 <41> 8b 4f 50 49 8d 7f 50 85 c9 0f 84 cb 00 00 00 8d 51 01 89 c8 f0
[  405.479208] RSP: 0018:ffffbd590e6afb78 EFLAGS: 00010286
[  405.479208] RAX: 0000000000000000 RBX: 0000000000000006 RCX: ffff98b320226328
[  405.479208] RDX: ffffbd590e6afbd0 RSI: 00000000fffffe01 RDI: ffff98b290418000
[  405.479208] RBP: ffff98d2481dc000 R08: ffff98b24b6a4820 R09: 0000000000000000
[  405.479208] R10: ffffbd590e6afd88 R11: 0000000000008000 R12: 000000000000000d
[  405.479208] R13: ffff98b2a868ea28 R14: ffffffffc1fb5a80 R15: fffffffffffffff9
[  405.479208] FS:  0000000000000000(0000) GS:ffff98d12f900000(0000) knlGS:0000000000000000
[  405.479208] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  405.479208] CR2: 0000000000000049 CR3: 000000010d290002 CR4: 00000000001706e0
[  405.479208] Call Trace:
[  405.479208]  <TASK>
[  405.479208]  nfs4_get_vfs_file+0x325/0x410 [nfsd]
[  405.479208]  ? kmem_cache_alloc+0x172/0x2e0
[  405.479208]  ? security_prepare_creds+0x46/0xa0
[  405.479208]  nfsd4_process_open2+0x4ba/0x16d0 [nfsd]
[  405.479208]  ? inode_get_bytes+0x38/0x40
[  405.479208]  ? nfsd_permission+0x97/0xf0 [nfsd]
[  405.479208]  ? fh_verify+0x1cc/0x6f0 [nfsd]
[  405.479208]  nfsd4_open+0x640/0xb30 [nfsd]
[  405.479208]  nfsd4_proc_compound+0x3bd/0x710 [nfsd]
[  405.479208]  nfsd_dispatch+0x143/0x270 [nfsd]
[  405.479208]  svc_process_common+0x3bf/0x5b0 [sunrpc]
[  405.479208]  ? svc_sock_secure_port+0x12/0x30 [sunrpc]
[  405.479208]  ? nfsd_svc+0x350/0x350 [nfsd]
[  405.479208]  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
[  405.479208]  svc_process+0xb7/0xf0 [sunrpc]
[  405.479208]  nfsd+0xd5/0x190 [nfsd]
[  405.479208]  kthread+0xe3/0x110
[  405.479208]  ? kthread_complete_and_exit+0x20/0x20
[  405.479208]  ret_from_fork+0x1f/0x30
[  405.479208]  </TASK>
[  405.479208] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm nfsd nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill ib_core sunrpc dm_multipath dm_mod intel_rapl_msr intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio snd_hda_intel sb_edac snd_intel_dspcfg x86_pkg_temp_thermal snd_intel_sdw_acpi intel_powerclamp coretemp snd_hda_codec mei_wdt kvm_intel snd_hda_core dcdbas snd_hwdep iTCO_wdt iTCO_vendor_support dell_smm_hwmon snd_seq kvm btrfs(OE) snd_seq_device irqbypass snd_pcm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl blake2b_generic raid6_pq mei_me snd_timer intel_cstate i2c_i801 zstd_compress snd intel_uncore pcspkr lpc_ich mei i2c_smbus soundcore xfs amdgpu iommu_v2 gpu_sched drm_buddy radeon i2c_algo_bit drm_ttm_helper ttm drm_display_helper sd_mod drm_kms_helper t10_pi syscopyarea sysfillrect sr_mod cdrom sysimgblt sg fb_sys_fops ahci mpt3sas bnx2x libahci drm raid_class libata e1000e
[  405.479208]  crc32c_intel mdio cec scsi_transport_sas wmi i2c_dev ipmi_devintf ipmi_msghandler
[  405.479208] Unloaded tainted modules: pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1
[  405.479208]  acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1
[  405.479208]  acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1
[  405.479208] CR2: 0000000000000049
[  405.479208] ---[ end trace 0000000000000000 ]---
[  405.479216] BUG: kernel NULL pointer dereference, address: 0000000000000049
[  405.479216] #PF: supervisor read access in kernel mode
[  405.479216] #PF: error_code(0x0000) - not-present page
[  405.479216] PGD 2078761067 P4D 2078761067 PUD 217874c067 PMD 0
[  405.479216] Oops: 0000 [#6] PREEMPT SMP NOPTI
[  405.479216] CPU: 31 PID: 2041 Comm: nfsd Tainted: G      D    OE     5.19.0-3.1.el7.x86_64 #1
[  405.479216] Hardware name: Dell Inc. Precision T7610/0NK70N, BIOS A18 09/11/2019
[  405.479216] RIP: 0010:nfsd_do_file_acquire+0x4e1/0xb80 [nfsd]
[  405.479216] Code: e8 a4 a3 da c9 e8 af 24 e3 c9 4c 89 ea 48 c7 c7 80 5a fb c1 48 8d 74 24 58 e8 9b 63 21 ca 49 89 c7 4d 85 ff 0f 84 db 00 00 00 <41> 8b 4f 50 49 8d 7f 50 85 c9 0f 84 cb 00 00 00 8d 51 01 89 c8 f0
[  405.479216] RSP: 0018:ffffbd590e6cfb78 EFLAGS: 00010286
[  405.479216] RAX: 0000000000000000 RBX: 0000000000000006 RCX: ffff98b3202f4ad0
[  405.479216] RDX: ffffbd590e6cfbd0 RSI: 00000000fffffe01 RDI: ffff98b29041cd40
[  405.479216] RBP: ffff98d2481e0000 R08: ffff98d1c2ddd958 R09: 0000000000000000
[  405.479216] R10: ffffbd590e6cfd88 R11: 0000000000008000 R12: 000000000000000d
[  405.479216] R13: ffff98d39a6dc4e0 R14: ffffffffc1fb5a80 R15: fffffffffffffff9
[  405.479216] FS:  0000000000000000(0000) GS:ffff98e16fac0000(0000) knlGS:0000000000000000
[  405.479216] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  405.479216] CR2: 0000000000000049 CR3: 000000207e006005 CR4: 00000000001706e0
[  405.479216] Call Trace:
[  405.479216]  <TASK>
[  405.479216]  nfs4_get_vfs_file+0x325/0x410 [nfsd]
[  405.479216]  ? kmem_cache_alloc+0x172/0x2e0
[  405.479216]  ? security_prepare_creds+0x46/0xa0
[  405.479216]  nfsd4_process_open2+0x4ba/0x16d0 [nfsd]
[  405.479216]  ? inode_get_bytes+0x38/0x40
[  405.479216]  ? nfsd_permission+0x97/0xf0 [nfsd]
[  405.479216]  ? fh_verify+0x1cc/0x6f0 [nfsd]
[  405.479216]  nfsd4_open+0x640/0xb30 [nfsd]
[  405.479216]  nfsd4_proc_compound+0x3bd/0x710 [nfsd]
[  405.479216]  nfsd_dispatch+0x143/0x270 [nfsd]
[  405.479216]  svc_process_common+0x3bf/0x5b0 [sunrpc]
[  405.479216]  ? svc_sock_secure_port+0x12/0x30 [sunrpc]
[  405.479216]  ? nfsd_svc+0x350/0x350 [nfsd]
[  405.479216]  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
[  405.479216]  svc_process+0xb7/0xf0 [sunrpc]
[  405.479216]  nfsd+0xd5/0x190 [nfsd]
[  405.479216]  kthread+0xe3/0x110
[  405.479216]  ? kthread_complete_and_exit+0x20/0x20
[  405.479216]  ret_from_fork+0x1f/0x30
[  405.479216]  </TASK>
[  405.479216] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm nfsd nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill ib_core sunrpc dm_multipath dm_mod intel_rapl_msr intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio snd_hda_intel sb_edac snd_intel_dspcfg x86_pkg_temp_thermal snd_intel_sdw_acpi intel_powerclamp coretemp snd_hda_codec mei_wdt kvm_intel snd_hda_core dcdbas snd_hwdep iTCO_wdt iTCO_vendor_support dell_smm_hwmon snd_seq kvm btrfs(OE) snd_seq_device irqbypass snd_pcm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl blake2b_generic raid6_pq mei_me snd_timer intel_cstate i2c_i801 zstd_compress snd intel_uncore pcspkr lpc_ich mei i2c_smbus soundcore xfs amdgpu iommu_v2 gpu_sched drm_buddy radeon i2c_algo_bit drm_ttm_helper ttm drm_display_helper sd_mod drm_kms_helper t10_pi syscopyarea sysfillrect sr_mod cdrom sysimgblt sg fb_sys_fops ahci mpt3sas bnx2x libahci drm raid_class libata e1000e
[  405.479216]  crc32c_intel mdio cec scsi_transport_sas wmi i2c_dev ipmi_devintf ipmi_msghandler
[  405.479216] Unloaded tainted modules: pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1
[  405.479216]  acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1
[  405.479216]  acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1
[  405.479216] CR2: 0000000000000049
[  405.479216] ---[ end trace 0000000000000000 ]---
[  406.239649] BUG: kernel NULL pointer dereference, address: 0000000000000049
[  406.239649] #PF: supervisor read access in kernel mode
[  406.239649] #PF: error_code(0x0000) - not-present page
[  406.239649] PGD 217b4ef067 P4D 217b4ef067 PUD 207e0e8067 PMD 0
[  406.239649] Oops: 0000 [#7] PREEMPT SMP NOPTI
[  406.239649] CPU: 2 PID: 2045 Comm: nfsd Tainted: G      D    OE     5.19.0-3.1.el7.x86_64 #1
[  406.239649] Hardware name: Dell Inc. Precision T7610/0NK70N, BIOS A18 09/11/2019
[  406.239649] RIP: 0010:nfsd_do_file_acquire+0x4e1/0xb80 [nfsd]
[  406.239649] Code: e8 a4 a3 da c9 e8 af 24 e3 c9 4c 89 ea 48 c7 c7 80 5a fb c1 48 8d 74 24 58 e8 9b 63 21 ca 49 89 c7 4d 85 ff 0f 84 db 00 00 00 <41> 8b 4f 50 49 8d 7f 50 85 c9 0f 84 cb 00 00 00 8d 51 01 89 c8 f0
[  406.239649] RSP: 0018:ffffbd590e9f3b78 EFLAGS: 00010286
[  406.239649] RAX: 0000000000000000 RBX: 0000000000000006 RCX: ffff98b32029abd8
[  406.239649] RDX: ffffbd590e9f3bd0 RSI: 00000000fffffe01 RDI: ffff98b24d263380
[  406.239649] RBP: ffff98d2481f0000 R08: ffff98d219ec0680 R09: 0000000000000000
[  406.239649] R10: ffffbd590e9f3d88 R11: 0000000000008000 R12: 000000000000000d
[  406.239649] R13: ffff98b30c920dd0 R14: ffffffffc1fb5a80 R15: fffffffffffffff9
[  406.239649] FS:  0000000000000000(0000) GS:ffff98d12f880000(0000) knlGS:0000000000000000
[  406.239649] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  406.239649] CR2: 0000000000000049 CR3: 000000217b5da005 CR4: 00000000001706e0
[  406.239649] Call Trace:
[  406.239649]  <TASK>
[  406.239649]  nfs4_get_vfs_file+0x325/0x410 [nfsd]
[  406.239649]  ? kmem_cache_alloc+0x172/0x2e0
[  406.239649]  ? security_prepare_creds+0x46/0xa0
[  406.239649]  nfsd4_process_open2+0x4ba/0x16d0 [nfsd]
[  406.239649]  ? inode_get_bytes+0x38/0x40
[  406.239649]  ? nfsd_permission+0x97/0xf0 [nfsd]
[  406.239649]  ? fh_verify+0x1cc/0x6f0 [nfsd]
[  406.239649]  nfsd4_open+0x640/0xb30 [nfsd]
[  406.239649]  nfsd4_proc_compound+0x3bd/0x710 [nfsd]
[  406.239649]  nfsd_dispatch+0x143/0x270 [nfsd]
[  406.239649]  svc_process_common+0x3bf/0x5b0 [sunrpc]
[  406.239649]  ? svc_sock_secure_port+0x12/0x30 [sunrpc]
[  406.239649]  ? nfsd_svc+0x350/0x350 [nfsd]
[  406.239649]  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
[  406.239649]  svc_process+0xb7/0xf0 [sunrpc]
[  406.239649]  nfsd+0xd5/0x190 [nfsd]
[  406.239649]  kthread+0xe3/0x110
[  406.239649]  ? kthread_complete_and_exit+0x20/0x20
[  406.239649]  ret_from_fork+0x1f/0x30
[  406.239649]  </TASK>
[  406.239649] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm nfsd nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill ib_core sunrpc dm_multipath dm_mod intel_rapl_msr intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio snd_hda_intel sb_edac snd_intel_dspcfg x86_pkg_temp_thermal snd_intel_sdw_acpi intel_powerclamp coretemp snd_hda_codec mei_wdt kvm_intel snd_hda_core dcdbas snd_hwdep iTCO_wdt iTCO_vendor_support dell_smm_hwmon snd_seq kvm btrfs(OE) snd_seq_device irqbypass snd_pcm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl blake2b_generic raid6_pq mei_me snd_timer intel_cstate i2c_i801 zstd_compress snd intel_uncore pcspkr lpc_ich mei i2c_smbus soundcore xfs amdgpu iommu_v2 gpu_sched
[  405.479001] RIP: 0010:nfsd_do_file_acquire+0x4e1/0xb80 [nfsd]
[  406.345822]  drm_buddy radeon i2c_algo_bit drm_ttm_helper ttm drm_display_helper sd_mod drm_kms_helper t10_pi syscopyarea sysfillrect sr_mod cdrom sysimgblt sg fb_sys_fops ahci mpt3sas bnx2x libahci drm raid_class libata e1000e crc32c_intel mdio cec scsi_transport_sas wmi i2c_dev ipmi_devintf ipmi_msghandler
[  406.345822] Unloaded tainted modules:
[  405.479001] Code: e8 a4 a3 da c9 e8 af 24 e3 c9 4c 89 ea 48 c7 c7 80 5a fb c1 48 8d 74 24 58 e8 9b 63 21 ca 49 89 c7 4d 85 ff 0f 84 db 00 00 00 <41> 8b 4f 50 49 8d 7f 50 85 c9 0f 84 cb 00 00 00 8d 51 01 89 c8 f0
[  406.345822]  pcc_cpufreq():1 acpi_cpufreq():1
[  405.479001] RSP: 0018:ffffbd590e9bbb78 EFLAGS: 00010286
[  406.345822]  acpi_cpufreq():1
[  405.479001]
[  406.345822]  acpi_cpufreq():1
[  405.479001] RAX: 0000000000000000 RBX: 0000000000000006 RCX: ffff98b320271b80
[  406.345822]  pcc_cpufreq():1 acpi_cpufreq():1
[  405.479001] RDX: ffffbd590e9bbbd0 RSI: 00000000fffffe01 RDI: ffff98b24d264d40
[  406.345822]  pcc_cpufreq():1
[  405.479001] RBP: ffff98d2481ec000 R08: ffff98d2f30f3dd0 R09: 0000000000000000
[  406.345822]  acpi_cpufreq():1
[  405.479001] R10: ffffbd590e9bbd88 R11: 0000000000008000 R12: 000000000000000e
[  406.345822]  pcc_cpufreq():1 acpi_cpufreq():1
[  405.479001] R13: ffff98d2c0075208 R14: ffffffffc1fb5a80 R15: fffffffffffffff9
[  406.345822]  pcc_cpufreq():1 acpi_cpufreq():1
[  405.479001] FS:  0000000000000000(0000) GS:ffff98d12fb80000(0000) knlGS:0000000000000000
[  406.345822]  pcc_cpufreq():1 acpi_cpufreq():1
[  405.479001] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  406.345822]  acpi_cpufreq():1 acpi_cpufreq():1
[  405.479001] CR2: 0000000000000049 CR3: 0000000124d10004 CR4: 00000000001706e0
[  406.345822]  acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1
[  406.345822]  pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1
[  406.345822] CR2: 0000000000000049
[  406.345822] ---[ end trace 0000000000000000 ]---
[  405.479004] BUG: kernel NULL pointer dereference, address: 0000000000000049
[  405.479004] #PF: supervisor read access in kernel mode
[  405.479004] #PF: error_code(0x0000) - not-present page
[  405.479004] PGD 15475e067 P4D 15475e067 PUD 15475f067 PMD 0
[  405.479004] Oops: 0000 [#8] PREEMPT SMP NOPTI
[  405.479004] CPU: 22 PID: 2043 Comm: nfsd Tainted: G      D    OE     5.19.0-3.1.el7.x86_64 #1
[  405.479004] Hardware name: Dell Inc. Precision T7610/0NK70N, BIOS A18 09/11/2019
[  405.479004] RIP: 0010:nfsd_do_file_acquire+0x4e1/0xb80 [nfsd]
[  405.479004] Code: e8 a4 a3 da c9 e8 af 24 e3 c9 4c 89 ea 48 c7 c7 80 5a fb c1 48 8d 74 24 58 e8 9b 63 21 ca 49 89 c7 4d 85 ff 0f 84 db 00 00 00 <41> 8b 4f 50 49 8d 7f 50 85 c9 0f 84 cb 00 00 00 8d 51 01 89 c8 f0
[  405.479004] RSP: 0018:ffffbd590e9b3b78 EFLAGS: 00010286
[  405.479004] RAX: 0000000000000000 RBX: 0000000000000006 RCX: ffff98b32028b390
[  405.479004] RDX: ffffbd590e9b3bd0 RSI: 00000000fffffe01 RDI: ffff98b24d2619c0
[  405.479004] RBP: ffff98d2481e8000 R08: ffff98d1dc176410 R09: 0000000000000000
[  405.479004] R10: ffffbd590e9b3d88 R11: 0000000000008000 R12: 000000000000000f
[  405.479004] R13: ffff98b30cab5a28 R14: ffffffffc1fb5a80 R15: fffffffffffffff9
[  405.479004] FS:  0000000000000000(0000) GS:ffff98d12fb00000(0000) knlGS:0000000000000000
[  405.479004] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  405.479004] CR2: 0000000000000049 CR3: 000000010e2aa004 CR4: 00000000001706e0
[  405.479004] Call Trace:
[  405.479004]  <TASK>
[  405.479004]  nfs4_get_vfs_file+0x325/0x410 [nfsd]
[  405.479004]  ? kmem_cache_alloc+0x172/0x2e0
[  405.479004]  ? security_prepare_creds+0x46/0xa0
[  405.479004]  nfsd4_process_open2+0x4ba/0x16d0 [nfsd]
[  405.479004]  ? inode_get_bytes+0x38/0x40
[  405.479004]  ? nfsd_permission+0x97/0xf0 [nfsd]
[  405.479004]  ? fh_verify+0x1cc/0x6f0 [nfsd]
[  405.479004]  nfsd4_open+0x640/0xb30 [nfsd]
[  405.479004]  nfsd4_proc_compound+0x3bd/0x710 [nfsd]
[  405.479004]  nfsd_dispatch+0x143/0x270 [nfsd]
[  405.479004]  svc_process_common+0x3bf/0x5b0 [sunrpc]
[  405.479004]  ? svc_sock_secure_port+0x12/0x30 [sunrpc]
[  405.479004]  ? nfsd_svc+0x350/0x350 [nfsd]
[  405.479004]  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
[  405.479004]  svc_process+0xb7/0xf0 [sunrpc]
[  405.479004]  nfsd+0xd5/0x190 [nfsd]
[  405.479004]  kthread+0xe3/0x110
[  405.479004]  ? kthread_complete_and_exit+0x20/0x20
[  405.479004]  ret_from_fork+0x1f/0x30
[  405.479004]  </TASK>
[  405.479004] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm nfsd nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill ib_core sunrpc dm_multipath dm_mod intel_rapl_msr intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio snd_hda_intel sb_edac snd_intel_dspcfg x86_pkg_temp_thermal snd_intel_sdw_acpi intel_powerclamp coretemp snd_hda_codec mei_wdt kvm_intel snd_hda_core dcdbas snd_hwdep iTCO_wdt iTCO_vendor_support dell_smm_hwmon snd_seq kvm btrfs(OE) snd_seq_device irqbypass snd_pcm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl blake2b_generic raid6_pq mei_me snd_timer intel_cstate i2c_i801 zstd_compress snd intel_uncore pcspkr lpc_ich mei i2c_smbus soundcore xfs amdgpu iommu_v2 gpu_sched drm_buddy radeon i2c_algo_bit drm_ttm_helper ttm drm_display_helper sd_mod drm_kms_helper t10_pi syscopyarea sysfillrect sr_mod cdrom sysimgblt sg fb_sys_fops ahci mpt3sas bnx2x libahci drm raid_class libata e1000e
[  405.479004]  crc32c_intel mdio cec scsi_transport_sas wmi i2c_dev ipmi_devintf ipmi_msghandler
[  405.479004] Unloaded tainted modules: pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1
[  405.479004]  acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1
[  405.479004]  acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1
[  405.479004] CR2: 0000000000000049
[  405.479004] ---[ end trace 0000000000000000 ]---
[  405.479208] RIP: 0010:nfsd_do_file_acquire+0x4e1/0xb80 [nfsd]
[  405.479208] Code: e8 a4 a3 da c9 e8 af 24 e3 c9 4c 89 ea 48 c7 c7 80 5a fb c1 48 8d 74 24 58 e8 9b 63 21 ca 49 89 c7 4d 85 ff 0f 84 db 00 00 00 <41> 8b 4f 50 49 8d 7f 50 85 c9 0f 84 cb 00 00 00 8d 51 01 89 c8 f0
[  405.479208] RSP: 0018:ffffbd590e9bbb78 EFLAGS: 00010286
[  405.479208] RAX: 0000000000000000 RBX: 0000000000000006 RCX: ffff98b320271b80
[  405.479208] RDX: ffffbd590e9bbbd0 RSI: 00000000fffffe01 RDI: ffff98b24d264d40
[  405.479208] RBP: ffff98d2481ec000 R08: ffff98d2f30f3dd0 R09: 0000000000000000
[  405.479208] R10: ffffbd590e9bbd88 R11: 0000000000008000 R12: 000000000000000e
[  405.479208] R13: ffff98d2c0075208 R14: ffffffffc1fb5a80 R15: fffffffffffffff9
[  405.479208] FS:  0000000000000000(0000) GS:ffff98d12f900000(0000) knlGS:0000000000000000
[  405.479208] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  405.479208] CR2: 0000000000000049 CR3: 000000010d290002 CR4: 00000000001706e0
[  405.479216] RIP: 0010:nfsd_do_file_acquire+0x4e1/0xb80 [nfsd]
[  405.479216] Code: e8 a4 a3 da c9 e8 af 24 e3 c9 4c 89 ea 48 c7 c7 80 5a fb c1 48 8d 74 24 58 e8 9b 63 21 ca 49 89 c7 4d 85 ff 0f 84 db 00 00 00 <41> 8b 4f 50 49 8d 7f 50 85 c9 0f 84 cb 00 00 00 8d 51 01 89 c8 f0
[  405.479216] RSP: 0018:ffffbd590e9bbb78 EFLAGS: 00010286
[  405.479216] RAX: 0000000000000000 RBX: 0000000000000006 RCX: ffff98b320271b80
[  405.479216] RDX: ffffbd590e9bbbd0 RSI: 00000000fffffe01 RDI: ffff98b24d264d40
[  405.479216] RBP: ffff98d2481ec000 R08: ffff98d2f30f3dd0 R09: 0000000000000000
[  405.479216] R10: ffffbd590e9bbd88 R11: 0000000000008000 R12: 000000000000000e
[  405.479216] R13: ffff98d2c0075208 R14: ffffffffc1fb5a80 R15: fffffffffffffff9
[  405.479216] FS:  0000000000000000(0000) GS:ffff98e16fac0000(0000) knlGS:0000000000000000
[  405.479216] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  405.479216] CR2: 0000000000000049 CR3: 000000207e006005 CR4: 00000000001706e0
[  406.815885] RIP: 0010:nfsd_do_file_acquire+0x4e1/0xb80 [nfsd]
[  406.815885] Code: e8 a4 a3 da c9 e8 af 24 e3 c9 4c 89 ea 48 c7 c7 80 5a fb c1 48 8d 74 24 58 e8 9b 63 21 ca 49 89 c7 4d 85 ff 0f 84 db 00 00 00 <41> 8b 4f 50 49 8d 7f 50 85 c9 0f 84 cb 00 00 00 8d 51 01 89 c8 f0
[  406.815885] RSP: 0018:ffffbd590e9bbb78 EFLAGS: 00010286
[  406.815885] RAX: 0000000000000000 RBX: 0000000000000006 RCX: ffff98b320271b80
[  406.815885] RDX: ffffbd590e9bbbd0 RSI: 00000000fffffe01 RDI: ffff98b24d264d40
[  406.815885] RBP: ffff98d2481ec000 R08: ffff98d2f30f3dd0 R09: 0000000000000000
[  406.815885] R10: ffffbd590e9bbd88 R11: 0000000000008000 R12: 000000000000000e
[  406.815885] R13: ffff98d2c0075208 R14: ffffffffc1fb5a80 R15: fffffffffffffff9
[  406.815885] FS:  0000000000000000(0000) GS:ffff98d12f880000(0000) knlGS:0000000000000000
[  406.815885] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  406.815885] CR2: 0000000000000049 CR3: 000000217b5da005 CR4: 00000000001706e0
[  405.479004] RIP: 0010:nfsd_do_file_acquire+0x4e1/0xb80 [nfsd]
[  405.479004] Code: e8 a4 a3 da c9 e8 af 24 e3 c9 4c 89 ea 48 c7 c7 80 5a fb c1 48 8d 74 24 58 e8 9b 63 21 ca 49 89 c7 4d 85 ff 0f 84 db 00 00 00 <41> 8b 4f 50 49 8d 7f 50 85 c9 0f 84 cb 00 00 00 8d 51 01 89 c8 f0
[  405.479004] RSP: 0018:ffffbd590e9bbb78 EFLAGS: 00010286
[  405.479004] RAX: 0000000000000000 RBX: 0000000000000006 RCX: ffff98b320271b80
[  405.479004] RDX: ffffbd590e9bbbd0 RSI: 00000000fffffe01 RDI: ffff98b24d264d40
[  405.479004] RBP: ffff98d2481ec000 R08: ffff98d2f30f3dd0 R09: 0000000000000000
[  405.479004] R10: ffffbd590e9bbd88 R11: 0000000000008000 R12: 000000000000000e
[  405.479004] R13: ffff98d2c0075208 R14: ffffffffc1fb5a80 R15: fffffffffffffff9
[  405.479004] FS:  0000000000000000(0000) GS:ffff98d12fb00000(0000) knlGS:0000000000000000
[  405.479004] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  405.479004] CR2: 0000000000000049 CR3: 000000010e2aa004 CR4: 00000000001706e0
[  412.024021] Shutting down cpus with NMI
[  412.024021] Kernel Offset: 0xac00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  412.024021] ---[ end Kernel panic - not syncing: Fatal exception ]---


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 00/30] Overhaul NFSD filecache
  2022-06-22 18:36 ` [PATCH RFC 00/30] Overhaul NFSD filecache Wang Yugui
@ 2022-06-22 19:04   ` Chuck Lever III
  2022-06-22 19:59     ` Chuck Lever III
  2022-06-23  0:21     ` Dave Chinner
  0 siblings, 2 replies; 51+ messages in thread
From: Chuck Lever III @ 2022-06-22 19:04 UTC (permalink / raw)
  To: Wang Yugui; +Cc: Linux NFS Mailing List, netdev, david, tgraf, Jeff Layton



> On Jun 22, 2022, at 2:36 PM, Wang Yugui <wangyugui@e16-tech.com> wrote:
> 
> Hi,
> 
> fstests generic/531 triggered a panic on kernel 5.19.0-rc3 with this
> patchset.

As I mention in the cover letter, I haven't tried running generic/531
yet -- no claim at all that this is finished work and that #386 has
been fixed at this point. I'm merely interested in comments on the
general approach.


> [  405.478056] BUG: kernel NULL pointer dereference, address: 0000000000000049

The "RIP: " tells the location of the crash. Notice that the call
trace here does not include that information. From your attachment:

[  405.518022] RIP: 0010:nfsd_do_file_acquire+0x4e1/0xb80 [nfsd]

To match that to a line of source code:

[cel@manet ~]$ cd src/linux/linux/
[cel@manet linux]$ scripts/faddr2line ../obj/manet/fs/nfsd/filecache.o nfsd_do_file_acquire+0x4e1
nfsd_do_file_acquire+0x4e1/0xfc0:
rht_bucket_insert at /home/cel/src/linux/linux/include/linux/rhashtable.h:303
(inlined by) __rhashtable_insert_fast at /home/cel/src/linux/linux/include/linux/rhashtable.h:718
(inlined by) rhashtable_lookup_get_insert_key at /home/cel/src/linux/linux/include/linux/rhashtable.h:982
(inlined by) nfsd_file_insert at /home/cel/src/linux/linux/fs/nfsd/filecache.c:1031
(inlined by) nfsd_do_file_acquire at /home/cel/src/linux/linux/fs/nfsd/filecache.c:1089
[cel@manet linux]$

This is an example, I'm sure my compiled objects don't match yours.

And, now that I've added observability, you should be able to do:

  # watch cat /proc/fs/nfsd/filecache

to see how many items are in the hash and LRU list while the test
is running.


> [  405.608016] Call Trace:
> [  405.608016]  <TASK>
> [  405.613020]  nfs4_get_vfs_file+0x325/0x410 [nfsd]
> [  405.618018]  nfsd4_process_open2+0x4ba/0x16d0 [nfsd]
> [  405.623016]  ? inode_get_bytes+0x38/0x40
> [  405.623016]  ? nfsd_permission+0x97/0xf0 [nfsd]
> [  405.628022]  ? fh_verify+0x1cc/0x6f0 [nfsd]
> [  405.633025]  nfsd4_open+0x640/0xb30 [nfsd]
> [  405.638025]  nfsd4_proc_compound+0x3bd/0x710 [nfsd]
> [  405.643017]  nfsd_dispatch+0x143/0x270 [nfsd]
> [  405.648019]  svc_process_common+0x3bf/0x5b0 [sunrpc]
> 
> more detail in attachment file(531.dmesg)
> 
> local.config of fstests:
> 	export NFS_MOUNT_OPTIONS="-o rw,relatime,vers=4.2,nconnect=8"
> changes of generic/531
> 	max_allowable_files=$(( 1 * 1024 * 1024 / $nr_cpus / 2 ))

Changed from:

	max_allowable_files=$(( $(cat /proc/sys/fs/file-max) / $nr_cpus / 2 ))

For my own information, what's $nr_cpus in your test?

Aside from the max_allowable_files setting, can you tell how the
test determines when it should stop creating files? Is it looking
for a particular error code from open(2), for instance?

On my client:

[cel@morisot generic]$ cat /proc/sys/fs/file-max
9223372036854775807
[cel@morisot generic]$

I wonder if it's realistic to expect an NFSv4 server to support
that many open files. Is 9 quintillion files really something
I'm going to have to engineer for, or is this just a crazy
test?


> Best Regards
> Wang Yugui (wangyugui@e16-tech.com)
> 2022/06/23
> 
>> This series overhauls the NFSD filecache, a cache of server-side
>> "struct file" objects recently used by NFS clients. The purposes of
>> this overhaul are an immediate improvement in cache scalability in
>> the number of open files, and preparation for further improvements.
>> 
>> There are three categories of patches in this series:
>> 
>> 1. Add observability of cache operation so we can see what we're
>> doing as changes are made to the code.
>> 
>> 2. Improve the scalability of filecache garbage collection,
>> addressing several bugs along the way.
>> 
>> 3. Improve the scalability of the filecache hash table by converting
>> it to use rhashtable.
>> 
>> The series as it stands survives typical test workloads. Running
>> stress-tests like generic/531 is the next step.
>> 
>> These patches are also available in the linux-nfs-bugzilla-386
>> branch of
>> 
>>  https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git 
>> 
>> ---
>> 
>> Chuck Lever (30):
>>      NFSD: Report filecache LRU size
>>      NFSD: Report count of calls to nfsd_file_acquire()
>>      NFSD: Report count of freed filecache items
>>      NFSD: Report average age of filecache items
>>      NFSD: Add nfsd_file_lru_dispose_list() helper
>>      NFSD: Refactor nfsd_file_gc()
>>      NFSD: Refactor nfsd_file_lru_scan()
>>      NFSD: Report the number of items evicted by the LRU walk
>>      NFSD: Record number of flush calls
>>      NFSD: Report filecache item construction failures
>>      NFSD: Zero counters when the filecache is re-initialized
>>      NFSD: Hook up the filecache stat file
>>      NFSD: WARN when freeing an item still linked via nf_lru
>>      NFSD: Trace filecache LRU activity
>>      NFSD: Leave open files out of the filecache LRU
>>      NFSD: Fix the filecache LRU shrinker
>>      NFSD: Never call nfsd_file_gc() in foreground paths
>>      NFSD: No longer record nf_hashval in the trace log
>>      NFSD: Remove lockdep assertion from unhash_and_release_locked()
>>      NFSD: nfsd_file_unhash can compute hashval from nf->nf_inode
>>      NFSD: Refactor __nfsd_file_close_inode()
>>      NFSD: nfsd_file_hash_remove can compute hashval
>>      NFSD: Remove nfsd_file::nf_hashval
>>      NFSD: Remove stale comment from nfsd_file_acquire()
>>      NFSD: Clean up "open file" case in nfsd_file_acquire()
>>      NFSD: Document nfsd_file_cache_purge() API contract
>>      NFSD: Replace the "init once" mechanism
>>      NFSD: Set up an rhashtable for the filecache
>>      NFSD: Convert the filecache to use rhashtable
>>      NFSD: Clean up unusued code after rhashtable conversion
>> 
>> 
>> fs/nfsd/filecache.c | 677 +++++++++++++++++++++++++++-----------------
>> fs/nfsd/filecache.h |   6 +-
>> fs/nfsd/nfsctl.c    |  10 +
>> fs/nfsd/trace.h     | 117 ++++++--
>> 4 files changed, 522 insertions(+), 288 deletions(-)
>> 
>> --
>> Chuck Lever
> 
> <531.dmesg>

--
Chuck Lever




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 00/30] Overhaul NFSD filecache
  2022-06-22 19:04   ` Chuck Lever III
@ 2022-06-22 19:59     ` Chuck Lever III
  2022-06-23  9:02       ` Wang Yugui
  2022-06-23  0:21     ` Dave Chinner
  1 sibling, 1 reply; 51+ messages in thread
From: Chuck Lever III @ 2022-06-22 19:59 UTC (permalink / raw)
  To: Wang Yugui; +Cc: Linux NFS Mailing List, netdev, david, tgraf, Jeff Layton



> On Jun 22, 2022, at 3:04 PM, Chuck Lever III <chuck.lever@oracle.com> wrote:
> 
> 
> 
>> On Jun 22, 2022, at 2:36 PM, Wang Yugui <wangyugui@e16-tech.com> wrote:
>> 
>> Hi,
>> 
>> fstests generic/531 triggered a panic on kernel 5.19.0-rc3 with this
>> patchset.
> 
> As I mention in the cover letter, I haven't tried running generic/531
> yet -- no claim at all that this is finished work and that #386 has
> been fixed at this point. I'm merely interested in comments on the
> general approach.
> 
> 
>> [ 405.478056] BUG: kernel NULL pointer dereference, address: 0000000000000049
> 
> The "RIP: " tells the location of the crash. Notice that the call
> trace here does not include that information. From your attachment:
> 
> [ 405.518022] RIP: 0010:nfsd_do_file_acquire+0x4e1/0xb80 [nfsd]
> 
> To match that to a line of source code:
> 
> [cel@manet ~]$ cd src/linux/linux/
> [cel@manet linux]$ scripts/faddr2line ../obj/manet/fs/nfsd/filecache.o nfsd_do_file_acquire+0x4e1
> nfsd_do_file_acquire+0x4e1/0xfc0:
> rht_bucket_insert at /home/cel/src/linux/linux/include/linux/rhashtable.h:303
> (inlined by) __rhashtable_insert_fast at /home/cel/src/linux/linux/include/linux/rhashtable.h:718
> (inlined by) rhashtable_lookup_get_insert_key at /home/cel/src/linux/linux/include/linux/rhashtable.h:982
> (inlined by) nfsd_file_insert at /home/cel/src/linux/linux/fs/nfsd/filecache.c:1031
> (inlined by) nfsd_do_file_acquire at /home/cel/src/linux/linux/fs/nfsd/filecache.c:1089
> [cel@manet linux]$
> 
> This is an example, I'm sure my compiled objects don't match yours.
> 
> And, now that I've added observability, you should be able to do:
> 
> # watch cat /proc/fs/nfsd/filecache
> 
> to see how many items are in the hash and LRU list while the test
> is running.
> 
> 
>> [ 405.608016] Call Trace:
>> [ 405.608016] <TASK>
>> [ 405.613020] nfs4_get_vfs_file+0x325/0x410 [nfsd]
>> [ 405.618018] nfsd4_process_open2+0x4ba/0x16d0 [nfsd]
>> [ 405.623016] ? inode_get_bytes+0x38/0x40
>> [ 405.623016] ? nfsd_permission+0x97/0xf0 [nfsd]
>> [ 405.628022] ? fh_verify+0x1cc/0x6f0 [nfsd]
>> [ 405.633025] nfsd4_open+0x640/0xb30 [nfsd]
>> [ 405.638025] nfsd4_proc_compound+0x3bd/0x710 [nfsd]
>> [ 405.643017] nfsd_dispatch+0x143/0x270 [nfsd]
>> [ 405.648019] svc_process_common+0x3bf/0x5b0 [sunrpc]

I was able to trigger something that looks very much like this crash.
If you remove this line from fs/nfsd/filecache.c:

	.max_size		= 131072, /* buckets */

things get a lot more stable for generic/531.

I'm looking into the issue now.


>> more detail in attachment file(531.dmesg)
>> 
>> local.config of fstests:
>> 	export NFS_MOUNT_OPTIONS="-o rw,relatime,vers=4.2,nconnect=8"
>> changes of generic/531
>> 	max_allowable_files=$(( 1 * 1024 * 1024 / $nr_cpus / 2 ))
> 
> Changed from:
> 
> 	max_allowable_files=$(( $(cat /proc/sys/fs/file-max) / $nr_cpus / 2 ))
> 
> For my own information, what's $nr_cpus in your test?
> 
> Aside from the max_allowable_files setting, can you tell how the
> test determines when it should stop creating files? Is it looking
> for a particular error code from open(2), for instance?
> 
> On my client:
> 
> [cel@morisot generic]$ cat /proc/sys/fs/file-max
> 9223372036854775807
> [cel@morisot generic]$
> 
> I wonder if it's realistic to expect an NFSv4 server to support
> that many open files. Is 9 quintillion files really something
> I'm going to have to engineer for, or is this just a crazy
> test?
> 
> 
>> Best Regards
>> Wang Yugui (wangyugui@e16-tech.com)
>> 2022/06/23
>> 
>>> This series overhauls the NFSD filecache, a cache of server-side
>>> "struct file" objects recently used by NFS clients. The purposes of
>>> this overhaul are an immediate improvement in cache scalability in
>>> the number of open files, and preparation for further improvements.
>>> 
>>> There are three categories of patches in this series:
>>> 
>>> 1. Add observability of cache operation so we can see what we're
>>> doing as changes are made to the code.
>>> 
>>> 2. Improve the scalability of filecache garbage collection,
>>> addressing several bugs along the way.
>>> 
>>> 3. Improve the scalability of the filecache hash table by converting
>>> it to use rhashtable.
>>> 
>>> The series as it stands survives typical test workloads. Running
>>> stress-tests like generic/531 is the next step.
>>> 
>>> These patches are also available in the linux-nfs-bugzilla-386
>>> branch of
>>> 
>>> https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git 
>>> 
>>> ---
>>> 
>>> Chuck Lever (30):
>>> NFSD: Report filecache LRU size
>>> NFSD: Report count of calls to nfsd_file_acquire()
>>> NFSD: Report count of freed filecache items
>>> NFSD: Report average age of filecache items
>>> NFSD: Add nfsd_file_lru_dispose_list() helper
>>> NFSD: Refactor nfsd_file_gc()
>>> NFSD: Refactor nfsd_file_lru_scan()
>>> NFSD: Report the number of items evicted by the LRU walk
>>> NFSD: Record number of flush calls
>>> NFSD: Report filecache item construction failures
>>> NFSD: Zero counters when the filecache is re-initialized
>>> NFSD: Hook up the filecache stat file
>>> NFSD: WARN when freeing an item still linked via nf_lru
>>> NFSD: Trace filecache LRU activity
>>> NFSD: Leave open files out of the filecache LRU
>>> NFSD: Fix the filecache LRU shrinker
>>> NFSD: Never call nfsd_file_gc() in foreground paths
>>> NFSD: No longer record nf_hashval in the trace log
>>> NFSD: Remove lockdep assertion from unhash_and_release_locked()
>>> NFSD: nfsd_file_unhash can compute hashval from nf->nf_inode
>>> NFSD: Refactor __nfsd_file_close_inode()
>>> NFSD: nfsd_file_hash_remove can compute hashval
>>> NFSD: Remove nfsd_file::nf_hashval
>>> NFSD: Remove stale comment from nfsd_file_acquire()
>>> NFSD: Clean up "open file" case in nfsd_file_acquire()
>>> NFSD: Document nfsd_file_cache_purge() API contract
>>> NFSD: Replace the "init once" mechanism
>>> NFSD: Set up an rhashtable for the filecache
>>> NFSD: Convert the filecache to use rhashtable
>>> NFSD: Clean up unusued code after rhashtable conversion
>>> 
>>> 
>>> fs/nfsd/filecache.c | 677 +++++++++++++++++++++++++++-----------------
>>> fs/nfsd/filecache.h | 6 +-
>>> fs/nfsd/nfsctl.c | 10 +
>>> fs/nfsd/trace.h | 117 ++++++--
>>> 4 files changed, 522 insertions(+), 288 deletions(-)
>>> 
>>> --
>>> Chuck Lever
>> 
>> <531.dmesg>
> 
> --
> Chuck Lever

--
Chuck Lever




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 00/30] Overhaul NFSD filecache
  2022-06-22 19:04   ` Chuck Lever III
  2022-06-22 19:59     ` Chuck Lever III
@ 2022-06-23  0:21     ` Dave Chinner
  2022-06-23  1:01       ` Chuck Lever III
  1 sibling, 1 reply; 51+ messages in thread
From: Dave Chinner @ 2022-06-23  0:21 UTC (permalink / raw)
  To: Chuck Lever III
  Cc: Wang Yugui, Linux NFS Mailing List, netdev, tgraf, Jeff Layton

On Wed, Jun 22, 2022 at 07:04:39PM +0000, Chuck Lever III wrote:
> > more detail in attachment file(531.dmesg)
> > 
> > local.config of fstests:
> > 	export NFS_MOUNT_OPTIONS="-o rw,relatime,vers=4.2,nconnect=8"
> > changes of generic/531
> > 	max_allowable_files=$(( 1 * 1024 * 1024 / $nr_cpus / 2 ))
> 
> Changed from:
> 
> 	max_allowable_files=$(( $(cat /proc/sys/fs/file-max) / $nr_cpus / 2 ))
> 
> For my own information, what's $nr_cpus in your test?
> 
> Aside from the max_allowable_files setting, can you tell how the
> test determines when it should stop creating files? Is it looking
> for a particular error code from open(2), for instance?
> 
> On my client:
> 
> [cel@morisot generic]$ cat /proc/sys/fs/file-max
> 9223372036854775807
> [cel@morisot generic]$

$ echo $((2**63 - 1))
9223372036854775807

i.e. LLONG_MAX, or "no limit is set".

> I wonder if it's realistic to expect an NFSv4 server to support
> that many open files. Is 9 quintillion files really something
> I'm going to have to engineer for, or is this just a crazy
> test?

The test does not use the value directly - it's a max value for
clamping:

max_files=$((50000 * LOAD_FACTOR))
max_allowable_files=$(( $(cat /proc/sys/fs/file-max) / $nr_cpus / 2 ))
test $max_allowable_files -gt 0 && test $max_files -gt $max_allowable_files && \
        max_files=$max_allowable_files
ulimit -n $max_files

i.e. the result should be

max_files = max(50000, max_allowable_files)

So the test should only be allowing 50,000 open unlinked files to be
created before unmounting. Which means there's lots of silly
renaming going on at the client and so the server is probably seeing
100,000 unique file handles across the test....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 29/30] NFSD: Convert the filecache to use rhashtable
  2022-06-22 14:15 ` [PATCH RFC 29/30] NFSD: Convert the filecache to use rhashtable Chuck Lever
@ 2022-06-23  0:38   ` Dave Chinner
  2022-06-23  0:58     ` Chuck Lever III
  0 siblings, 1 reply; 51+ messages in thread
From: Dave Chinner @ 2022-06-23  0:38 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs, netdev, tgraf, jlayton

On Wed, Jun 22, 2022 at 10:15:56AM -0400, Chuck Lever wrote:
> Enable the filecache hash table to start small, then grow with the
> workload. Smaller server deployments benefit because there should
> be lower memory utilization. Larger server deployments should see
> improved scaling with the number of open files.
> 
> I know this is a big and messy patch, but there's no good way to
> rip out and replace a data structure like this.
> 
> Suggested-by: Jeff Layton <jlayton@kernel.org>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

Pretty sure I mentioned converting to rhashtable as well when we
were talking about the pointer-chasing overhead of list and tree
based indexing of large caches.  :)

> +
> +/*
> + * Atomically insert a new nfsd_file item into nfsd_file_rhash_tbl.
> + *
> + * Return values:
> + *   %NULL: @new was inserted successfully
> + *   %A valid pointer: @new was not inserted, a matching item is returned
> + *   %ERR_PTR: an unexpected error occurred during insertion
> + */
> +static struct nfsd_file *nfsd_file_insert(struct nfsd_file *new)
> +{
> +	struct nfsd_file_lookup_key key = {
> +		.type	= NFSD_FILE_KEY_FULL,
> +		.inode	= new->nf_inode,
> +		.need	= new->nf_flags,
> +		.net	= new->nf_net,
> +		.cred	= current_cred(),
> +	};
> +	struct nfsd_file *nf;
> +
> +	nf = rhashtable_lookup_get_insert_key(&nfsd_file_rhash_tbl,
> +					      &key, &new->nf_rhash,
> +					      nfsd_file_rhash_params);
> +	if (!nf)
> +		return nf;

The insert can return an error (e.g. -ENOMEM) so need to check
IS_ERR(nf) here as well.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 29/30] NFSD: Convert the filecache to use rhashtable
  2022-06-23  0:38   ` Dave Chinner
@ 2022-06-23  0:58     ` Chuck Lever III
  2022-06-23 17:27       ` Chuck Lever III
  0 siblings, 1 reply; 51+ messages in thread
From: Chuck Lever III @ 2022-06-23  0:58 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Linux NFS Mailing List, netdev, tgraf, Jeff Layton



> On Jun 22, 2022, at 8:38 PM, Dave Chinner <david@fromorbit.com> wrote:
> 
> On Wed, Jun 22, 2022 at 10:15:56AM -0400, Chuck Lever wrote:
>> Enable the filecache hash table to start small, then grow with the
>> workload. Smaller server deployments benefit because there should
>> be lower memory utilization. Larger server deployments should see
>> improved scaling with the number of open files.
>> 
>> I know this is a big and messy patch, but there's no good way to
>> rip out and replace a data structure like this.
>> 
>> Suggested-by: Jeff Layton <jlayton@kernel.org>
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> 
> Pretty sure I mentioned converting to rhashtable as well when we
> were talking about the pointer-chasing overhead of list and tree
> based indexing of large caches.  :)

Jeff's suggestion was right in the source code :-) but fair enough.
The idea was also discussed when the filecache code was changed to
use kzvalloc recently.

I appreciate your review and your advice!


>> +
>> +/*
>> + * Atomically insert a new nfsd_file item into nfsd_file_rhash_tbl.
>> + *
>> + * Return values:
>> + *   %NULL: @new was inserted successfully
>> + *   %A valid pointer: @new was not inserted, a matching item is returned
>> + *   %ERR_PTR: an unexpected error occurred during insertion
>> + */
>> +static struct nfsd_file *nfsd_file_insert(struct nfsd_file *new)
>> +{
>> +	struct nfsd_file_lookup_key key = {
>> +		.type	= NFSD_FILE_KEY_FULL,
>> +		.inode	= new->nf_inode,
>> +		.need	= new->nf_flags,
>> +		.net	= new->nf_net,
>> +		.cred	= current_cred(),
>> +	};
>> +	struct nfsd_file *nf;
>> +
>> +	nf = rhashtable_lookup_get_insert_key(&nfsd_file_rhash_tbl,
>> +					      &key, &new->nf_rhash,
>> +					      nfsd_file_rhash_params);
>> +	if (!nf)
>> +		return nf;
> 
> The insert can return an error (e.g. -ENOMEM) so need to check
> IS_ERR(nf) here as well.

That is likely the cause of the BUG that Wang just reported, as
that will send a ERR_PTR to nfsd_file_get(), which blows up when
it tries to defererence it.

I'll resend the series first thing tomorrow morning after some
more clean up and testing.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 00/30] Overhaul NFSD filecache
  2022-06-23  0:21     ` Dave Chinner
@ 2022-06-23  1:01       ` Chuck Lever III
  0 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever III @ 2022-06-23  1:01 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Wang Yugui, Linux NFS Mailing List, netdev, tgraf, Jeff Layton



> On Jun 22, 2022, at 8:21 PM, Dave Chinner <david@fromorbit.com> wrote:
> 
> On Wed, Jun 22, 2022 at 07:04:39PM +0000, Chuck Lever III wrote:
>>> more detail in attachment file(531.dmesg)
>>> 
>>> local.config of fstests:
>>> 	export NFS_MOUNT_OPTIONS="-o rw,relatime,vers=4.2,nconnect=8"
>>> changes of generic/531
>>> 	max_allowable_files=$(( 1 * 1024 * 1024 / $nr_cpus / 2 ))
>> 
>> Changed from:
>> 
>> 	max_allowable_files=$(( $(cat /proc/sys/fs/file-max) / $nr_cpus / 2 ))
>> 
>> For my own information, what's $nr_cpus in your test?
>> 
>> Aside from the max_allowable_files setting, can you tell how the
>> test determines when it should stop creating files? Is it looking
>> for a particular error code from open(2), for instance?
>> 
>> On my client:
>> 
>> [cel@morisot generic]$ cat /proc/sys/fs/file-max
>> 9223372036854775807
>> [cel@morisot generic]$
> 
> $ echo $((2**63 - 1))
> 9223372036854775807
> 
> i.e. LLONG_MAX, or "no limit is set".
> 
>> I wonder if it's realistic to expect an NFSv4 server to support
>> that many open files. Is 9 quintillion files really something
>> I'm going to have to engineer for, or is this just a crazy
>> test?
> 
> The test does not use the value directly - it's a max value for
> clamping:
> 
> max_files=$((50000 * LOAD_FACTOR))
> max_allowable_files=$(( $(cat /proc/sys/fs/file-max) / $nr_cpus / 2 ))
> test $max_allowable_files -gt 0 && test $max_files -gt $max_allowable_files && \
>        max_files=$max_allowable_files
> ulimit -n $max_files
> 
> i.e. the result should be
> 
> max_files = max(50000, max_allowable_files)
> 
> So the test should only be allowing 50,000 open unlinked files to be
> created before unmounting.

Looking at my testing, it's ~50,000 per worker thread, and there are
2 workers per physical core on the client. But thankfully, this is
much smaller than 9 quintillion.


> Which means there's lots of silly
> renaming going on at the client and so the server is probably seeing
> 100,000 unique file handles across the test....
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

--
Chuck Lever




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 00/30] Overhaul NFSD filecache
  2022-06-22 19:59     ` Chuck Lever III
@ 2022-06-23  9:02       ` Wang Yugui
  2022-06-23 16:44         ` Chuck Lever III
  0 siblings, 1 reply; 51+ messages in thread
From: Wang Yugui @ 2022-06-23  9:02 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Linux NFS Mailing List, netdev, david, tgraf, Jeff Layton

Hi,

> > On Jun 22, 2022, at 3:04 PM, Chuck Lever III <chuck.lever@oracle.com> wrote:
> >> On Jun 22, 2022, at 2:36 PM, Wang Yugui <wangyugui@e16-tech.com> wrote:
> >> 
> >> Hi,
> >> 
> >> fstests generic/531 triggered a panic on kernel 5.19.0-rc3 with this
> >> patchset.
> > 
> > As I mention in the cover letter, I haven't tried running generic/531
> > yet -- no claim at all that this is finished work and that #386 has
> > been fixed at this point. I'm merely interested in comments on the
> > general approach.
> > 
> > 
> >> [ 405.478056] BUG: kernel NULL pointer dereference, address: 0000000000000049
> > 
> > The "RIP: " tells the location of the crash. Notice that the call
> > trace here does not include that information. From your attachment:
> > 
> > [ 405.518022] RIP: 0010:nfsd_do_file_acquire+0x4e1/0xb80 [nfsd]
> > 
> > To match that to a line of source code:
> > 
> > [cel@manet ~]$ cd src/linux/linux/
> > [cel@manet linux]$ scripts/faddr2line ../obj/manet/fs/nfsd/filecache.o nfsd_do_file_acquire+0x4e1
> > nfsd_do_file_acquire+0x4e1/0xfc0:
> > rht_bucket_insert at /home/cel/src/linux/linux/include/linux/rhashtable.h:303
> > (inlined by) __rhashtable_insert_fast at /home/cel/src/linux/linux/include/linux/rhashtable.h:718
> > (inlined by) rhashtable_lookup_get_insert_key at /home/cel/src/linux/linux/include/linux/rhashtable.h:982
> > (inlined by) nfsd_file_insert at /home/cel/src/linux/linux/fs/nfsd/filecache.c:1031
> > (inlined by) nfsd_do_file_acquire at /home/cel/src/linux/linux/fs/nfsd/filecache.c:1089
> > [cel@manet linux]$
> > 
> > This is an example, I'm sure my compiled objects don't match yours.
> > 
> > And, now that I've added observability, you should be able to do:
> > 
> > # watch cat /proc/fs/nfsd/filecache
> > 
> > to see how many items are in the hash and LRU list while the test
> > is running.
> > 
> > 
> >> [ 405.608016] Call Trace:
> >> [ 405.608016] <TASK>
> >> [ 405.613020] nfs4_get_vfs_file+0x325/0x410 [nfsd]
> >> [ 405.618018] nfsd4_process_open2+0x4ba/0x16d0 [nfsd]
> >> [ 405.623016] ? inode_get_bytes+0x38/0x40
> >> [ 405.623016] ? nfsd_permission+0x97/0xf0 [nfsd]
> >> [ 405.628022] ? fh_verify+0x1cc/0x6f0 [nfsd]
> >> [ 405.633025] nfsd4_open+0x640/0xb30 [nfsd]
> >> [ 405.638025] nfsd4_proc_compound+0x3bd/0x710 [nfsd]
> >> [ 405.643017] nfsd_dispatch+0x143/0x270 [nfsd]
> >> [ 405.648019] svc_process_common+0x3bf/0x5b0 [sunrpc]
> 
> I was able to trigger something that looks very much like this crash.
> If you remove this line from fs/nfsd/filecache.c:
> 
> 	.max_size		= 131072, /* buckets */
> 
> things get a lot more stable for generic/531.
> 
> I'm looking into the issue now.

Yes.  When '.max_size  = 131072' is removed,  fstests generic/531 passed.

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2022/06/23


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 00/30] Overhaul NFSD filecache
  2022-06-23  9:02       ` Wang Yugui
@ 2022-06-23 16:44         ` Chuck Lever III
  2022-06-23 17:51           ` Wang Yugui
  0 siblings, 1 reply; 51+ messages in thread
From: Chuck Lever III @ 2022-06-23 16:44 UTC (permalink / raw)
  To: Wang Yugui; +Cc: Linux NFS Mailing List, netdev, david, tgraf, Jeff Layton



> On Jun 23, 2022, at 5:02 AM, Wang Yugui <wangyugui@e16-tech.com> wrote:
> 
> Hi,
> 
>>> On Jun 22, 2022, at 3:04 PM, Chuck Lever III <chuck.lever@oracle.com> wrote:
>>>> On Jun 22, 2022, at 2:36 PM, Wang Yugui <wangyugui@e16-tech.com> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> fstests generic/531 triggered a panic on kernel 5.19.0-rc3 with this
>>>> patchset.
>>> 
>>> As I mention in the cover letter, I haven't tried running generic/531
>>> yet -- no claim at all that this is finished work and that #386 has
>>> been fixed at this point. I'm merely interested in comments on the
>>> general approach.
>>> 
>>> 
>>>> [ 405.478056] BUG: kernel NULL pointer dereference, address: 0000000000000049
>>> 
>>> The "RIP: " tells the location of the crash. Notice that the call
>>> trace here does not include that information. From your attachment:
>>> 
>>> [ 405.518022] RIP: 0010:nfsd_do_file_acquire+0x4e1/0xb80 [nfsd]
>>> 
>>> To match that to a line of source code:
>>> 
>>> [cel@manet ~]$ cd src/linux/linux/
>>> [cel@manet linux]$ scripts/faddr2line ../obj/manet/fs/nfsd/filecache.o nfsd_do_file_acquire+0x4e1
>>> nfsd_do_file_acquire+0x4e1/0xfc0:
>>> rht_bucket_insert at /home/cel/src/linux/linux/include/linux/rhashtable.h:303
>>> (inlined by) __rhashtable_insert_fast at /home/cel/src/linux/linux/include/linux/rhashtable.h:718
>>> (inlined by) rhashtable_lookup_get_insert_key at /home/cel/src/linux/linux/include/linux/rhashtable.h:982
>>> (inlined by) nfsd_file_insert at /home/cel/src/linux/linux/fs/nfsd/filecache.c:1031
>>> (inlined by) nfsd_do_file_acquire at /home/cel/src/linux/linux/fs/nfsd/filecache.c:1089
>>> [cel@manet linux]$
>>> 
>>> This is an example, I'm sure my compiled objects don't match yours.
>>> 
>>> And, now that I've added observability, you should be able to do:
>>> 
>>> # watch cat /proc/fs/nfsd/filecache
>>> 
>>> to see how many items are in the hash and LRU list while the test
>>> is running.
>>> 
>>> 
>>>> [ 405.608016] Call Trace:
>>>> [ 405.608016] <TASK>
>>>> [ 405.613020] nfs4_get_vfs_file+0x325/0x410 [nfsd]
>>>> [ 405.618018] nfsd4_process_open2+0x4ba/0x16d0 [nfsd]
>>>> [ 405.623016] ? inode_get_bytes+0x38/0x40
>>>> [ 405.623016] ? nfsd_permission+0x97/0xf0 [nfsd]
>>>> [ 405.628022] ? fh_verify+0x1cc/0x6f0 [nfsd]
>>>> [ 405.633025] nfsd4_open+0x640/0xb30 [nfsd]
>>>> [ 405.638025] nfsd4_proc_compound+0x3bd/0x710 [nfsd]
>>>> [ 405.643017] nfsd_dispatch+0x143/0x270 [nfsd]
>>>> [ 405.648019] svc_process_common+0x3bf/0x5b0 [sunrpc]
>> 
>> I was able to trigger something that looks very much like this crash.
>> If you remove this line from fs/nfsd/filecache.c:
>> 
>> 	.max_size		= 131072, /* buckets */
>> 
>> things get a lot more stable for generic/531.
>> 
>> I'm looking into the issue now.
> 
> Yes. When '.max_size = 131072' is removed, fstests generic/531 passed.

Great! Are you comfortable with this general approach for bug #386?


--
Chuck Lever




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 29/30] NFSD: Convert the filecache to use rhashtable
  2022-06-23  0:58     ` Chuck Lever III
@ 2022-06-23 17:27       ` Chuck Lever III
  2022-06-23 22:33         ` Dave Chinner
  0 siblings, 1 reply; 51+ messages in thread
From: Chuck Lever III @ 2022-06-23 17:27 UTC (permalink / raw)
  To: Linux NFS Mailing List, netdev, tgraf, Dave Chinner, Jeff Layton


> On Jun 22, 2022, at 8:58 PM, Chuck Lever III <chuck.lever@oracle.com> wrote:
> 
>> On Jun 22, 2022, at 8:38 PM, Dave Chinner <david@fromorbit.com> wrote:
>> 
>> On Wed, Jun 22, 2022 at 10:15:56AM -0400, Chuck Lever wrote:
> 
>>> +
>>> +/*
>>> + * Atomically insert a new nfsd_file item into nfsd_file_rhash_tbl.
>>> + *
>>> + * Return values:
>>> + *   %NULL: @new was inserted successfully
>>> + *   %A valid pointer: @new was not inserted, a matching item is returned
>>> + *   %ERR_PTR: an unexpected error occurred during insertion
>>> + */
>>> +static struct nfsd_file *nfsd_file_insert(struct nfsd_file *new)
>>> +{
>>> +	struct nfsd_file_lookup_key key = {
>>> +		.type	= NFSD_FILE_KEY_FULL,
>>> +		.inode	= new->nf_inode,
>>> +		.need	= new->nf_flags,
>>> +		.net	= new->nf_net,
>>> +		.cred	= current_cred(),
>>> +	};
>>> +	struct nfsd_file *nf;
>>> +
>>> +	nf = rhashtable_lookup_get_insert_key(&nfsd_file_rhash_tbl,
>>> +					      &key, &new->nf_rhash,
>>> +					      nfsd_file_rhash_params);
>>> +	if (!nf)
>>> +		return nf;
>> 
>> The insert can return an error (e.g. -ENOMEM) so need to check
>> IS_ERR(nf) here as well.
> 
> That is likely the cause of the BUG that Wang just reported, as
> that will send a ERR_PTR to nfsd_file_get(), which blows up when
> it tries to defererence it.

Yep, that was it. I've fixed it, but some other doubts have surfaced
in the meantime.

Removing the .max_size cap also helps, and in the long run, I now
feel that cap should be left off. But I would like to be certain that
nfsd_file_acquire's logic works when hard errors occur, so I left the cap
in place for now. I found that the "failed to open newly created file!"
warning fires when insertion fails. I need to work on addressing that
case silently.

Also I just found Neil's nice rhashtable explainer:

   https://lwn.net/Articles/751374/

Where he writes that:

> Sometimes you might want a hash table to potentially contain multiple objects for any given key. In that case you can use "rhltables" — rhashtables with lists of objects.


I believe that is the case for the filecache. The hash value is
computed based on the inode pointer, and therefore there can be more
than one nfsd_file object for a particular inode (depending on who
is opening and for what access). So I think filecache needs to use
rhltable, not rhashtable. Any thoughts from rhashtable experts?


--
Chuck Lever




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 00/30] Overhaul NFSD filecache
  2022-06-23 16:44         ` Chuck Lever III
@ 2022-06-23 17:51           ` Wang Yugui
  2022-06-24 15:30             ` Chuck Lever III
  0 siblings, 1 reply; 51+ messages in thread
From: Wang Yugui @ 2022-06-23 17:51 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Linux NFS Mailing List, netdev, david, tgraf, Jeff Layton

Hi,

> > On Jun 23, 2022, at 5:02 AM, Wang Yugui <wangyugui@e16-tech.com> wrote:
> > 
> > Hi,
> > 
> >>> On Jun 22, 2022, at 3:04 PM, Chuck Lever III <chuck.lever@oracle.com> wrote:
> >>>> On Jun 22, 2022, at 2:36 PM, Wang Yugui <wangyugui@e16-tech.com> wrote:
> >>>> 
> >>>> Hi,
> >>>> 
> >>>> fstests generic/531 triggered a panic on kernel 5.19.0-rc3 with this
> >>>> patchset.
> >>> 
> >>> As I mention in the cover letter, I haven't tried running generic/531
> >>> yet -- no claim at all that this is finished work and that #386 has
> >>> been fixed at this point. I'm merely interested in comments on the
> >>> general approach.
> >>> 
> >>> 
> >>>> [ 405.478056] BUG: kernel NULL pointer dereference, address: 0000000000000049
> >>> 
> >>> The "RIP: " tells the location of the crash. Notice that the call
> >>> trace here does not include that information. From your attachment:
> >>> 
> >>> [ 405.518022] RIP: 0010:nfsd_do_file_acquire+0x4e1/0xb80 [nfsd]
> >>> 
> >>> To match that to a line of source code:
> >>> 
> >>> [cel@manet ~]$ cd src/linux/linux/
> >>> [cel@manet linux]$ scripts/faddr2line ../obj/manet/fs/nfsd/filecache.o nfsd_do_file_acquire+0x4e1
> >>> nfsd_do_file_acquire+0x4e1/0xfc0:
> >>> rht_bucket_insert at /home/cel/src/linux/linux/include/linux/rhashtable.h:303
> >>> (inlined by) __rhashtable_insert_fast at /home/cel/src/linux/linux/include/linux/rhashtable.h:718
> >>> (inlined by) rhashtable_lookup_get_insert_key at /home/cel/src/linux/linux/include/linux/rhashtable.h:982
> >>> (inlined by) nfsd_file_insert at /home/cel/src/linux/linux/fs/nfsd/filecache.c:1031
> >>> (inlined by) nfsd_do_file_acquire at /home/cel/src/linux/linux/fs/nfsd/filecache.c:1089
> >>> [cel@manet linux]$
> >>> 
> >>> This is an example, I'm sure my compiled objects don't match yours.
> >>> 
> >>> And, now that I've added observability, you should be able to do:
> >>> 
> >>> # watch cat /proc/fs/nfsd/filecache
> >>> 
> >>> to see how many items are in the hash and LRU list while the test
> >>> is running.
> >>> 
> >>> 
> >>>> [ 405.608016] Call Trace:
> >>>> [ 405.608016] <TASK>
> >>>> [ 405.613020] nfs4_get_vfs_file+0x325/0x410 [nfsd]
> >>>> [ 405.618018] nfsd4_process_open2+0x4ba/0x16d0 [nfsd]
> >>>> [ 405.623016] ? inode_get_bytes+0x38/0x40
> >>>> [ 405.623016] ? nfsd_permission+0x97/0xf0 [nfsd]
> >>>> [ 405.628022] ? fh_verify+0x1cc/0x6f0 [nfsd]
> >>>> [ 405.633025] nfsd4_open+0x640/0xb30 [nfsd]
> >>>> [ 405.638025] nfsd4_proc_compound+0x3bd/0x710 [nfsd]
> >>>> [ 405.643017] nfsd_dispatch+0x143/0x270 [nfsd]
> >>>> [ 405.648019] svc_process_common+0x3bf/0x5b0 [sunrpc]
> >> 
> >> I was able to trigger something that looks very much like this crash.
> >> If you remove this line from fs/nfsd/filecache.c:
> >> 
> >> 	.max_size		= 131072, /* buckets */
> >> 
> >> things get a lot more stable for generic/531.
> >> 
> >> I'm looking into the issue now.
> > 
> > Yes. When '.max_size = 131072' is removed, fstests generic/531 passed.
> 
> Great! Are you comfortable with this general approach for bug #386?

It seems a good result for #386.

fstests generic/531(file-max: 1M) performance result:
base(5.19.0-rc3, 12 bits hash, serialized nfsd_file_gc): 222s
this patchset(.min_size=4096): 59s
so, a good improvement for #386.

It seems a good(acceptable) result for #387 too.
the period of 'text busy(exec directly from the back-end of nfs-server)'
is about 4s.

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2022/06/24



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 00/30] Overhaul NFSD filecache
  2022-06-22 14:12 [PATCH RFC 00/30] Overhaul NFSD filecache Chuck Lever
                   ` (30 preceding siblings ...)
  2022-06-22 18:36 ` [PATCH RFC 00/30] Overhaul NFSD filecache Wang Yugui
@ 2022-06-23 20:27 ` Frank van der Linden
  2022-06-28 17:57   ` Chuck Lever III
  31 siblings, 1 reply; 51+ messages in thread
From: Frank van der Linden @ 2022-06-23 20:27 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs, netdev, david, tgraf, jlayton

On Wed, Jun 22, 2022 at 7:12 AM Chuck Lever <chuck.lever@oracle.com> wrote:
>
> This series overhauls the NFSD filecache, a cache of server-side
> "struct file" objects recently used by NFS clients. The purposes of
> this overhaul are an immediate improvement in cache scalability in
> the number of open files, and preparation for further improvements.
>
> There are three categories of patches in this series:
>
> 1. Add observability of cache operation so we can see what we're
> doing as changes are made to the code.
>
> 2. Improve the scalability of filecache garbage collection,
> addressing several bugs along the way.
>
> 3. Improve the scalability of the filecache hash table by converting
> it to use rhashtable.
>
> The series as it stands survives typical test workloads. Running
> stress-tests like generic/531 is the next step.
>
> These patches are also available in the linux-nfs-bugzilla-386
> branch of
>
>   https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git
>
> ---
>
> Chuck Lever (30):
>       NFSD: Report filecache LRU size
>       NFSD: Report count of calls to nfsd_file_acquire()
>       NFSD: Report count of freed filecache items
>       NFSD: Report average age of filecache items
>       NFSD: Add nfsd_file_lru_dispose_list() helper
>       NFSD: Refactor nfsd_file_gc()
>       NFSD: Refactor nfsd_file_lru_scan()
>       NFSD: Report the number of items evicted by the LRU walk
>       NFSD: Record number of flush calls
>       NFSD: Report filecache item construction failures
>       NFSD: Zero counters when the filecache is re-initialized
>       NFSD: Hook up the filecache stat file
>       NFSD: WARN when freeing an item still linked via nf_lru
>       NFSD: Trace filecache LRU activity
>       NFSD: Leave open files out of the filecache LRU
>       NFSD: Fix the filecache LRU shrinker
>       NFSD: Never call nfsd_file_gc() in foreground paths
>       NFSD: No longer record nf_hashval in the trace log
>       NFSD: Remove lockdep assertion from unhash_and_release_locked()
>       NFSD: nfsd_file_unhash can compute hashval from nf->nf_inode
>       NFSD: Refactor __nfsd_file_close_inode()
>       NFSD: nfsd_file_hash_remove can compute hashval
>       NFSD: Remove nfsd_file::nf_hashval
>       NFSD: Remove stale comment from nfsd_file_acquire()
>       NFSD: Clean up "open file" case in nfsd_file_acquire()
>       NFSD: Document nfsd_file_cache_purge() API contract
>       NFSD: Replace the "init once" mechanism
>       NFSD: Set up an rhashtable for the filecache
>       NFSD: Convert the filecache to use rhashtable
>       NFSD: Clean up unusued code after rhashtable conversion
>
>
>  fs/nfsd/filecache.c | 677 +++++++++++++++++++++++++++-----------------
>  fs/nfsd/filecache.h |   6 +-
>  fs/nfsd/nfsctl.c    |  10 +
>  fs/nfsd/trace.h     | 117 ++++++--
>  4 files changed, 522 insertions(+), 288 deletions(-)
>
> --
> Chuck Lever
>

Yep, looks good so far, thanks for doing this. Somewhat similar to my (buggy)
attempt at fixing it that I sent at the time (don't put open files on
the LRU, and
use rhashtable), but cleaner and, presumably, less buggy :)

Can't test it right now, but it seems like Wang already confirmed that it works.

- Frank

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 29/30] NFSD: Convert the filecache to use rhashtable
  2022-06-23 17:27       ` Chuck Lever III
@ 2022-06-23 22:33         ` Dave Chinner
  2022-06-23 23:59           ` Chuck Lever III
  0 siblings, 1 reply; 51+ messages in thread
From: Dave Chinner @ 2022-06-23 22:33 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Linux NFS Mailing List, netdev, tgraf, Jeff Layton

On Thu, Jun 23, 2022 at 05:27:20PM +0000, Chuck Lever III wrote:
> Also I just found Neil's nice rhashtable explainer:
> 
>    https://lwn.net/Articles/751374/
> 
> Where he writes that:
> 
> > Sometimes you might want a hash table to potentially contain
> > multiple objects for any given key. In that case you can use
> > "rhltables" — rhashtables with lists of objects.
> 
> I believe that is the case for the filecache. The hash value is
> computed based on the inode pointer, and therefore there can be more
> than one nfsd_file object for a particular inode (depending on who
> is opening and for what access). So I think filecache needs to use
> rhltable, not rhashtable. Any thoughts from rhashtable experts?

Huh, I assumed the file cache was just hashing the whole key so that
every object in the rht has it's own unique key and hash and there's
no need to handle multiple objects per key...

What are you trying to optimise by hashing only the inode *pointer*
in the nfsd_file object keyspace?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 28/30] NFSD: Set up an rhashtable for the filecache
  2022-06-22 14:15 ` [PATCH RFC 28/30] NFSD: Set up an rhashtable for the filecache Chuck Lever
@ 2022-06-23 22:56   ` Al Viro
  2022-06-23 23:51     ` Chuck Lever III
  0 siblings, 1 reply; 51+ messages in thread
From: Al Viro @ 2022-06-23 22:56 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs, netdev, david, tgraf, jlayton

On Wed, Jun 22, 2022 at 10:15:50AM -0400, Chuck Lever wrote:

> +static u32 nfsd_file_obj_hashfn(const void *data, u32 len, u32 seed)
> +{
> +	const struct nfsd_file *nf = data;
> +
> +	return jhash2((const u32 *)&nf->nf_inode,
> +		      sizeof_field(struct nfsd_file, nf_inode) / sizeof(u32),
> +		      seed);

Out of curiosity - what are you using to allocate those?  Because if
it's a slab, then middle bits of address (i.e. lower bits of
(unsigned long)data / L1_CACHE_BYTES) would better be random enough...

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 28/30] NFSD: Set up an rhashtable for the filecache
  2022-06-23 22:56   ` Al Viro
@ 2022-06-23 23:51     ` Chuck Lever III
  2022-06-24  0:14       ` Chuck Lever III
  0 siblings, 1 reply; 51+ messages in thread
From: Chuck Lever III @ 2022-06-23 23:51 UTC (permalink / raw)
  To: Al Viro; +Cc: Linux NFS Mailing List, netdev, Dave Chinner, tgraf, Jeff Layton



> On Jun 23, 2022, at 6:56 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> 
> On Wed, Jun 22, 2022 at 10:15:50AM -0400, Chuck Lever wrote:
> 
>> +static u32 nfsd_file_obj_hashfn(const void *data, u32 len, u32 seed)
>> +{
>> +	const struct nfsd_file *nf = data;
>> +
>> +	return jhash2((const u32 *)&nf->nf_inode,
>> +		      sizeof_field(struct nfsd_file, nf_inode) / sizeof(u32),
>> +		      seed);
> 
> Out of curiosity - what are you using to allocate those?  Because if
> it's a slab, then middle bits of address (i.e. lower bits of
> (unsigned long)data / L1_CACHE_BYTES) would better be random enough...

 261 static struct nfsd_file *
 262 nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may)
 263 {
 264         static atomic_t nfsd_file_id;
 265         struct nfsd_file *nf;
 266 
 267         nf = kmem_cache_alloc(nfsd_file_slab, GFP_KERNEL);

Was wondering about that. pahole says struct nfsd_file is 112
bytes on my system.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 29/30] NFSD: Convert the filecache to use rhashtable
  2022-06-23 22:33         ` Dave Chinner
@ 2022-06-23 23:59           ` Chuck Lever III
  0 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever III @ 2022-06-23 23:59 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Linux NFS Mailing List, netdev, tgraf, Jeff Layton



> On Jun 23, 2022, at 6:33 PM, Dave Chinner <david@fromorbit.com> wrote:
> 
> On Thu, Jun 23, 2022 at 05:27:20PM +0000, Chuck Lever III wrote:
>> Also I just found Neil's nice rhashtable explainer:
>> 
>>   https://lwn.net/Articles/751374/
>> 
>> Where he writes that:
>> 
>>> Sometimes you might want a hash table to potentially contain
>>> multiple objects for any given key. In that case you can use
>>> "rhltables" — rhashtables with lists of objects.
>> 
>> I believe that is the case for the filecache. The hash value is
>> computed based on the inode pointer, and therefore there can be more
>> than one nfsd_file object for a particular inode (depending on who
>> is opening and for what access). So I think filecache needs to use
>> rhltable, not rhashtable. Any thoughts from rhashtable experts?
> 
> Huh, I assumed the file cache was just hashing the whole key so that
> every object in the rht has it's own unique key and hash and there's
> no need to handle multiple objects per key...
> 
> What are you trying to optimise by hashing only the inode *pointer*
> in the nfsd_file object keyspace?

Well, this design is inherited from the current filecache
implementation.

It assumes that all nfsd_file objects that refer to the same
inode will always get chained into the same bucket. That way:

 506 static void
 507 __nfsd_file_close_inode(struct inode *inode, unsigned int hashval,
 508                         struct list_head *dispose)
 509 {
 510         struct nfsd_file        *nf;
 511         struct hlist_node       *tmp;
 512 
 513         spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock);
 514         hlist_for_each_entry_safe(nf, tmp, &nfsd_file_hashtbl[hashval].nfb_head, nf_node) {
 515                 if (inode == nf->nf_inode)
 516                         nfsd_file_unhash_and_release_locked(nf, dispose);
 517         }
 518         spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
 519 }

nfsd_file_close_inode() can lock one hash bucket and just
walk that hash chain to find all the nfsd_file's associated
with a particular in-core inode.

Actually I don't think there's any other reason to keep that
hashing design, but Jeff can confirm that.

So I guess we could use rhltable and keep the nfsd_file items
for the same inode on the same hash list? I'm not sure it's
worth the trouble: this part of filecache isn't really on the
hot path.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 28/30] NFSD: Set up an rhashtable for the filecache
  2022-06-23 23:51     ` Chuck Lever III
@ 2022-06-24  0:14       ` Chuck Lever III
  2022-06-24  0:29         ` Al Viro
  0 siblings, 1 reply; 51+ messages in thread
From: Chuck Lever III @ 2022-06-24  0:14 UTC (permalink / raw)
  To: Al Viro; +Cc: Linux NFS Mailing List, netdev, Dave Chinner, tgraf, Jeff Layton


> On Jun 23, 2022, at 7:51 PM, Chuck Lever III <chuck.lever@oracle.com> wrote:
> 
>> On Jun 23, 2022, at 6:56 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>> 
>> On Wed, Jun 22, 2022 at 10:15:50AM -0400, Chuck Lever wrote:
>> 
>>> +static u32 nfsd_file_obj_hashfn(const void *data, u32 len, u32 seed)
>>> +{
>>> +	const struct nfsd_file *nf = data;
>>> +
>>> +	return jhash2((const u32 *)&nf->nf_inode,
>>> +		      sizeof_field(struct nfsd_file, nf_inode) / sizeof(u32),
>>> +		      seed);
>> 
>> Out of curiosity - what are you using to allocate those?  Because if
>> it's a slab, then middle bits of address (i.e. lower bits of
>> (unsigned long)data / L1_CACHE_BYTES) would better be random enough...
> 
> 261 static struct nfsd_file *
> 262 nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may)
> 263 {
> 264         static atomic_t nfsd_file_id;
> 265         struct nfsd_file *nf;
> 266 
> 267         nf = kmem_cache_alloc(nfsd_file_slab, GFP_KERNEL);
> 
> Was wondering about that. pahole says struct nfsd_file is 112
> bytes on my system.

Oops. nfsd_file_obj_hashfn() is supposed to be generating the
hash value based on the address stored in the nf_inode field.
So it's an inode pointer, alloced via kmem_cache_alloc by default.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 28/30] NFSD: Set up an rhashtable for the filecache
  2022-06-24  0:14       ` Chuck Lever III
@ 2022-06-24  0:29         ` Al Viro
  0 siblings, 0 replies; 51+ messages in thread
From: Al Viro @ 2022-06-24  0:29 UTC (permalink / raw)
  To: Chuck Lever III
  Cc: Linux NFS Mailing List, netdev, Dave Chinner, tgraf, Jeff Layton

On Fri, Jun 24, 2022 at 12:14:53AM +0000, Chuck Lever III wrote:
> 
> > On Jun 23, 2022, at 7:51 PM, Chuck Lever III <chuck.lever@oracle.com> wrote:
> > 
> >> On Jun 23, 2022, at 6:56 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >> 
> >> On Wed, Jun 22, 2022 at 10:15:50AM -0400, Chuck Lever wrote:
> >> 
> >>> +static u32 nfsd_file_obj_hashfn(const void *data, u32 len, u32 seed)
> >>> +{
> >>> +	const struct nfsd_file *nf = data;
> >>> +
> >>> +	return jhash2((const u32 *)&nf->nf_inode,
> >>> +		      sizeof_field(struct nfsd_file, nf_inode) / sizeof(u32),
> >>> +		      seed);
> >> 
> >> Out of curiosity - what are you using to allocate those?  Because if
> >> it's a slab, then middle bits of address (i.e. lower bits of
> >> (unsigned long)data / L1_CACHE_BYTES) would better be random enough...
> > 
> > 261 static struct nfsd_file *
> > 262 nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may)
> > 263 {
> > 264         static atomic_t nfsd_file_id;
> > 265         struct nfsd_file *nf;
> > 266 
> > 267         nf = kmem_cache_alloc(nfsd_file_slab, GFP_KERNEL);
> > 
> > Was wondering about that. pahole says struct nfsd_file is 112
> > bytes on my system.
> 
> Oops. nfsd_file_obj_hashfn() is supposed to be generating the
> hash value based on the address stored in the nf_inode field.
> So it's an inode pointer, alloced via kmem_cache_alloc by default.

inode pointers are definitely "divide by L1_CACHE_BYTES and take lower
bits" fodder...

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 00/30] Overhaul NFSD filecache
  2022-06-23 17:51           ` Wang Yugui
@ 2022-06-24 15:30             ` Chuck Lever III
  0 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever III @ 2022-06-24 15:30 UTC (permalink / raw)
  To: Wang Yugui; +Cc: Linux NFS Mailing List, netdev, david, tgraf, Jeff Layton



> On Jun 23, 2022, at 1:51 PM, Wang Yugui <wangyugui@e16-tech.com> wrote:
> 
> Hi,
> 
>>> On Jun 23, 2022, at 5:02 AM, Wang Yugui <wangyugui@e16-tech.com> wrote:
>>> 
>>> Hi,
>>> 
>>>>> On Jun 22, 2022, at 3:04 PM, Chuck Lever III <chuck.lever@oracle.com> wrote:
>>>>>> On Jun 22, 2022, at 2:36 PM, Wang Yugui <wangyugui@e16-tech.com> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> fstests generic/531 triggered a panic on kernel 5.19.0-rc3 with this
>>>>>> patchset.
>>>>> 
>>>>> As I mention in the cover letter, I haven't tried running generic/531
>>>>> yet -- no claim at all that this is finished work and that #386 has
>>>>> been fixed at this point. I'm merely interested in comments on the
>>>>> general approach.
>>>>> 
>>>>> 
>>>>>> [ 405.478056] BUG: kernel NULL pointer dereference, address: 0000000000000049
>>>>> 
>>>>> The "RIP: " tells the location of the crash. Notice that the call
>>>>> trace here does not include that information. From your attachment:
>>>>> 
>>>>> [ 405.518022] RIP: 0010:nfsd_do_file_acquire+0x4e1/0xb80 [nfsd]
>>>>> 
>>>>> To match that to a line of source code:
>>>>> 
>>>>> [cel@manet ~]$ cd src/linux/linux/
>>>>> [cel@manet linux]$ scripts/faddr2line ../obj/manet/fs/nfsd/filecache.o nfsd_do_file_acquire+0x4e1
>>>>> nfsd_do_file_acquire+0x4e1/0xfc0:
>>>>> rht_bucket_insert at /home/cel/src/linux/linux/include/linux/rhashtable.h:303
>>>>> (inlined by) __rhashtable_insert_fast at /home/cel/src/linux/linux/include/linux/rhashtable.h:718
>>>>> (inlined by) rhashtable_lookup_get_insert_key at /home/cel/src/linux/linux/include/linux/rhashtable.h:982
>>>>> (inlined by) nfsd_file_insert at /home/cel/src/linux/linux/fs/nfsd/filecache.c:1031
>>>>> (inlined by) nfsd_do_file_acquire at /home/cel/src/linux/linux/fs/nfsd/filecache.c:1089
>>>>> [cel@manet linux]$
>>>>> 
>>>>> This is an example, I'm sure my compiled objects don't match yours.
>>>>> 
>>>>> And, now that I've added observability, you should be able to do:
>>>>> 
>>>>> # watch cat /proc/fs/nfsd/filecache
>>>>> 
>>>>> to see how many items are in the hash and LRU list while the test
>>>>> is running.
>>>>> 
>>>>> 
>>>>>> [ 405.608016] Call Trace:
>>>>>> [ 405.608016] <TASK>
>>>>>> [ 405.613020] nfs4_get_vfs_file+0x325/0x410 [nfsd]
>>>>>> [ 405.618018] nfsd4_process_open2+0x4ba/0x16d0 [nfsd]
>>>>>> [ 405.623016] ? inode_get_bytes+0x38/0x40
>>>>>> [ 405.623016] ? nfsd_permission+0x97/0xf0 [nfsd]
>>>>>> [ 405.628022] ? fh_verify+0x1cc/0x6f0 [nfsd]
>>>>>> [ 405.633025] nfsd4_open+0x640/0xb30 [nfsd]
>>>>>> [ 405.638025] nfsd4_proc_compound+0x3bd/0x710 [nfsd]
>>>>>> [ 405.643017] nfsd_dispatch+0x143/0x270 [nfsd]
>>>>>> [ 405.648019] svc_process_common+0x3bf/0x5b0 [sunrpc]
>>>> 
>>>> I was able to trigger something that looks very much like this crash.
>>>> If you remove this line from fs/nfsd/filecache.c:
>>>> 
>>>> 	.max_size		= 131072, /* buckets */
>>>> 
>>>> things get a lot more stable for generic/531.
>>>> 
>>>> I'm looking into the issue now.
>>> 
>>> Yes. When '.max_size = 131072' is removed, fstests generic/531 passed.
>> 
>> Great! Are you comfortable with this general approach for bug #386?
> 
> It seems a good result for #386.
> 
> fstests generic/531(file-max: 1M) performance result:
> base(5.19.0-rc3, 12 bits hash, serialized nfsd_file_gc): 222s
> this patchset(.min_size=4096): 59s
> so, a good improvement for #386.
> 
> It seems a good(acceptable) result for #387 too.
> the period of 'text busy(exec directly from the back-end of nfs-server)'
> is about 4s.

I was surprised to learn that NFSv4 doesn't close those files outright,
that they remain in the filecache for a bit. I think we can do better
there, but I haven't looked closely at that yet.

I see that files are left open on the server by crashed NFSv4 clients too.
That will result in the server becoming unusable after significant
uptime, which I've seen on occasion (because I crash my clients a lot)
but never looked into until now. 


--
Chuck Lever




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 00/30] Overhaul NFSD filecache
  2022-06-23 20:27 ` Frank van der Linden
@ 2022-06-28 17:57   ` Chuck Lever III
  0 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever III @ 2022-06-28 17:57 UTC (permalink / raw)
  To: Frank van der Linden
  Cc: Linux NFS Mailing List, netdev, Dave Chinner, tgraf, Jeff Layton



> On Jun 23, 2022, at 4:27 PM, Frank van der Linden <fvdl@google.com> wrote:
> 
> On Wed, Jun 22, 2022 at 7:12 AM Chuck Lever <chuck.lever@oracle.com> wrote:
>> 
>> This series overhauls the NFSD filecache, a cache of server-side
>> "struct file" objects recently used by NFS clients. The purposes of
>> this overhaul are an immediate improvement in cache scalability in
>> the number of open files, and preparation for further improvements.
>> 
>> There are three categories of patches in this series:
>> 
>> 1. Add observability of cache operation so we can see what we're
>> doing as changes are made to the code.
>> 
>> 2. Improve the scalability of filecache garbage collection,
>> addressing several bugs along the way.
>> 
>> 3. Improve the scalability of the filecache hash table by converting
>> it to use rhashtable.
>> 
>> The series as it stands survives typical test workloads. Running
>> stress-tests like generic/531 is the next step.
>> 
>> These patches are also available in the linux-nfs-bugzilla-386
>> branch of
>> 
>>  https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git
>> 
>> ---
>> 
>> Chuck Lever (30):
>>      NFSD: Report filecache LRU size
>>      NFSD: Report count of calls to nfsd_file_acquire()
>>      NFSD: Report count of freed filecache items
>>      NFSD: Report average age of filecache items
>>      NFSD: Add nfsd_file_lru_dispose_list() helper
>>      NFSD: Refactor nfsd_file_gc()
>>      NFSD: Refactor nfsd_file_lru_scan()
>>      NFSD: Report the number of items evicted by the LRU walk
>>      NFSD: Record number of flush calls
>>      NFSD: Report filecache item construction failures
>>      NFSD: Zero counters when the filecache is re-initialized
>>      NFSD: Hook up the filecache stat file
>>      NFSD: WARN when freeing an item still linked via nf_lru
>>      NFSD: Trace filecache LRU activity
>>      NFSD: Leave open files out of the filecache LRU
>>      NFSD: Fix the filecache LRU shrinker
>>      NFSD: Never call nfsd_file_gc() in foreground paths
>>      NFSD: No longer record nf_hashval in the trace log
>>      NFSD: Remove lockdep assertion from unhash_and_release_locked()
>>      NFSD: nfsd_file_unhash can compute hashval from nf->nf_inode
>>      NFSD: Refactor __nfsd_file_close_inode()
>>      NFSD: nfsd_file_hash_remove can compute hashval
>>      NFSD: Remove nfsd_file::nf_hashval
>>      NFSD: Remove stale comment from nfsd_file_acquire()
>>      NFSD: Clean up "open file" case in nfsd_file_acquire()
>>      NFSD: Document nfsd_file_cache_purge() API contract
>>      NFSD: Replace the "init once" mechanism
>>      NFSD: Set up an rhashtable for the filecache
>>      NFSD: Convert the filecache to use rhashtable
>>      NFSD: Clean up unusued code after rhashtable conversion
>> 
>> 
>> fs/nfsd/filecache.c | 677 +++++++++++++++++++++++++++-----------------
>> fs/nfsd/filecache.h |   6 +-
>> fs/nfsd/nfsctl.c    |  10 +
>> fs/nfsd/trace.h     | 117 ++++++--
>> 4 files changed, 522 insertions(+), 288 deletions(-)
>> 
>> --
>> Chuck Lever
>> 
> 
> Yep, looks good so far, thanks for doing this. Somewhat similar to my (buggy)
> attempt at fixing it that I sent at the time (don't put open files on
> the LRU, and
> use rhashtable), but cleaner and, presumably, less buggy :)
> 
> Can't test it right now, but it seems like Wang already confirmed that it works.

Frank, thanks to you and Wang for reporting this issue, and sorry for
taking so long to address it.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2022-06-28 17:57 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-22 14:12 [PATCH RFC 00/30] Overhaul NFSD filecache Chuck Lever
2022-06-22 14:12 ` [PATCH RFC 01/30] NFSD: Report filecache LRU size Chuck Lever
2022-06-22 14:12 ` [PATCH RFC 02/30] NFSD: Report count of calls to nfsd_file_acquire() Chuck Lever
2022-06-22 14:13 ` [PATCH RFC 03/30] NFSD: Report count of freed filecache items Chuck Lever
2022-06-22 14:13 ` [PATCH RFC 04/30] NFSD: Report average age of " Chuck Lever
2022-06-22 14:13 ` [PATCH RFC 05/30] NFSD: Add nfsd_file_lru_dispose_list() helper Chuck Lever
2022-06-22 14:13 ` [PATCH RFC 06/30] NFSD: Refactor nfsd_file_gc() Chuck Lever
2022-06-22 14:13 ` [PATCH RFC 07/30] NFSD: Refactor nfsd_file_lru_scan() Chuck Lever
2022-06-22 14:13 ` [PATCH RFC 08/30] NFSD: Report the number of items evicted by the LRU walk Chuck Lever
2022-06-22 14:13 ` [PATCH RFC 09/30] NFSD: Record number of flush calls Chuck Lever
2022-06-22 14:13 ` [PATCH RFC 10/30] NFSD: Report filecache item construction failures Chuck Lever
2022-06-22 14:13 ` [PATCH RFC 11/30] NFSD: Zero counters when the filecache is re-initialized Chuck Lever
2022-06-22 14:14 ` [PATCH RFC 12/30] NFSD: Hook up the filecache stat file Chuck Lever
2022-06-22 14:14 ` [PATCH RFC 13/30] NFSD: WARN when freeing an item still linked via nf_lru Chuck Lever
2022-06-22 14:14 ` [PATCH RFC 14/30] NFSD: Trace filecache LRU activity Chuck Lever
2022-06-22 14:14 ` [PATCH RFC 15/30] NFSD: Leave open files out of the filecache LRU Chuck Lever
2022-06-22 14:14 ` [PATCH RFC 16/30] NFSD: Fix the filecache LRU shrinker Chuck Lever
2022-06-22 14:14 ` [PATCH RFC 17/30] NFSD: Never call nfsd_file_gc() in foreground paths Chuck Lever
2022-06-22 14:14 ` [PATCH RFC 18/30] NFSD: No longer record nf_hashval in the trace log Chuck Lever
2022-06-22 14:14 ` [PATCH RFC 19/30] NFSD: Remove lockdep assertion from unhash_and_release_locked() Chuck Lever
2022-06-22 14:14 ` [PATCH RFC 20/30] NFSD: nfsd_file_unhash can compute hashval from nf->nf_inode Chuck Lever
2022-06-22 14:15 ` [PATCH RFC 21/30] NFSD: Refactor __nfsd_file_close_inode() Chuck Lever
2022-06-22 14:15 ` [PATCH RFC 22/30] NFSD: nfsd_file_hash_remove can compute hashval Chuck Lever
2022-06-22 14:15 ` [PATCH RFC 23/30] NFSD: Remove nfsd_file::nf_hashval Chuck Lever
2022-06-22 14:15 ` [PATCH RFC 24/30] NFSD: Remove stale comment from nfsd_file_acquire() Chuck Lever
2022-06-22 14:15 ` [PATCH RFC 25/30] NFSD: Clean up "open file" case in nfsd_file_acquire() Chuck Lever
2022-06-22 14:15 ` [PATCH RFC 26/30] NFSD: Document nfsd_file_cache_purge() API contract Chuck Lever
2022-06-22 14:15 ` [PATCH RFC 27/30] NFSD: Replace the "init once" mechanism Chuck Lever
2022-06-22 14:15 ` [PATCH RFC 28/30] NFSD: Set up an rhashtable for the filecache Chuck Lever
2022-06-23 22:56   ` Al Viro
2022-06-23 23:51     ` Chuck Lever III
2022-06-24  0:14       ` Chuck Lever III
2022-06-24  0:29         ` Al Viro
2022-06-22 14:15 ` [PATCH RFC 29/30] NFSD: Convert the filecache to use rhashtable Chuck Lever
2022-06-23  0:38   ` Dave Chinner
2022-06-23  0:58     ` Chuck Lever III
2022-06-23 17:27       ` Chuck Lever III
2022-06-23 22:33         ` Dave Chinner
2022-06-23 23:59           ` Chuck Lever III
2022-06-22 14:16 ` [PATCH RFC 30/30] NFSD: Clean up unusued code after rhashtable conversion Chuck Lever
2022-06-22 18:36 ` [PATCH RFC 00/30] Overhaul NFSD filecache Wang Yugui
2022-06-22 19:04   ` Chuck Lever III
2022-06-22 19:59     ` Chuck Lever III
2022-06-23  9:02       ` Wang Yugui
2022-06-23 16:44         ` Chuck Lever III
2022-06-23 17:51           ` Wang Yugui
2022-06-24 15:30             ` Chuck Lever III
2022-06-23  0:21     ` Dave Chinner
2022-06-23  1:01       ` Chuck Lever III
2022-06-23 20:27 ` Frank van der Linden
2022-06-28 17:57   ` Chuck Lever III

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.