All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@redhat.com>
To: Chuck Lever III <chuck.lever@oracle.com>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"david@fromorbit.com" <david@fromorbit.com>,
	"tgraf@suug.ch" <tgraf@suug.ch>
Subject: Re: [PATCH v3 15/32] NFSD: Leave open files out of the filecache LRU
Date: Mon, 11 Jul 2022 07:39:07 -0400	[thread overview]
Message-ID: <33688832c658a15d35af6321030f8ebd98cfea8a.camel@redhat.com> (raw)
In-Reply-To: <BC4E1A0C-1404-4C50-ADD0-3999DEE01066@oracle.com>

On Sat, 2022-07-09 at 20:45 +0000, Chuck Lever III wrote:
> 
> > On Jul 8, 2022, at 3:29 PM, Jeff Layton <jlayton@redhat.com> wrote:
> > 
> > On Fri, 2022-07-08 at 14:25 -0400, Chuck Lever wrote:
> > > There have been reports of problems when running fstests generic/531
> > > against Linux NFS servers with NFSv4. The NFS server that hosts the
> > > test's SCRATCH_DEV suffers from CPU soft lock-ups during the test.
> > > Analysis shows that:
> > > 
> > > fs/nfsd/filecache.c
> > > 482 ret = list_lru_walk(&nfsd_file_lru,
> > > 483 nfsd_file_lru_cb,
> > > 484 &head, LONG_MAX);
> > > 
> > > causes nfsd_file_gc() to walk the entire length of the filecache LRU
> > > list every time it is called (which is quite frequently). The walk
> > > holds a spinlock the entire time that prevents other nfsd threads
> > > from accessing the filecache.
> > > 
> > > What's more, for NFSv4 workloads, none of the items that are visited
> > > during this walk may be evicted, since they are all files that are
> > > held OPEN by NFS clients.
> > > 
> > > Address this by ensuring that open files are not kept on the LRU
> > > list.
> > > 
> > > Reported-by: Frank van der Linden <fllinden@amazon.com>
> > > Reported-by: Wang Yugui <wangyugui@e16-tech.com>
> > > Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=386
> > > Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > > ---
> > > fs/nfsd/filecache.c | 24 +++++++++++++++++++-----
> > > fs/nfsd/trace.h | 2 ++
> > > 2 files changed, 21 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > > index 37373b012276..6e9e186334ab 100644
> > > --- a/fs/nfsd/filecache.c
> > > +++ b/fs/nfsd/filecache.c
> > > @@ -269,6 +269,7 @@ nfsd_file_flush(struct nfsd_file *nf)
> > > 
> > > static void nfsd_file_lru_add(struct nfsd_file *nf)
> > > {
> > > +	set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags);
> > > 	if (list_lru_add(&nfsd_file_lru, &nf->nf_lru))
> > > 		trace_nfsd_file_lru_add(nf);
> > > }
> > > @@ -298,7 +299,6 @@ nfsd_file_unhash(struct nfsd_file *nf)
> > > {
> > > 	if (test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
> > > 		nfsd_file_do_unhash(nf);
> > > -		nfsd_file_lru_remove(nf);
> > > 		return true;
> > > 	}
> > > 	return false;
> > > @@ -319,6 +319,7 @@ nfsd_file_unhash_and_release_locked(struct nfsd_file *nf, struct list_head *disp
> > > 	if (refcount_dec_not_one(&nf->nf_ref))
> > > 		return true;
> > > 
> > > +	nfsd_file_lru_remove(nf);
> > > 	list_add(&nf->nf_lru, dispose);
> > > 	return true;
> > > }
> > > @@ -330,6 +331,7 @@ nfsd_file_put_noref(struct nfsd_file *nf)
> > > 
> > > 	if (refcount_dec_and_test(&nf->nf_ref)) {
> > > 		WARN_ON(test_bit(NFSD_FILE_HASHED, &nf->nf_flags));
> > > +		nfsd_file_lru_remove(nf);
> > > 		nfsd_file_free(nf);
> > > 	}
> > > }
> > > @@ -339,7 +341,7 @@ nfsd_file_put(struct nfsd_file *nf)
> > > {
> > > 	might_sleep();
> > > 
> > > -	set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags);
> > > +	nfsd_file_lru_add(nf);
> > 
> > Do you really want to add this on every put? I would have thought you'd
> > only want to do this on a 2->1 nf_ref transition.
> 
> My measurements indicate that 2->1 is the common case, so checking
> that this is /not/ a 2->1 transition doesn't confer much if any
> benefit.
> 
> Under load, I don't see any contention on the LRU locks, which is
> where I'd expect to see a problem if this design were not efficient.
> 
> 

Fair enough. I guess the idea is to throw it onto the LRU and the
scanner will just (eventually) take it off again without reaping it.

You can add my Reviewed-by: to this one as well.


> > > 	if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) {
> > > 		nfsd_file_flush(nf);
> > > 		nfsd_file_put_noref(nf);
> > > @@ -439,8 +441,18 @@ nfsd_file_dispose_list_delayed(struct list_head *dispose)
> > > 	}
> > > }
> > > 
> > > -/*
> > > +/**
> > > + * nfsd_file_lru_cb - Examine an entry on the LRU list
> > > + * @item: LRU entry to examine
> > > + * @lru: controlling LRU
> > > + * @lock: LRU list lock (unused)
> > > + * @arg: dispose list
> > > + *
> > > * Note this can deadlock with nfsd_file_cache_purge.
> > > + *
> > > + * Return values:
> > > + * %LRU_REMOVED: @item was removed from the LRU
> > > + * %LRU_SKIP: @item cannot be evicted
> > > */
> > > static enum lru_status
> > > nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru,
> > > @@ -462,8 +474,9 @@ nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru,
> > > 	 * That order is deliberate to ensure that we can do this locklessly.
> > > 	 */
> > > 	if (refcount_read(&nf->nf_ref) > 1) {
> > > +		list_lru_isolate(lru, &nf->nf_lru);
> > > 		trace_nfsd_file_gc_in_use(nf);
> > > -		return LRU_SKIP;
> > > +		return LRU_REMOVED;
> > 
> > Interesting. So you wait until the LRU scanner runs to remove these
> > entries? I expected to see you do this in nfsd_file_get, but this does
> > seem likely to be more efficient.
> > 
> > > 	}
> > > 
> > > 	/*
> > > @@ -1020,6 +1033,7 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > > 		goto retry;
> > > 	}
> > > 
> > > +	nfsd_file_lru_remove(nf);
> > > 	this_cpu_inc(nfsd_file_cache_hits);
> > > 
> > > 	if (!(may_flags & NFSD_MAY_NOT_BREAK_LEASE)) {
> > > @@ -1055,7 +1069,6 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > > 	refcount_inc(&nf->nf_ref);
> > > 	__set_bit(NFSD_FILE_HASHED, &nf->nf_flags);
> > > 	__set_bit(NFSD_FILE_PENDING, &nf->nf_flags);
> > > -	nfsd_file_lru_add(nf);
> > > 	hlist_add_head_rcu(&nf->nf_node, &nfsd_file_hashtbl[hashval].nfb_head);
> > > 	++nfsd_file_hashtbl[hashval].nfb_count;
> > > 	nfsd_file_hashtbl[hashval].nfb_maxcount = max(nfsd_file_hashtbl[hashval].nfb_maxcount,
> > > @@ -1080,6 +1093,7 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > > 	 */
> > > 	if (status != nfs_ok || inode->i_nlink == 0) {
> > > 		bool do_free;
> > > +		nfsd_file_lru_remove(nf);
> > > 		spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock);
> > > 		do_free = nfsd_file_unhash(nf);
> > > 		spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
> > > diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
> > > index 1cc1133371eb..54082b868b72 100644
> > > --- a/fs/nfsd/trace.h
> > > +++ b/fs/nfsd/trace.h
> > > @@ -929,7 +929,9 @@ DEFINE_EVENT(nfsd_file_gc_class, name,					\
> > > 	TP_ARGS(nf))
> > > 
> > > DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_add);
> > > +DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_add_disposed);
> > > DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_del);
> > > +DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_del_disposed);
> > > DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_in_use);
> > > DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_writeback);
> > > DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_referenced);
> > > 
> > > 
> > 
> > -- 
> > Jeff Layton <jlayton@redhat.com>
> 
> --
> Chuck Lever
> 
> 
> 

-- 
Jeff Layton <jlayton@redhat.com>


  reply	other threads:[~2022-07-11 11:43 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-08 18:23 [PATCH v3 00/32] Overhaul NFSD filecache Chuck Lever
2022-07-08 18:23 ` [PATCH v3 01/32] NFSD: Demote a WARN to a pr_warn() Chuck Lever
2022-07-08 18:23 ` [PATCH v3 02/32] NFSD: Report filecache LRU size Chuck Lever
2022-07-08 18:23 ` [PATCH v3 03/32] NFSD: Report count of calls to nfsd_file_acquire() Chuck Lever
2022-07-08 18:24 ` [PATCH v3 04/32] NFSD: Report count of freed filecache items Chuck Lever
2022-07-08 18:24 ` [PATCH v3 05/32] NFSD: Report average age of " Chuck Lever
2022-07-08 18:24 ` [PATCH v3 06/32] NFSD: Add nfsd_file_lru_dispose_list() helper Chuck Lever
2022-07-08 18:24 ` [PATCH v3 07/32] NFSD: Refactor nfsd_file_gc() Chuck Lever
2022-07-08 18:24 ` [PATCH v3 08/32] NFSD: Refactor nfsd_file_lru_scan() Chuck Lever
2022-07-08 18:24 ` [PATCH v3 09/32] NFSD: Report the number of items evicted by the LRU walk Chuck Lever
2022-07-08 18:24 ` [PATCH v3 10/32] NFSD: Record number of flush calls Chuck Lever
2022-07-08 18:24 ` [PATCH v3 11/32] NFSD: Zero counters when the filecache is re-initialized Chuck Lever
2022-07-08 18:24 ` [PATCH v3 12/32] NFSD: Hook up the filecache stat file Chuck Lever
2022-07-08 19:16   ` Jeff Layton
2022-07-08 18:25 ` [PATCH v3 13/32] NFSD: WARN when freeing an item still linked via nf_lru Chuck Lever
2022-07-08 18:25 ` [PATCH v3 14/32] NFSD: Trace filecache LRU activity Chuck Lever
2022-07-08 18:25 ` [PATCH v3 15/32] NFSD: Leave open files out of the filecache LRU Chuck Lever
2022-07-08 19:29   ` Jeff Layton
2022-07-09 20:45     ` Chuck Lever III
2022-07-11 11:39       ` Jeff Layton [this message]
2022-07-08 18:25 ` [PATCH v3 16/32] NFSD: Fix the filecache LRU shrinker Chuck Lever
2022-07-08 19:37   ` Jeff Layton
2022-07-08 18:25 ` [PATCH v3 17/32] NFSD: Never call nfsd_file_gc() in foreground paths Chuck Lever
2022-07-08 19:43   ` Jeff Layton
2022-07-08 19:45     ` Chuck Lever III
2022-07-08 18:25 ` [PATCH v3 18/32] NFSD: No longer record nf_hashval in the trace log Chuck Lever
2022-07-08 18:25 ` [PATCH v3 19/32] NFSD: Remove lockdep assertion from unhash_and_release_locked() Chuck Lever
2022-07-08 18:25 ` [PATCH v3 20/32] NFSD: nfsd_file_unhash can compute hashval from nf->nf_inode Chuck Lever
2022-07-08 18:25 ` [PATCH v3 21/32] NFSD: Refactor __nfsd_file_close_inode() Chuck Lever
2022-07-08 18:26 ` [PATCH v3 22/32] NFSD: nfsd_file_hash_remove can compute hashval Chuck Lever
2022-07-08 18:26 ` [PATCH v3 23/32] NFSD: Remove nfsd_file::nf_hashval Chuck Lever
2022-07-08 18:26 ` [PATCH v3 24/32] NFSD: Replace the "init once" mechanism Chuck Lever
2022-07-08 18:26 ` [PATCH v3 25/32] NFSD: Set up an rhashtable for the filecache Chuck Lever
2022-07-08 18:26 ` [PATCH v3 26/32] NFSD: Convert the filecache to use rhashtable Chuck Lever
2022-07-08 18:26 ` [PATCH v3 27/32] NFSD: Clean up unused code after rhashtable conversion Chuck Lever
2022-07-08 18:26 ` [PATCH v3 28/32] NFSD: Separate tracepoints for acquire and create Chuck Lever
2022-07-08 18:26 ` [PATCH v3 29/32] NFSD: Move nfsd_file_trace_alloc() tracepoint Chuck Lever
2022-07-08 18:26 ` [PATCH v3 30/32] NFSD: Update the nfsd_file_fsnotify_handle_event() tracepoint Chuck Lever
2022-07-08 18:27 ` [PATCH v3 31/32] NFSD: NFSv4 CLOSE should release an nfsd_file immediately Chuck Lever
2022-07-08 18:27 ` [PATCH v3 32/32] NFSD: Ensure nf_inode is never dereferenced Chuck Lever
2022-07-08 20:27 ` [PATCH v3 00/32] Overhaul NFSD filecache Jeff Layton
2022-07-08 20:32   ` Chuck Lever III

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=33688832c658a15d35af6321030f8ebd98cfea8a.camel@redhat.com \
    --to=jlayton@redhat.com \
    --cc=chuck.lever@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=tgraf@suug.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.