linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Frank van der Linden <fllinden@amazon.com>
To: Bruce Fields <bfields@fieldses.org>,
	Trond Myklebust <trond.myklebust@hammerspace.com>,
	Chuck Lever <chuck.lever@oracle.com>
Cc: <linux-nfs@vger.kernel.org>
Subject: nfsd filecache issues with v4
Date: Mon, 8 Jun 2020 19:21:22 +0000	[thread overview]
Message-ID: <20200608192122.GA19171@dev-dsk-fllinden-2c-c1893d73.us-west-2.amazon.com> (raw)

We recently noticed that, with 5.4+ kernels, the generic/531 test takes
a very long time to finish for v4, especially when run on larger systems.

Case in point: a 72 VCPU, 144G EC2 instance as a client will make the test
last about 20 hours.

So, I had a look to see what was going on. First of all, the test generates
a lot of files - what it does is generate 50000 files per process, where
it starts 2 * NCPU processes. So that's 144 processes in this case, 50000
files each. Also, it does it by setting the file ulimit to 50000, and then
just opening files, keeping them open, until it hits the limit.

So that's 7 million new/open files - that's a lot, but the problem can
be triggered with far fewer than that as well.

Looking at what the server was doing, I noticed a lot of lock contention
for nfsd_file_lru. Then I noticed that that nfsd_filecache_count kept
going up, reflecting the number of open files by the client processes,
eventually reaching, for example, that 7 million number.

So here's what happens: for NFSv4, files that are associated with an
open stateid can stick around for a long time, as long as there's no
CLOSE done on them. That's what's happening here. Also, since those files
have a refcount of >= 2 (one for the hash table, one for being pointed to
by the state), they are never eligible for removal from the file cache.
Worse, since the code call nfs_file_gc inline if the upper bound is crossed
(8192), every single operation that calls nfsd_file_acquire will end up
walking the entire LRU, trying to free files, and failing every time.
Walking a list with millions of files every single time isn't great.

There are some ways to fix this behavior like:

* Always allow v4 cached file structured to be purged from the cache.
  They will stick around, since they still have a reference, but
  at least they won't slow down cache handling to a crawl.

* Don't add v4 files to the cache to begin with.

* Since the only advantage of the file cache for v4 is the caching
  of files linked to special stateids (as far as I can tell), only
  cache files associated with special state ids.

* Don't bother with v4 files at all, and revert the changes that
  made v4 use the file cache.

In general, the resource control for files OPENed by the client is
probably an issue. Even if you fix the cache, what if there are
N clients that open millions of files and keep them open? Maybe
there should be a fallback to start using temporary open files
if a client goes beyond a reasonable limit and threatens to eat
all resources.

Thoughts?

- Frank

             reply	other threads:[~2020-06-08 19:21 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-08 19:21 Frank van der Linden [this message]
2020-06-25 17:10 ` Bruce Fields
2020-06-25 19:12   ` Frank van der Linden
2020-06-25 19:20     ` Frank van der Linden
2020-06-25 19:48     ` Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200608192122.GA19171@dev-dsk-fllinden-2c-c1893d73.us-west-2.amazon.com \
    --to=fllinden@amazon.com \
    --cc=bfields@fieldses.org \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trond.myklebust@hammerspace.com \
    --subject='Re: nfsd filecache issues with v4' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).