nfsd filecache issues with v4

* nfsd filecache issues with v4
@ 2020-06-08 19:21 Frank van der Linden
  2020-06-25 17:10 ` Bruce Fields
  0 siblings, 1 reply; 5+ messages in thread
From: Frank van der Linden @ 2020-06-08 19:21 UTC (permalink / raw)
  To: Bruce Fields, Trond Myklebust, Chuck Lever; +Cc: linux-nfs

We recently noticed that, with 5.4+ kernels, the generic/531 test takes
a very long time to finish for v4, especially when run on larger systems.

Case in point: a 72 VCPU, 144G EC2 instance as a client will make the test
last about 20 hours.

So, I had a look to see what was going on. First of all, the test generates
a lot of files - what it does is generate 50000 files per process, where
it starts 2 * NCPU processes. So that's 144 processes in this case, 50000
files each. Also, it does it by setting the file ulimit to 50000, and then
just opening files, keeping them open, until it hits the limit.

So that's 7 million new/open files - that's a lot, but the problem can
be triggered with far fewer than that as well.

Looking at what the server was doing, I noticed a lot of lock contention
for nfsd_file_lru. Then I noticed that that nfsd_filecache_count kept
going up, reflecting the number of open files by the client processes,
eventually reaching, for example, that 7 million number.

So here's what happens: for NFSv4, files that are associated with an
open stateid can stick around for a long time, as long as there's no
CLOSE done on them. That's what's happening here. Also, since those files
have a refcount of >= 2 (one for the hash table, one for being pointed to
by the state), they are never eligible for removal from the file cache.
Worse, since the code call nfs_file_gc inline if the upper bound is crossed
(8192), every single operation that calls nfsd_file_acquire will end up
walking the entire LRU, trying to free files, and failing every time.
Walking a list with millions of files every single time isn't great.

There are some ways to fix this behavior like:

* Always allow v4 cached file structured to be purged from the cache.
  They will stick around, since they still have a reference, but
  at least they won't slow down cache handling to a crawl.

* Don't add v4 files to the cache to begin with.

* Since the only advantage of the file cache for v4 is the caching
  of files linked to special stateids (as far as I can tell), only
  cache files associated with special state ids.

* Don't bother with v4 files at all, and revert the changes that
  made v4 use the file cache.

In general, the resource control for files OPENed by the client is
probably an issue. Even if you fix the cache, what if there are
N clients that open millions of files and keep them open? Maybe
there should be a fallback to start using temporary open files
if a client goes beyond a reasonable limit and threatens to eat
all resources.

Thoughts?

- Frank

^ permalink raw reply	[flat|nested] 5+ messages in thread