On Thu, Sep 17, 2020 at 03:09:31PM -0400, bfields wrote: > > On Thu, Sep 17, 2020 at 05:01:11PM +0100, Daire Byrne wrote: > > > > ----- On 15 Sep, 2020, at 18:21, bfields bfields@fieldses.org wrote: > > > > >> 4) With an NFSv4 re-export, lots of open/close requests (hundreds per > > >> second) quickly eat up the CPU on the re-export server and perf top > > >> shows we are mostly in native_queued_spin_lock_slowpath. > > > > > > Any statistics on who's calling that function? > > > > I've always struggled to reproduce this with a simple open/close simulation, so I suspect some other operations need to be mixed in too. But I have one production workload that I know has lots of opens & closes (buggy software) included in amongst the usual reads, writes etc. > > > > With just 40 clients mounting the reexport server (v5.7.6) using NFSv4.2, we see the CPU of the nfsd threads increase rapidly and by the time we have 100 clients, we have maxed out the 32 cores of the server with most of that in native_queued_spin_lock_slowpath. > > That sounds a lot like what Frank Van der Linden reported: > > https://lore.kernel.org/linux-nfs/20200608192122.GA19171@dev-dsk-fllinden-2c-c1893d73.us-west-2.amazon.com/ > > It looks like a bug in the filehandle caching code. > > --b. Yes, that does look like the same one. I still think that not caching v4 files at all may be the best way to go here, since the intent of the filecache code was to speed up v2/v3 I/O, where you end up doing a lot of opens/closes, but it doesn't make as much sense for v4. However, short of that, I tested a local patch a few months back, that I never posted here, so I'll do so now. It just makes v4 opens in to 'long term' opens, which do not get put on the LRU, since that doesn't make sense (they are in the hash table, so they are still cached). Also, the file caching code seems to walk the LRU a little too often, but that's another issue - and this change keeps the LRU short, so it's not a big deal. I don't particularly love this patch, but it does keep the LRU short, and did significantly speed up my testcase (by about 50%). So, maybe you can give it a try. I'll also attach a second patch, that converts the hash table to an rhashtable, which automatically grows and shrinks in size with usage. That patch also helped, but not by nearly as much (I think it yielded another 10%). - Frank