* directory caching & negative file lookups? @ 2022-09-01 13:32 Daire Byrne 2022-09-01 13:55 ` Trond Myklebust 0 siblings, 1 reply; 11+ messages in thread From: Daire Byrne @ 2022-09-01 13:32 UTC (permalink / raw) To: linux-nfs Hi, So I have a bit of a newbie question (apologies) that came to me while debugging some code that was spamming our NFS servers with lookups for nonexistent files. If we can cache directory entries (readdir) and even all their attributes (readdirplus) for some specified period of time (actimeo, nocto) on a client, then why can't we use that data to serve negative lookups for files in that directory too (if we so choose)? There are probably very good reasons you always need to do a (negative) file lookup, like being able to read files recently created on another client (despite your local cache for that directory), but I'm just curious what the official reasons are. If we could choose to serve negative lookups using the directory entries cache for a read-only or unchanging filesystem, would that still be bad? We already choose to use nocto for some workloads... In our case we see these kinds of heavy negative lookup workloads for network installed software (100 entries in PYTHONPATH is bad) and in buggy software (randomly generated filename lookups are really bad!). Of course, this overhead gets really bad as you add latency between the client and server. Daire ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: directory caching & negative file lookups? 2022-09-01 13:32 directory caching & negative file lookups? Daire Byrne @ 2022-09-01 13:55 ` Trond Myklebust 2022-09-01 15:27 ` Daire Byrne 0 siblings, 1 reply; 11+ messages in thread From: Trond Myklebust @ 2022-09-01 13:55 UTC (permalink / raw) To: linux-nfs, daire On Thu, 2022-09-01 at 14:32 +0100, Daire Byrne wrote: > Hi, > > So I have a bit of a newbie question (apologies) that came to me > while > debugging some code that was spamming our NFS servers with lookups > for > nonexistent files. > > If we can cache directory entries (readdir) and even all their > attributes (readdirplus) for some specified period of time (actimeo, > nocto) on a client, then why can't we use that data to serve negative > lookups for files in that directory too (if we so choose)? > > There are probably very good reasons you always need to do a > (negative) file lookup, like being able to read files recently > created > on another client (despite your local cache for that directory), but > I'm just curious what the official reasons are. If we could choose to > serve negative lookups using the directory entries cache for a > read-only or unchanging filesystem, would that still be bad? We > already choose to use nocto for some workloads... > > In our case we see these kinds of heavy negative lookup workloads for > network installed software (100 entries in PYTHONPATH is bad) and in > buggy software (randomly generated filename lookups are really bad!). > Of course, this overhead gets really bad as you add latency between > the client and server. > > Daire man 5 nfs Look for the section on the 'lookupcache=mode' mount option. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: directory caching & negative file lookups? 2022-09-01 13:55 ` Trond Myklebust @ 2022-09-01 15:27 ` Daire Byrne 2022-09-01 15:43 ` Trond Myklebust 0 siblings, 1 reply; 11+ messages in thread From: Daire Byrne @ 2022-09-01 15:27 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs On Thu, 1 Sept 2022 at 14:55, Trond Myklebust <trondmy@hammerspace.com> wrote: > man 5 nfs > > Look for the section on the 'lookupcache=mode' mount option. So I get that the client caches negative lookups once we've made them (the default lookupcache=all), but what I'm wondering is if we have already cached the entire directory contents before the (negative) lookup, can we not reply that it doesn't exist using that information without having to go across the wire the at all (even the first time)? Or is there no concept of "cached directory contents"? I thought that maybe readdir/readdirplus knew about the "full contents" of a directory? My thinking was that if we did a readdir/readirplus first, we could then do lookups for any random non-existent filename without having to send anything across the wire. Like I said, a newbie question with limited understanding of the actual internals :) Daire ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: directory caching & negative file lookups? 2022-09-01 15:27 ` Daire Byrne @ 2022-09-01 15:43 ` Trond Myklebust 2022-09-01 15:49 ` Daire Byrne 0 siblings, 1 reply; 11+ messages in thread From: Trond Myklebust @ 2022-09-01 15:43 UTC (permalink / raw) To: daire; +Cc: linux-nfs On Thu, 2022-09-01 at 16:27 +0100, Daire Byrne wrote: > On Thu, 1 Sept 2022 at 14:55, Trond Myklebust > <trondmy@hammerspace.com> wrote: > > man 5 nfs > > > > Look for the section on the 'lookupcache=mode' mount option. > > So I get that the client caches negative lookups once we've made them > (the default lookupcache=all), but what I'm wondering is if we have > already cached the entire directory contents before the (negative) > lookup, can we not reply that it doesn't exist using that information > without having to go across the wire the at all (even the first > time)? > > Or is there no concept of "cached directory contents"? I thought that > maybe readdir/readdirplus knew about the "full contents" of a > directory? > > My thinking was that if we did a readdir/readirplus first, we could > then do lookups for any random non-existent filename without having > to > send anything across the wire. Like I said, a newbie question with > limited understanding of the actual internals :) > > Daire There is no concept of a 'fully cached directory'. The VFS and the memory management code are free to kick out any unused cached entries from the dcache at any time and for any reason. So the absence of an entry is not the same as a negative entry. Furthermore, certain features like case insensitive filesystems on servers makes it hard for the NFS client to know whether or not a specific name will or won't match an entry returned by readdir. In those circumstances, even if you think you have cached the entire directory, you are not guaranteed to know whether the lookup will fail or succeed. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: directory caching & negative file lookups? 2022-09-01 15:43 ` Trond Myklebust @ 2022-09-01 15:49 ` Daire Byrne 2024-04-05 14:47 ` Daire Byrne 0 siblings, 1 reply; 11+ messages in thread From: Daire Byrne @ 2022-09-01 15:49 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs Yea, got it now. That all makes sense. Thanks! Apologies for the noise. Now I just have to go and fix a bunch of our user's code so I can forget about negative lookups again... Daire On Thu, 1 Sept 2022 at 16:43, Trond Myklebust <trondmy@hammerspace.com> wrote: > > On Thu, 2022-09-01 at 16:27 +0100, Daire Byrne wrote: > > On Thu, 1 Sept 2022 at 14:55, Trond Myklebust > > <trondmy@hammerspace.com> wrote: > > > man 5 nfs > > > > > > Look for the section on the 'lookupcache=mode' mount option. > > > > So I get that the client caches negative lookups once we've made them > > (the default lookupcache=all), but what I'm wondering is if we have > > already cached the entire directory contents before the (negative) > > lookup, can we not reply that it doesn't exist using that information > > without having to go across the wire the at all (even the first > > time)? > > > > Or is there no concept of "cached directory contents"? I thought that > > maybe readdir/readdirplus knew about the "full contents" of a > > directory? > > > > My thinking was that if we did a readdir/readirplus first, we could > > then do lookups for any random non-existent filename without having > > to > > send anything across the wire. Like I said, a newbie question with > > limited understanding of the actual internals :) > > > > Daire > > There is no concept of a 'fully cached directory'. The VFS and the > memory management code are free to kick out any unused cached entries > from the dcache at any time and for any reason. So the absence of an > entry is not the same as a negative entry. > > Furthermore, certain features like case insensitive filesystems on > servers makes it hard for the NFS client to know whether or not a > specific name will or won't match an entry returned by readdir. In > those circumstances, even if you think you have cached the entire > directory, you are not guaranteed to know whether the lookup will fail > or succeed. > > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@hammerspace.com > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: directory caching & negative file lookups? 2022-09-01 15:49 ` Daire Byrne @ 2024-04-05 14:47 ` Daire Byrne 2024-04-05 15:03 ` Trond Myklebust 0 siblings, 1 reply; 11+ messages in thread From: Daire Byrne @ 2024-04-05 14:47 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs Apologies for dragging up an old thread, but I've had to tackle wayward negative lookup storms again and I have obviously half forgotten what I learned in this thread last time (even after re-reading it!). Can I just ask if I understand correctly and that there was an intention a long time ago to be able to serve negative dentries from a "complete" READDIRPLUS result? https://www.cs.helsinki.fi/linux/linux-kernel/2002-30/0108.html So if we did a readdirplus on a directory then immediately fired random non existent lookups at the directory, it could be served from the readdirplus result? i.e. not in readdir result, then return ENOENT without needing to ask server? But that is not the case today because you can't track the "completeness" of a READDIRPLUS result for a directory over time (in page cache)? Or is it all due to needing to deal with case insensitive filesystems (which I would think effects positive lookups too)? I did try to decipher the v6.6 fs/nfs/dir.c READDIR bits but I quickly got lost... Cheers, Daire On Thu, 1 Sept 2022 at 16:49, Daire Byrne <daire@dneg.com> wrote: > > Yea, got it now. That all makes sense. Thanks! > > Apologies for the noise. Now I just have to go and fix a bunch of our > user's code so I can forget about negative lookups again... > > Daire > > On Thu, 1 Sept 2022 at 16:43, Trond Myklebust <trondmy@hammerspace.com> wrote: > > > > On Thu, 2022-09-01 at 16:27 +0100, Daire Byrne wrote: > > > On Thu, 1 Sept 2022 at 14:55, Trond Myklebust > > > <trondmy@hammerspace.com> wrote: > > > > man 5 nfs > > > > > > > > Look for the section on the 'lookupcache=mode' mount option. > > > > > > So I get that the client caches negative lookups once we've made them > > > (the default lookupcache=all), but what I'm wondering is if we have > > > already cached the entire directory contents before the (negative) > > > lookup, can we not reply that it doesn't exist using that information > > > without having to go across the wire the at all (even the first > > > time)? > > > > > > Or is there no concept of "cached directory contents"? I thought that > > > maybe readdir/readdirplus knew about the "full contents" of a > > > directory? > > > > > > My thinking was that if we did a readdir/readirplus first, we could > > > then do lookups for any random non-existent filename without having > > > to > > > send anything across the wire. Like I said, a newbie question with > > > limited understanding of the actual internals :) > > > > > > Daire > > > > There is no concept of a 'fully cached directory'. The VFS and the > > memory management code are free to kick out any unused cached entries > > from the dcache at any time and for any reason. So the absence of an > > entry is not the same as a negative entry. > > > > Furthermore, certain features like case insensitive filesystems on > > servers makes it hard for the NFS client to know whether or not a > > specific name will or won't match an entry returned by readdir. In > > those circumstances, even if you think you have cached the entire > > directory, you are not guaranteed to know whether the lookup will fail > > or succeed. > > > > -- > > Trond Myklebust > > Linux NFS client maintainer, Hammerspace > > trond.myklebust@hammerspace.com > > > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: directory caching & negative file lookups? 2024-04-05 14:47 ` Daire Byrne @ 2024-04-05 15:03 ` Trond Myklebust 2024-04-12 9:11 ` Daire Byrne 0 siblings, 1 reply; 11+ messages in thread From: Trond Myklebust @ 2024-04-05 15:03 UTC (permalink / raw) To: daire; +Cc: linux-nfs On Fri, 2024-04-05 at 15:47 +0100, Daire Byrne wrote: > Apologies for dragging up an old thread, but I've had to tackle > wayward negative lookup storms again and I have obviously half > forgotten what I learned in this thread last time (even after > re-reading it!). > > Can I just ask if I understand correctly and that there was an > intention a long time ago to be able to serve negative dentries from > a > "complete" READDIRPLUS result? > > https://www.cs.helsinki.fi/linux/linux-kernel/2002-30/0108.html > > So if we did a readdirplus on a directory then immediately fired > random non existent lookups at the directory, it could be served from > the readdirplus result? i.e. not in readdir result, then return > ENOENT > without needing to ask server? > > But that is not the case today because you can't track the > "completeness" of a READDIRPLUS result for a directory over time (in > page cache)? Or is it all due to needing to deal with case > insensitive > filesystems (which I would think effects positive lookups too)? > > I did try to decipher the v6.6 fs/nfs/dir.c READDIR bits but I > quickly > got lost... > > Cheers, > > Daire If the question is whether the client trusts that a READDIR call to the server returns all the names that can be successfully looked up, then the answer is "no". It's not even a question of case sensitivity. There are plenty of servers out there that will allow you to look up names that won't ever appear in the results of a READDIR (or READDIRPLUS) call. Having a hidden ".snapshot" directory is, for instance, a popular way to present snapshots. So no, we're not ever going to implement any negative dentry cache scheme that relies on READDIR/READDIRPLUS. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: directory caching & negative file lookups? 2024-04-05 15:03 ` Trond Myklebust @ 2024-04-12 9:11 ` Daire Byrne 2024-04-12 10:21 ` Jeff Layton 0 siblings, 1 reply; 11+ messages in thread From: Daire Byrne @ 2024-04-12 9:11 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs Thanks for the clarity Trond - I promise not to forget this time and ask the same question again in 2 years! It just keeps coming up here at DNEG due to accessing software over NFS and crazy PYTHONPATH usage by some of our developers. In some cases, there are 57,000 negative lookups but only 5000 positive lookups (and opens)! Getting devs to optimise their code is my cross to bear I guess. But this is also a well known and common problem for large batch farms and there are some novel workarounds out there: https://guix.gnu.org/en/blog/2021/taming-the-stat-storm-with-a-loader-cache https://computing.llnl.gov/projects/spindle https://cernvm.cern.ch/fs/ Coupled with our propensity for high latency (~100ms) NFS via re-export servers (for "cloud rendering"), these inefficient path lookups quickly become a killer - the application takes longer to lookup non-existent files and open files, than it does to execute to completion. We use aggressive caching (actimeo=3600,nocto,vers=3) and "preload" metadata ops (ls -l, open) on a regular basis to try and keep things in (re-export) client cache which certainly helps. It's hard to keep known (expensive) metadata worksets in memory. I've also been looking at using an overlay and hand crafting whiteout files in the upper layers to essentially block known negative lookups from hitting the lower NFS share - again, only useful and correct for read-only software shares. I wonder if Jeff Layton's directory delegations will help for (read-only) metadata heavy lookups over the WAN? Daire On Fri, 5 Apr 2024 at 16:03, Trond Myklebust <trondmy@hammerspace.com> wrote: > > On Fri, 2024-04-05 at 15:47 +0100, Daire Byrne wrote: > > Apologies for dragging up an old thread, but I've had to tackle > > wayward negative lookup storms again and I have obviously half > > forgotten what I learned in this thread last time (even after > > re-reading it!). > > > > Can I just ask if I understand correctly and that there was an > > intention a long time ago to be able to serve negative dentries from > > a > > "complete" READDIRPLUS result? > > > > https://www.cs.helsinki.fi/linux/linux-kernel/2002-30/0108.html > > > > So if we did a readdirplus on a directory then immediately fired > > random non existent lookups at the directory, it could be served from > > the readdirplus result? i.e. not in readdir result, then return > > ENOENT > > without needing to ask server? > > > > But that is not the case today because you can't track the > > "completeness" of a READDIRPLUS result for a directory over time (in > > page cache)? Or is it all due to needing to deal with case > > insensitive > > filesystems (which I would think effects positive lookups too)? > > > > I did try to decipher the v6.6 fs/nfs/dir.c READDIR bits but I > > quickly > > got lost... > > > > Cheers, > > > > Daire > > If the question is whether the client trusts that a READDIR call to the > server returns all the names that can be successfully looked up, then > the answer is "no". > It's not even a question of case sensitivity. There are plenty of > servers out there that will allow you to look up names that won't ever > appear in the results of a READDIR (or READDIRPLUS) call. Having a > hidden ".snapshot" directory is, for instance, a popular way to present > snapshots. > > So no, we're not ever going to implement any negative dentry cache > scheme that relies on READDIR/READDIRPLUS. > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@hammerspace.com > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: directory caching & negative file lookups? 2024-04-12 9:11 ` Daire Byrne @ 2024-04-12 10:21 ` Jeff Layton 2024-04-12 11:43 ` Daire Byrne 0 siblings, 1 reply; 11+ messages in thread From: Jeff Layton @ 2024-04-12 10:21 UTC (permalink / raw) To: Daire Byrne, Trond Myklebust; +Cc: linux-nfs On Fri, 2024-04-12 at 10:11 +0100, Daire Byrne wrote: > Thanks for the clarity Trond - I promise not to forget this time and > ask the same question again in 2 years! > > It just keeps coming up here at DNEG due to accessing software over > NFS and crazy PYTHONPATH usage by some of our developers. In some > cases, there are 57,000 negative lookups but only 5000 positive > lookups (and opens)! > > Getting devs to optimise their code is my cross to bear I guess. > > But this is also a well known and common problem for large batch farms > and there are some novel workarounds out there: > > https://guix.gnu.org/en/blog/2021/taming-the-stat-storm-with-a-loader-cache > https://computing.llnl.gov/projects/spindle > https://cernvm.cern.ch/fs/ > > Coupled with our propensity for high latency (~100ms) NFS via > re-export servers (for "cloud rendering"), these inefficient path > lookups quickly become a killer - the application takes longer to > lookup non-existent files and open files, than it does to execute to > completion. We use aggressive caching (actimeo=3600,nocto,vers=3) and > "preload" metadata ops (ls -l, open) on a regular basis to try and > keep things in (re-export) client cache which certainly helps. It's > hard to keep known (expensive) metadata worksets in memory. > > I've also been looking at using an overlay and hand crafting whiteout > files in the upper layers to essentially block known negative lookups > from hitting the lower NFS share - again, only useful and correct for > read-only software shares. > > I wonder if Jeff Layton's directory delegations will help for > (read-only) metadata heavy lookups over the WAN? > Probably not. In order to optimize away lookups of negative dentries that aren't in cache, you need to know all of the positive dentries in the directory. As Trond pointed out earlier in the discussion, NFS doesn't have a concept of directory "completeness", so we can't reasonably do this. FWIW, CephFS does have such a concept and can satisfy readdir requests and negative lookups out of the cache when it has complete directory info. > > > On Fri, 5 Apr 2024 at 16:03, Trond Myklebust <trondmy@hammerspace.com> wrote: > > > > On Fri, 2024-04-05 at 15:47 +0100, Daire Byrne wrote: > > > Apologies for dragging up an old thread, but I've had to tackle > > > wayward negative lookup storms again and I have obviously half > > > forgotten what I learned in this thread last time (even after > > > re-reading it!). > > > > > > Can I just ask if I understand correctly and that there was an > > > intention a long time ago to be able to serve negative dentries from > > > a > > > "complete" READDIRPLUS result? > > > > > > https://www.cs.helsinki.fi/linux/linux-kernel/2002-30/0108.html > > > > > > So if we did a readdirplus on a directory then immediately fired > > > random non existent lookups at the directory, it could be served from > > > the readdirplus result? i.e. not in readdir result, then return > > > ENOENT > > > without needing to ask server? > > > > > > But that is not the case today because you can't track the > > > "completeness" of a READDIRPLUS result for a directory over time (in > > > page cache)? Or is it all due to needing to deal with case > > > insensitive > > > filesystems (which I would think effects positive lookups too)? > > > > > > I did try to decipher the v6.6 fs/nfs/dir.c READDIR bits but I > > > quickly > > > got lost... > > > > > > Cheers, > > > > > > Daire > > > > If the question is whether the client trusts that a READDIR call to the > > server returns all the names that can be successfully looked up, then > > the answer is "no". > > It's not even a question of case sensitivity. There are plenty of > > servers out there that will allow you to look up names that won't ever > > appear in the results of a READDIR (or READDIRPLUS) call. Having a > > hidden ".snapshot" directory is, for instance, a popular way to present > > snapshots. > > > > So no, we're not ever going to implement any negative dentry cache > > scheme that relies on READDIR/READDIRPLUS. > > -- > > Trond Myklebust > > Linux NFS client maintainer, Hammerspace > > trond.myklebust@hammerspace.com > > > > > -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: directory caching & negative file lookups? 2024-04-12 10:21 ` Jeff Layton @ 2024-04-12 11:43 ` Daire Byrne 2024-04-12 14:13 ` Jeff Layton 0 siblings, 1 reply; 11+ messages in thread From: Daire Byrne @ 2024-04-12 11:43 UTC (permalink / raw) To: Jeff Layton; +Cc: Trond Myklebust, linux-nfs On Fri, 12 Apr 2024 at 11:21, Jeff Layton <jlayton@kernel.org> wrote: > > On Fri, 2024-04-12 at 10:11 +0100, Daire Byrne wrote: > > Thanks for the clarity Trond - I promise not to forget this time and > > ask the same question again in 2 years! > > > > It just keeps coming up here at DNEG due to accessing software over > > NFS and crazy PYTHONPATH usage by some of our developers. In some > > cases, there are 57,000 negative lookups but only 5000 positive > > lookups (and opens)! > > > > Getting devs to optimise their code is my cross to bear I guess. > > > > But this is also a well known and common problem for large batch farms > > and there are some novel workarounds out there: > > > > https://guix.gnu.org/en/blog/2021/taming-the-stat-storm-with-a-loader-cache > > https://computing.llnl.gov/projects/spindle > > https://cernvm.cern.ch/fs/ > > > > Coupled with our propensity for high latency (~100ms) NFS via > > re-export servers (for "cloud rendering"), these inefficient path > > lookups quickly become a killer - the application takes longer to > > lookup non-existent files and open files, than it does to execute to > > completion. We use aggressive caching (actimeo=3600,nocto,vers=3) and > > "preload" metadata ops (ls -l, open) on a regular basis to try and > > keep things in (re-export) client cache which certainly helps. It's > > hard to keep known (expensive) metadata worksets in memory. > > > > I've also been looking at using an overlay and hand crafting whiteout > > files in the upper layers to essentially block known negative lookups > > from hitting the lower NFS share - again, only useful and correct for > > read-only software shares. > > > > I wonder if Jeff Layton's directory delegations will help for > > (read-only) metadata heavy lookups over the WAN? > > > > Probably not. In order to optimize away lookups of negative dentries > that aren't in cache, you need to know all of the positive dentries in > the directory. As Trond pointed out earlier in the discussion, NFS > doesn't have a concept of directory "completeness", so we can't > reasonably do this. > > FWIW, CephFS does have such a concept and can satisfy readdir requests > and negative lookups out of the cache when it has complete directory > info. Out of interest, do directory delegations help with positive lookups or repeat opens? They may be less numerous in our badly behaved workloads, but they are still nice to optimise for latency. Can you disable "cto" for example if you have a directory delegation and repeatedly open the same file for reading without a network hop? I also noticed that "nocto" can completely stop any subsequent network hops for opens (with a long actimeo) for NFSv3, but on NFSv4 it only cuts a single GETATTR before still doing an OPEN DH over the network each time. I'm probably wandering off into "disconnected clients" and AFS style territory now... Daire > > On Fri, 5 Apr 2024 at 16:03, Trond Myklebust <trondmy@hammerspace.com> wrote: > > > > > > On Fri, 2024-04-05 at 15:47 +0100, Daire Byrne wrote: > > > > Apologies for dragging up an old thread, but I've had to tackle > > > > wayward negative lookup storms again and I have obviously half > > > > forgotten what I learned in this thread last time (even after > > > > re-reading it!). > > > > > > > > Can I just ask if I understand correctly and that there was an > > > > intention a long time ago to be able to serve negative dentries from > > > > a > > > > "complete" READDIRPLUS result? > > > > > > > > https://www.cs.helsinki.fi/linux/linux-kernel/2002-30/0108.html > > > > > > > > So if we did a readdirplus on a directory then immediately fired > > > > random non existent lookups at the directory, it could be served from > > > > the readdirplus result? i.e. not in readdir result, then return > > > > ENOENT > > > > without needing to ask server? > > > > > > > > But that is not the case today because you can't track the > > > > "completeness" of a READDIRPLUS result for a directory over time (in > > > > page cache)? Or is it all due to needing to deal with case > > > > insensitive > > > > filesystems (which I would think effects positive lookups too)? > > > > > > > > I did try to decipher the v6.6 fs/nfs/dir.c READDIR bits but I > > > > quickly > > > > got lost... > > > > > > > > Cheers, > > > > > > > > Daire > > > > > > If the question is whether the client trusts that a READDIR call to the > > > server returns all the names that can be successfully looked up, then > > > the answer is "no". > > > It's not even a question of case sensitivity. There are plenty of > > > servers out there that will allow you to look up names that won't ever > > > appear in the results of a READDIR (or READDIRPLUS) call. Having a > > > hidden ".snapshot" directory is, for instance, a popular way to present > > > snapshots. > > > > > > So no, we're not ever going to implement any negative dentry cache > > > scheme that relies on READDIR/READDIRPLUS. > > > -- > > > Trond Myklebust > > > Linux NFS client maintainer, Hammerspace > > > trond.myklebust@hammerspace.com > > > > > > > > > > -- > Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: directory caching & negative file lookups? 2024-04-12 11:43 ` Daire Byrne @ 2024-04-12 14:13 ` Jeff Layton 0 siblings, 0 replies; 11+ messages in thread From: Jeff Layton @ 2024-04-12 14:13 UTC (permalink / raw) To: Daire Byrne; +Cc: Trond Myklebust, linux-nfs On Fri, 2024-04-12 at 12:43 +0100, Daire Byrne wrote: > On Fri, 12 Apr 2024 at 11:21, Jeff Layton <jlayton@kernel.org> wrote: > > > > On Fri, 2024-04-12 at 10:11 +0100, Daire Byrne wrote: > > > Thanks for the clarity Trond - I promise not to forget this time and > > > ask the same question again in 2 years! > > > > > > It just keeps coming up here at DNEG due to accessing software over > > > NFS and crazy PYTHONPATH usage by some of our developers. In some > > > cases, there are 57,000 negative lookups but only 5000 positive > > > lookups (and opens)! > > > > > > Getting devs to optimise their code is my cross to bear I guess. > > > > > > But this is also a well known and common problem for large batch farms > > > and there are some novel workarounds out there: > > > > > > https://guix.gnu.org/en/blog/2021/taming-the-stat-storm-with-a-loader-cache > > > https://computing.llnl.gov/projects/spindle > > > https://cernvm.cern.ch/fs/ > > > > > > Coupled with our propensity for high latency (~100ms) NFS via > > > re-export servers (for "cloud rendering"), these inefficient path > > > lookups quickly become a killer - the application takes longer to > > > lookup non-existent files and open files, than it does to execute to > > > completion. We use aggressive caching (actimeo=3600,nocto,vers=3) and > > > "preload" metadata ops (ls -l, open) on a regular basis to try and > > > keep things in (re-export) client cache which certainly helps. It's > > > hard to keep known (expensive) metadata worksets in memory. > > > > > > I've also been looking at using an overlay and hand crafting whiteout > > > files in the upper layers to essentially block known negative lookups > > > from hitting the lower NFS share - again, only useful and correct for > > > read-only software shares. > > > > > > I wonder if Jeff Layton's directory delegations will help for > > > (read-only) metadata heavy lookups over the WAN? > > > > > > > Probably not. In order to optimize away lookups of negative dentries > > that aren't in cache, you need to know all of the positive dentries in > > the directory. As Trond pointed out earlier in the discussion, NFS > > doesn't have a concept of directory "completeness", so we can't > > reasonably do this. > > > > FWIW, CephFS does have such a concept and can satisfy readdir requests > > and negative lookups out of the cache when it has complete directory > > info. > > Out of interest, do directory delegations help with positive lookups > or repeat opens? They may be less numerous in our badly behaved > workloads, but they are still nice to optimise for latency. > > Can you disable "cto" for example if you have a directory delegation > and repeatedly open the same file for reading without a network hop? Maybe? Dir delegations don't really help with CTO, since that's all about the file itself, not its parent directory. It might help avoid having to revalidate the parent directory for the lookup however. FWIW, basic, recallable directory delegations with no notifications are pretty useless in my testing. You optimize away a few GETATTRs on the parent directories, but those are pretty infrequent anyway -- 1 every 60s or so on directories that aren't changing much by default. That's close to "why bother" territory, but maybe there is a case to be made for that on high-latency links (like you mention). Mixing in notifications may change things though: Consider 2 clients that are both working with files in the same directory and both hold directory delegations. client1 creates a file or another directory in the dir. Server then pushes out a notification to client2. client2 goes to look up the new dentry later, and finds that it's already in cache. That's a potential optimization, but it's pretty specific to workloads where multiple clients are operating on the same files in the a directory that is frequently changing. > > I also noticed that "nocto" can completely stop any subsequent network > hops for opens (with a long actimeo) for NFSv3, but on NFSv4 it only > cuts a single GETATTR before still doing an OPEN DH over the network > each time. > File delegations can allow you to do an open w/o having to cross the network. If I hold the right sort of deleg on a file, I should be able to open it without talking to the server. Dir delegations could help optimize away some round trips for the lookups leading up to the open however. > I'm probably wandering off into "disconnected clients" and AFS style > territory now... > > > > > > On Fri, 5 Apr 2024 at 16:03, Trond Myklebust <trondmy@hammerspace.com> wrote: > > > > > > > > On Fri, 2024-04-05 at 15:47 +0100, Daire Byrne wrote: > > > > > Apologies for dragging up an old thread, but I've had to tackle > > > > > wayward negative lookup storms again and I have obviously half > > > > > forgotten what I learned in this thread last time (even after > > > > > re-reading it!). > > > > > > > > > > Can I just ask if I understand correctly and that there was an > > > > > intention a long time ago to be able to serve negative dentries from > > > > > a > > > > > "complete" READDIRPLUS result? > > > > > > > > > > https://www.cs.helsinki.fi/linux/linux-kernel/2002-30/0108.html > > > > > > > > > > So if we did a readdirplus on a directory then immediately fired > > > > > random non existent lookups at the directory, it could be served from > > > > > the readdirplus result? i.e. not in readdir result, then return > > > > > ENOENT > > > > > without needing to ask server? > > > > > > > > > > But that is not the case today because you can't track the > > > > > "completeness" of a READDIRPLUS result for a directory over time (in > > > > > page cache)? Or is it all due to needing to deal with case > > > > > insensitive > > > > > filesystems (which I would think effects positive lookups too)? > > > > > > > > > > I did try to decipher the v6.6 fs/nfs/dir.c READDIR bits but I > > > > > quickly > > > > > got lost... > > > > > > > > > > Cheers, > > > > > > > > > > Daire > > > > > > > > If the question is whether the client trusts that a READDIR call to the > > > > server returns all the names that can be successfully looked up, then > > > > the answer is "no". > > > > It's not even a question of case sensitivity. There are plenty of > > > > servers out there that will allow you to look up names that won't ever > > > > appear in the results of a READDIR (or READDIRPLUS) call. Having a > > > > hidden ".snapshot" directory is, for instance, a popular way to present > > > > snapshots. > > > > > > > > So no, we're not ever going to implement any negative dentry cache > > > > scheme that relies on READDIR/READDIRPLUS. > > > > -- > > > > Trond Myklebust > > > > Linux NFS client maintainer, Hammerspace > > > > trond.myklebust@hammerspace.com > > > > > > > > > > > > > > > -- > > Jeff Layton <jlayton@kernel.org> -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-04-12 14:13 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-09-01 13:32 directory caching & negative file lookups? Daire Byrne 2022-09-01 13:55 ` Trond Myklebust 2022-09-01 15:27 ` Daire Byrne 2022-09-01 15:43 ` Trond Myklebust 2022-09-01 15:49 ` Daire Byrne 2024-04-05 14:47 ` Daire Byrne 2024-04-05 15:03 ` Trond Myklebust 2024-04-12 9:11 ` Daire Byrne 2024-04-12 10:21 ` Jeff Layton 2024-04-12 11:43 ` Daire Byrne 2024-04-12 14:13 ` Jeff Layton
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.