All of lore.kernel.org
 help / color / mirror / Atom feed
* [LSF/MM TOPIC] Better handling of negative dentries
@ 2022-03-15 19:55 Matthew Wilcox
  2022-03-15 20:56 ` Roman Gushchin
  0 siblings, 1 reply; 18+ messages in thread
From: Matthew Wilcox @ 2022-03-15 19:55 UTC (permalink / raw)
  To: Stephen Brennan, lsf-pc, linux-fsdevel, linux-mm
  Cc: Gautham Ananthakrishna, khlebnikov

The number of negative dentries is effectively constrained only by memory
size.  Systems which do not experience significant memory pressure for
an extended period can build up millions of negative dentries which
clog the dcache.  That can have different symptoms, such as inotify
taking a long time [1], high memory usage [2] and even just poor lookup
performance [3].  We've also seen problems with cgroups being pinned
by negative dentries, though I think we now reparent those dentries to
their parent cgroup instead.

We don't have a really good solution yet, and maybe some focused
brainstorming on the problem would lead to something that actually works.

(Apologies to Stephen; I should have thought to send this before the
invitations to LSFMM went out).

[1] https://lore.kernel.org/linux-fsdevel/20220209231406.187668-1-stephen.s.brennan@oracle.com/
[2] https://lore.kernel.org/linux-fsdevel/1611235185-1685-1-git-send-email-gautham.ananthakrishna@oracle.com/
[3] https://lore.kernel.org/linux-fsdevel/158893941613.200862.4094521350329937435.stgit@buzz/


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [LSF/MM TOPIC] Better handling of negative dentries
  2022-03-15 19:55 [LSF/MM TOPIC] Better handling of negative dentries Matthew Wilcox
@ 2022-03-15 20:56 ` Roman Gushchin
  2022-03-16  2:07   ` Gao Xiang
  2022-03-22 20:41   ` Matthew Wilcox
  0 siblings, 2 replies; 18+ messages in thread
From: Roman Gushchin @ 2022-03-15 20:56 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Stephen Brennan, lsf-pc, linux-fsdevel, linux-mm,
	Gautham Ananthakrishna, khlebnikov


> On Mar 15, 2022, at 12:56 PM, Matthew Wilcox <willy@infradead.org> wrote:
> 
> The number of negative dentries is effectively constrained only by memory
> size.  Systems which do not experience significant memory pressure for
> an extended period can build up millions of negative dentries which
> clog the dcache.  That can have different symptoms, such as inotify
> taking a long time [1], high memory usage [2] and even just poor lookup
> performance [3].  We've also seen problems with cgroups being pinned
> by negative dentries, though I think we now reparent those dentries to
> their parent cgroup instead.

Yes, it should be fixed already.

> 
> We don't have a really good solution yet, and maybe some focused
> brainstorming on the problem would lead to something that actually works.

I’d be happy to join this discussion. And in my opinion it’s going beyond negative dentries: there are other types of objects which tend to grow beyond any reasonable limits if there is no memory pressure.
A perfect example when it happens is when a machine is almost idle for some period of time. Periodically running processes creating various kernel objects (mostly vfs cache) which over time are filling significant portions of the total memory. And when the need for memory arises, we realize that the memory is heavily fragmented and it’s costly to reclaim it back.

Thanks!

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [LSF/MM TOPIC] Better handling of negative dentries
  2022-03-15 20:56 ` Roman Gushchin
@ 2022-03-16  2:07   ` Gao Xiang
  2022-03-16  2:52     ` Dave Chinner
  2022-03-22 20:41   ` Matthew Wilcox
  1 sibling, 1 reply; 18+ messages in thread
From: Gao Xiang @ 2022-03-16  2:07 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Matthew Wilcox, Stephen Brennan, lsf-pc, linux-fsdevel, linux-mm,
	Gautham Ananthakrishna, khlebnikov

On Tue, Mar 15, 2022 at 01:56:18PM -0700, Roman Gushchin wrote:
> 
> > On Mar 15, 2022, at 12:56 PM, Matthew Wilcox <willy@infradead.org> wrote:
> > 
> > The number of negative dentries is effectively constrained only by memory
> > size.  Systems which do not experience significant memory pressure for
> > an extended period can build up millions of negative dentries which
> > clog the dcache.  That can have different symptoms, such as inotify
> > taking a long time [1], high memory usage [2] and even just poor lookup
> > performance [3].  We've also seen problems with cgroups being pinned
> > by negative dentries, though I think we now reparent those dentries to
> > their parent cgroup instead.
> 
> Yes, it should be fixed already.
> 
> > 
> > We don't have a really good solution yet, and maybe some focused
> > brainstorming on the problem would lead to something that actually works.
> 
> I’d be happy to join this discussion. And in my opinion it’s going beyond negative dentries: there are other types of objects which tend to grow beyond any reasonable limits if there is no memory pressure.

+1, we once had a similar issue as well, and agree that is not only
limited to negative dentries but all too many LRU-ed dentries and inodes.

Limited the total number may benefit to avoid shrink spiking for servers.

Thanks,
Gao Xiang

> A perfect example when it happens is when a machine is almost idle for some period of time. Periodically running processes creating various kernel objects (mostly vfs cache) which over time are filling significant portions of the total memory. And when the need for memory arises, we realize that the memory is heavily fragmented and it’s costly to reclaim it back.
> 
> Thanks!

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [LSF/MM TOPIC] Better handling of negative dentries
  2022-03-16  2:07   ` Gao Xiang
@ 2022-03-16  2:52     ` Dave Chinner
  2022-03-16  3:08       ` Gao Xiang
  2022-03-22 15:08       ` Matthew Wilcox
  0 siblings, 2 replies; 18+ messages in thread
From: Dave Chinner @ 2022-03-16  2:52 UTC (permalink / raw)
  To: Roman Gushchin, Matthew Wilcox, Stephen Brennan, lsf-pc,
	linux-fsdevel, linux-mm, Gautham Ananthakrishna, khlebnikov

On Wed, Mar 16, 2022 at 10:07:19AM +0800, Gao Xiang wrote:
> On Tue, Mar 15, 2022 at 01:56:18PM -0700, Roman Gushchin wrote:
> > 
> > > On Mar 15, 2022, at 12:56 PM, Matthew Wilcox <willy@infradead.org> wrote:
> > > 
> > > The number of negative dentries is effectively constrained only by memory
> > > size.  Systems which do not experience significant memory pressure for
> > > an extended period can build up millions of negative dentries which
> > > clog the dcache.  That can have different symptoms, such as inotify
> > > taking a long time [1], high memory usage [2] and even just poor lookup
> > > performance [3].  We've also seen problems with cgroups being pinned
> > > by negative dentries, though I think we now reparent those dentries to
> > > their parent cgroup instead.
> > 
> > Yes, it should be fixed already.
> > 
> > > 
> > > We don't have a really good solution yet, and maybe some focused
> > > brainstorming on the problem would lead to something that actually works.
> > 
> > I’d be happy to join this discussion. And in my opinion it’s going beyond negative dentries: there are other types of objects which tend to grow beyond any reasonable limits if there is no memory pressure.
> 
> +1, we once had a similar issue as well, and agree that is not only
> limited to negative dentries but all too many LRU-ed dentries and inodes.

Yup, any discussion solely about managing buildup of negative
dentries doesn't acknowledge that it is just a symptom of larger
problems that need to be addressed.

> Limited the total number may benefit to avoid shrink spiking for servers.

No, we don't want to set hard limits on object counts - that's just
asking for systems that need frequent hand tuning and are impossible
to get right under changing workloads. Caches need to auto size
according to workload's working set to find a steady state balance,
not be bound by artitrary limits.

But even cache sizing isn't the problem here - it's just another
symptom.

> > A perfect example when it happens is when a machine is almost
> > idle for some period of time. Periodically running processes
> > creating various kernel objects (mostly vfs cache) which over
> > time are filling significant portions of the total memory. And
> > when the need for memory arises, we realize that the memory is
> > heavily fragmented and it’s costly to reclaim it back.

Yup, the underlying issue here is that memory reclaim does nothing
to manage long term build-up of single use cached objects when
*there is no memory pressure*. There's of idle time and spare
resources to manage caches sanely, but we don't. e.g. there is no
periodic rotation of caches that could lead to detection and reclaim
of single use objects (say over a period of minutes) and hence
prevent them from filling up all of memory unnecessarily and
creating transient memory reclaim and allocation latency spikes when
memory finally fills up.

IOWs, negative dentries getting out of hand and shrinker spikes are
both a symptom of the same problem: while memory allocation is free,
memory reclaim does nothing to manage cache aging. Hence we only
find out we've got a badly aged cache when we finally realise
it has filled all of memory, and then we have heaps of work to do
before memory can be made available for allocation again....

And then if you're going to talk memory reclaim, the elephant in the
room is the lack of integration between shrinkers and the main
reclaim infrastructure.  There's no priority determination, there's
no progress feedback, there's no mechanism to allow shrinkers to
throttle reclaim rather than have the reclaim infrastructure wind up
priority and OOM kill when a shrinker cannot make progress quickly,
etc. Then there's direct reclaim hammering shrinkers with unbound
concurrency so individual shrinkers have no chance of determining
how much memory pressure there really is by themselves, not to
mention the lock contention problems that unbound reclaim
concurrency on things like LRU lists can cause. And, of course,
memcg based reclaim is still only tacked onto the side of the
shrinker infrastructure...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [LSF/MM TOPIC] Better handling of negative dentries
  2022-03-16  2:52     ` Dave Chinner
@ 2022-03-16  3:08       ` Gao Xiang
  2022-03-22 15:08       ` Matthew Wilcox
  1 sibling, 0 replies; 18+ messages in thread
From: Gao Xiang @ 2022-03-16  3:08 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Roman Gushchin, Matthew Wilcox, Stephen Brennan, lsf-pc,
	linux-fsdevel, linux-mm, Gautham Ananthakrishna, khlebnikov

On Wed, Mar 16, 2022 at 01:52:23PM +1100, Dave Chinner wrote:
> On Wed, Mar 16, 2022 at 10:07:19AM +0800, Gao Xiang wrote:
> > On Tue, Mar 15, 2022 at 01:56:18PM -0700, Roman Gushchin wrote:
> > > 
> > > > On Mar 15, 2022, at 12:56 PM, Matthew Wilcox <willy@infradead.org> wrote:
> > > > 
> > > > The number of negative dentries is effectively constrained only by memory
> > > > size.  Systems which do not experience significant memory pressure for
> > > > an extended period can build up millions of negative dentries which
> > > > clog the dcache.  That can have different symptoms, such as inotify
> > > > taking a long time [1], high memory usage [2] and even just poor lookup
> > > > performance [3].  We've also seen problems with cgroups being pinned
> > > > by negative dentries, though I think we now reparent those dentries to
> > > > their parent cgroup instead.
> > > 
> > > Yes, it should be fixed already.
> > > 
> > > > 
> > > > We don't have a really good solution yet, and maybe some focused
> > > > brainstorming on the problem would lead to something that actually works.
> > > 
> > > I’d be happy to join this discussion. And in my opinion it’s going beyond negative dentries: there are other types of objects which tend to grow beyond any reasonable limits if there is no memory pressure.
> > 
> > +1, we once had a similar issue as well, and agree that is not only
> > limited to negative dentries but all too many LRU-ed dentries and inodes.
> 
> Yup, any discussion solely about managing buildup of negative
> dentries doesn't acknowledge that it is just a symptom of larger
> problems that need to be addressed.
> 
> > Limited the total number may benefit to avoid shrink spiking for servers.
> 
> No, we don't want to set hard limits on object counts - that's just
> asking for systems that need frequent hand tuning and are impossible
> to get right under changing workloads. Caches need to auto size
> according to workload's working set to find a steady state balance,
> not be bound by artitrary limits.
> 
> But even cache sizing isn't the problem here - it's just another
> symptom.
> 
> > > A perfect example when it happens is when a machine is almost
> > > idle for some period of time. Periodically running processes
> > > creating various kernel objects (mostly vfs cache) which over
> > > time are filling significant portions of the total memory. And
> > > when the need for memory arises, we realize that the memory is
> > > heavily fragmented and it’s costly to reclaim it back.
> 
> Yup, the underlying issue here is that memory reclaim does nothing
> to manage long term build-up of single use cached objects when
> *there is no memory pressure*. There's of idle time and spare
> resources to manage caches sanely, but we don't. e.g. there is no
> periodic rotation of caches that could lead to detection and reclaim
> of single use objects (say over a period of minutes) and hence
> prevent them from filling up all of memory unnecessarily and
> creating transient memory reclaim and allocation latency spikes when
> memory finally fills up.
> 
> IOWs, negative dentries getting out of hand and shrinker spikes are
> both a symptom of the same problem: while memory allocation is free,
> memory reclaim does nothing to manage cache aging. Hence we only
> find out we've got a badly aged cache when we finally realise
> it has filled all of memory, and then we have heaps of work to do
> before memory can be made available for allocation again....
> 
> And then if you're going to talk memory reclaim, the elephant in the
> room is the lack of integration between shrinkers and the main
> reclaim infrastructure.  There's no priority determination, there's
> no progress feedback, there's no mechanism to allow shrinkers to
> throttle reclaim rather than have the reclaim infrastructure wind up
> priority and OOM kill when a shrinker cannot make progress quickly,
> etc. Then there's direct reclaim hammering shrinkers with unbound
> concurrency so individual shrinkers have no chance of determining
> how much memory pressure there really is by themselves, not to
> mention the lock contention problems that unbound reclaim
> concurrency on things like LRU lists can cause. And, of course,
> memcg based reclaim is still only tacked onto the side of the
> shrinker infrastructure...

Yeah, it's really a generic problem between objects and shrinkers.
Some intelligent detection and feedback loop (even without memory
pressure) would be much better than hardcoded numbers. Actually I
remembered such topic has been raised for times, hoping for some
progress..

Thanks,
Gao Xiang

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [LSF/MM TOPIC] Better handling of negative dentries
  2022-03-16  2:52     ` Dave Chinner
  2022-03-16  3:08       ` Gao Xiang
@ 2022-03-22 15:08       ` Matthew Wilcox
  2022-03-22 19:19         ` James Bottomley
  2022-03-22 22:21         ` Dave Chinner
  1 sibling, 2 replies; 18+ messages in thread
From: Matthew Wilcox @ 2022-03-22 15:08 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Roman Gushchin, Stephen Brennan, lsf-pc, linux-fsdevel, linux-mm,
	Gautham Ananthakrishna, khlebnikov

On Wed, Mar 16, 2022 at 01:52:23PM +1100, Dave Chinner wrote:
> On Wed, Mar 16, 2022 at 10:07:19AM +0800, Gao Xiang wrote:
> > On Tue, Mar 15, 2022 at 01:56:18PM -0700, Roman Gushchin wrote:
> > > 
> > > > On Mar 15, 2022, at 12:56 PM, Matthew Wilcox <willy@infradead.org> wrote:
> > > > 
> > > > The number of negative dentries is effectively constrained only by memory
> > > > size.  Systems which do not experience significant memory pressure for
> > > > an extended period can build up millions of negative dentries which
> > > > clog the dcache.  That can have different symptoms, such as inotify
> > > > taking a long time [1], high memory usage [2] and even just poor lookup
> > > > performance [3].  We've also seen problems with cgroups being pinned
> > > > by negative dentries, though I think we now reparent those dentries to
> > > > their parent cgroup instead.
> > > 
> > > Yes, it should be fixed already.
> > > 
> > > > 
> > > > We don't have a really good solution yet, and maybe some focused
> > > > brainstorming on the problem would lead to something that actually works.
> > > 
> > > I’d be happy to join this discussion. And in my opinion it’s going beyond negative dentries: there are other types of objects which tend to grow beyond any reasonable limits if there is no memory pressure.
> > 
> > +1, we once had a similar issue as well, and agree that is not only
> > limited to negative dentries but all too many LRU-ed dentries and inodes.
> 
> Yup, any discussion solely about managing buildup of negative
> dentries doesn't acknowledge that it is just a symptom of larger
> problems that need to be addressed.

Yes, but let's not make this _so_ broad a discussion that it becomes
unsolvable.  Rather, let's look for a solution to this particular problem
that can be adopted by other caches that share a similar problem.

For example, we might be seduced into saying "this is a slab problem"
because all the instances we have here allocate from slab.  But slab
doesn't have enough information to solve the problem.  Maybe the working
set of the current workload really needs 6 million dentries to perform
optimally.  Maybe it needs 600.  Slab can't know that.  Maybe slab can
play a role here, but the only component which can know the appropriate
size for a cache is the cache itself.

I think the logic needs to be in d_alloc().  Before it calls __d_alloc(),
it should check ... something ... to see if it should try to shrink
the LRU list.  The devil is in what that something should be.  I'm no
expert on the dcache; do we just want to call prune_dcache_sb() for
every 1/1000 time?  Rely on DCACHE_REFERENCED to make sure that we're
not over-pruning the list?  If so, what do we set nr_to_scan to?  1000 so
that we try to keep the dentry list the same size?  1500 so that it
actually tries to shrink?

I don't feel like I know enough to go further here.  But it feels better
than what we currently do -- calling all the shrinkers from deep in
the page allocator.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [LSF/MM TOPIC] Better handling of negative dentries
  2022-03-22 15:08       ` Matthew Wilcox
@ 2022-03-22 19:19         ` James Bottomley
  2022-03-22 20:17           ` Colin Walters
  2022-03-22 22:21         ` Dave Chinner
  1 sibling, 1 reply; 18+ messages in thread
From: James Bottomley @ 2022-03-22 19:19 UTC (permalink / raw)
  To: Matthew Wilcox, Dave Chinner
  Cc: Roman Gushchin, Stephen Brennan, lsf-pc, linux-fsdevel, linux-mm,
	Gautham Ananthakrishna, khlebnikov

On Tue, 2022-03-22 at 15:08 +0000, Matthew Wilcox wrote:
> On Wed, Mar 16, 2022 at 01:52:23PM +1100, Dave Chinner wrote:
> > On Wed, Mar 16, 2022 at 10:07:19AM +0800, Gao Xiang wrote:
> > > On Tue, Mar 15, 2022 at 01:56:18PM -0700, Roman Gushchin wrote:
> > > > > On Mar 15, 2022, at 12:56 PM, Matthew Wilcox <
> > > > > willy@infradead.org> wrote:
> > > > > 
> > > > > The number of negative dentries is effectively constrained
> > > > > only by memory
> > > > > size.  Systems which do not experience significant memory
> > > > > pressure for
> > > > > an extended period can build up millions of negative dentries
> > > > > which
> > > > > clog the dcache.  That can have different symptoms, such as
> > > > > inotify
> > > > > taking a long time [1], high memory usage [2] and even just
> > > > > poor lookup
> > > > > performance [3].  We've also seen problems with cgroups being
> > > > > pinned
> > > > > by negative dentries, though I think we now reparent those
> > > > > dentries to
> > > > > their parent cgroup instead.
> > > > 
> > > > Yes, it should be fixed already.
> > > > 
> > > > > We don't have a really good solution yet, and maybe some
> > > > > focused
> > > > > brainstorming on the problem would lead to something that
> > > > > actually works.
> > > > 
> > > > I’d be happy to join this discussion. And in my opinion it’s
> > > > going beyond negative dentries: there are other types of
> > > > objects which tend to grow beyond any reasonable limits if
> > > > there is no memory pressure.
> > > 
> > > +1, we once had a similar issue as well, and agree that is not
> > > only
> > > limited to negative dentries but all too many LRU-ed dentries and
> > > inodes.
> > 
> > Yup, any discussion solely about managing buildup of negative
> > dentries doesn't acknowledge that it is just a symptom of larger
> > problems that need to be addressed.
> 
> Yes, but let's not make this _so_ broad a discussion that it becomes
> unsolvable.  Rather, let's look for a solution to this particular
> problem that can be adopted by other caches that share a similar
> problem.

Well, firstly what is the exact problem?  People maliciously looking up
nonexistent files and causing the negative dentries to balloon or
simply managing dentries for a well behaved workload (which is what
good examples?)

> For example, we might be seduced into saying "this is a slab problem"
> because all the instances we have here allocate from slab.  But slab
> doesn't have enough information to solve the problem.  Maybe the
> working set of the current workload really needs 6 million dentries
> to perform optimally.  Maybe it needs 600.  Slab can't know
> that.  Maybe slab can play a role here, but the only component which
> can know the appropriate size for a cache is the cache itself.

Could we do something depending on the age of the oldest dentry on the
lru list?  We don't currently keep a last accessed time, but we could.

> I think the logic needs to be in d_alloc().  Before it calls
> __d_alloc(), it should check ... something ... to see if it should
> try to shrink the LRU list.

We could also do it in kill_dentry() or retain_dentry() ... we're past
all the fast paths when they start running.  They also have a view on
the liveness of the dentry.  Chances are if you get a negative dentry
back in kill_dentry, it isn't going to be instantiated, so it's only
use is to cache -ENOENT and we could perform more agressive shrinking
heuristics.

>   The devil is in what that something
> should be.  I'm no expert on the dcache; do we just want to call
> prune_dcache_sb() for every 1/1000 time?  Rely on DCACHE_REFERENCED
> to make sure that we're not over-pruning the list?  If so, what do we
> set nr_to_scan to?  1000 so that we try to keep the dentry list the
> same size?  1500 so that it actually tries to shrink?

Fundamentally, a negative dentry that doesn't get promoted to a
positive one better be looked up pretty often for us to keep it in the
cache, I think?  So last accessed time seems to be a good indicator.

We also have to scan the whole list for negative dentries, perhaps they
might get on better with their own lru list (although that complicates
the instantiation case and perhaps we can afford to spend the time
walking backwards over the lru list in alloc/kill anyway, so there's no
need for a separate list).

> I don't feel like I know enough to go further here.  But it feels
> better than what we currently do -- calling all the shrinkers from
> deep in the page allocator.

Well, yes, but that's not a high bar.

James



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [LSF/MM TOPIC] Better handling of negative dentries
  2022-03-22 19:19         ` James Bottomley
@ 2022-03-22 20:17           ` Colin Walters
  2022-03-22 20:27             ` James Bottomley
  2022-03-22 20:37             ` Matthew Wilcox
  0 siblings, 2 replies; 18+ messages in thread
From: Colin Walters @ 2022-03-22 20:17 UTC (permalink / raw)
  To: James Bottomley, Matthew Wilcox, Dave Chinner
  Cc: Roman Gushchin, Stephen Brennan, lsf-pc, linux-fsdevel, linux-mm,
	Gautham Ananthakrishna, khlebnikov



On Tue, Mar 22, 2022, at 3:19 PM, James Bottomley wrote:
> 
> Well, firstly what is the exact problem?  People maliciously looking up
> nonexistent files

Maybe most people have seen it, but for those who haven't:
https://bugzilla.redhat.com/show_bug.cgi?id=1571183
was definitely one of those things that just makes one recoil in horror.

TL;DR NSS used to have code that tried to detect "is this a network filesystem"
by timing `stat()` calls to nonexistent paths, and this massively boated
the negative dentry cache and caused all sorts of performance problems.
It was particularly confusing because this would just happen as a side effect of e.g. executing `curl https://somewebsite`.

That code wasn't *intentionally* malicious but...

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [LSF/MM TOPIC] Better handling of negative dentries
  2022-03-22 20:17           ` Colin Walters
@ 2022-03-22 20:27             ` James Bottomley
  2022-03-22 20:37             ` Matthew Wilcox
  1 sibling, 0 replies; 18+ messages in thread
From: James Bottomley @ 2022-03-22 20:27 UTC (permalink / raw)
  To: Colin Walters, Matthew Wilcox, Dave Chinner
  Cc: Roman Gushchin, Stephen Brennan, lsf-pc, linux-fsdevel, linux-mm,
	Gautham Ananthakrishna, khlebnikov

On Tue, 2022-03-22 at 16:17 -0400, Colin Walters wrote:
> 
> On Tue, Mar 22, 2022, at 3:19 PM, James Bottomley wrote:
> > Well, firstly what is the exact problem?  People maliciously
> > looking up nonexistent files
> 
> Maybe most people have seen it, but for those who haven't:
> https://bugzilla.redhat.com/show_bug.cgi?id=1571183
> was definitely one of those things that just makes one recoil in
> horror.
> 
> TL;DR NSS used to have code that tried to detect "is this a network
> filesystem" by timing `stat()` calls to nonexistent paths, and this
> massively boated the negative dentry cache and caused all sorts of
> performance problems.
> It was particularly confusing because this would just happen as a
> side effect of e.g. executing `curl https://somewebsite`.
> 
> That code wasn't *intentionally* malicious but...

Right, understood.  That's why I think keeping track of negative
dentries coming back to kill_dentry/retain_dentry is a good way of
detecting something like this, so we can get a signal for "some process
bloating the negative dentry cache" and act on it.  My suspicion is
that if we see the signal we can agressively remove old negative
dentries in this same routine, thus penalizing the problem process for
cleaning the caches.  However, it's a very narrow solution somewhat
decoupled from making the dentry cache a better memory management
citizen.

James



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [LSF/MM TOPIC] Better handling of negative dentries
  2022-03-22 20:17           ` Colin Walters
  2022-03-22 20:27             ` James Bottomley
@ 2022-03-22 20:37             ` Matthew Wilcox
  2022-03-22 21:08               ` Stephen Brennan
  1 sibling, 1 reply; 18+ messages in thread
From: Matthew Wilcox @ 2022-03-22 20:37 UTC (permalink / raw)
  To: Colin Walters
  Cc: James Bottomley, Dave Chinner, Roman Gushchin, Stephen Brennan,
	lsf-pc, linux-fsdevel, linux-mm, Gautham Ananthakrishna,
	khlebnikov

On Tue, Mar 22, 2022 at 04:17:16PM -0400, Colin Walters wrote:
> 
> 
> On Tue, Mar 22, 2022, at 3:19 PM, James Bottomley wrote:
> > 
> > Well, firstly what is the exact problem?  People maliciously looking up
> > nonexistent files
> 
> Maybe most people have seen it, but for those who haven't:
> https://bugzilla.redhat.com/show_bug.cgi?id=1571183
> was definitely one of those things that just makes one recoil in horror.
> 
> TL;DR NSS used to have code that tried to detect "is this a network filesystem"
> by timing `stat()` calls to nonexistent paths, and this massively boated
> the negative dentry cache and caused all sorts of performance problems.
> It was particularly confusing because this would just happen as a side effect of e.g. executing `curl https://somewebsite`.
> 
> That code wasn't *intentionally* malicious but...

Oh, the situation where we encountered the problem was systemd.
Definitely not malicious, and not even stupid (as the NSS example above).
I forget exactly which thing it was, but on some fairly common event
(user login?), it looked up a file in a PATH of some type, failed
to find it in the first two directories, then created it in a third.
At logout, it deleted the file.  Now there are three negative dentries.
Repeat a few million times (each time looking for a different file)
with no memory pressure and you have a thoroughly soggy machine that
is faster to reboot than to reclaim dentries.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [LSF/MM TOPIC] Better handling of negative dentries
  2022-03-15 20:56 ` Roman Gushchin
  2022-03-16  2:07   ` Gao Xiang
@ 2022-03-22 20:41   ` Matthew Wilcox
  2022-03-22 21:19     ` Roman Gushchin
  1 sibling, 1 reply; 18+ messages in thread
From: Matthew Wilcox @ 2022-03-22 20:41 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Stephen Brennan, lsf-pc, linux-fsdevel, linux-mm,
	Gautham Ananthakrishna, khlebnikov

On Tue, Mar 15, 2022 at 01:56:18PM -0700, Roman Gushchin wrote:
> I’d be happy to join this discussion. And in my opinion it’s going
> beyond negative dentries: there are other types of objects which tend
> to grow beyond any reasonable limits if there is no memory pressure.
>
> A perfect example when it happens is when a machine is almost idle
> for some period of time. Periodically running processes creating
> various kernel objects (mostly vfs cache) which over time are filling
> significant portions of the total memory. And when the need for memory
> arises, we realize that the memory is heavily fragmented and it’s
> costly to reclaim it back.

When you say "vfs cache", do you mean page cache, inode cache, or
something else?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [LSF/MM TOPIC] Better handling of negative dentries
  2022-03-22 20:37             ` Matthew Wilcox
@ 2022-03-22 21:08               ` Stephen Brennan
  2022-03-29 15:24                 ` James Bottomley
  0 siblings, 1 reply; 18+ messages in thread
From: Stephen Brennan @ 2022-03-22 21:08 UTC (permalink / raw)
  To: Matthew Wilcox, Colin Walters
  Cc: James Bottomley, Dave Chinner, Roman Gushchin, lsf-pc,
	linux-fsdevel, linux-mm, Gautham Ananthakrishna, khlebnikov

On 3/22/22 13:37, Matthew Wilcox wrote:
> On Tue, Mar 22, 2022 at 04:17:16PM -0400, Colin Walters wrote:
>>
>>
>> On Tue, Mar 22, 2022, at 3:19 PM, James Bottomley wrote:
>>>
>>> Well, firstly what is the exact problem?  People maliciously looking up
>>> nonexistent files
>>
>> Maybe most people have seen it, but for those who haven't:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1571183
>> was definitely one of those things that just makes one recoil in horror.
>>
>> TL;DR NSS used to have code that tried to detect "is this a network filesystem"
>> by timing `stat()` calls to nonexistent paths, and this massively boated
>> the negative dentry cache and caused all sorts of performance problems.
>> It was particularly confusing because this would just happen as a side effect of e.g. executing `curl https://somewebsite`.
>>
>> That code wasn't *intentionally* malicious but...

That's... unpleasant.

> 
> Oh, the situation where we encountered the problem was systemd.
> Definitely not malicious, and not even stupid (as the NSS example above).
> I forget exactly which thing it was, but on some fairly common event
> (user login?), it looked up a file in a PATH of some type, failed
> to find it in the first two directories, then created it in a third> At logout, it deleted the file.  Now there are three negative dentries.

More or less this, although I'm not sure it even created and deleted the
files... it just wanted to check for them in all sorts of places. The
file paths were something like this:

/{etc,usr/lib}/systemd/system/session-XXXXXXXX.scope.{wants,d,requires}

> Repeat a few million times (each time looking for a different file)
> with no memory pressure and you have a thoroughly soggy machine that
> is faster to reboot than to reclaim dentries.

The speed of reclaiming memory wasn't the straw which broke this
server's back, it ended up being that some operations might iterate over
the entire list of children of a dentry, holding a spinlock, causing
soft lockups. Thus, patches like [1] which are attempting to treat the
symptom, not the cause.

It seems to me that the idea of doing something based on last access
time, or number of accesses, would be a great step. But given a
prioritized list of dentries to target, and even a reasonable call site
like kill_dentry(), the hardest part still seems to be determining the
working set of dentries, or at least determining what is a reasonable
number of negative dentries to keep around.

If we're looking at issues like [1], then the amount needs to be on a
per-directory basis, and maybe roughly based on CPU speed. For other
priorities or failure modes, then the policy would need to be completely
different. Ideally a solution could work for almost all scenarios, but
failing that, maybe it is worth allowing policy to be set by
administrators via sysctl or even a BPF?

Thanks,
Stephen

[1]:
https://lore.kernel.org/linux-fsdevel/20220209231406.187668-1-stephen.s.brennan@oracle.com/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [LSF/MM TOPIC] Better handling of negative dentries
  2022-03-22 20:41   ` Matthew Wilcox
@ 2022-03-22 21:19     ` Roman Gushchin
  2022-03-22 22:29       ` Dave Chinner
  0 siblings, 1 reply; 18+ messages in thread
From: Roman Gushchin @ 2022-03-22 21:19 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Stephen Brennan, lsf-pc, linux-fsdevel, linux-mm,
	Gautham Ananthakrishna, khlebnikov

On Tue, Mar 22, 2022 at 08:41:56PM +0000, Matthew Wilcox wrote:
> On Tue, Mar 15, 2022 at 01:56:18PM -0700, Roman Gushchin wrote:
> > I’d be happy to join this discussion. And in my opinion it’s going
> > beyond negative dentries: there are other types of objects which tend
> > to grow beyond any reasonable limits if there is no memory pressure.
> >
> > A perfect example when it happens is when a machine is almost idle
> > for some period of time. Periodically running processes creating
> > various kernel objects (mostly vfs cache) which over time are filling
> > significant portions of the total memory. And when the need for memory
> > arises, we realize that the memory is heavily fragmented and it’s
> > costly to reclaim it back.
> 
> When you say "vfs cache", do you mean page cache, inode cache, or
> something else?

Mostly inodes and dentries, but also in theory some fs-specific objects
(e.g. xfs implements nr_cached_objects/free_cached_objects callbacks).

Also dentries, for example, can have attached kmalloc'ed areas if the
length of the file's name is larger than x. And probably there are more
examples of indirectly pinned objects.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [LSF/MM TOPIC] Better handling of negative dentries
  2022-03-22 15:08       ` Matthew Wilcox
  2022-03-22 19:19         ` James Bottomley
@ 2022-03-22 22:21         ` Dave Chinner
  1 sibling, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2022-03-22 22:21 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Roman Gushchin, Stephen Brennan, lsf-pc, linux-fsdevel, linux-mm,
	Gautham Ananthakrishna, khlebnikov

On Tue, Mar 22, 2022 at 03:08:33PM +0000, Matthew Wilcox wrote:
> On Wed, Mar 16, 2022 at 01:52:23PM +1100, Dave Chinner wrote:
> > On Wed, Mar 16, 2022 at 10:07:19AM +0800, Gao Xiang wrote:
> > > On Tue, Mar 15, 2022 at 01:56:18PM -0700, Roman Gushchin wrote:
> > > > 
> > > > > On Mar 15, 2022, at 12:56 PM, Matthew Wilcox <willy@infradead.org> wrote:
> > > > > 
> > > > > The number of negative dentries is effectively constrained only by memory
> > > > > size.  Systems which do not experience significant memory pressure for
> > > > > an extended period can build up millions of negative dentries which
> > > > > clog the dcache.  That can have different symptoms, such as inotify
> > > > > taking a long time [1], high memory usage [2] and even just poor lookup
> > > > > performance [3].  We've also seen problems with cgroups being pinned
> > > > > by negative dentries, though I think we now reparent those dentries to
> > > > > their parent cgroup instead.
> > > > 
> > > > Yes, it should be fixed already.
> > > > 
> > > > > 
> > > > > We don't have a really good solution yet, and maybe some focused
> > > > > brainstorming on the problem would lead to something that actually works.
> > > > 
> > > > I’d be happy to join this discussion. And in my opinion it’s going beyond negative dentries: there are other types of objects which tend to grow beyond any reasonable limits if there is no memory pressure.
> > > 
> > > +1, we once had a similar issue as well, and agree that is not only
> > > limited to negative dentries but all too many LRU-ed dentries and inodes.
> > 
> > Yup, any discussion solely about managing buildup of negative
> > dentries doesn't acknowledge that it is just a symptom of larger
> > problems that need to be addressed.
> 
> Yes, but let's not make this _so_ broad a discussion that it becomes
> unsolvable.  Rather, let's look for a solution to this particular problem
> that can be adopted by other caches that share a similar problem.
> 
> For example, we might be seduced into saying "this is a slab problem"
> because all the instances we have here allocate from slab.  But slab
> doesn't have enough information to solve the problem.  Maybe the working
> set of the current workload really needs 6 million dentries to perform
> optimally.  Maybe it needs 600.  Slab can't know that.  Maybe slab can
> play a role here, but the only component which can know the appropriate
> size for a cache is the cache itself.

Yes, that's correct, but it also misses the fact that we need to
replicate this for inodes, various filesystem caches that have LRUs
and shrinkers because they can grow very large very quickly (e.g.
the xfs buffer and dquot caches, the ext4 extent cache shrinker, nfs
has several shrinkable caches, etc)

> I think the logic needs to be in d_alloc().  Before it calls __d_alloc(),
> it should check ... something ... to see if it should try to shrink
> the LRU list.

ANd therein lies the issue - the dcache, inode cache, the xfs buffer
cache, the xfs dquot cache, the gfs2 dquot cache, and the nfsv4 xattr
cache all use list_lru for LRU reclaim of objects via a shrinker.

See the commonality in functionality and requirements all these
caches already have? They all allocate from a slab, all use the same
LRU infratructure to age the cache, and all have shrinkers that use
list_lru_shrink_walk() to do the shrinker iteration of the LRU.
Using list-lru infrastructure is effectively a requirement for any
shrinkable cache that wants to be memcg aware, too.

What they all need is a common method of hooking list_lru management
to allocation and/or periodic scanning mechanisms. Fix the problem
for the dcache via list-lru, and all the other caches that need the
same feedback/control mechanisms get them as well. We kill at least
6 birds with one stone, and provide the infrastructure for new
caches to manage these same problems without having to write tehir
own infrastructure to do it.

IOWs, you're still missing the bigger picture here by continuing to
focus on negative dentries and the dcache rather than the common
infrastructure that shrinkable caches use.

> The devil is in what that something should be.  I'm no
> expert on the dcache; do we just want to call prune_dcache_sb() for
> every 1/1000 time?  Rely on DCACHE_REFERENCED to make sure that we're
> not over-pruning the list?  If so, what do we set nr_to_scan to?  1000 so
> that we try to keep the dentry list the same size?  1500 so that it
> actually tries to shrink?

The numbers themselves are largely irrelevant and should be set by
the subsystem after careful analysis. What we need first is the
common infrastructure to do the periodic/demand driven list
maintenance, then individual subsystems can set the limits and
triggers according to their individual needs.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [LSF/MM TOPIC] Better handling of negative dentries
  2022-03-22 21:19     ` Roman Gushchin
@ 2022-03-22 22:29       ` Dave Chinner
  0 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2022-03-22 22:29 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Matthew Wilcox, Stephen Brennan, lsf-pc, linux-fsdevel, linux-mm,
	Gautham Ananthakrishna, khlebnikov

On Tue, Mar 22, 2022 at 02:19:30PM -0700, Roman Gushchin wrote:
> On Tue, Mar 22, 2022 at 08:41:56PM +0000, Matthew Wilcox wrote:
> > On Tue, Mar 15, 2022 at 01:56:18PM -0700, Roman Gushchin wrote:
> > > I’d be happy to join this discussion. And in my opinion it’s going
> > > beyond negative dentries: there are other types of objects which tend
> > > to grow beyond any reasonable limits if there is no memory pressure.
> > >
> > > A perfect example when it happens is when a machine is almost idle
> > > for some period of time. Periodically running processes creating
> > > various kernel objects (mostly vfs cache) which over time are filling
> > > significant portions of the total memory. And when the need for memory
> > > arises, we realize that the memory is heavily fragmented and it’s
> > > costly to reclaim it back.
> > 
> > When you say "vfs cache", do you mean page cache, inode cache, or
> > something else?
> 
> Mostly inodes and dentries, but also in theory some fs-specific objects
> (e.g. xfs implements nr_cached_objects/free_cached_objects callbacks).

Those aren't independent shrinkers - they are part of the superblock
shrinker.  XFS just uses these for background management of the VFS
inode cache footprint - vfs inodes live a bit longer than
->destroy_inode in XFS - so these callouts from the superblock
shrinker are really just part of the existing VFS inode cache
management.

> Also dentries, for example, can have attached kmalloc'ed areas if the
> length of the file's name is larger than x. And probably there are more
> examples of indirectly pinned objects.

The xfs buffer cache has slab allocated handles that can pin up to
64kB of pages each. The buffer cache can quickly grow to hold tens
of gigabytes of memory when you have filesystems with hundreds of
gigabytes of metadata in them (which are not that uncommon). It's
also not uncommon for the XFS buffer cache to consume more memory
than the VFS dentry and inode caches combined.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [LSF/MM TOPIC] Better handling of negative dentries
  2022-03-22 21:08               ` Stephen Brennan
@ 2022-03-29 15:24                 ` James Bottomley
  2022-03-31 19:27                   ` Stephen Brennan
  0 siblings, 1 reply; 18+ messages in thread
From: James Bottomley @ 2022-03-29 15:24 UTC (permalink / raw)
  To: Stephen Brennan, Matthew Wilcox, Colin Walters
  Cc: Dave Chinner, Roman Gushchin, lsf-pc, linux-fsdevel, linux-mm,
	Gautham Ananthakrishna, khlebnikov

On Tue, 2022-03-22 at 14:08 -0700, Stephen Brennan wrote:
> On 3/22/22 13:37, Matthew Wilcox wrote:
> > On Tue, Mar 22, 2022 at 04:17:16PM -0400, Colin Walters wrote:
> > > 
> > > On Tue, Mar 22, 2022, at 3:19 PM, James Bottomley wrote:
> > > > Well, firstly what is the exact problem?  People maliciously
> > > > looking up nonexistent files
> > > 
> > > Maybe most people have seen it, but for those who haven't:
> > > https://bugzilla.redhat.com/show_bug.cgi?id=1571183
> > > was definitely one of those things that just makes one recoil in
> > > horror.
> > > 
> > > TL;DR NSS used to have code that tried to detect "is this a
> > > network filesystem" by timing `stat()` calls to nonexistent
> > > paths, and this massively boated the negative dentry cache and
> > > caused all sorts of performance problems.
> > > It was particularly confusing because this would just happen as a
> > > side effect of e.g. executing `curl https://somewebsite`.
> > > 
> > > That code wasn't *intentionally* malicious but...
> 
> That's... unpleasant.
> 
> > Oh, the situation where we encountered the problem was systemd.
> > Definitely not malicious, and not even stupid (as the NSS example
> > above). I forget exactly which thing it was, but on some fairly
> > common event (user login?), it looked up a file in a PATH of some
> > type, failed to find it in the first two directories, then created
> > it in a third> At logout, it deleted the file.  Now there are three
> > negative dentries.
> 
> More or less this, although I'm not sure it even created and deleted
> the files... it just wanted to check for them in all sorts of places.
> The file paths were something like this:
> 
> /{etc,usr/lib}/systemd/system/session-
> XXXXXXXX.scope.{wants,d,requires}
> 
> > Repeat a few million times (each time looking for a different file)
> > with no memory pressure and you have a thoroughly soggy machine
> > that is faster to reboot than to reclaim dentries.
> 
> The speed of reclaiming memory wasn't the straw which broke this
> server's back, it ended up being that some operations might iterate
> over the entire list of children of a dentry, holding a spinlock,
> causing soft lockups. Thus, patches like [1] which are attempting to
> treat the symptom, not the cause.
> 
> It seems to me that the idea of doing something based on last access
> time, or number of accesses, would be a great step. But given a
> prioritized list of dentries to target, and even a reasonable call
> site like kill_dentry(), the hardest part still seems to be
> determining the working set of dentries, or at least determining what
> is a reasonable number of negative dentries to keep around.
> 
> If we're looking at issues like [1], then the amount needs to be on a
> per-directory basis, and maybe roughly based on CPU speed. For other
> priorities or failure modes, then the policy would need to be
> completely different. Ideally a solution could work for almost all
> scenarios, but failing that, maybe it is worth allowing policy to be
> set by administrators via sysctl or even a BPF?

Looking at [1], you're really trying to contain the parent's child list
from exploding with negative dentries.  Looking through the patch, it
still strikes me that dentry_kill/retain_dentry is still a better
place, because if a negative dentry comes back there, it's unlikely to
become positive (well, fstat followed by create would be the counter
example, but it would partly be the app's fault for not doing
open(O_CREAT)).

If we have the signal for reuse of negative dentry from the cache,
which would be a fast lookup, we know a newly created negative dentry
already had a slow lookup, so we can do more processing without
necessarily slowing down the workload too much.  In particular, we
could just iterate over the parent's children of this negative dentry
and start pruning if there are too many (too many being a relative
term, but I think something like 2x-10x the max positive entries
wouldn't be such a bad heuristic).  Assuming we don't allow the
parent's list to contain too many negative dentries, we might not need
the sweep negative logic because the list wouldn't be allowed to grow
overly large.  I think a second heuristic would be prune at least two
negative dentries from the end of the sb lru list if they've never been
used for a lookup and were created more than a specified time ago
(problem is what, but I bet a minute wouldn't be unreasonable).

Obviously, while I think it would work for some of the negative dentry
induced issues, the above is very heuristic in tuning and won't help
with any of the other object issues in filesystems.  But on the other
hand, negative dentries are special in that if they're never used to
cache an -ENOENT and they never go positive, they're just a waste of
space.

James



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [LSF/MM TOPIC] Better handling of negative dentries
  2022-03-29 15:24                 ` James Bottomley
@ 2022-03-31 19:27                   ` Stephen Brennan
  2022-04-01 15:45                     ` James Bottomley
  0 siblings, 1 reply; 18+ messages in thread
From: Stephen Brennan @ 2022-03-31 19:27 UTC (permalink / raw)
  To: James Bottomley, Matthew Wilcox, Colin Walters
  Cc: Dave Chinner, Roman Gushchin, lsf-pc, linux-fsdevel, linux-mm,
	Gautham Ananthakrishna, khlebnikov

James Bottomley <James.Bottomley@HansenPartnership.com> writes:
> On Tue, 2022-03-22 at 14:08 -0700, Stephen Brennan wrote:
[snip]
>> If we're looking at issues like [1], then the amount needs to be on a
>> per-directory basis, and maybe roughly based on CPU speed. For other
>> priorities or failure modes, then the policy would need to be
>> completely different. Ideally a solution could work for almost all
>> scenarios, but failing that, maybe it is worth allowing policy to be
>> set by administrators via sysctl or even a BPF?
>
> Looking at [1], you're really trying to contain the parent's child list
> from exploding with negative dentries.  Looking through the patch, it
> still strikes me that dentry_kill/retain_dentry is still a better
> place, because if a negative dentry comes back there, it's unlikely to
> become positive (well, fstat followed by create would be the counter
> example, but it would partly be the app's fault for not doing
> open(O_CREAT)).

I actually like the idea of doing the pruning during d_alloc().
Basically, if you're creating dentries, you should also be working on
the cache management for them.

> If we have the signal for reuse of negative dentry from the cache,
> which would be a fast lookup, we know a newly created negative dentry
> already had a slow lookup, so we can do more processing without
> necessarily slowing down the workload too much.  In particular, we
> could just iterate over the parent's children of this negative dentry
> and start pruning if there are too many (too many being a relative
> term, but I think something like 2x-10x the max positive entries
> wouldn't be such a bad heuristic).

I agree that, on a per-directory basis, 2-10x feels right, though maybe
there needs to be some leeway for empty directories?

Per-directory pruning also makes sense from a concurrency standpoint:
the LRU locks could become a source of contention.

> Assuming we don't allow the
> parent's list to contain too many negative dentries, we might not need
> the sweep negative logic because the list wouldn't be allowed to grow
> overly large.

Seconded, I have no desire to actually try to get that sweep negative
logic merged if we can do a better job handling the dentries in the
first place.

> I think a second heuristic would be prune at least two
> negative dentries from the end of the sb lru list if they've never been
> used for a lookup and were created more than a specified time ago
> (problem is what, but I bet a minute wouldn't be unreasonable).
>
> Obviously, while I think it would work for some of the negative dentry
> induced issues, the above is very heuristic in tuning and won't help
> with any of the other object issues in filesystems.  But on the other
> hand, negative dentries are special in that if they're never used to
> cache an -ENOENT and they never go positive, they're just a waste of
> space.

I took a preliminary stab at some of these ideas in this series:

https://lore.kernel.org/linux-fsdevel/20220331190827.48241-1-stephen.s.brennan@oracle.com/

It doesn't handle pruning from the sb LRU list, nor does it have a good
way to decide which dentry to kill. But it's pretty stable and simple,
and I value that part :) . It still needs some work but I'd welcome
feedback from folks interested in this discussion.

Stephen

>
> James

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [LSF/MM TOPIC] Better handling of negative dentries
  2022-03-31 19:27                   ` Stephen Brennan
@ 2022-04-01 15:45                     ` James Bottomley
  0 siblings, 0 replies; 18+ messages in thread
From: James Bottomley @ 2022-04-01 15:45 UTC (permalink / raw)
  To: Stephen Brennan, Matthew Wilcox, Colin Walters
  Cc: Dave Chinner, Roman Gushchin, lsf-pc, linux-fsdevel, linux-mm,
	Gautham Ananthakrishna, khlebnikov

On Thu, 2022-03-31 at 12:27 -0700, Stephen Brennan wrote:
> James Bottomley <James.Bottomley@HansenPartnership.com> writes:
> > On Tue, 2022-03-22 at 14:08 -0700, Stephen Brennan wrote:
> [snip]
> > > If we're looking at issues like [1], then the amount needs to be
> > > on a per-directory basis, and maybe roughly based on CPU speed.
> > > For other priorities or failure modes, then the policy would need
> > > to be completely different. Ideally a solution could work for
> > > almost all scenarios, but failing that, maybe it is worth
> > > allowing policy to be set by administrators via sysctl or even a
> > > BPF?
> > 
> > Looking at [1], you're really trying to contain the parent's child
> > list from exploding with negative dentries.  Looking through the
> > patch, it still strikes me that dentry_kill/retain_dentry is still
> > a better place, because if a negative dentry comes back there, it's
> > unlikely to become positive (well, fstat followed by create would
> > be the counter example, but it would partly be the app's fault for
> > not doing open(O_CREAT)).
> 
> I actually like the idea of doing the pruning during d_alloc().
> Basically, if you're creating dentries, you should also be working on
> the cache management for them.

Agreed, but all of the profligate negative dentry creators do 

lookup ... dput

The final dput causes a dentry_kill() (if there are no other
references), so they still get to work on cache management, plus you
get a better signal for "I just created a negative dentry".

I'm not saying either is right: doing it in d_alloc shares the work
among all things that create dentries which may produce better
throughput.  Doing it in dentry_kill allows you to shovel more work on
to the negative dentry creators which can cause a greater penalty for
creating them.

James



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-04-01 16:19 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-15 19:55 [LSF/MM TOPIC] Better handling of negative dentries Matthew Wilcox
2022-03-15 20:56 ` Roman Gushchin
2022-03-16  2:07   ` Gao Xiang
2022-03-16  2:52     ` Dave Chinner
2022-03-16  3:08       ` Gao Xiang
2022-03-22 15:08       ` Matthew Wilcox
2022-03-22 19:19         ` James Bottomley
2022-03-22 20:17           ` Colin Walters
2022-03-22 20:27             ` James Bottomley
2022-03-22 20:37             ` Matthew Wilcox
2022-03-22 21:08               ` Stephen Brennan
2022-03-29 15:24                 ` James Bottomley
2022-03-31 19:27                   ` Stephen Brennan
2022-04-01 15:45                     ` James Bottomley
2022-03-22 22:21         ` Dave Chinner
2022-03-22 20:41   ` Matthew Wilcox
2022-03-22 21:19     ` Roman Gushchin
2022-03-22 22:29       ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.