All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>,
	Hillf Danton <hdanton@sina.com>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	MM <linux-mm@kvack.org>, Mel Gorman <mgorman@techsingularity.net>,
	Yu Zhao <yuzhao@google.com>, David Hildenbrand <david@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC] mm/vmscan: add periodic slab shrinker
Date: Wed, 6 Apr 2022 09:54:26 +1000	[thread overview]
Message-ID: <20220405235426.GY1609613@dread.disaster.area> (raw)
In-Reply-To: <YkyyCcdJq69tO6ba@casper.infradead.org>

On Tue, Apr 05, 2022 at 10:18:01PM +0100, Matthew Wilcox wrote:
> On Tue, Apr 05, 2022 at 10:22:14AM -0700, Stephen Brennan wrote:
> > I can't speak for every slab cache, but I've been coming to the same
> > conclusion myself regarding the dentry cache. I think that the rate of
> > stepping through the LRU should be tied to the rate of allocations.
> > Truly in-use objects shouldn't be harmed by this, as they should get
> > referenced and rotated to the beginning of the LRU. But the one-offs
> > which are bloating the cache will be found and removed.
> 
> I agree with all this.

Same here.

> > I've implemented a version of this patch which just takes one step
> > through the LRU on each d_alloc. It's quite interesting to watch it
> > work. You can create 5 million negative dentries in directory /A via
> > stat(), and then create 5 million negative dentries in directory /B. The
> > total dentry slab size reaches 5 million but never goes past it, since
> > the old negative dentries from /A aren't really in use, and they get
> > pruned at the same rate as negative dentries from /B get created. On the
> > other hand, if you *continue* to stat() on the dentries of /A while you
> > create negative dentries in /B, then the cache grows to 10 million,
> > since the /A dentries are actually in use.
> > 
> > Maybe a solution could involve some generic list_lru machinery that can
> > let you do these sorts of incremental scans? Maybe batching them up so
> > instead of doing one every allocation, you do them every 100 or 1000?
> > It would still be up to the individual user to put this to good use in
> > the object allocation path.
> 
> I feel like we need to allow the list to both shrink and grow, depending
> on how useful the entries in it are.  So one counter per LRU, incremented
> on every add.  When that counter gets to 100, reset it to 0 and scan
> 110 entries.  Maybe 0 of them can be reclaimed; maybe 110 of them can be.
> But the list can shrink over time instead of being a "one in, one out"
> scenario.

Yes, this is pretty much what I've been saying we should be using
the list-lru for since .... Well, let's just say it was one of the
things I wanted to be able to do when I first created the list-lru
infrastructure.

But it is much more complex than this. One of the issues with purely
list-lru add-time accounting is that we cannot make reclaim
decisions from list-add context because the list-add can occur in
reclaim context.  e.g.  dentry reclaim will drop the last reference
to an inode, which then gets inserted into the the inode list-lru in
reclaim context.  The very next thing the superblock shrinker does
is scan the inode cache list-lru and remove a pre-calculated number
of objects from the list. Hence in reclaim context we can be both
increasing and decreasing the size of the list-lru by significant
percentages in a very short time window. This means it will be quite
challenging to make clear decisions based purely on lru list add
operations.

Hence I think we want to connect more than just the unused entries
to periodic reclaim - we really need to know both the number of free
objects on the list-lru as well as the total number of objects
allocated that could be on the list_lru. This would give us some
comparitive measure of free objects vs active referenced objects
and that would allow better decisions to be made.

As it is, we've recently made a necessary connection between
allocation and the list-lru via kmem_cache_alloc_lru(). This was
done as part of the list-lru/memcg rework patchset I referenced
earlier in the thread.

This means that operations that slab objects that are kept
on list_lrus for caching are now supplied with the list_lru at
allocation time. We already use this API for inode caches (via
inode_alloc_sb()) and the dentry cache (via __d_alloc()), so we
already have the infrastructure in place to do per-allocation
list-lru accounting for inodes and dentries, not just "per list-lru
add/remove" accounting.

Extending that to other slab-based list-lru users should be pretty
easy, and in doing so would remove another difference between memcg
and non-memcg aware list-lrus....

> Clearly 110 is a magic number, but intuitively, attempting to shrink
> by 10% feels reasonable.  Need to strike a balance between shrinking
> quickly enough and giving the cache time to figure out which entries
> are actually useful.

Testing will teach us where the thresholds need to be pretty
quickly. :)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2022-04-06  3:38 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-02  7:21 [RFC] mm/vmscan: add periodic slab shrinker Hillf Danton
2022-04-02 17:54 ` Roman Gushchin
2022-04-03  0:56   ` Hillf Danton
2022-04-04  1:09     ` Dave Chinner
2022-04-04  5:14       ` Hillf Danton
2022-04-04 18:32         ` Roman Gushchin
2022-04-04 19:08       ` Roman Gushchin
2022-04-05  5:17         ` Dave Chinner
2022-04-05 16:35           ` Roman Gushchin
2022-04-05 20:58             ` Yang Shi
2022-04-05 21:21               ` Matthew Wilcox
2022-04-06  0:01                 ` Dave Chinner
2022-04-06  4:14                   ` Hillf Danton
2022-04-21 19:03                   ` Kent Overstreet
2022-04-21 23:55                     ` Dave Chinner
2022-04-05 21:31               ` Roman Gushchin
2022-04-06  0:11                 ` Dave Chinner
2022-04-05 17:22       ` Stephen Brennan
2022-04-05 21:18         ` Matthew Wilcox
2022-04-05 23:54           ` Dave Chinner [this message]
2022-04-06  1:06             ` Stephen Brennan
2022-04-06  3:52               ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220405235426.GY1609613@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=david@redhat.com \
    --cc=hdanton@sina.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=roman.gushchin@linux.dev \
    --cc=stephen.s.brennan@oracle.com \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.