linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Roman Gushchin <guro@fb.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: <akpm@linux-foundation.org>, <bigeasy@linutronix.de>,
	<cl@linux.com>, <hannes@cmpxchg.org>, <iamjoonsoo.kim@lge.com>,
	<jannh@google.com>, <linux-kernel@vger.kernel.org>,
	<linux-mm@kvack.org>, <mhocko@kernel.org>, <minchan@kernel.org>,
	<penberg@kernel.org>, <rientjes@google.com>,
	<shakeelb@google.com>, <surenb@google.com>, <tglx@linutronix.de>
Subject: Re: [RFC 2/2] mm, slub: add shrinker to reclaim cached slabs
Date: Thu, 21 Jan 2021 16:48:47 -0800	[thread overview]
Message-ID: <20210122004847.GA25567@carbon.dhcp.thefacebook.com> (raw)
In-Reply-To: <20210121172154.27580-2-vbabka@suse.cz>

On Thu, Jan 21, 2021 at 06:21:54PM +0100, Vlastimil Babka wrote:
> For performance reasons, SLUB doesn't keep all slabs on shared lists and
> doesn't always free slabs immediately after all objects are freed. Namely:
> 
> - for each cache and cpu, there might be a "CPU slab" page, partially or fully
>   free
> - with SLUB_CPU_PARTIAL enabled (default y), there might be a number of "percpu
>   partial slabs" for each cache and cpu, also partially or fully free
> - for each cache and numa node, there are caches on per-node partial list, up
>   to 10 of those may be empty
> 
> As Jann reports [1], the number of percpu partial slabs should be limited by
> number of free objects (up to 30), but due to imprecise accounting, this can
> deterioriate so that there are up to 30 free slabs. He notes:
> 
> > Even on an old-ish Android phone (Pixel 2), with normal-ish usage, I
> > see something like 1.5MiB of pages with zero inuse objects stuck in
> > percpu lists.
> 
> My observations match Jann's, and we've seen e.g. cases with 10 free slabs per
> cpu. We can also confirm Jann's theory that on kernels pre-kmemcg rewrite (in
> v5.9), this issue is amplified as there are separate sets of kmem caches with
> cpu caches, per-cpu partial and per-node partial lists for each memcg and cache
> that deals with kmemcg-accounted objects.
> 
> The cached free slabs can therefore become a memory waste, making memory
> pressure higher, causing more reclaim of actually used LRU pages, and even
> cause OOM (global, or memcg on older kernels).
> 
> SLUB provides __kmem_cache_shrink() that can flush all the abovementioned
> slabs, but is currently called only in rare situations, or from a sysfs
> handler. The standard way to cooperate with reclaim is to provide a shrinker,
> and so this patch adds such shrinker to call __kmem_cache_shrink()
> systematically.
> 
> The shrinker design is however atypical. The usual design assumes that a
> shrinker can easily count how many objects can be reclaimed, and then reclaim
> given number of objects. For SLUB, determining the number of the various cached
> slabs would be a lot of work, and controlling how many to shrink precisely
> would be impractical. Instead, the shrinker is based on reclaim priority, and
> on lowest priority shrinks a single kmem cache, while on highest it shrinks all
> of them. To do that effectively, there's a new list caches_to_shrink where
> caches are taken from its head and then moved to tail. Existing slab_caches
> list is unaffected so that e.g. /proc/slabinfo order is not disrupted.
> 
> This approach should not cause excessive shrinking and IPI storms:
> 
> - If there are multiple reclaimers in parallel, only one can proceed, thanks to
>   mutex_trylock(&slab_mutex). After unlocking, caches that were just shrinked
>   are at the tail of the list.
> - in flush_all(), we actually check if there's anything to flush by a CPU
>   (has_cpu_slab()) before sending an IPI
> - CPU slab deactivation became more efficient with "mm, slub: splice cpu and
>   page freelists in deactivate_slab()
> 
> The result is that SLUB's per-cpu and per-node caches are trimmed of free
> pages, and partially used pages have higher chance of being either reused of
> freed. The trimming effort is controlled by reclaim activity and thus memory
> pressure. Before an OOM, a reclaim attempt at highest priority ensures
> shrinking all caches. Also being a proper slab shrinker, the shrinking is
> now also called as part of the drop_caches sysctl operation.

Hi Vlastimil!

This makes a lot of sense, however it looks a bit as an overkill to me (on 5.9+).
Isn't limiting a number of pages (instead of number of objects) sufficient on 5.9+?

If not, maybe we can limit the shrinking to the pre-OOM condition?
Do we really need to trip it constantly?

Thanks!


  reply	other threads:[~2021-01-22  0:49 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-11 23:12 SLUB: percpu partial object count is highly inaccurate, causing some memory wastage and maybe also worse tail latencies? Jann Horn
2021-01-12  0:27 ` Roman Gushchin
2021-01-12 16:35 ` Christoph Lameter
2021-01-14  9:27   ` Vlastimil Babka
2021-01-18 11:03     ` Michal Hocko
2021-01-18 15:46       ` Christoph Lameter
2021-01-18 16:07         ` Michal Hocko
2021-01-13 19:14 ` Vlastimil Babka
2021-01-13 22:37   ` Jann Horn
2021-01-14  9:04     ` Christoph Lameter
2021-01-21 17:21 ` Vlastimil Babka
2021-01-21 17:21   ` [RFC 1/2] mm, vmscan: add priority field to struct shrink_control Vlastimil Babka
2021-01-21 17:21     ` [RFC 2/2] mm, slub: add shrinker to reclaim cached slabs Vlastimil Babka
2021-01-22  0:48       ` Roman Gushchin [this message]
2021-01-26 12:06         ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210122004847.GA25567@carbon.dhcp.thefacebook.com \
    --to=guro@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=bigeasy@linutronix.de \
    --cc=cl@linux.com \
    --cc=hannes@cmpxchg.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=minchan@kernel.org \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=shakeelb@google.com \
    --cc=surenb@google.com \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).