All of lore.kernel.org
 help / color / mirror / Atom feed
From: Roman Gushchin <guro@fb.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: <akpm@linux-foundation.org>, <bigeasy@linutronix.de>,
	<cl@linux.com>, <hannes@cmpxchg.org>, <iamjoonsoo.kim@lge.com>,
	<jannh@google.com>, <linux-kernel@vger.kernel.org>,
	<linux-mm@kvack.org>, <mhocko@kernel.org>, <minchan@kernel.org>,
	<penberg@kernel.org>, <rientjes@google.com>,
	<shakeelb@google.com>, <surenb@google.com>, <tglx@linutronix.de>
Subject: Re: [RFC 2/2] mm, slub: add shrinker to reclaim cached slabs
Date: Thu, 21 Jan 2021 16:48:47 -0800	[thread overview]
Message-ID: <20210122004847.GA25567@carbon.dhcp.thefacebook.com> (raw)
In-Reply-To: <20210121172154.27580-2-vbabka@suse.cz>

On Thu, Jan 21, 2021 at 06:21:54PM +0100, Vlastimil Babka wrote:
> For performance reasons, SLUB doesn't keep all slabs on shared lists and
> doesn't always free slabs immediately after all objects are freed. Namely:
> 
> - for each cache and cpu, there might be a "CPU slab" page, partially or fully
>   free
> - with SLUB_CPU_PARTIAL enabled (default y), there might be a number of "percpu
>   partial slabs" for each cache and cpu, also partially or fully free
> - for each cache and numa node, there are caches on per-node partial list, up
>   to 10 of those may be empty
> 
> As Jann reports [1], the number of percpu partial slabs should be limited by
> number of free objects (up to 30), but due to imprecise accounting, this can
> deterioriate so that there are up to 30 free slabs. He notes:
> 
> > Even on an old-ish Android phone (Pixel 2), with normal-ish usage, I
> > see something like 1.5MiB of pages with zero inuse objects stuck in
> > percpu lists.
> 
> My observations match Jann's, and we've seen e.g. cases with 10 free slabs per
> cpu. We can also confirm Jann's theory that on kernels pre-kmemcg rewrite (in
> v5.9), this issue is amplified as there are separate sets of kmem caches with
> cpu caches, per-cpu partial and per-node partial lists for each memcg and cache
> that deals with kmemcg-accounted objects.
> 
> The cached free slabs can therefore become a memory waste, making memory
> pressure higher, causing more reclaim of actually used LRU pages, and even
> cause OOM (global, or memcg on older kernels).
> 
> SLUB provides __kmem_cache_shrink() that can flush all the abovementioned
> slabs, but is currently called only in rare situations, or from a sysfs
> handler. The standard way to cooperate with reclaim is to provide a shrinker,
> and so this patch adds such shrinker to call __kmem_cache_shrink()
> systematically.
> 
> The shrinker design is however atypical. The usual design assumes that a
> shrinker can easily count how many objects can be reclaimed, and then reclaim
> given number of objects. For SLUB, determining the number of the various cached
> slabs would be a lot of work, and controlling how many to shrink precisely
> would be impractical. Instead, the shrinker is based on reclaim priority, and
> on lowest priority shrinks a single kmem cache, while on highest it shrinks all
> of them. To do that effectively, there's a new list caches_to_shrink where
> caches are taken from its head and then moved to tail. Existing slab_caches
> list is unaffected so that e.g. /proc/slabinfo order is not disrupted.
> 
> This approach should not cause excessive shrinking and IPI storms:
> 
> - If there are multiple reclaimers in parallel, only one can proceed, thanks to
>   mutex_trylock(&slab_mutex). After unlocking, caches that were just shrinked
>   are at the tail of the list.
> - in flush_all(), we actually check if there's anything to flush by a CPU
>   (has_cpu_slab()) before sending an IPI
> - CPU slab deactivation became more efficient with "mm, slub: splice cpu and
>   page freelists in deactivate_slab()
> 
> The result is that SLUB's per-cpu and per-node caches are trimmed of free
> pages, and partially used pages have higher chance of being either reused of
> freed. The trimming effort is controlled by reclaim activity and thus memory
> pressure. Before an OOM, a reclaim attempt at highest priority ensures
> shrinking all caches. Also being a proper slab shrinker, the shrinking is
> now also called as part of the drop_caches sysctl operation.

Hi Vlastimil!

This makes a lot of sense, however it looks a bit as an overkill to me (on 5.9+).
Isn't limiting a number of pages (instead of number of objects) sufficient on 5.9+?

If not, maybe we can limit the shrinking to the pre-OOM condition?
Do we really need to trip it constantly?

Thanks!

  reply	other threads:[~2021-01-22  0:50 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-11 23:12 SLUB: percpu partial object count is highly inaccurate, causing some memory wastage and maybe also worse tail latencies? Jann Horn
2021-01-11 23:12 ` Jann Horn
2021-01-12  0:27 ` Roman Gushchin
2021-01-12 16:35 ` Christoph Lameter
2021-01-12 16:35   ` Christoph Lameter
2021-01-14  9:27   ` Vlastimil Babka
2021-01-18 11:03     ` Michal Hocko
2021-01-18 15:46       ` Christoph Lameter
2021-01-18 15:46         ` Christoph Lameter
2021-01-18 16:07         ` Michal Hocko
2021-01-13 19:14 ` Vlastimil Babka
2021-01-13 22:37   ` Jann Horn
2021-01-13 22:37     ` Jann Horn
2021-01-14  9:04     ` Christoph Lameter
2021-01-14  9:04       ` Christoph Lameter
2021-01-21 17:21 ` Vlastimil Babka
2021-01-21 17:21   ` [RFC 1/2] mm, vmscan: add priority field to struct shrink_control Vlastimil Babka
2021-01-21 17:21     ` [RFC 2/2] mm, slub: add shrinker to reclaim cached slabs Vlastimil Babka
2021-01-22  0:48       ` Roman Gushchin [this message]
2021-01-26 12:06         ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210122004847.GA25567@carbon.dhcp.thefacebook.com \
    --to=guro@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=bigeasy@linutronix.de \
    --cc=cl@linux.com \
    --cc=hannes@cmpxchg.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=minchan@kernel.org \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=shakeelb@google.com \
    --cc=surenb@google.com \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.