From: Roman Gushchin <guro@fb.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Muchun Song <songmuchun@bytedance.com>, <willy@infradead.org>,
<akpm@linux-foundation.org>, <hannes@cmpxchg.org>,
<mhocko@kernel.org>, <vdavydov.dev@gmail.com>,
<shakeelb@google.com>, <shy828301@gmail.com>, <alexs@kernel.org>,
<alexander.h.duyck@linux.intel.com>, <richard.weiyang@gmail.com>,
<linux-fsdevel@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
<linux-mm@kvack.org>
Subject: Re: [PATCH 0/9] Shrink the list lru size on memory cgroup removal
Date: Thu, 29 Apr 2021 18:39:40 -0700 [thread overview]
Message-ID: <YItf3GIUs2skeuyi@carbon.dhcp.thefacebook.com> (raw)
In-Reply-To: <20210430004903.GF1872259@dread.disaster.area>
On Fri, Apr 30, 2021 at 10:49:03AM +1000, Dave Chinner wrote:
> On Wed, Apr 28, 2021 at 05:49:40PM +0800, Muchun Song wrote:
> > In our server, we found a suspected memory leak problem. The kmalloc-32
> > consumes more than 6GB of memory. Other kmem_caches consume less than 2GB
> > memory.
> >
> > After our in-depth analysis, the memory consumption of kmalloc-32 slab
> > cache is the cause of list_lru_one allocation.
> >
> > crash> p memcg_nr_cache_ids
> > memcg_nr_cache_ids = $2 = 24574
> >
> > memcg_nr_cache_ids is very large and memory consumption of each list_lru
> > can be calculated with the following formula.
> >
> > num_numa_node * memcg_nr_cache_ids * 32 (kmalloc-32)
> >
> > There are 4 numa nodes in our system, so each list_lru consumes ~3MB.
> >
> > crash> list super_blocks | wc -l
> > 952
>
> The more I see people trying to work around this, the more I think
> that the way memcgs have been grafted into the list_lru is back to
> front.
>
> We currently allocate scope for every memcg to be able to tracked on
> every not on every superblock instantiated in the system, regardless
> of whether that superblock is even accessible to that memcg.
>
> These huge memcg counts come from container hosts where memcgs are
> confined to just a small subset of the total number of superblocks
> that instantiated at any given point in time.
>
> IOWs, for these systems with huge container counts, list_lru does
> not need the capability of tracking every memcg on every superblock.
>
> What it comes down to is that the list_lru is only needed for a
> given memcg if that memcg is instatiating and freeing objects on a
> given list_lru.
>
> Which makes me think we should be moving more towards "add the memcg
> to the list_lru at the first insert" model rather than "instantiate
> all at memcg init time just in case". The model we originally came
> up with for supprting memcgs is really starting to show it's limits,
> and we should address those limitations rahter than hack more
> complexity into the system that does nothing to remove the
> limitations that are causing the problems in the first place.
I totally agree.
It looks like the initial implementation of the whole kernel memory accounting
and memcg-aware shrinkers was based on the idea that the number of memory
cgroups is relatively small and stable. With systemd creating a separate cgroup
for everything including short-living processes it simple not true anymore.
Thanks!
next prev parent reply other threads:[~2021-04-30 1:40 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-28 9:49 [PATCH 0/9] Shrink the list lru size on memory cgroup removal Muchun Song
2021-04-28 9:49 ` [PATCH 1/9] mm: list_lru: fix list_lru_count_one() return value Muchun Song
2021-04-28 9:49 ` [PATCH 2/9] mm: memcontrol: remove kmemcg_id reparenting Muchun Song
2021-04-28 9:49 ` [PATCH 3/9] mm: list_lru: rename memcg_drain_all_list_lrus to memcg_reparent_list_lrus Muchun Song
2021-04-28 9:49 ` [PATCH 4/9] mm: memcontrol: remove the kmem states Muchun Song
2021-04-28 9:49 ` [PATCH 5/9] mm: memcontrol: move memcg_online_kmem() to mem_cgroup_css_online() Muchun Song
2021-04-28 9:49 ` [PATCH 6/9] mm: list_lru: support for shrinking list lru Muchun Song
2021-04-28 9:49 ` [PATCH 7/9] ida: introduce ida_max() to return the maximum allocated ID Muchun Song
2021-04-29 6:47 ` Christoph Hellwig
2021-04-29 7:36 ` [External] " Muchun Song
2021-04-28 9:49 ` [PATCH 8/9] mm: memcontrol: shrink the list lru size Muchun Song
2021-04-28 9:49 ` [PATCH 9/9] mm: memcontrol: rename memcg_{get,put}_cache_ids to memcg_list_lru_resize_{lock,unlock} Muchun Song
2021-04-28 23:32 ` [PATCH 0/9] Shrink the list lru size on memory cgroup removal Shakeel Butt
2021-04-29 3:05 ` [External] " Muchun Song
2021-04-30 0:49 ` Dave Chinner
2021-04-30 1:39 ` Roman Gushchin [this message]
2021-04-30 3:27 ` Dave Chinner
2021-04-30 8:32 ` [External] " Muchun Song
2021-05-01 3:10 ` Roman Gushchin
2021-05-01 3:27 ` Matthew Wilcox
2021-05-02 23:58 ` Dave Chinner
2021-05-03 6:33 ` Muchun Song
2021-05-05 1:13 ` Dave Chinner
2021-05-07 5:45 ` Muchun Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YItf3GIUs2skeuyi@carbon.dhcp.thefacebook.com \
--to=guro@fb.com \
--cc=akpm@linux-foundation.org \
--cc=alexander.h.duyck@linux.intel.com \
--cc=alexs@kernel.org \
--cc=david@fromorbit.com \
--cc=hannes@cmpxchg.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=richard.weiyang@gmail.com \
--cc=shakeelb@google.com \
--cc=shy828301@gmail.com \
--cc=songmuchun@bytedance.com \
--cc=vdavydov.dev@gmail.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).