All of lore.kernel.org
 help / color / mirror / Atom feed
From: Muchun Song <songmuchun@bytedance.com>
To: willy@infradead.org, akpm@linux-foundation.org,
	hannes@cmpxchg.org, mhocko@kernel.org, vdavydov.dev@gmail.com,
	shakeelb@google.com, guro@fb.com, shy828301@gmail.com,
	alexs@kernel.org, alexander.h.duyck@linux.intel.com,
	richard.weiyang@gmail.com
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, Muchun Song <songmuchun@bytedance.com>
Subject: [PATCH 8/9] mm: memcontrol: shrink the list lru size
Date: Wed, 28 Apr 2021 17:49:48 +0800	[thread overview]
Message-ID: <20210428094949.43579-9-songmuchun@bytedance.com> (raw)
In-Reply-To: <20210428094949.43579-1-songmuchun@bytedance.com>

In our server, we found a suspected memory leak problem. The kmalloc-32
consumes more than 6GB of memory. Other kmem_caches consume less than 2GB
memory.

After our in-depth analysis, the memory consumption of kmalloc-32 slab
cache is the cause of list_lru_one allocation.

  crash> p memcg_nr_cache_ids
  memcg_nr_cache_ids = $2 = 24574

memcg_nr_cache_ids is very large and memory consumption of each list_lru
can be calculated with the following formula.

  num_numa_node * memcg_nr_cache_ids * 32 (kmalloc-32)

There are 4 numa nodes in our system, so each list_lru consumes ~3MB.

  crash> list super_blocks | wc -l
  952

Every mount will register 2 list lrus, one is for inode, another is for
dentry. There are 952 super_blocks. So the total memory is 952 * 2 * 3
MB (~5.6GB). But the number of memory cgroup is less than 500. So I
guess more than 12286 containers have been deployed on this machine (I
do not know why there are so many containers, it may be a user's bug or
the user really want to do that). But now there are less than 500
containers in the system. And memcg_nr_cache_ids has not been reduced
to a suitable value. This can waste a lot of memory. If we want to reduce
memcg_nr_cache_ids, we have to reboot the server. This is not what we
want. So this patch will dynamically adjust the value of
memcg_nr_cache_ids to keep healthy memory consumption. In this case, we
may be able to restore a healthy environment even if the users have
created tens of thousands of memory cgroups.

In this patch, I adjusted the calculation formula of memcg_nr_cache_ids
from "size = 2 * (id + 1)" to "size = 2 * id" in memcg_alloc_cache_id().
Because this can make things more simple when shrink the list lru size.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 mm/memcontrol.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 49 insertions(+), 3 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 1610d501e7b5..f8cdd87cf693 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -362,6 +362,8 @@ static void memcg_reparent_objcgs(struct mem_cgroup *memcg,
 static DEFINE_IDA(memcg_cache_ida);
 int memcg_nr_cache_ids;
 
+static int kmemcg_max_id;
+
 /* Protects memcg_nr_cache_ids */
 static DECLARE_RWSEM(memcg_cache_ids_sem);
 
@@ -2856,8 +2858,11 @@ static int memcg_alloc_cache_id(void)
 	if (id < 0)
 		return id;
 
-	if (id < memcg_nr_cache_ids)
+	if (id < memcg_nr_cache_ids) {
+		if (id > kmemcg_max_id)
+			kmemcg_max_id = id;
 		return id;
+	}
 
 	/*
 	 * There's no space for the new id in memcg_caches arrays,
@@ -2865,15 +2870,17 @@ static int memcg_alloc_cache_id(void)
 	 */
 	down_write(&memcg_cache_ids_sem);
 
-	size = 2 * (id + 1);
+	size = 2 * id;
 	if (size < MEMCG_CACHES_MIN_SIZE)
 		size = MEMCG_CACHES_MIN_SIZE;
 	else if (size > MEMCG_CACHES_MAX_SIZE)
 		size = MEMCG_CACHES_MAX_SIZE;
 
 	err = memcg_update_all_list_lrus(size);
-	if (!err)
+	if (!err) {
 		memcg_nr_cache_ids = size;
+		kmemcg_max_id = id;
+	}
 
 	up_write(&memcg_cache_ids_sem);
 
@@ -2884,9 +2891,48 @@ static int memcg_alloc_cache_id(void)
 	return id;
 }
 
+static inline int nearest_fit_id(int id)
+{
+	if (unlikely(id < MEMCG_CACHES_MIN_SIZE))
+		return MEMCG_CACHES_MIN_SIZE;
+
+	return 1 << (__fls(id) + 1);
+}
+
+/*
+ * memcg_alloc_cache_id() and memcg_free_cache_id() are serialized by
+ * cgroup_mutex. So there is no race on kmemcg_max_id.
+ */
 static void memcg_free_cache_id(int id)
 {
 	ida_simple_remove(&memcg_cache_ida, id);
+
+	if (kmemcg_max_id == id) {
+		/*
+		 * In order to avoid @memcg_nr_cache_ids bouncing between
+		 * @memcg_nr_cache_ids / 2 and @memcg_nr_cache_ids. We only
+		 * shrink the list lru size when @kmemcg_max_id is smaller
+		 * than @memcg_nr_cache_ids / 3.
+		 */
+		int size = memcg_nr_cache_ids / 3;
+
+		kmemcg_max_id = ida_max(&memcg_cache_ida);
+		if (kmemcg_max_id < size) {
+			/*
+			 * Find the first value greater than @kmemcg_max_id
+			 * which can fit our need. And shrink the list lru
+			 * to this size.
+			 */
+			size = nearest_fit_id(kmemcg_max_id);
+
+			down_write(&memcg_cache_ids_sem);
+			if (size != memcg_nr_cache_ids) {
+				memcg_update_all_list_lrus(size);
+				memcg_nr_cache_ids = size;
+			}
+			up_write(&memcg_cache_ids_sem);
+		}
+	}
 }
 
 /*
-- 
2.11.0


  parent reply	other threads:[~2021-04-28  9:54 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-28  9:49 [PATCH 0/9] Shrink the list lru size on memory cgroup removal Muchun Song
2021-04-28  9:49 ` [PATCH 1/9] mm: list_lru: fix list_lru_count_one() return value Muchun Song
2021-04-28  9:49 ` [PATCH 2/9] mm: memcontrol: remove kmemcg_id reparenting Muchun Song
2021-04-28  9:49 ` [PATCH 3/9] mm: list_lru: rename memcg_drain_all_list_lrus to memcg_reparent_list_lrus Muchun Song
2021-04-28  9:49 ` [PATCH 4/9] mm: memcontrol: remove the kmem states Muchun Song
2021-04-28  9:49 ` [PATCH 5/9] mm: memcontrol: move memcg_online_kmem() to mem_cgroup_css_online() Muchun Song
2021-04-28  9:49 ` [PATCH 6/9] mm: list_lru: support for shrinking list lru Muchun Song
2021-04-28  9:49 ` [PATCH 7/9] ida: introduce ida_max() to return the maximum allocated ID Muchun Song
2021-04-29  6:47   ` Christoph Hellwig
2021-04-29  7:36     ` [External] " Muchun Song
2021-04-29  7:36       ` Muchun Song
2021-04-28  9:49 ` Muchun Song [this message]
2021-04-28  9:49 ` [PATCH 9/9] mm: memcontrol: rename memcg_{get,put}_cache_ids to memcg_list_lru_resize_{lock,unlock} Muchun Song
2021-04-28 23:32 ` [PATCH 0/9] Shrink the list lru size on memory cgroup removal Shakeel Butt
2021-04-28 23:32   ` Shakeel Butt
2021-04-29  3:05   ` [External] " Muchun Song
2021-04-29  3:05     ` Muchun Song
2021-04-30  0:49 ` Dave Chinner
2021-04-30  1:39   ` Roman Gushchin
2021-04-30  3:27     ` Dave Chinner
2021-04-30  8:32       ` [External] " Muchun Song
2021-04-30  8:32         ` Muchun Song
2021-05-01  3:10         ` Roman Gushchin
2021-05-01  3:27         ` Matthew Wilcox
2021-05-02 23:58         ` Dave Chinner
2021-05-03  6:33           ` Muchun Song
2021-05-03  6:33             ` Muchun Song
2021-05-05  1:13             ` Dave Chinner
2021-05-07  5:45               ` Muchun Song
2021-05-07  5:45                 ` Muchun Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210428094949.43579-9-songmuchun@bytedance.com \
    --to=songmuchun@bytedance.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=alexs@kernel.org \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=richard.weiyang@gmail.com \
    --cc=shakeelb@google.com \
    --cc=shy828301@gmail.com \
    --cc=vdavydov.dev@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.