All of lore.kernel.org
 help / color / mirror / Atom feed
From: Roman Gushchin <guro@fb.com>
To: Muchun Song <songmuchun@bytedance.com>
Cc: <willy@infradead.org>, <akpm@linux-foundation.org>,
	<hannes@cmpxchg.org>, <mhocko@kernel.org>,
	<vdavydov.dev@gmail.com>, <shakeelb@google.com>,
	<shy828301@gmail.com>, <alexs@kernel.org>,
	<richard.weiyang@gmail.com>, <david@fromorbit.com>,
	<trond.myklebust@hammerspace.com>, <anna.schumaker@netapp.com>,
	<jaegeuk@kernel.org>, <chao@kernel.org>,
	<kari.argillander@gmail.com>, <linux-fsdevel@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>,
	<linux-nfs@vger.kernel.org>, <zhengqi.arch@bytedance.com>,
	<duanxiongchun@bytedance.com>, <fam.zheng@bytedance.com>,
	<smuchun@gmail.com>
Subject: Re: [PATCH v5 01/16] mm: list_lru: optimize memory consumption of arrays of per cgroup lists
Date: Thu, 6 Jan 2022 16:05:30 -0800	[thread overview]
Message-ID: <YdeDym9IUghnagrK@carbon.dhcp.thefacebook.com> (raw)
In-Reply-To: <20211220085649.8196-2-songmuchun@bytedance.com>

On Mon, Dec 20, 2021 at 04:56:34PM +0800, Muchun Song wrote:
> The list_lru uses an array (list_lru_memcg->lru) to store pointers
> which point to the list_lru_one. And the array is per memcg per node.
> Therefore, the size of the arrays will be 10K * number_of_node * 8 (
> a pointer size on 64 bits system) when we run 10k containers in the
> system. The memory consumption of the arrays becomes significant. The
> more numa node, the more memory it consumes.
> 
> I have done a simple test, which creates 10K memcg and mount point
> each in a two-node system. The memory consumption of the list_lru
> will be 24464MB. After converting the array from per memcg per node
> to per memcg, the memory consumption is going to be 21957MB. It is
> reduces by 2.5GB. In our AMD servers with 8 numa nodes in those
> sysuem, the memory consumption could be more significant. The savings
> come from the list_lru_one heads, that it also simplifies the
> alloc/dealloc path.
> 
> The new scheme looks like the following.
> 
>   +----------+   mlrus   +----------------+   mlru   +----------------------+
>   | list_lru +---------->| list_lru_memcg +--------->|  list_lru_per_memcg  |
>   +----------+           +----------------+          +----------------------+
>                                                      |  list_lru_per_memcg  |
>                                                      +----------------------+
>                                                      |          ...         |
>                           +--------------+   node    +----------------------+
>                           | list_lru_one |<----------+  list_lru_per_memcg  |
>                           +--------------+           +----------------------+
>                           | list_lru_one |
>                           +--------------+
>                           |      ...     |
>                           +--------------+
>                           | list_lru_one |
>                           +--------------+
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>

As much as I like the code changes (there is indeed a significant simplification!),
I don't like the commit message and title, because I wasn't able to understand
what the patch is doing and some parts look simply questionable. Overall it
sounds like you reduce the number of list_lru_one structures, which is not true.

How about something like this?

--
mm: list_lru: transpose the array of per-node per-memcg lru lists

The current scheme of maintaining per-node per-memcg lru lists looks like:
  struct list_lru {
    struct list_lru_node *node;           (for each node)
      struct list_lru_memcg *memcg_lrus;
        struct list_lru_one *lru[];       (for each memcg)
  }

By effectively transposing the two-dimension array of list_lru_one's structures
(per-node per-memcg => per-memcg per-node) it's possible to save some memory
and simplify alloc/dealloc paths. The new scheme looks like:
  struct list_lru {
    struct list_lru_memcg *mlrus;
      struct list_lru_per_memcg *mlru[];  (for each memcg)
        struct list_lru_one node[0];      (for each node)
  }

Memory savings are coming from having fewer list_lru_memcg structures, which
contain an extra struct rcu_head to handle the destruction process.
--

But what worries me is that memory savings numbers you posted don't do up.
In theory we can save
16 (size of struct rcu_head) * 10000 (number of cgroups) * 2 (number of numa nodes) = 320k
per slab cache. Did you have a ton of mount points? Otherwise I don't understand
where these 2.5Gb are coming from.

Thanks!

  reply	other threads:[~2022-01-07  0:06 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-20  8:56 [PATCH v5 00/16] Optimize list lru memory consumption Muchun Song
2021-12-20  8:56 ` [PATCH v5 01/16] mm: list_lru: optimize memory consumption of arrays of per cgroup lists Muchun Song
2022-01-07  0:05   ` Roman Gushchin [this message]
2022-01-09  4:49     ` Muchun Song
2022-01-10 18:42       ` Roman Gushchin
2022-01-11  3:19         ` Muchun Song
2021-12-20  8:56 ` [PATCH v5 02/16] mm: introduce kmem_cache_alloc_lru Muchun Song
2022-01-07  3:04   ` Roman Gushchin
2022-01-09  6:21     ` Muchun Song
2022-01-10 18:47       ` Roman Gushchin
2022-01-11 15:41         ` Vlastimil Babka
2022-01-11 17:54           ` Roman Gushchin
2021-12-20  8:56 ` [PATCH v5 03/16] fs: introduce alloc_inode_sb() to allocate filesystems specific inode Muchun Song
2022-01-11 18:55   ` Roman Gushchin
2022-01-12  2:54     ` Muchun Song
2021-12-20  8:56 ` [PATCH v5 04/16] fs: allocate inode by using alloc_inode_sb() Muchun Song
2022-01-11 18:58   ` Roman Gushchin
2022-01-12  2:55     ` Muchun Song
2021-12-20  8:56 ` [PATCH v5 05/16] f2fs: " Muchun Song
2022-01-11 19:03   ` Roman Gushchin
2021-12-20  8:56 ` [PATCH v5 06/16] nfs42: use a specific kmem_cache to allocate nfs4_xattr_entry Muchun Song
2021-12-20  8:56 ` [PATCH v5 07/16] mm: dcache: use kmem_cache_alloc_lru() to allocate dentry Muchun Song
2022-01-11 19:05   ` Roman Gushchin
2021-12-20  8:56 ` [PATCH v5 08/16] xarray: use kmem_cache_alloc_lru to allocate xa_node Muchun Song
2022-01-11 19:14   ` Roman Gushchin
2021-12-20  8:56 ` [PATCH v5 09/16] mm: memcontrol: move memcg_online_kmem() to mem_cgroup_css_online() Muchun Song
2022-01-11 19:17   ` Roman Gushchin
2021-12-20  8:56 ` [PATCH v5 10/16] mm: list_lru: allocate list_lru_one only when needed Muchun Song
2022-01-06 11:00   ` Michal Koutný
2022-01-12 13:22     ` Muchun Song
2022-01-13 13:32       ` Michal Koutný
2022-01-18 12:05         ` Muchun Song
2022-01-19  9:33           ` Michal Koutný
2022-01-21  5:28             ` Muchun Song
2022-01-11 20:00   ` Roman Gushchin
2022-01-12  4:48     ` Muchun Song
2021-12-20  8:56 ` [PATCH v5 11/16] mm: list_lru: rename memcg_drain_all_list_lrus to memcg_reparent_list_lrus Muchun Song
2021-12-20  8:56 ` [PATCH v5 12/16] mm: list_lru: replace linear array with xarray Muchun Song
2021-12-20  8:56 ` [PATCH v5 13/16] mm: memcontrol: reuse memory cgroup ID for kmem ID Muchun Song
2021-12-20  9:27   ` Mika Penttilä
2022-01-05 17:03   ` Michal Koutný
2022-01-06  3:34     ` Muchun Song
2021-12-20  8:56 ` [PATCH v5 14/16] mm: memcontrol: fix cannot alloc the maximum memcg ID Muchun Song
2021-12-20  8:56 ` [PATCH v5 15/16] mm: list_lru: rename list_lru_per_memcg to list_lru_memcg Muchun Song
2021-12-20  8:56 ` [PATCH v5 16/16] mm: memcontrol: rename memcg_cache_id to memcg_kmem_id Muchun Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YdeDym9IUghnagrK@carbon.dhcp.thefacebook.com \
    --to=guro@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexs@kernel.org \
    --cc=anna.schumaker@netapp.com \
    --cc=chao@kernel.org \
    --cc=david@fromorbit.com \
    --cc=duanxiongchun@bytedance.com \
    --cc=fam.zheng@bytedance.com \
    --cc=hannes@cmpxchg.org \
    --cc=jaegeuk@kernel.org \
    --cc=kari.argillander@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=richard.weiyang@gmail.com \
    --cc=shakeelb@google.com \
    --cc=shy828301@gmail.com \
    --cc=smuchun@gmail.com \
    --cc=songmuchun@bytedance.com \
    --cc=trond.myklebust@hammerspace.com \
    --cc=vdavydov.dev@gmail.com \
    --cc=willy@infradead.org \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.