All of lore.kernel.org
 help / color / mirror / Atom feed
From: Muchun Song <songmuchun@bytedance.com>
To: Matthew Wilcox <willy@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Shakeel Butt <shakeelb@google.com>, Roman Gushchin <guro@fb.com>,
	Yang Shi <shy828301@gmail.com>, Alex Shi <alexs@kernel.org>,
	Wei Yang <richard.weiyang@gmail.com>,
	Dave Chinner <david@fromorbit.com>,
	trond.myklebust@hammerspace.com, anna.schumaker@netapp.com
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	linux-nfs@vger.kernel.org, zhengqi.arch@bytedance.com,
	Xiongchun duan <duanxiongchun@bytedance.com>,
	fam.zheng@bytedance.com
Subject: Re: [PATCH v2 00/21] Optimize list lru memory consumption
Date: Wed, 16 Jun 2021 18:53:09 +0800	[thread overview]
Message-ID: <CAMZfGtUREhSDqurY5E=e-otSvN3LvZSrFX8WvP6zt3kaNgpS8g@mail.gmail.com> (raw)
In-Reply-To: <20210527062148.9361-1-songmuchun@bytedance.com>

I'm very sorry to bother everyone. Ping guys. Any comments on this series?

On Thu, May 27, 2021 at 2:24 PM Muchun Song <songmuchun@bytedance.com> wrote:
>
> In our server, we found a suspected memory leak problem. The kmalloc-32
> consumes more than 6GB of memory. Other kmem_caches consume less than 2GB
> memory.
>
> After our in-depth analysis, the memory consumption of kmalloc-32 slab
> cache is the cause of list_lru_one allocation.
>
>   crash> p memcg_nr_cache_ids
>   memcg_nr_cache_ids = $2 = 24574
>
> memcg_nr_cache_ids is very large and memory consumption of each list_lru
> can be calculated with the following formula.
>
>   num_numa_node * memcg_nr_cache_ids * 32 (kmalloc-32)
>
> There are 4 numa nodes in our system, so each list_lru consumes ~3MB.
>
>   crash> list super_blocks | wc -l
>   952
>
> Every mount will register 2 list lrus, one is for inode, another is for
> dentry. There are 952 super_blocks. So the total memory is 952 * 2 * 3
> MB (~5.6GB). But now the number of memory cgroups is less than 500. So I
> guess more than 12286 memory cgroups have been created on this machine (I
> do not know why there are so many cgroups, it may be a user's bug or
> the user really want to do that). Because memcg_nr_cache_ids has not been
> reduced to a suitable value. This can waste a lot of memory. If we want
> to reduce memcg_nr_cache_ids, we have to reboot the server. This is not
> what we want.
>
> In order to reduce memcg_nr_cache_ids, I had posted a patchset [1] to do
> this. But this did not fundamentally solve the problem.
>
> We currently allocate scope for every memcg to be able to tracked on every
> superblock instantiated in the system, regardless of whether that superblock
> is even accessible to that memcg.
>
> These huge memcg counts come from container hosts where memcgs are confined
> to just a small subset of the total number of superblocks that instantiated
> at any given point in time.
>
> For these systems with huge container counts, list_lru does not need the
> capability of tracking every memcg on every superblock.
>
> What it comes down to is that the list_lru is only needed for a given memcg
> if that memcg is instatiating and freeing objects on a given list_lru.
>
> As Dave said, "Which makes me think we should be moving more towards 'add the
> memcg to the list_lru at the first insert' model rather than 'instantiate
> all at memcg init time just in case'."
>
> This patchset aims to optimize the list lru memory consumption from different
> aspects.
>
> Patch 1-6 are code simplification.
> Patch 7 converts the array from per-memcg per-node to per-memcg
> Patch 8 is code simplification.
> Patch 9-15 let list_lru allocation dynamically.
> Patch 16 is code cleanup.
> Patch 17 use xarray to optimize per memcg pointer array size.
> Patch 18-21 is code simplification.
>
> I had done a easy test to show the optimization. I create 10k memory cgroups
> and mount 10k filesystems in the systems. We use free command to show how many
> memory does the systems comsumes after this operation (There are 2 numa nodes
> in the system).
>
>         +-----------------------+------------------------+
>         |      condition        |   memory consumption   |
>         +-----------------------+------------------------+
>         | without this patchset |        24464 MB        |
>         +-----------------------+------------------------+
>         |     after patch 7     |        21957 MB        | <--------+
>         +-----------------------+------------------------+          |
>         |     after patch 15    |         6895 MB        |          |
>         +-----------------------+------------------------+          |
>         |     after patch 17    |         4367 MB        |          |
>         +-----------------------+------------------------+          |
>                                                                     |
>         The more the number of nodes, the more obvious the effect---+
>
> BTW, there was a recent discussion [2] on the same issue.
>
> [1] https://lore.kernel.org/linux-fsdevel/20210428094949.43579-1-songmuchun@bytedance.com/
> [2] https://lore.kernel.org/linux-fsdevel/20210405054848.GA1077931@in.ibm.com/
>
> Changelog in v2:
>   1. Update Documentation/filesystems/porting.rst suggested by Dave.
>   2. Add a comment above alloc_inode_sb() suggested by Dave.
>   2. Rework some patch's commit log.
>   3. Add patch 18-21.
>
>   Thanks Dave.
>
> Muchun Song (21):
>   mm: list_lru: fix list_lru_count_one() return value
>   mm: memcontrol: remove kmemcg_id reparenting
>   mm: memcontrol: remove the kmem states
>   mm: memcontrol: do it in mem_cgroup_css_online to make the kmem online
>   mm: list_lru: remove lru node locking from memcg_update_list_lru_node
>   mm: list_lru: only add the memcg aware lrus to the list_lrus
>   mm: list_lru: optimize the array of per memcg lists memory consumption
>   mm: list_lru: remove memcg_aware field from struct list_lru
>   mm: introduce kmem_cache_alloc_lru
>   fs: introduce alloc_inode_sb() to allocate filesystems specific inode
>   mm: dcache: use kmem_cache_alloc_lru() to allocate dentry
>   xarray: use kmem_cache_alloc_lru to allocate xa_node
>   mm: workingset: use xas_set_lru() to pass shadow_nodes
>   nfs42: use a specific kmem_cache to allocate nfs4_xattr_entry
>   mm: list_lru: allocate list_lru_one only when needed
>   mm: list_lru: rename memcg_drain_all_list_lrus to
>     memcg_reparent_list_lrus
>   mm: list_lru: replace linear array with xarray
>   mm: memcontrol: reuse memory cgroup ID for kmem ID
>   mm: memcontrol: fix cannot alloc the maximum memcg ID
>   mm: list_lru: rename list_lru_per_memcg to list_lru_memcg
>   mm: memcontrol: rename memcg_cache_id to memcg_kmem_id
>
>  Documentation/filesystems/porting.rst |   5 +
>  drivers/dax/super.c                   |   2 +-
>  fs/9p/vfs_inode.c                     |   2 +-
>  fs/adfs/super.c                       |   2 +-
>  fs/affs/super.c                       |   2 +-
>  fs/afs/super.c                        |   2 +-
>  fs/befs/linuxvfs.c                    |   2 +-
>  fs/bfs/inode.c                        |   2 +-
>  fs/block_dev.c                        |   2 +-
>  fs/btrfs/inode.c                      |   2 +-
>  fs/ceph/inode.c                       |   2 +-
>  fs/cifs/cifsfs.c                      |   2 +-
>  fs/coda/inode.c                       |   2 +-
>  fs/dcache.c                           |   3 +-
>  fs/ecryptfs/super.c                   |   2 +-
>  fs/efs/super.c                        |   2 +-
>  fs/erofs/super.c                      |   2 +-
>  fs/exfat/super.c                      |   2 +-
>  fs/ext2/super.c                       |   2 +-
>  fs/ext4/super.c                       |   2 +-
>  fs/f2fs/super.c                       |   2 +-
>  fs/fat/inode.c                        |   2 +-
>  fs/freevxfs/vxfs_super.c              |   2 +-
>  fs/fuse/inode.c                       |   2 +-
>  fs/gfs2/super.c                       |   2 +-
>  fs/hfs/super.c                        |   2 +-
>  fs/hfsplus/super.c                    |   2 +-
>  fs/hostfs/hostfs_kern.c               |   2 +-
>  fs/hpfs/super.c                       |   2 +-
>  fs/hugetlbfs/inode.c                  |   2 +-
>  fs/inode.c                            |   2 +-
>  fs/isofs/inode.c                      |   2 +-
>  fs/jffs2/super.c                      |   2 +-
>  fs/jfs/super.c                        |   2 +-
>  fs/minix/inode.c                      |   2 +-
>  fs/nfs/inode.c                        |   2 +-
>  fs/nfs/nfs42xattr.c                   |  95 ++++----
>  fs/nilfs2/super.c                     |   2 +-
>  fs/ntfs/inode.c                       |   2 +-
>  fs/ocfs2/dlmfs/dlmfs.c                |   2 +-
>  fs/ocfs2/super.c                      |   2 +-
>  fs/openpromfs/inode.c                 |   2 +-
>  fs/orangefs/super.c                   |   2 +-
>  fs/overlayfs/super.c                  |   2 +-
>  fs/proc/inode.c                       |   2 +-
>  fs/qnx4/inode.c                       |   2 +-
>  fs/qnx6/inode.c                       |   2 +-
>  fs/reiserfs/super.c                   |   2 +-
>  fs/romfs/super.c                      |   2 +-
>  fs/squashfs/super.c                   |   2 +-
>  fs/sysv/inode.c                       |   2 +-
>  fs/ubifs/super.c                      |   2 +-
>  fs/udf/super.c                        |   2 +-
>  fs/ufs/super.c                        |   2 +-
>  fs/vboxsf/super.c                     |   2 +-
>  fs/xfs/xfs_icache.c                   |   3 +-
>  fs/zonefs/super.c                     |   2 +-
>  include/linux/fs.h                    |  11 +
>  include/linux/list_lru.h              |  18 +-
>  include/linux/memcontrol.h            |  48 ++--
>  include/linux/slab.h                  |   4 +
>  include/linux/swap.h                  |   5 +-
>  include/linux/xarray.h                |   9 +-
>  ipc/mqueue.c                          |   2 +-
>  lib/xarray.c                          |  10 +-
>  mm/list_lru.c                         | 447 +++++++++++++++++-----------------
>  mm/memcontrol.c                       | 185 ++------------
>  mm/shmem.c                            |   2 +-
>  mm/slab.c                             |  39 ++-
>  mm/slab.h                             |  17 +-
>  mm/slub.c                             |  42 ++--
>  mm/workingset.c                       |   2 +-
>  net/socket.c                          |   2 +-
>  net/sunrpc/rpc_pipe.c                 |   2 +-
>  74 files changed, 480 insertions(+), 577 deletions(-)
>
> --
> 2.11.0
>

  parent reply	other threads:[~2021-06-16 10:53 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-27  6:21 [PATCH v2 00/21] Optimize list lru memory consumption Muchun Song
2021-05-27  6:21 ` [PATCH v2 01/21] mm: list_lru: fix list_lru_count_one() return value Muchun Song
2021-05-27  6:21 ` [PATCH v2 02/21] mm: memcontrol: remove kmemcg_id reparenting Muchun Song
2021-05-27  6:21 ` [PATCH v2 03/21] mm: memcontrol: remove the kmem states Muchun Song
2021-05-27  6:21 ` [PATCH v2 04/21] mm: memcontrol: do it in mem_cgroup_css_online to make the kmem online Muchun Song
2021-05-27  6:21 ` [PATCH v2 05/21] mm: list_lru: remove lru node locking from memcg_update_list_lru_node Muchun Song
2021-05-27  6:21 ` [PATCH v2 06/21] mm: list_lru: only add the memcg aware lrus to the list_lrus Muchun Song
2021-05-27  6:21 ` [PATCH v2 07/21] mm: list_lru: optimize the array of per memcg lists memory consumption Muchun Song
2021-05-27  6:21 ` [PATCH v2 08/21] mm: list_lru: remove memcg_aware field from struct list_lru Muchun Song
2021-05-27  6:21 ` [PATCH v2 09/21] mm: introduce kmem_cache_alloc_lru Muchun Song
2021-05-27  6:21 ` [PATCH v2 10/21] fs: introduce alloc_inode_sb() to allocate filesystems specific inode Muchun Song
2021-05-27  6:21 ` [PATCH v2 11/21] mm: dcache: use kmem_cache_alloc_lru() to allocate dentry Muchun Song
2021-05-27  6:21 ` [PATCH v2 12/21] xarray: use kmem_cache_alloc_lru to allocate xa_node Muchun Song
2021-05-27  6:21 ` [PATCH v2 13/21] mm: workingset: use xas_set_lru() to pass shadow_nodes Muchun Song
2021-05-27  6:21 ` [PATCH v2 14/21] nfs42: use a specific kmem_cache to allocate nfs4_xattr_entry Muchun Song
2021-05-27  6:21 ` [PATCH v2 15/21] mm: list_lru: allocate list_lru_one only when needed Muchun Song
2021-05-27  6:21 ` [PATCH v2 16/21] mm: list_lru: rename memcg_drain_all_list_lrus to memcg_reparent_list_lrus Muchun Song
2021-05-27  6:21 ` [PATCH v2 17/21] mm: list_lru: replace linear array with xarray Muchun Song
2021-05-27 12:07   ` Matthew Wilcox
2021-05-28  3:43     ` [External] " Muchun Song
2021-05-28  3:43       ` Muchun Song
2021-05-28  8:04       ` Muchun Song
2021-05-28  8:04         ` Muchun Song
2021-05-27  6:21 ` [PATCH v2 18/21] mm: memcontrol: reuse memory cgroup ID for kmem ID Muchun Song
2021-05-27  6:21 ` [PATCH v2 19/21] mm: memcontrol: fix cannot alloc the maximum memcg ID Muchun Song
2021-05-27  6:21 ` [PATCH v2 20/21] mm: list_lru: rename list_lru_per_memcg to list_lru_memcg Muchun Song
2021-05-27  6:21 ` [PATCH v2 21/21] mm: memcontrol: rename memcg_cache_id to memcg_kmem_id Muchun Song
2021-06-16 10:53 ` Muchun Song [this message]
2021-06-16 10:53   ` [PATCH v2 00/21] Optimize list lru memory consumption Muchun Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAMZfGtUREhSDqurY5E=e-otSvN3LvZSrFX8WvP6zt3kaNgpS8g@mail.gmail.com' \
    --to=songmuchun@bytedance.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexs@kernel.org \
    --cc=anna.schumaker@netapp.com \
    --cc=david@fromorbit.com \
    --cc=duanxiongchun@bytedance.com \
    --cc=fam.zheng@bytedance.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=richard.weiyang@gmail.com \
    --cc=shakeelb@google.com \
    --cc=shy828301@gmail.com \
    --cc=trond.myklebust@hammerspace.com \
    --cc=vdavydov.dev@gmail.com \
    --cc=willy@infradead.org \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.