All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vladimir Davydov <vdavydov@tarantool.org>
To: Tejun Heo <tj@kernel.org>
Cc: cl@linux.com, penberg@kernel.org, rientjes@google.com,
	iamjoonsoo.kim@lge.com, akpm@linux-foundation.org, jsvana@fb.com,
	hannes@cmpxchg.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, cgroups@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH 6/9] slab: don't put memcg caches on slab_caches list
Date: Sat, 14 Jan 2017 16:39:18 +0300	[thread overview]
Message-ID: <20170114133918.GE2668@esperanza> (raw)
In-Reply-To: <20170114055449.11044-7-tj@kernel.org>

On Sat, Jan 14, 2017 at 12:54:46AM -0500, Tejun Heo wrote:
> With kmem cgroup support enabled, kmem_caches can be created and
> destroyed frequently and a great number of near empty kmem_caches can
> accumulate if there are a lot of transient cgroups and the system is
> not under memory pressure.  When memory reclaim starts under such
> conditions, it can lead to consecutive deactivation and destruction of
> many kmem_caches, easily hundreds of thousands on moderately large
> systems, exposing scalability issues in the current slab management
> code.  This is one of the patches to address the issue.
> 
> slab_caches currently lists all caches including root and memcg ones.
> This is the only data structure which lists the root caches and
> iterating root caches can only be done by walking the list while
> skipping over memcg caches.  As there can be a huge number of memcg
> caches, this can become very expensive.
> 
> This also can make /proc/slabinfo behave very badly.  seq_file
> processes reads in 4k chunks and seeks to the previous Nth position on
> slab_caches list to resume after each chunk.  With a lot of memcg
> cache churns on the list, reading /proc/slabinfo can become very slow
> and its content often ends up with duplicate and/or missing entries.
> 
> As the previous patch made it unnecessary to walk slab_caches to
> iterate memcg-specific caches, there is no reason to keep memcg caches
> on the list.  This patch makes slab_caches include only the root
> caches.  As this makes slab_cache->list unused for memcg caches,
> ->memcg_params.children_node is removed and ->list is used instead.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reported-by: Jay Vana <jsvana@fb.com>
> Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> ---
>  include/linux/slab.h |  3 ---
>  mm/slab.h            |  3 +--
>  mm/slab_common.c     | 58 +++++++++++++++++++++++++---------------------------
>  3 files changed, 29 insertions(+), 35 deletions(-)

IIRC the slab_caches list is also used on cpu/mem online/offline, so you
have to patch those places to ensure that memcg caches get updated too.
Other than that the patch looks good to me.

WARNING: multiple messages have this Message-ID (diff)
From: Vladimir Davydov <vdavydov@tarantool.org>
To: Tejun Heo <tj@kernel.org>
Cc: cl@linux.com, penberg@kernel.org, rientjes@google.com,
	iamjoonsoo.kim@lge.com, akpm@linux-foundation.org, jsvana@fb.com,
	hannes@cmpxchg.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, cgroups@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH 6/9] slab: don't put memcg caches on slab_caches list
Date: Sat, 14 Jan 2017 16:39:18 +0300	[thread overview]
Message-ID: <20170114133918.GE2668@esperanza> (raw)
In-Reply-To: <20170114055449.11044-7-tj@kernel.org>

On Sat, Jan 14, 2017 at 12:54:46AM -0500, Tejun Heo wrote:
> With kmem cgroup support enabled, kmem_caches can be created and
> destroyed frequently and a great number of near empty kmem_caches can
> accumulate if there are a lot of transient cgroups and the system is
> not under memory pressure.  When memory reclaim starts under such
> conditions, it can lead to consecutive deactivation and destruction of
> many kmem_caches, easily hundreds of thousands on moderately large
> systems, exposing scalability issues in the current slab management
> code.  This is one of the patches to address the issue.
> 
> slab_caches currently lists all caches including root and memcg ones.
> This is the only data structure which lists the root caches and
> iterating root caches can only be done by walking the list while
> skipping over memcg caches.  As there can be a huge number of memcg
> caches, this can become very expensive.
> 
> This also can make /proc/slabinfo behave very badly.  seq_file
> processes reads in 4k chunks and seeks to the previous Nth position on
> slab_caches list to resume after each chunk.  With a lot of memcg
> cache churns on the list, reading /proc/slabinfo can become very slow
> and its content often ends up with duplicate and/or missing entries.
> 
> As the previous patch made it unnecessary to walk slab_caches to
> iterate memcg-specific caches, there is no reason to keep memcg caches
> on the list.  This patch makes slab_caches include only the root
> caches.  As this makes slab_cache->list unused for memcg caches,
> ->memcg_params.children_node is removed and ->list is used instead.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reported-by: Jay Vana <jsvana@fb.com>
> Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> ---
>  include/linux/slab.h |  3 ---
>  mm/slab.h            |  3 +--
>  mm/slab_common.c     | 58 +++++++++++++++++++++++++---------------------------
>  3 files changed, 29 insertions(+), 35 deletions(-)

IIRC the slab_caches list is also used on cpu/mem online/offline, so you
have to patch those places to ensure that memcg caches get updated too.
Other than that the patch looks good to me.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-01-14 13:40 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-14  5:54 [PATCHSET] slab: make memcg slab destruction scalable Tejun Heo
2017-01-14  5:54 ` Tejun Heo
2017-01-14  5:54 ` Tejun Heo
2017-01-14  5:54 ` [PATCH 1/9] Revert "slub: move synchronize_sched out of slab_mutex on shrink" Tejun Heo
2017-01-14  5:54   ` Tejun Heo
2017-01-14  5:54 ` [PATCH 2/9] slab: remove synchronous rcu_barrier() call in memcg cache release path Tejun Heo
2017-01-14  5:54   ` Tejun Heo
2017-01-14 13:19   ` Vladimir Davydov
2017-01-14 13:19     ` Vladimir Davydov
2017-01-14 15:19     ` Tejun Heo
2017-01-14 15:19       ` Tejun Heo
2017-01-17  0:07       ` Joonsoo Kim
2017-01-17  0:07         ` Joonsoo Kim
2017-01-17  0:07         ` Joonsoo Kim
2017-01-17 16:37         ` Tejun Heo
2017-01-17 16:37           ` Tejun Heo
2017-01-17 17:02           ` Tejun Heo
2017-01-17 17:02             ` Tejun Heo
2017-01-14  5:54 ` [PATCH 3/9] slab: simplify shutdown_memcg_caches() Tejun Heo
2017-01-14  5:54   ` Tejun Heo
2017-01-14 13:27   ` Vladimir Davydov
2017-01-14 13:27     ` Vladimir Davydov
2017-01-14 15:38     ` Tejun Heo
2017-01-14 15:38       ` Tejun Heo
2017-01-14 15:53       ` Tejun Heo
2017-01-14 15:53         ` Tejun Heo
2017-01-14  5:54 ` [PATCH 4/9] slab: reorganize memcg_cache_params Tejun Heo
2017-01-14  5:54   ` Tejun Heo
2017-01-14 13:30   ` Vladimir Davydov
2017-01-14 13:30     ` Vladimir Davydov
2017-01-14  5:54 ` [PATCH 5/9] slab: link memcg kmem_caches on their associated memory cgroup Tejun Heo
2017-01-14  5:54   ` Tejun Heo
2017-01-14 13:33   ` Vladimir Davydov
2017-01-14 13:33     ` Vladimir Davydov
2017-01-14  5:54 ` [PATCH 6/9] slab: don't put memcg caches on slab_caches list Tejun Heo
2017-01-14  5:54   ` Tejun Heo
2017-01-14 13:39   ` Vladimir Davydov [this message]
2017-01-14 13:39     ` Vladimir Davydov
2017-01-14 15:39     ` Tejun Heo
2017-01-14 15:39       ` Tejun Heo
2017-01-14 15:39       ` Tejun Heo
2017-01-14  5:54 ` [PATCH 7/9] slab: introduce __kmemcg_cache_deactivate() Tejun Heo
2017-01-14  5:54   ` Tejun Heo
2017-01-14 13:42   ` Vladimir Davydov
2017-01-14 13:42     ` Vladimir Davydov
2017-01-14 15:39     ` Tejun Heo
2017-01-14 15:39       ` Tejun Heo
2017-01-14  5:54 ` [PATCH 8/9] slab: remove synchronous synchronize_sched() from memcg cache deactivation path Tejun Heo
2017-01-14  5:54   ` Tejun Heo
2017-01-14 13:57   ` Vladimir Davydov
2017-01-14 13:57     ` Vladimir Davydov
2017-01-14 13:57     ` Vladimir Davydov
2017-01-14  5:54 ` [PATCH 9/9] slab: remove slub sysfs interface files early for empty memcg caches Tejun Heo
2017-01-14  5:54   ` Tejun Heo
2017-01-14 14:00   ` Vladimir Davydov
2017-01-14 14:00     ` Vladimir Davydov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170114133918.GE2668@esperanza \
    --to=vdavydov@tarantool.org \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=cl@linux.com \
    --cc=hannes@cmpxchg.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=jsvana@fb.com \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.