From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1751286AbdANN5k (ORCPT <rfc822;w@1wt.eu>);
        Sat, 14 Jan 2017 08:57:40 -0500
Received: from smtp39.i.mail.ru ([94.100.177.99]:39719 "EHLO smtp39.i.mail.ru"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1750879AbdANN5i (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Sat, 14 Jan 2017 08:57:38 -0500
Date: Sat, 14 Jan 2017 16:57:27 +0300
From: Vladimir Davydov <vdavydov@tarantool.org>
To: Tejun Heo <tj@kernel.org>
Cc: cl@linux.com, penberg@kernel.org, rientjes@google.com,
        iamjoonsoo.kim@lge.com, akpm@linux-foundation.org, jsvana@fb.com,
        hannes@cmpxchg.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org,
        cgroups@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH 8/9] slab: remove synchronous synchronize_sched() from
 memcg cache deactivation path
Message-ID: <20170114135727.GG2668@esperanza>
References: <20170114055449.11044-1-tj@kernel.org>
 <20170114055449.11044-9-tj@kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170114055449.11044-9-tj@kernel.org>
Authentication-Results: smtp39.i.mail.ru; auth=pass smtp.auth=vdavydov@tarantool.org smtp.mailfrom=vdavydov@tarantool.org
X-E1FCDC63: 26CF7C04F9023CD340CE0973549D48EBE8A5C135F8BDED31
X-E1FCDC64: A5D9DE20CE205FD2A728BBEE1DE2DF5D0A4E3464D7493012629A9FF33EBB83B5
X-Mailru-Sender: AA5F055C295B4E99DFBD641EBD038254C8EEB064954538AA6A10FC424E5F3C0F662DB57B0F54CB2FFEDCCBD3DDE7F493
X-Mras: OK
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Sat, Jan 14, 2017 at 12:54:48AM -0500, Tejun Heo wrote:
> With kmem cgroup support enabled, kmem_caches can be created and
> destroyed frequently and a great number of near empty kmem_caches can
> accumulate if there are a lot of transient cgroups and the system is
> not under memory pressure.  When memory reclaim starts under such
> conditions, it can lead to consecutive deactivation and destruction of
> many kmem_caches, easily hundreds of thousands on moderately large
> systems, exposing scalability issues in the current slab management
> code.  This is one of the patches to address the issue.
> 
> slub uses synchronize_sched() to deactivate a memcg cache.
> synchronize_sched() is an expensive and slow operation and doesn't
> scale when a huge number of caches are destroyed back-to-back.  While
> there used to be a simple batching mechanism, the batching was too
> restricted to be helpful.
> 
> This patch implements slab_deactivate_memcg_cache_rcu_sched() which
> slub can use to schedule sched RCU callback instead of performing
> synchronize_sched() synchronously while holding cgroup_mutex.  While
> this adds online cpus, mems and slab_mutex operations, operating on
> these locks back-to-back from the same kworker, which is what's gonna
> happen when there are many to deactivate, isn't expensive at all and
> this gets rid of the scalability problem completely.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reported-by: Jay Vana <jsvana@fb.com>
> Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>

I don't think there's much point in having the infrastructure for this
in slab_common.c, as only SLUB needs it, but it isn't a show stopper.

Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>