From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78558C433DB for ; Wed, 24 Feb 2021 20:02:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3E13664E60 for ; Wed, 24 Feb 2021 20:02:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235136AbhBXUCf (ORCPT ); Wed, 24 Feb 2021 15:02:35 -0500 Received: from mail.kernel.org ([198.145.29.99]:54850 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235102AbhBXUCK (ORCPT ); Wed, 24 Feb 2021 15:02:10 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id B575564F25; Wed, 24 Feb 2021 20:01:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1614196873; bh=rItnaG2cH6d6XPeZ6bkSwlBLsFnuNmEyhSPvslokdxs=; h=Date:From:To:Subject:In-Reply-To:From; b=XCB7c09YqxUX+8JahP3DKX7kv5RFcRZo4w4s6dp1pzHddZCL9MPf7dhJyjPaZ5hO7 1yJy8rbtIyWygniG/RHRIUfWsh4eEODWMEsM6H3rgW6LWwkIdonHxU+48A2dLURwkQ IZKVIqUq3WT6rTQ/b26CBrjrMajm7CV0wIBaw56M= Date: Wed, 24 Feb 2021 12:01:12 -0800 From: Andrew Morton To: akpm@linux-foundation.org, cai@redhat.com, cl@linux.com, david@redhat.com, iamjoonsoo.kim@lge.com, linux-mm@kvack.org, mhocko@kernel.org, mm-commits@vger.kernel.org, penberg@kernel.org, rientjes@google.com, torvalds@linux-foundation.org, vbabka@suse.cz, vdavydov.dev@gmail.com Subject: [patch 019/173] mm, slab, slub: stop taking memory hotplug lock Message-ID: <20210224200112.lvAMjFt4C%akpm@linux-foundation.org> In-Reply-To: <20210224115824.1e289a6895087f10c41dd8d6@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk Reply-To: linux-kernel@vger.kernel.org List-ID: X-Mailing-List: mm-commits@vger.kernel.org From: Vlastimil Babka Subject: mm, slab, slub: stop taking memory hotplug lock Since commit 03afc0e25f7f ("slab: get_online_mems for kmem_cache_{create,destroy,shrink}") we are taking memory hotplug lock for SLAB and SLUB when creating, destroying or shrinking a cache. It is quite a heavy lock and it's best to avoid it if possible, as we had several issues with lockdep complaining about ordering in the past, see e.g. e4f8e513c3d3 ("mm/slub: fix a deadlock in show_slab_objects()"). The problem scenario in 03afc0e25f7f (solved by the memory hotplug lock) can be summarized as follows: while there's slab_mutex synchronizing new kmem cache creation and SLUB's MEM_GOING_ONLINE callback slab_mem_going_online_callback(), we may miss creation of kmem_cache_node for the hotplugged node in the new kmem cache, because the hotplug callback doesn't yet see the new cache, and cache creation in init_kmem_cache_nodes() only inits kmem_cache_node for nodes in the N_NORMAL_MEMORY nodemask, which however may not yet include the new node, as that happens only later after the MEM_GOING_ONLINE callback. Instead of using get/put_online_mems(), the problem can be solved by SLUB maintaining its own nodemask of nodes for which it has allocated the per-node kmem_cache_node structures. This nodemask would generally mirror the N_NORMAL_MEMORY nodemask, but would be updated only in under SLUB's control in its memory hotplug callbacks under the slab_mutex. This patch adds such nodemask and its handling. Commit 03afc0e25f7f mentiones "issues like [the one above]", but there don't appear to be further issues. All the paths (shared for SLAB and SLUB) taking the memory hotplug locks are also taking the slab_mutex, except kmem_cache_shrink() where 03afc0e25f7f replaced slab_mutex with get/put_online_mems(). We however cannot simply restore slab_mutex in kmem_cache_shrink(), as SLUB can enters the function from a write to sysfs 'shrink' file, thus holding kernfs lock, and in kmem_cache_create() the kernfs lock is nested within slab_mutex. But on closer inspection we don't actually need to protect kmem_cache_shrink() from hotplug callbacks: While SLUB's __kmem_cache_shrink() does for_each_kmem_cache_node(), missing a new node added in parallel hotplug is not fatal, and parallel hotremove does not free kmem_cache_node's anymore after the previous patch, so use-after free cannot happen. The per-node shrinking itself is protected by n->list_lock. Same is true for SLAB, and SLOB is no-op. SLAB also doesn't need the memory hotplug locking, which it only gained by 03afc0e25f7f through the shared paths in slab_common.c. Its memory hotplug callbacks are also protected by slab_mutex against races with these paths. The problem of SLUB relying on N_NORMAL_MEMORY doesn't apply to SLAB, as its setup_kmem_cache_nodes relies on N_ONLINE, and the new node is already set there during the MEM_GOING_ONLINE callback, so no special care is needed for SLAB. As such, this patch removes all get/put_online_mems() usage by the slab subsystem. Link: https://lkml.kernel.org/r/20210113131634.3671-3-vbabka@suse.cz Signed-off-by: Vlastimil Babka Cc: Christoph Lameter Cc: David Hildenbrand Cc: David Rientjes Cc: Joonsoo Kim Cc: Michal Hocko Cc: Pekka Enberg Cc: Qian Cai Cc: Vladimir Davydov Signed-off-by: Andrew Morton --- mm/slab_common.c | 8 ++------ mm/slub.c | 28 +++++++++++++++++++++++++--- 2 files changed, 27 insertions(+), 9 deletions(-) --- a/mm/slab_common.c~mm-slab-slub-stop-taking-memory-hotplug-lock +++ a/mm/slab_common.c @@ -310,7 +310,6 @@ kmem_cache_create_usercopy(const char *n int err; get_online_cpus(); - get_online_mems(); mutex_lock(&slab_mutex); @@ -360,7 +359,6 @@ kmem_cache_create_usercopy(const char *n out_unlock: mutex_unlock(&slab_mutex); - put_online_mems(); put_online_cpus(); if (err) { @@ -487,7 +485,6 @@ void kmem_cache_destroy(struct kmem_cach return; get_online_cpus(); - get_online_mems(); mutex_lock(&slab_mutex); @@ -504,7 +501,6 @@ void kmem_cache_destroy(struct kmem_cach out_unlock: mutex_unlock(&slab_mutex); - put_online_mems(); put_online_cpus(); } EXPORT_SYMBOL(kmem_cache_destroy); @@ -523,10 +519,10 @@ int kmem_cache_shrink(struct kmem_cache int ret; get_online_cpus(); - get_online_mems(); + kasan_cache_shrink(cachep); ret = __kmem_cache_shrink(cachep); - put_online_mems(); + put_online_cpus(); return ret; } --- a/mm/slub.c~mm-slab-slub-stop-taking-memory-hotplug-lock +++ a/mm/slub.c @@ -235,6 +235,14 @@ static inline void stat(const struct kme #endif } +/* + * Tracks for which NUMA nodes we have kmem_cache_nodes allocated. + * Corresponds to node_state[N_NORMAL_MEMORY], but can temporarily + * differ during memory hotplug/hotremove operations. + * Protected by slab_mutex. + */ +static nodemask_t slab_nodes; + /******************************************************************** * Core slab cache functions *******************************************************************/ @@ -2678,7 +2686,7 @@ static void *___slab_alloc(struct kmem_c * ignore the node constraint */ if (unlikely(node != NUMA_NO_NODE && - !node_state(node, N_NORMAL_MEMORY))) + !node_isset(node, slab_nodes))) node = NUMA_NO_NODE; goto new_slab; } @@ -2689,7 +2697,7 @@ redo: * same as above but node_match() being false already * implies node != NUMA_NO_NODE */ - if (!node_state(node, N_NORMAL_MEMORY)) { + if (!node_isset(node, slab_nodes)) { node = NUMA_NO_NODE; goto redo; } else { @@ -3592,7 +3600,7 @@ static int init_kmem_cache_nodes(struct { int node; - for_each_node_state(node, N_NORMAL_MEMORY) { + for_each_node_mask(node, slab_nodes) { struct kmem_cache_node *n; if (slab_state == DOWN) { @@ -4286,6 +4294,7 @@ static void slab_mem_offline_callback(vo return; mutex_lock(&slab_mutex); + node_clear(offline_node, slab_nodes); /* * We no longer free kmem_cache_node structures here, as it would be * racy with all get_node() users, and infeasible to protect them with @@ -4335,6 +4344,11 @@ static int slab_mem_going_online_callbac init_kmem_cache_node(n); s->node[nid] = n; } + /* + * Any cache created after this point will also have kmem_cache_node + * initialized for the new node. + */ + node_set(nid, slab_nodes); out: mutex_unlock(&slab_mutex); return ret; @@ -4415,6 +4429,7 @@ void __init kmem_cache_init(void) { static __initdata struct kmem_cache boot_kmem_cache, boot_kmem_cache_node; + int node; if (debug_guardpage_minorder()) slub_max_order = 0; @@ -4422,6 +4437,13 @@ void __init kmem_cache_init(void) kmem_cache_node = &boot_kmem_cache_node; kmem_cache = &boot_kmem_cache; + /* + * Initialize the nodemask for which we will allocate per node + * structures. Here we don't need taking slab_mutex yet. + */ + for_each_node_state(node, N_NORMAL_MEMORY) + node_set(node, slab_nodes); + create_boot_cache(kmem_cache_node, "kmem_cache_node", sizeof(struct kmem_cache_node), SLAB_HWCACHE_ALIGN, 0, 0); _