From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB3AAC433DB for ; Wed, 13 Jan 2021 13:16:54 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 73B0523432 for ; Wed, 13 Jan 2021 13:16:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 73B0523432 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A1E858D0054; Wed, 13 Jan 2021 08:16:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 97CE38D002E; Wed, 13 Jan 2021 08:16:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 894B08D0054; Wed, 13 Jan 2021 08:16:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0007.hostedemail.com [216.40.44.7]) by kanga.kvack.org (Postfix) with ESMTP id 6C3D08D002E for ; Wed, 13 Jan 2021 08:16:49 -0500 (EST) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 36555181AEF21 for ; Wed, 13 Jan 2021 13:16:49 +0000 (UTC) X-FDA: 77700801738.22.swing71_030f1f22751e Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin22.hostedemail.com (Postfix) with ESMTP id EAF981803AA6D for ; Wed, 13 Jan 2021 13:16:46 +0000 (UTC) X-HE-Tag: swing71_030f1f22751e X-Filterd-Recvd-Size: 5018 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf07.hostedemail.com (Postfix) with ESMTP for ; Wed, 13 Jan 2021 13:16:46 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id E0830B804; Wed, 13 Jan 2021 13:16:44 +0000 (UTC) From: Vlastimil Babka To: Andrew Morton , linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Vladimir Davydov , Qian Cai , David Hildenbrand , Michal Hocko , Vlastimil Babka Subject: [PATCH 1/3] mm, slub: stop freeing kmem_cache_node structures on node offline Date: Wed, 13 Jan 2021 14:16:32 +0100 Message-Id: <20210113131634.3671-2-vbabka@suse.cz> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20210113131634.3671-1-vbabka@suse.cz> References: <20210113131634.3671-1-vbabka@suse.cz> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Commit e4f8e513c3d3 ("mm/slub: fix a deadlock in show_slab_objects()") ha= s fixed a problematic locking order by removing the memory hotplug lock get/put_online_mems() from show_slab_objects(). During the discussion, it= was argued [1] that this is OK, because existing slabs on the node would prev= ent a hotremove to proceed. That's true, but per-node kmem_cache_node structures are not necessarily allocated on the same node and may exist even without actual slab pages on the same node. Any path that uses get_node() directly or via for_each_kmem_cache_node() (such as show_slab_objects()) can race with freeing of kmem_cache_node even with the !NULL check, resulting in use-after-free. To that end, commit e4f8e513c3d3 argues in a comment that: * We don't really need mem_hotplug_lock (to hold off * slab_mem_going_offline_callback) here because slab's memory hot * unplug code doesn't destroy the kmem_cache->node[] data. While it's true that slab_mem_going_offline_callback() doesn't free the kmem_cache_node, the later callback slab_mem_offline_callback() actua= lly does, so the race and use-after-free exists. Not just for show_slab_objec= ts() after commit e4f8e513c3d3, but also many other places that are not under slab_mutex. And adding slab_mutex locking or other synchronization to SLU= B paths such as get_any_partial() would be bad for performance and error-pr= one. The easiest solution is therefore to make the abovementioned comment true= and stop freeing the kmem_cache_node structures, accepting some wasted memory= in the full memory node removal scenario. Analogically we also don't free hotremoved pgdat as mentioned in [1], nor the similar per-node structures= in SLAB. Importantly this approach will not block the hotremove, as generall= y such nodes should be movable in order to succeed hotremove in the first place,= and thus the GFP_KERNEL allocated kmem_cache_node will come from elsewhere. [1] https://lore.kernel.org/linux-mm/20190924151147.GB23050@dhcp22.suse.c= z/ Signed-off-by: Vlastimil Babka --- mm/slub.c | 28 +++++++++++----------------- 1 file changed, 11 insertions(+), 17 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index daf5ca1755d5..0d01a893cb64 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -4286,8 +4286,6 @@ static int slab_mem_going_offline_callback(void *ar= g) =20 static void slab_mem_offline_callback(void *arg) { - struct kmem_cache_node *n; - struct kmem_cache *s; struct memory_notify *marg =3D arg; int offline_node; =20 @@ -4301,21 +4299,11 @@ static void slab_mem_offline_callback(void *arg) return; =20 mutex_lock(&slab_mutex); - list_for_each_entry(s, &slab_caches, list) { - n =3D get_node(s, offline_node); - if (n) { - /* - * if n->nr_slabs > 0, slabs still exist on the node - * that is going down. We were unable to free them, - * and offline_pages() function shouldn't call this - * callback. So, we must fail. - */ - BUG_ON(slabs_node(s, offline_node)); - - s->node[offline_node] =3D NULL; - kmem_cache_free(kmem_cache_node, n); - } - } + /* + * We no longer free kmem_cache_node structures here, as it would be + * racy with all get_node() users, and infeasible to protect them with + * slab_mutex. + */ mutex_unlock(&slab_mutex); } =20 @@ -4341,6 +4329,12 @@ static int slab_mem_going_online_callback(void *ar= g) */ mutex_lock(&slab_mutex); list_for_each_entry(s, &slab_caches, list) { + /* + * The structure may already exist if the node was previously + * onlined and offlined. + */ + if (get_node(s, nid)) + continue; /* * XXX: kmem_cache_alloc_node will fallback to other nodes * since memory is not yet available from the node that --=20 2.29.2