From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E031FC4338F for ; Thu, 29 Jul 2021 13:22:58 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 903F960184 for ; Thu, 29 Jul 2021 13:22:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 903F960184 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id DC80B90000D; Thu, 29 Jul 2021 09:21:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 72AB98D0011; Thu, 29 Jul 2021 09:21:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1B5CD8D0011; Thu, 29 Jul 2021 09:21:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0243.hostedemail.com [216.40.44.243]) by kanga.kvack.org (Postfix) with ESMTP id 9E262900007 for ; Thu, 29 Jul 2021 09:21:48 -0400 (EDT) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 43FA126823 for ; Thu, 29 Jul 2021 13:21:48 +0000 (UTC) X-FDA: 78415687896.06.0386D9C Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf19.hostedemail.com (Postfix) with ESMTP id BF996B0000A9 for ; Thu, 29 Jul 2021 13:21:47 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id C00D920049; Thu, 29 Jul 2021 13:21:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1627564906; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JVL14ifb9GhJCMgIIx12FYLC2C+FwRm9/dQwtXzEBts=; b=l7d2vGseywo0LiVNgUitneekxouVMFIxJ54lNCmMbvEuVEPMwsE99P9tarDcEbk8qwtHC3 ZwzOOvxp7K3zqDuVNiImaZJ704QkXiGLABv15hLdhhQAfIDba4bzPehuzT8JHALIhYAoY8 /7+5JDwC4KAHaN+wDYXdXGoawoopbto= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1627564906; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JVL14ifb9GhJCMgIIx12FYLC2C+FwRm9/dQwtXzEBts=; b=8JKz6zd8gGftpAPkI4kuUqL5oDm2K4hJR+gWGSwPBqHun7yvqsCiylBa3cWzPq+d0P+L7D y7RdhMx2xm4lMWAw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 949E513AE9; Thu, 29 Jul 2021 13:21:46 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id uFitI2qrAmF9AwAAMHmgww (envelope-from ); Thu, 29 Jul 2021 13:21:46 +0000 From: Vlastimil Babka To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Christoph Lameter , David Rientjes , Pekka Enberg , Joonsoo Kim Cc: Mike Galbraith , Sebastian Andrzej Siewior , Thomas Gleixner , Mel Gorman , Jesper Dangaard Brouer , Jann Horn , Vlastimil Babka Subject: [PATCH v3 33/35] mm, slub: protect put_cpu_partial() with disabled irqs instead of cmpxchg Date: Thu, 29 Jul 2021 15:21:30 +0200 Message-Id: <20210729132132.19691-34-vbabka@suse.cz> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210729132132.19691-1-vbabka@suse.cz> References: <20210729132132.19691-1-vbabka@suse.cz> MIME-Version: 1.0 Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=l7d2vGse; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=8JKz6zd8; spf=pass (imf19.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.29 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none X-Rspamd-Server: rspam02 X-Stat-Signature: 676ssufpuirbys7h15nwaiwjzhifjkzb X-Rspamd-Queue-Id: BF996B0000A9 X-HE-Tag: 1627564907-173249 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Jann Horn reported [1] the following theoretically possible race: task A: put_cpu_partial() calls preempt_disable() task A: oldpage =3D this_cpu_read(s->cpu_slab->partial) interrupt: kfree() reaches unfreeze_partials() and discards the page task B (on another CPU): reallocates page as page cache task A: reads page->pages and page->pobjects, which are actually halves of the pointer page->lru.prev task B (on another CPU): frees page interrupt: allocates page as SLUB page and places it on the percpu part= ial list task A: this_cpu_cmpxchg() succeeds which would cause page->pages and page->pobjects to end up containing halves of pointers that would then influence when put_cpu_partial() happens and show up in root-only sysfs files. Maybe that's acceptable, I don't know. But there should probably at least be a comment for now to point out that we're reading union fields of a page that might be in a completely different state. Additionally, the this_cpu_cmpxchg() approach in put_cpu_partial() is onl= y safe against s->cpu_slab->partial manipulation in ___slab_alloc() if the latte= r disables irqs, otherwise a __slab_free() in an irq handler could call put_cpu_partial() in the middle of ___slab_alloc() manipulating ->partial and corrupt it. This becomes an issue on RT after a local_lock is introdu= ced in later patch. The fix means taking the local_lock also in put_cpu_parti= al() on RT. After debugging this issue, Mike Galbraith suggested [2] that to avoid different locking schemes on RT and !RT, we can just protect put_cpu_part= ial() with disabled irqs (to be converted to local_lock_irqsave() later) everyw= here. This should be acceptable as it's not a fast path, and moving the actual partial unfreezing outside of the irq disabled section makes it short, an= d with the retry loop gone the code can be also simplified. In addition, the rac= e reported by Jann should no longer be possible. [1] https://lore.kernel.org/lkml/CAG48ez1mvUuXwg0YPH5ANzhQLpbphqk-ZS+jbRz= +H66fvm4FcA@mail.gmail.com/ [2] https://lore.kernel.org/linux-rt-users/e3470ab357b48bccfbd1f5133b9821= 78a7d2befb.camel@gmx.de/ Reported-by: Jann Horn Suggested-by: Mike Galbraith Signed-off-by: Vlastimil Babka --- mm/slub.c | 81 ++++++++++++++++++++++++++++++------------------------- 1 file changed, 44 insertions(+), 37 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 4f7218797603..0fd60d9ca27e 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2002,7 +2002,12 @@ static inline void *acquire_slab(struct kmem_cache= *s, return freelist; } =20 +#ifdef CONFIG_SLUB_CPU_PARTIAL static void put_cpu_partial(struct kmem_cache *s, struct page *page, int= drain); +#else +static inline void put_cpu_partial(struct kmem_cache *s, struct page *pa= ge, + int drain) { } +#endif static inline bool pfmemalloc_match(struct page *page, gfp_t gfpflags); =20 /* @@ -2436,14 +2441,6 @@ static void unfreeze_partials_cpu(struct kmem_cach= e *s, __unfreeze_partials(s, partial_page); } =20 -#else /* CONFIG_SLUB_CPU_PARTIAL */ - -static inline void unfreeze_partials(struct kmem_cache *s) { } -static inline void unfreeze_partials_cpu(struct kmem_cache *s, - struct kmem_cache_cpu *c) { } - -#endif /* CONFIG_SLUB_CPU_PARTIAL */ - /* * Put a page that was just frozen (in __slab_free|get_partial_node) int= o a * partial page slot if available. @@ -2453,46 +2450,56 @@ static inline void unfreeze_partials_cpu(struct k= mem_cache *s, */ static void put_cpu_partial(struct kmem_cache *s, struct page *page, int= drain) { -#ifdef CONFIG_SLUB_CPU_PARTIAL struct page *oldpage; - int pages; - int pobjects; + struct page *page_to_unfreeze =3D NULL; + unsigned long flags; + int pages =3D 0; + int pobjects =3D 0; =20 - preempt_disable(); - do { - pages =3D 0; - pobjects =3D 0; - oldpage =3D this_cpu_read(s->cpu_slab->partial); + local_irq_save(flags); + + oldpage =3D this_cpu_read(s->cpu_slab->partial); =20 - if (oldpage) { + if (oldpage) { + if (drain && pobjects > slub_cpu_partial(s)) { + /* + * Partial array is full. Move the existing set to the + * per node partial list. Postpone the actual unfreezing + * outside of the critical section. + */ + page_to_unfreeze =3D oldpage; + oldpage =3D NULL; + } else { pobjects =3D oldpage->pobjects; pages =3D oldpage->pages; - if (drain && pobjects > slub_cpu_partial(s)) { - /* - * partial array is full. Move the existing - * set to the per node partial list. - */ - unfreeze_partials(s); - oldpage =3D NULL; - pobjects =3D 0; - pages =3D 0; - stat(s, CPU_PARTIAL_DRAIN); - } } + } =20 - pages++; - pobjects +=3D page->objects - page->inuse; + pages++; + pobjects +=3D page->objects - page->inuse; =20 - page->pages =3D pages; - page->pobjects =3D pobjects; - page->next =3D oldpage; + page->pages =3D pages; + page->pobjects =3D pobjects; + page->next =3D oldpage; =20 - } while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page) - !=3D oldpage); - preempt_enable(); -#endif /* CONFIG_SLUB_CPU_PARTIAL */ + this_cpu_write(s->cpu_slab->partial, page); + + local_irq_restore(flags); + + if (page_to_unfreeze) { + __unfreeze_partials(s, page_to_unfreeze); + stat(s, CPU_PARTIAL_DRAIN); + } } =20 +#else /* CONFIG_SLUB_CPU_PARTIAL */ + +static inline void unfreeze_partials(struct kmem_cache *s) { } +static inline void unfreeze_partials_cpu(struct kmem_cache *s, + struct kmem_cache_cpu *c) { } + +#endif /* CONFIG_SLUB_CPU_PARTIAL */ + static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cp= u *c, bool lock) { --=20 2.32.0