From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 891CDC4338F for ; Tue, 10 Aug 2021 20:31:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1FAF960F55 for ; Tue, 10 Aug 2021 20:31:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 1FAF960F55 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 9346B6B0071; Tue, 10 Aug 2021 16:31:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8E55C6B0072; Tue, 10 Aug 2021 16:31:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7AC0C6B0073; Tue, 10 Aug 2021 16:31:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0130.hostedemail.com [216.40.44.130]) by kanga.kvack.org (Postfix) with ESMTP id 5E7FE6B0071 for ; Tue, 10 Aug 2021 16:31:27 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id E62E1824999B for ; Tue, 10 Aug 2021 20:31:26 +0000 (UTC) X-FDA: 78460316172.18.A13F813 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf20.hostedemail.com (Postfix) with ESMTP id 7F691D0049AB for ; Tue, 10 Aug 2021 20:31:26 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 0B810606A5; Tue, 10 Aug 2021 20:31:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1628627484; bh=C2rMNc9e2efv5j9d1NIVrZ2ghfDUBN/t3clUU7g6QbY=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=VvI4AB5xd8KciuWEbLlrr7918D0Q5BpbpAEqjiHNWMrcbAZ5YLxx58qhd9bw2c5/L 7nym+05flY5XuWb0/LgcJSrUlW0pv4FG5ByDENUazW/r3hHe5ev9naq7mxRnZFN4ay RnrlApAK3suV5Hd/EzfOL1ovY+NZmVYlmDkyL7bHB+L9D2ijOKT0ixle+4pw28XH/t kevouw8jNsGopm0Vx7KQ2bLtVqHA/zDEQwGRLCJN/kyohu+8tYZ4s4HiNtjbyPQOSS GPG1v3/+cC4dLZF1eh6lyLoBRk1n90f7X3N9d3QHf3UWLmxQwA7FiSd2+t8B6YMqdy gv5qfgI05A1Fw== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id CD0175C048D; Tue, 10 Aug 2021 13:31:23 -0700 (PDT) Date: Tue, 10 Aug 2021 13:31:23 -0700 From: "Paul E. McKenney" To: Mike Galbraith Cc: Vlastimil Babka , Qian Cai , Andrew Morton , Christoph Lameter , David Rientjes , Pekka Enberg , Joonsoo Kim , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Sebastian Andrzej Siewior , Thomas Gleixner , Mel Gorman , Jesper Dangaard Brouer , Jann Horn Subject: Re: [PATCH v4 29/35] mm: slub: Move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context Message-ID: <20210810203123.GB190765@paulmck-ThinkPad-P17-Gen-1> Reply-To: paulmck@kernel.org References: <20210805152000.12817-1-vbabka@suse.cz> <20210805152000.12817-30-vbabka@suse.cz> <0b36128c-3e12-77df-85fe-a153a714569b@quicinc.com> <2eb3cf340716c40f03a0a342ab40219b3d1de195.camel@gmx.de> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <2eb3cf340716c40f03a0a342ab40219b3d1de195.camel@gmx.de> X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 7F691D0049AB Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=VvI4AB5x; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf20.hostedemail.com: domain of "SRS0=bEI5=NB=paulmck-ThinkPad-P17-Gen-1.home=paulmck@kernel.org" designates 198.145.29.99 as permitted sender) smtp.mailfrom="SRS0=bEI5=NB=paulmck-ThinkPad-P17-Gen-1.home=paulmck@kernel.org" X-Stat-Signature: o4i4jnoddohfbapj7wmbq31erke8qa1b X-HE-Tag: 1628627486-22523 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Aug 10, 2021 at 01:47:42PM +0200, Mike Galbraith wrote: > On Tue, 2021-08-10 at 11:03 +0200, Vlastimil Babka wrote: > > On 8/9/21 3:41 PM, Qian Cai wrote: > > > > =A0 > > > > +static DEFINE_MUTEX(flush_lock); > > > > +static DEFINE_PER_CPU(struct slub_flush_work, slub_flush); > > > > + > > > > =A0static void flush_all(struct kmem_cache *s) > > > > =A0{ > > > > -=A0=A0=A0=A0=A0=A0=A0on_each_cpu_cond(has_cpu_slab, flush_cpu_sl= ab, s, 1); > > > > +=A0=A0=A0=A0=A0=A0=A0struct slub_flush_work *sfw; > > > > +=A0=A0=A0=A0=A0=A0=A0unsigned int cpu; > > > > + > > > > +=A0=A0=A0=A0=A0=A0=A0mutex_lock(&flush_lock); > > > > > > Vlastimil, taking the lock here could trigger a warning during memo= ry offline/online due to the locking order: > > > > > > slab_mutex -> flush_lock > > > > > > [=A0=A0 91.374541] WARNING: possible circular locking dependency de= tected > > > [=A0=A0 91.381411] 5.14.0-rc5-next-20210809+ #84 Not tainted > > > [=A0=A0 91.387149] ------------------------------------------------= ------ > > > [=A0=A0 91.394016] lsbug/1523 is trying to acquire lock: > > > [=A0=A0 91.399406] ffff800018e76530 (flush_lock){+.+.}-{3:3}, at: f= lush_all+0x50/0x1c8 > > > [=A0=A0 91.407425] > > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 but task is already hold= ing lock: > > > [=A0=A0 91.414638] ffff800018e48468 (slab_mutex){+.+.}-{3:3}, at: s= lab_memory_callback+0x44/0x280 > > > [=A0=A0 91.423603] > > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 which lock already depen= ds on the new lock. > > > > > > > OK, managed to reproduce in qemu and this fixes it for me on top of > > next-20210809. Could you test as well, as your testing might be more > > comprehensive? I will format is as a fixup for the proper patch in th= e series then. >=20 > As it appeared it should, moving cpu_hotplug_lock outside slab_mutex in > kmem_cache_destroy() on top of that silenced the cpu offline gripe. And this one got rid of the remainder of the deadlock, but gets me the splat shown at the end of this message. So some sort of middle ground may be needed. (Same reproducer as in my previous reply to Vlastimil.) Thanx, Paul > --- > mm/slab_common.c | 2 ++ > mm/slub.c | 2 +- > 2 files changed, 3 insertions(+), 1 deletion(-) >=20 > --- a/mm/slab_common.c > +++ b/mm/slab_common.c > @@ -502,6 +502,7 @@ void kmem_cache_destroy(struct kmem_cach > if (unlikely(!s)) > return; >=20 > + cpus_read_lock(); > mutex_lock(&slab_mutex); >=20 > s->refcount--; > @@ -516,6 +517,7 @@ void kmem_cache_destroy(struct kmem_cach > } > out_unlock: > mutex_unlock(&slab_mutex); > + cpus_read_unlock(); > } > EXPORT_SYMBOL(kmem_cache_destroy); >=20 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -4234,7 +4234,7 @@ int __kmem_cache_shutdown(struct kmem_ca > int node; > struct kmem_cache_node *n; >=20 > - flush_all(s); > + flush_all_cpus_locked(s); > /* Attempt to free all objects */ > for_each_kmem_cache_node(s, node, n) { > free_partial(s, n); [ 602.539109] ------------[ cut here ]------------ [ 602.539804] WARNING: CPU: 3 PID: 88 at kernel/cpu.c:335 lockdep_assert= _cpus_held+0x29/0x30 [ 602.540940] Modules linked in: [ 602.541377] CPU: 3 PID: 88 Comm: torture_shutdow Not tainted 5.14.0-rc= 5-next-20210809+ #3299 [ 602.542536] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.13.0-2.module_e= l8.5.0+746+bbd5d70c 04/01/2014 [ 602.543786] RIP: 0010:lockdep_assert_cpus_held+0x29/0x30 [ 602.544524] Code: 00 83 3d 4d f1 a4 01 01 76 0a 8b 05 4d 23 a5 01 85 c= 0 75 01 c3 be ff ff ff ff 48 c7 c7 b0 86 66 a3 e8 9b 05 c9 00 85 c0 75 ea= <0f> 0b c3 0f 1f 40 00 41 57 41 89 ff 41 56 4d 89 c6 41 55 49 89 cd [ 602.547051] RSP: 0000:ffffb382802efdb8 EFLAGS: 00010246 [ 602.547783] RAX: 0000000000000000 RBX: ffffa23301a44000 RCX: 000000000= 0000001 [ 602.548764] RDX: 0000000000000001 RSI: ffffffffa335f5c0 RDI: ffffffffa= 33adbbf[ 602.549747] RBP: ffffa23301a44000 R08: ffffa23302810000 R09: 97= 4cf0ba5c48ad3c [ 602.550727] R10: ffffb382802efe78 R11: 0000000000000001 R12: ffffa2330= 1a44000[ 602.551709] R13: 00000000000249c0 R14: 00000000ffffffff R15: 00= 00000fffffffe0 [ 602.552694] FS: 0000000000000000(0000) GS:ffffa2331f580000(0000) knlG= S:0000000000000000 [ 602.553805] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 602.554606] CR2: 0000000000000000 CR3: 0000000017222000 CR4: 000000000= 00006e0 [ 602.555601] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 000000000= 0000000 [ 602.556590] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 000000000= 0000400 [ 602.557585] Call Trace: [ 602.557927] flush_all_cpus_locked+0x29/0x140 [ 602.558535] __kmem_cache_shutdown+0x26/0x200 [ 602.559145] ? lock_is_held_type+0xd6/0x130 [ 602.559739] ? torture_onoff+0x260/0x260 [ 602.560284] kmem_cache_destroy+0x38/0x110 [ 602.560859] rcu_torture_cleanup.cold.36+0x192/0x421 [ 602.561539] ? wait_woken+0x60/0x60 [ 602.562035] ? torture_onoff+0x260/0x260 [ 602.562591] torture_shutdown+0xdd/0x1c0 [ 602.563131] kthread+0x132/0x160 [ 602.563592] ? set_kthread_struct+0x40/0x40 [ 602.564172] ret_from_fork+0x22/0x30 [ 602.564696] irq event stamp: 1307 [ 602.565161] hardirqs last enabled at (1315): [] __u= p_console_sem+0x4d/0x50 [ 602.566321] hardirqs last disabled at (1324): [] __u= p_console_sem+0x32/0x50 [ 602.567479] softirqs last enabled at (1304): [] __d= o_softirq+0x311/0x473 [ 602.568616] softirqs last disabled at (1299): [] irq= _exit_rcu+0xe8/0xf0 [ 602.569735] ---[ end trace 26fd643e1df331c9 ]---