All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: paulmck@kernel.org, Mike Galbraith <efault@gmx.de>
Cc: Qian Cai <quic_qiancai@quicinc.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christoph Lameter <cl@linux.com>,
	David Rientjes <rientjes@google.com>,
	Pekka Enberg <penberg@kernel.org>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	Mel Gorman <mgorman@techsingularity.net>,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	Jann Horn <jannh@google.com>
Subject: Re: [PATCH v4 29/35] mm: slub: Move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context
Date: Wed, 11 Aug 2021 00:36:00 +0200	[thread overview]
Message-ID: <a8308403-ae87-28ce-1aa3-66e1a436b144@suse.cz> (raw)
In-Reply-To: <20210810203123.GB190765@paulmck-ThinkPad-P17-Gen-1>

On 8/10/2021 10:31 PM, Paul E. McKenney wrote:
> On Tue, Aug 10, 2021 at 01:47:42PM +0200, Mike Galbraith wrote:
>> On Tue, 2021-08-10 at 11:03 +0200, Vlastimil Babka wrote:
>>> On 8/9/21 3:41 PM, Qian Cai wrote:
>>>>>  
>>>>> +static DEFINE_MUTEX(flush_lock);
>>>>> +static DEFINE_PER_CPU(struct slub_flush_work, slub_flush);
>>>>> +
>>>>>  static void flush_all(struct kmem_cache *s)
>>>>>  {
>>>>> -       on_each_cpu_cond(has_cpu_slab, flush_cpu_slab, s, 1);
>>>>> +       struct slub_flush_work *sfw;
>>>>> +       unsigned int cpu;
>>>>> +
>>>>> +       mutex_lock(&flush_lock);
>>>>
>>>> Vlastimil, taking the lock here could trigger a warning during memory offline/online due to the locking order:
>>>>
>>>> slab_mutex -> flush_lock
>>>>
>>>> [   91.374541] WARNING: possible circular locking dependency detected
>>>> [   91.381411] 5.14.0-rc5-next-20210809+ #84 Not tainted
>>>> [   91.387149] ------------------------------------------------------
>>>> [   91.394016] lsbug/1523 is trying to acquire lock:
>>>> [   91.399406] ffff800018e76530 (flush_lock){+.+.}-{3:3}, at: flush_all+0x50/0x1c8
>>>> [   91.407425]
>>>>                but task is already holding lock:
>>>> [   91.414638] ffff800018e48468 (slab_mutex){+.+.}-{3:3}, at: slab_memory_callback+0x44/0x280
>>>> [   91.423603]
>>>>                which lock already depends on the new lock.
>>>>
>>>
>>> OK, managed to reproduce in qemu and this fixes it for me on top of
>>> next-20210809. Could you test as well, as your testing might be more
>>> comprehensive? I will format is as a fixup for the proper patch in the series then.
>>
>> As it appeared it should, moving cpu_hotplug_lock outside slab_mutex in
>> kmem_cache_destroy() on top of that silenced the cpu offline gripe.
> 
> And this one got rid of the remainder of the deadlock, but gets me the
> splat shown at the end of this message.  So some sort of middle ground
> may be needed.
> 
> (Same reproducer as in my previous reply to Vlastimil.)
> 
> 							Thanx, Paul
> 
>> ---
>>  mm/slab_common.c |    2 ++
>>  mm/slub.c        |    2 +-
>>  2 files changed, 3 insertions(+), 1 deletion(-)
>>
>> --- a/mm/slab_common.c
>> +++ b/mm/slab_common.c
>> @@ -502,6 +502,7 @@ void kmem_cache_destroy(struct kmem_cach
>>  	if (unlikely(!s))
>>  		return;
>>
>> +	cpus_read_lock();
>>  	mutex_lock(&slab_mutex);
>>
>>  	s->refcount--;
>> @@ -516,6 +517,7 @@ void kmem_cache_destroy(struct kmem_cach
>>  	}
>>  out_unlock:
>>  	mutex_unlock(&slab_mutex);
>> +	cpus_read_unlock();
>>  }
>>  EXPORT_SYMBOL(kmem_cache_destroy);
>>
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -4234,7 +4234,7 @@ int __kmem_cache_shutdown(struct kmem_ca
>>  	int node;
>>  	struct kmem_cache_node *n;
>>
>> -	flush_all(s);
>> +	flush_all_cpus_locked(s);
>>  	/* Attempt to free all objects */
>>  	for_each_kmem_cache_node(s, node, n) {
>>  		free_partial(s, n);
> 
> [  602.539109] ------------[ cut here ]------------
> [  602.539804] WARNING: CPU: 3 PID: 88 at kernel/cpu.c:335 lockdep_assert_cpus_held+0x29/0x30

So this says the assert failed and we don't have the cpus_read_lock(), right, but...

> [  602.540940] Modules linked in:
> [  602.541377] CPU: 3 PID: 88 Comm: torture_shutdow Not tainted 5.14.0-rc5-next-20210809+ #3299
> [  602.542536] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.13.0-2.module_el8.5.0+746+bbd5d70c 04/01/2014
> [  602.543786] RIP: 0010:lockdep_assert_cpus_held+0x29/0x30
> [  602.544524] Code: 00 83 3d 4d f1 a4 01 01 76 0a 8b 05 4d 23 a5 01 85 c0 75 01 c3 be ff ff ff ff 48 c7 c7 b0 86 66 a3 e8 9b 05 c9 00 85 c0 75 ea <0f> 0b c3 0f 1f 40 00 41 57 41 89 ff 41 56 4d 89 c6 41 55 49 89 cd
> [  602.547051] RSP: 0000:ffffb382802efdb8 EFLAGS: 00010246
> [  602.547783] RAX: 0000000000000000 RBX: ffffa23301a44000 RCX: 0000000000000001
> [  602.548764] RDX: 0000000000000001 RSI: ffffffffa335f5c0 RDI: ffffffffa33adbbf[  602.549747] RBP: ffffa23301a44000 R08: ffffa23302810000 R09: 974cf0ba5c48ad3c
> [  602.550727] R10: ffffb382802efe78 R11: 0000000000000001 R12: ffffa23301a44000[  602.551709] R13: 00000000000249c0 R14: 00000000ffffffff R15: 0000000fffffffe0
> [  602.552694] FS:  0000000000000000(0000) GS:ffffa2331f580000(0000) knlGS:0000000000000000
> [  602.553805] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  602.554606] CR2: 0000000000000000 CR3: 0000000017222000 CR4: 00000000000006e0
> [  602.555601] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  602.556590] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  602.557585] Call Trace:
> [  602.557927]  flush_all_cpus_locked+0x29/0x140
> [  602.558535]  __kmem_cache_shutdown+0x26/0x200
> [  602.559145]  ? lock_is_held_type+0xd6/0x130
> [  602.559739]  ? torture_onoff+0x260/0x260
> [  602.560284]  kmem_cache_destroy+0x38/0x110

It should have been taken here. I don't understand. It's as if only the
mm/slub.c was patched by Mike's patch, but mm/slab_common.c not?

> [  602.560859]  rcu_torture_cleanup.cold.36+0x192/0x421
> [  602.561539]  ? wait_woken+0x60/0x60
> [  602.562035]  ? torture_onoff+0x260/0x260
> [  602.562591]  torture_shutdown+0xdd/0x1c0
> [  602.563131]  kthread+0x132/0x160
> [  602.563592]  ? set_kthread_struct+0x40/0x40
> [  602.564172]  ret_from_fork+0x22/0x30
> [  602.564696] irq event stamp: 1307
> [  602.565161] hardirqs last  enabled at (1315): [<ffffffffa1eddced>] __up_console_sem+0x4d/0x50
> [  602.566321] hardirqs last disabled at (1324): [<ffffffffa1eddcd2>] __up_console_sem+0x32/0x50
> [  602.567479] softirqs last  enabled at (1304): [<ffffffffa2e00311>] __do_softirq+0x311/0x473
> [  602.568616] softirqs last disabled at (1299): [<ffffffffa1e72eb8>] irq_exit_rcu+0xe8/0xf0
> [  602.569735] ---[ end trace 26fd643e1df331c9 ]---
> 


  reply	other threads:[~2021-08-10 22:36 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-05 15:19 [PATCH v4 00/35] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 01/35] mm, slub: don't call flush_all() from slab_debug_trace_open() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 02/35] mm, slub: allocate private object map for debugfs listings Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 03/35] mm, slub: allocate private object map for validate_slab_cache() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 04/35] mm, slub: don't disable irq for debug_check_no_locks_freed() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 05/35] mm, slub: remove redundant unfreeze_partials() from put_cpu_partial() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 06/35] mm, slub: unify cmpxchg_double_slab() and __cmpxchg_double_slab() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 07/35] mm, slub: extract get_partial() from new_slab_objects() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 08/35] mm, slub: dissolve new_slab_objects() into ___slab_alloc() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 09/35] mm, slub: return slab page from get_partial() and set c->page afterwards Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 10/35] mm, slub: restructure new page checks in ___slab_alloc() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 11/35] mm, slub: simplify kmem_cache_cpu and tid setup Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 12/35] mm, slub: move disabling/enabling irqs to ___slab_alloc() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 13/35] mm, slub: do initial checks in ___slab_alloc() with irqs enabled Vlastimil Babka
2021-08-15 10:14   ` Vlastimil Babka
2021-08-15 10:22     ` Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 14/35] mm, slub: move disabling irqs closer to get_partial() in ___slab_alloc() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 15/35] mm, slub: restore irqs around calling new_slab() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 16/35] mm, slub: validate slab from partial list or page allocator before making it cpu slab Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 17/35] mm, slub: check new pages with restored irqs Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 18/35] mm, slub: stop disabling irqs around get_partial() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 19/35] mm, slub: move reset of c->page and freelist out of deactivate_slab() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 20/35] mm, slub: make locking in deactivate_slab() irq-safe Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 21/35] mm, slub: call deactivate_slab() without disabling irqs Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 22/35] mm, slub: move irq control into unfreeze_partials() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 23/35] mm, slub: discard slabs in unfreeze_partials() without irqs disabled Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 24/35] mm, slub: detach whole partial list at once in unfreeze_partials() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 25/35] mm, slub: separate detaching of partial list in unfreeze_partials() from unfreezing Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 26/35] mm, slub: only disable irq with spin_lock in __unfreeze_partials() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 27/35] mm, slub: don't disable irqs in slub_cpu_dead() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 28/35] mm, slab: make flush_slab() possible to call with irqs enabled Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 29/35] mm: slub: Move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context Vlastimil Babka
2021-08-09 13:41   ` Qian Cai
2021-08-09 18:44     ` Mike Galbraith
2021-08-09 18:44       ` Mike Galbraith
2021-08-09 20:08       ` Vlastimil Babka
2021-08-09 22:13         ` Qian Cai
2021-08-10  1:07         ` Mike Galbraith
2021-08-10  1:07           ` Mike Galbraith
2021-08-10  9:03     ` Vlastimil Babka
2021-08-10 11:47       ` Mike Galbraith
2021-08-10 11:47         ` Mike Galbraith
2021-08-10 20:31         ` Paul E. McKenney
2021-08-10 22:36           ` Vlastimil Babka [this message]
2021-08-10 23:53             ` Paul E. McKenney
2021-08-11 14:17               ` Paul E. McKenney
2021-08-10 20:25       ` Paul E. McKenney
2021-08-10 14:33     ` Vlastimil Babka
2021-08-11  1:42       ` Qian Cai
2021-08-11  8:55       ` Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 30/35] mm: slub: Make object_map_lock a raw_spinlock_t Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 31/35] mm, slub: optionally save/restore irqs in slab_[un]lock()/ Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 32/35] mm, slub: make slab_lock() disable irqs with PREEMPT_RT Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 33/35] mm, slub: protect put_cpu_partial() with disabled irqs instead of cmpxchg Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 34/35] mm, slub: use migrate_disable() on PREEMPT_RT Vlastimil Babka
2021-08-05 15:20 ` [PATCH v4 35/35] mm, slub: convert kmem_cpu_slab protection to local_lock Vlastimil Babka
2021-08-15 12:27   ` Sven Eckelmann
2021-08-17  8:37     ` Vlastimil Babka
2021-08-17  9:12       ` Sebastian Andrzej Siewior
2021-08-17  9:17         ` Vlastimil Babka
2021-08-17  9:31           ` Sebastian Andrzej Siewior
2021-08-17  9:31         ` Vlastimil Babka
2021-08-17  9:34           ` Sebastian Andrzej Siewior
2021-08-17  9:13     ` Vlastimil Babka
2021-08-17 10:14   ` Vlastimil Babka
2021-08-17 19:53     ` Andrew Morton
2021-08-18 11:52       ` Vlastimil Babka
2021-08-23 20:36         ` Thomas Gleixner
2021-08-17 15:39   ` Sebastian Andrzej Siewior
2021-08-17 15:41     ` Vlastimil Babka
2021-08-17 15:49       ` Sebastian Andrzej Siewior
2021-08-17 15:56   ` Vlastimil Babka
2021-08-05 16:42 ` [PATCH v4 00/35] SLUB: reduce irq disabled scope and make it RT compatible Sebastian Andrzej Siewior
2021-08-06  5:14   ` Mike Galbraith
2021-08-06  5:14     ` Mike Galbraith
2021-08-06  7:45     ` Vlastimil Babka
2021-08-10 14:36 ` Vlastimil Babka
2021-08-15 10:18   ` Vlastimil Babka
2021-08-17 10:23     ` Vlastimil Babka
2021-08-17 15:59       ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a8308403-ae87-28ce-1aa3-66e1a436b144@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=bigeasy@linutronix.de \
    --cc=brouer@redhat.com \
    --cc=cl@linux.com \
    --cc=efault@gmx.de \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=paulmck@kernel.org \
    --cc=penberg@kernel.org \
    --cc=quic_qiancai@quicinc.com \
    --cc=rientjes@google.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.