* [PATCH] mm/slub: fix a deadlock in shuffle_freelist()
@ 2019-09-13 16:27 Qian Cai
2019-09-16 9:03 ` Sebastian Andrzej Siewior
2019-09-25 9:31 ` Peter Zijlstra
0 siblings, 2 replies; 23+ messages in thread
From: Qian Cai @ 2019-09-13 16:27 UTC (permalink / raw)
To: akpm
Cc: bigeasy, tglx, thgarnie, peterz, tytso, cl, penberg, rientjes,
mingo, will, linux-mm, linux-kernel, keescook, Qian Cai
The commit b7d5dc21072c ("random: add a spinlock_t to struct
batched_entropy") insists on acquiring "batched_entropy_u32.lock" in
get_random_u32() which introduced the lock chain,
"&rq->lock --> batched_entropy_u32.lock"
even after crng init. As the result, it could result in deadlock below.
Fix it by using get_random_bytes() in shuffle_freelist() which does not
need to take on the batched_entropy locks.
WARNING: possible circular locking dependency detected
5.3.0-rc7-mm1+ #3 Tainted: G L
------------------------------------------------------
make/7937 is trying to acquire lock:
ffff900012f225f8 (random_write_wait.lock){....}, at:
__wake_up_common_lock+0xa8/0x11c
but task is already holding lock:
ffff0096b9429c00 (batched_entropy_u32.lock){-.-.}, at:
get_random_u32+0x6c/0x1dc
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 (batched_entropy_u32.lock){-.-.}:
lock_acquire+0x31c/0x360
_raw_spin_lock_irqsave+0x7c/0x9c
get_random_u32+0x6c/0x1dc
new_slab+0x234/0x6c0
___slab_alloc+0x3c8/0x650
kmem_cache_alloc+0x4b0/0x590
__debug_object_init+0x778/0x8b4
debug_object_init+0x40/0x50
debug_init+0x30/0x29c
hrtimer_init+0x30/0x50
init_dl_task_timer+0x24/0x44
__sched_fork+0xc0/0x168
init_idle+0x78/0x26c
fork_idle+0x12c/0x178
idle_threads_init+0x108/0x178
smp_init+0x20/0x1bc
kernel_init_freeable+0x198/0x26c
kernel_init+0x18/0x334
ret_from_fork+0x10/0x18
-> #2 (&rq->lock){-.-.}:
lock_acquire+0x31c/0x360
_raw_spin_lock+0x64/0x80
task_fork_fair+0x5c/0x1b0
sched_fork+0x15c/0x2dc
copy_process+0x9e0/0x244c
_do_fork+0xb8/0x644
kernel_thread+0xc4/0xf4
rest_init+0x30/0x238
arch_call_rest_init+0x10/0x18
start_kernel+0x424/0x52c
-> #1 (&p->pi_lock){-.-.}:
lock_acquire+0x31c/0x360
_raw_spin_lock_irqsave+0x7c/0x9c
try_to_wake_up+0x74/0x8d0
default_wake_function+0x38/0x48
pollwake+0x118/0x158
__wake_up_common+0x130/0x1c4
__wake_up_common_lock+0xc8/0x11c
__wake_up+0x3c/0x4c
account+0x390/0x3e0
extract_entropy+0x2cc/0x37c
_xfer_secondary_pool+0x35c/0x3c4
push_to_pool+0x54/0x308
process_one_work+0x4f4/0x950
worker_thread+0x390/0x4bc
kthread+0x1cc/0x1e8
ret_from_fork+0x10/0x18
-> #0 (random_write_wait.lock){....}:
validate_chain+0xd10/0x2bcc
__lock_acquire+0x7f4/0xb8c
lock_acquire+0x31c/0x360
_raw_spin_lock_irqsave+0x7c/0x9c
__wake_up_common_lock+0xa8/0x11c
__wake_up+0x3c/0x4c
account+0x390/0x3e0
extract_entropy+0x2cc/0x37c
crng_reseed+0x60/0x2f8
_extract_crng+0xd8/0x164
crng_reseed+0x7c/0x2f8
_extract_crng+0xd8/0x164
get_random_u32+0xec/0x1dc
new_slab+0x234/0x6c0
___slab_alloc+0x3c8/0x650
kmem_cache_alloc+0x4b0/0x590
getname_flags+0x44/0x1c8
user_path_at_empty+0x3c/0x68
vfs_statx+0xa4/0x134
__arm64_sys_newfstatat+0x94/0x120
el0_svc_handler+0x170/0x240
el0_svc+0x8/0xc
other info that might help us debug this:
Chain exists of:
random_write_wait.lock --> &rq->lock --> batched_entropy_u32.lock
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(batched_entropy_u32.lock);
lock(&rq->lock);
lock(batched_entropy_u32.lock);
lock(random_write_wait.lock);
*** DEADLOCK ***
1 lock held by make/7937:
#0: ffff0096b9429c00 (batched_entropy_u32.lock){-.-.}, at:
get_random_u32+0x6c/0x1dc
stack backtrace:
CPU: 220 PID: 7937 Comm: make Tainted: G L 5.3.0-rc7-mm1+
Hardware name: HPE Apollo 70 /C01_APACHE_MB , BIOS
L50_5.13_1.11 06/18/2019
Call trace:
dump_backtrace+0x0/0x248
show_stack+0x20/0x2c
dump_stack+0xd0/0x140
print_circular_bug+0x368/0x380
check_noncircular+0x248/0x250
validate_chain+0xd10/0x2bcc
__lock_acquire+0x7f4/0xb8c
lock_acquire+0x31c/0x360
_raw_spin_lock_irqsave+0x7c/0x9c
__wake_up_common_lock+0xa8/0x11c
__wake_up+0x3c/0x4c
account+0x390/0x3e0
extract_entropy+0x2cc/0x37c
crng_reseed+0x60/0x2f8
_extract_crng+0xd8/0x164
crng_reseed+0x7c/0x2f8
_extract_crng+0xd8/0x164
get_random_u32+0xec/0x1dc
new_slab+0x234/0x6c0
___slab_alloc+0x3c8/0x650
kmem_cache_alloc+0x4b0/0x590
getname_flags+0x44/0x1c8
user_path_at_empty+0x3c/0x68
vfs_statx+0xa4/0x134
__arm64_sys_newfstatat+0x94/0x120
el0_svc_handler+0x170/0x240
el0_svc+0x8/0xc
Signed-off-by: Qian Cai <cai@lca.pw>
---
mm/slub.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/mm/slub.c b/mm/slub.c
index 8834563cdb4b..96cdd36f9380 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1598,8 +1598,15 @@ static bool shuffle_freelist(struct kmem_cache *s, struct page *page)
if (page->objects < 2 || !s->random_seq)
return false;
+ /*
+ * Don't use get_random_int() here as it might deadlock due to
+ * "&rq->lock --> batched_entropy_u32.lock" chain.
+ */
+ if (!arch_get_random_int((int *)&pos))
+ get_random_bytes(&pos, sizeof(int));
+
freelist_count = oo_objects(s->oo);
- pos = get_random_int() % freelist_count;
+ pos %= freelist_count;
page_limit = page->objects * s->size;
start = fixup_red_left(s, page_address(page));
--
1.8.3.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH] mm/slub: fix a deadlock in shuffle_freelist()
2019-09-13 16:27 [PATCH] mm/slub: fix a deadlock in shuffle_freelist() Qian Cai
@ 2019-09-16 9:03 ` Sebastian Andrzej Siewior
2019-09-16 14:01 ` Qian Cai
2019-09-25 9:31 ` Peter Zijlstra
1 sibling, 1 reply; 23+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-09-16 9:03 UTC (permalink / raw)
To: Qian Cai, peterz, mingo
Cc: akpm, tglx, thgarnie, tytso, cl, penberg, rientjes, will,
linux-mm, linux-kernel, keescook
On 2019-09-13 12:27:44 [-0400], Qian Cai wrote:
…
> Chain exists of:
> random_write_wait.lock --> &rq->lock --> batched_entropy_u32.lock
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(batched_entropy_u32.lock);
> lock(&rq->lock);
> lock(batched_entropy_u32.lock);
> lock(random_write_wait.lock);
would this deadlock still occur if lockdep knew that
batched_entropy_u32.lock on CPU0 could be acquired at the same time
as CPU1 acquired its batched_entropy_u32.lock?
Sebastian
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] mm/slub: fix a deadlock in shuffle_freelist()
2019-09-16 9:03 ` Sebastian Andrzej Siewior
@ 2019-09-16 14:01 ` Qian Cai
2019-09-16 19:51 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 23+ messages in thread
From: Qian Cai @ 2019-09-16 14:01 UTC (permalink / raw)
To: Sebastian Andrzej Siewior, peterz, mingo
Cc: akpm, tglx, thgarnie, tytso, cl, penberg, rientjes, will,
linux-mm, linux-kernel, keescook
On Mon, 2019-09-16 at 11:03 +0200, Sebastian Andrzej Siewior wrote:
> On 2019-09-13 12:27:44 [-0400], Qian Cai wrote:
> …
> > Chain exists of:
> > random_write_wait.lock --> &rq->lock --> batched_entropy_u32.lock
> >
> > Possible unsafe locking scenario:
> >
> > CPU0 CPU1
> > ---- ----
> > lock(batched_entropy_u32.lock);
> > lock(&rq->lock);
> > lock(batched_entropy_u32.lock);
> > lock(random_write_wait.lock);
>
> would this deadlock still occur if lockdep knew that
> batched_entropy_u32.lock on CPU0 could be acquired at the same time
> as CPU1 acquired its batched_entropy_u32.lock?
I suppose that might fix it too if it can teach the lockdep the trick, but it
would be better if there is a patch if you have something in mind that could be
tested to make sure.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] mm/slub: fix a deadlock in shuffle_freelist()
2019-09-16 14:01 ` Qian Cai
@ 2019-09-16 19:51 ` Sebastian Andrzej Siewior
2019-09-16 21:31 ` Qian Cai
0 siblings, 1 reply; 23+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-09-16 19:51 UTC (permalink / raw)
To: Qian Cai
Cc: peterz, mingo, akpm, tglx, thgarnie, tytso, cl, penberg,
rientjes, will, linux-mm, linux-kernel, keescook
On 2019-09-16 10:01:27 [-0400], Qian Cai wrote:
> On Mon, 2019-09-16 at 11:03 +0200, Sebastian Andrzej Siewior wrote:
> > On 2019-09-13 12:27:44 [-0400], Qian Cai wrote:
> > …
> > > Chain exists of:
> > > random_write_wait.lock --> &rq->lock --> batched_entropy_u32.lock
> > >
> > > Possible unsafe locking scenario:
> > >
> > > CPU0 CPU1
> > > ---- ----
> > > lock(batched_entropy_u32.lock);
> > > lock(&rq->lock);
> > > lock(batched_entropy_u32.lock);
> > > lock(random_write_wait.lock);
> >
> > would this deadlock still occur if lockdep knew that
> > batched_entropy_u32.lock on CPU0 could be acquired at the same time
> > as CPU1 acquired its batched_entropy_u32.lock?
>
> I suppose that might fix it too if it can teach the lockdep the trick, but it
> would be better if there is a patch if you have something in mind that could be
> tested to make sure.
get_random_bytes() is heavier than get_random_int() so I would prefer to
avoid its usage to fix what looks like a false positive report from
lockdep.
But no, I don't have a patch sitting around. A lock in per-CPU memory
could lead to the scenario mentioned above if the lock could be obtained
cross-CPU it just isn't so in that case. So I don't think it is that
simple.
Sebastian
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] mm/slub: fix a deadlock in shuffle_freelist()
2019-09-16 19:51 ` Sebastian Andrzej Siewior
@ 2019-09-16 21:31 ` Qian Cai
2019-09-17 7:16 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 23+ messages in thread
From: Qian Cai @ 2019-09-16 21:31 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: peterz, mingo, akpm, tglx, thgarnie, tytso, cl, penberg,
rientjes, will, linux-mm, linux-kernel, keescook
On Mon, 2019-09-16 at 21:51 +0200, Sebastian Andrzej Siewior wrote:
> On 2019-09-16 10:01:27 [-0400], Qian Cai wrote:
> > On Mon, 2019-09-16 at 11:03 +0200, Sebastian Andrzej Siewior wrote:
> > > On 2019-09-13 12:27:44 [-0400], Qian Cai wrote:
> > > …
> > > > Chain exists of:
> > > > random_write_wait.lock --> &rq->lock --> batched_entropy_u32.lock
> > > >
> > > > Possible unsafe locking scenario:
> > > >
> > > > CPU0 CPU1
> > > > ---- ----
> > > > lock(batched_entropy_u32.lock);
> > > > lock(&rq->lock);
> > > > lock(batched_entropy_u32.lock);
> > > > lock(random_write_wait.lock);
> > >
> > > would this deadlock still occur if lockdep knew that
> > > batched_entropy_u32.lock on CPU0 could be acquired at the same time
> > > as CPU1 acquired its batched_entropy_u32.lock?
> >
> > I suppose that might fix it too if it can teach the lockdep the trick, but it
> > would be better if there is a patch if you have something in mind that could be
> > tested to make sure.
>
> get_random_bytes() is heavier than get_random_int() so I would prefer to
> avoid its usage to fix what looks like a false positive report from
> lockdep.
> But no, I don't have a patch sitting around. A lock in per-CPU memory
> could lead to the scenario mentioned above if the lock could be obtained
> cross-CPU it just isn't so in that case. So I don't think it is that
> simple.
get_random_u64() is also busted.
[ 752.925079] WARNING: possible circular locking dependency detected
[ 752.931951] 5.3.0-rc8-next-20190915+ #2 Tainted: G L
[ 752.938906] ------------------------------------------------------
[ 752.945774] ls/9665 is trying to acquire lock:
[ 752.950905] ffff90001311fef8 (random_write_wait.lock){..-.}, at:
__wake_up_common_lock+0xa8/0x11c
[ 752.960481]
but task is already holding lock:
[ 752.967698] ffff008abc7b9c00 (batched_entropy_u64.lock){....}, at:
get_random_u64+0x6c/0x1dc
[ 752.976835]
which lock already depends on the new lock.
[ 752.987089]
the existing dependency chain (in reverse order) is:
[ 752.995953]
-> #4 (batched_entropy_u64.lock){....}:
[ 753.003702] lock_acquire+0x320/0x364
[ 753.008577] _raw_spin_lock_irqsave+0x7c/0x9c
[ 753.014145] get_random_u64+0x6c/0x1dc
[ 753.019109] add_to_free_area_random+0x54/0x1c8
[ 753.024851] free_one_page+0x86c/0xc28
[ 753.029818] __free_pages_ok+0x69c/0xdac
[ 753.034960] __free_pages+0xbc/0xf8
[ 753.039663] __free_pages_core+0x2ac/0x3c0
[ 753.044973] memblock_free_pages+0xe0/0xf8
[ 753.050281] __free_pages_memory+0xcc/0xfc
[ 753.055588] __free_memory_core+0x70/0x78
[ 753.060809] free_low_memory_core_early+0x148/0x18c
[ 753.066897] memblock_free_all+0x18/0x54
[ 753.072033] mem_init+0x9c/0x160
[ 753.076472] mm_init+0x14/0x38
[ 753.080737] start_kernel+0x19c/0x52c
[ 753.085607]
-> #3 (&(&zone->lock)->rlock){..-.}:
[ 753.093092] lock_acquire+0x320/0x364
[ 753.097964] _raw_spin_lock+0x64/0x80
[ 753.102839] rmqueue_bulk+0x50/0x15a0
[ 753.107712] get_page_from_freelist+0x2260/0x29dc
[ 753.113627] __alloc_pages_nodemask+0x36c/0x1ce0
[ 753.119457] alloc_page_interleave+0x34/0x17c
[ 753.125023] alloc_pages_current+0x80/0xe0
[ 753.130334] allocate_slab+0xfc/0x1d80
[ 753.135296] ___slab_alloc+0x5d4/0xa70
[ 753.140257] kmem_cache_alloc+0x588/0x66c
[ 753.145480] __debug_object_init+0x9d8/0xbac
[ 753.150962] debug_object_init+0x40/0x50
[ 753.156098] hrtimer_init+0x38/0x2b4
[ 753.160885] init_dl_task_timer+0x24/0x44
[ 753.166108] __sched_fork+0xc0/0x168
[ 753.170894] init_idle+0x80/0x3d8
[ 753.175420] idle_thread_get+0x60/0x8c
[ 753.180385] _cpu_up+0x10c/0x348
[ 753.184824] do_cpu_up+0x114/0x170
[ 753.189437] cpu_up+0x20/0x2c
[ 753.193615] smp_init+0xf8/0x1bc
[ 753.198054] kernel_init_freeable+0x198/0x26c
[ 753.203622] kernel_init+0x18/0x334
[ 753.208323] ret_from_fork+0x10/0x18
[ 753.213107]
-> #2 (&rq->lock){-.-.}:
[ 753.219550] lock_acquire+0x320/0x364
[ 753.224423] _raw_spin_lock+0x64/0x80
[ 753.229299] task_fork_fair+0x64/0x22c
[ 753.234261] sched_fork+0x24c/0x3d8
[ 753.238962] copy_process+0xa60/0x29b0
[ 753.243921] _do_fork+0xb8/0xa64
[ 753.248360] kernel_thread+0xc4/0xf4
[ 753.253147] rest_init+0x30/0x320
[ 753.257673] arch_call_rest_init+0x10/0x18
[ 753.262980] start_kernel+0x424/0x52c
[ 753.267849]
-> #1 (&p->pi_lock){-.-.}:
[ 753.274467] lock_acquire+0x320/0x364
[ 753.279342] _raw_spin_lock_irqsave+0x7c/0x9c
[ 753.284910] try_to_wake_up+0x74/0x128c
[ 753.289959] default_wake_function+0x38/0x48
[ 753.295440] pollwake+0x118/0x158
[ 753.299967] __wake_up_common+0x16c/0x240
[ 753.305187] __wake_up_common_lock+0xc8/0x11c
[ 753.310754] __wake_up+0x3c/0x4c
[ 753.315193] account+0x390/0x3e0
[ 753.319632] extract_entropy+0x2cc/0x37c
[ 753.324766] _xfer_secondary_pool+0x35c/0x3c4
[ 753.330333] push_to_pool+0x54/0x308
[ 753.335119] process_one_work+0x558/0xb1c
[ 753.340339] worker_thread+0x494/0x650
[ 753.345300] kthread+0x1cc/0x1e8
[ 753.349739] ret_from_fork+0x10/0x18
[ 753.354522]
-> #0 (random_write_wait.lock){..-.}:
[ 753.362093] validate_chain+0xfcc/0x2fd4
[ 753.367227] __lock_acquire+0x868/0xc2c
[ 753.372274] lock_acquire+0x320/0x364
[ 753.377147] _raw_spin_lock_irqsave+0x7c/0x9c
[ 753.382715] __wake_up_common_lock+0xa8/0x11c
[ 753.388282] __wake_up+0x3c/0x4c
[ 753.392720] account+0x390/0x3e0
[ 753.397159] extract_entropy+0x2cc/0x37c
[ 753.402292] crng_reseed+0x60/0x350
[ 753.406991] _extract_crng+0xd8/0x164
[ 753.411864] crng_reseed+0x7c/0x350
[ 753.416563] _extract_crng+0xd8/0x164
[ 753.421436] get_random_u64+0xec/0x1dc
[ 753.426396] arch_mmap_rnd+0x18/0x78
[ 753.431187] load_elf_binary+0x6d0/0x1730
[ 753.436411] search_binary_handler+0x10c/0x35c
[ 753.442067] __do_execve_file+0xb58/0xf7c
[ 753.447287] __arm64_sys_execve+0x6c/0xa4
[ 753.452509] el0_svc_handler+0x170/0x240
[ 753.457643] el0_svc+0x8/0xc
[ 753.461732]
other info that might help us debug this:
[ 753.471812] Chain exists of:
random_write_wait.lock --> &(&zone->lock)->rlock -->
batched_entropy_u64.lock
[ 753.486588] Possible unsafe locking scenario:
[ 753.493890] CPU0 CPU1
[ 753.499108] ---- ----
[ 753.504324] lock(batched_entropy_u64.lock);
[ 753.509372] lock(&(&zone->lock)->rlock);
[ 753.516675] lock(batched_entropy_u64.lock);
[ 753.524238] lock(random_write_wait.lock);
[ 753.529113]
*** DEADLOCK ***
[ 753.537111] 1 lock held by ls/9665:
[ 753.541287] #0: ffff008abc7b9c00 (batched_entropy_u64.lock){....}, at:
get_random_u64+0x6c/0x1dc
[ 753.550858]
stack backtrace:
[ 753.556602] CPU: 121 PID: 9665 Comm: ls Tainted: G L 5.3.0-
rc8-next-20190915+ #2
[ 753.565987] Hardware name: HPE Apollo 70 /C01_APACHE_MB ,
BIOS L50_5.13_1.11 06/18/2019
[ 753.576414] Call trace:
[ 753.579553] dump_backtrace+0x0/0x264
[ 753.583905] show_stack+0x20/0x2c
[ 753.587911] dump_stack+0xd0/0x140
[ 753.592003] print_circular_bug+0x368/0x380
[ 753.596876] check_noncircular+0x28c/0x294
[ 753.601664] validate_chain+0xfcc/0x2fd4
[ 753.606276] __lock_acquire+0x868/0xc2c
[ 753.610802] lock_acquire+0x320/0x364
[ 753.615154] _raw_spin_lock_irqsave+0x7c/0x9c
[ 753.620202] __wake_up_common_lock+0xa8/0x11c
[ 753.625248] __wake_up+0x3c/0x4c
[ 753.629171] account+0x390/0x3e0
[ 753.633095] extract_entropy+0x2cc/0x37c
[ 753.637708] crng_reseed+0x60/0x350
[ 753.641887] _extract_crng+0xd8/0x164
[ 753.646238] crng_reseed+0x7c/0x350
[ 753.650417] _extract_crng+0xd8/0x164
[ 753.654768] get_random_u64+0xec/0x1dc
[ 753.659208] arch_mmap_rnd+0x18/0x78
[ 753.663474] load_elf_binary+0x6d0/0x1730
[ 753.668173] search_binary_handler+0x10c/0x35c
[ 753.673308] __do_execve_file+0xb58/0xf7c
[ 753.678007] __arm64_sys_execve+0x6c/0xa4
[ 753.682707] el0_svc_handler+0x170/0x240
[ 753.687319] el0_svc+0x8/0xc
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] mm/slub: fix a deadlock in shuffle_freelist()
2019-09-16 21:31 ` Qian Cai
@ 2019-09-17 7:16 ` Sebastian Andrzej Siewior
2019-09-18 19:59 ` Qian Cai
0 siblings, 1 reply; 23+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-09-17 7:16 UTC (permalink / raw)
To: Qian Cai
Cc: peterz, mingo, akpm, tglx, thgarnie, tytso, cl, penberg,
rientjes, will, linux-mm, linux-kernel, keescook
On 2019-09-16 17:31:34 [-0400], Qian Cai wrote:
…
> get_random_u64() is also busted.
…
> [ 753.486588] Possible unsafe locking scenario:
>
> [ 753.493890] CPU0 CPU1
> [ 753.499108] ---- ----
> [ 753.504324] lock(batched_entropy_u64.lock);
> [ 753.509372] lock(&(&zone->lock)->rlock);
> [ 753.516675] lock(batched_entropy_u64.lock);
> [ 753.524238] lock(random_write_wait.lock);
> [ 753.529113]
> *** DEADLOCK ***
This is the same scenario as the previous one in regard to the
batched_entropy_….lock.
Sebastian
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] mm/slub: fix a deadlock in shuffle_freelist()
2019-09-17 7:16 ` Sebastian Andrzej Siewior
@ 2019-09-18 19:59 ` Qian Cai
0 siblings, 0 replies; 23+ messages in thread
From: Qian Cai @ 2019-09-18 19:59 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: peterz, mingo, akpm, tglx, thgarnie, tytso, cl, penberg,
rientjes, will, linux-mm, linux-kernel, keescook
On Tue, 2019-09-17 at 09:16 +0200, Sebastian Andrzej Siewior wrote:
> On 2019-09-16 17:31:34 [-0400], Qian Cai wrote:
> …
> > get_random_u64() is also busted.
>
> …
> > [ 753.486588] Possible unsafe locking scenario:
> >
> > [ 753.493890] CPU0 CPU1
> > [ 753.499108] ---- ----
> > [ 753.504324] lock(batched_entropy_u64.lock);
> > [ 753.509372] lock(&(&zone->lock)->rlock);
> > [ 753.516675] lock(batched_entropy_u64.lock);
> > [ 753.524238] lock(random_write_wait.lock);
> > [ 753.529113]
> > *** DEADLOCK ***
>
> This is the same scenario as the previous one in regard to the
> batched_entropy_….lock.
The commit 383776fa7527 ("locking/lockdep: Handle statically initialized percpu
locks properly") which increased the chance of false positives for percpu locks
significantly especially for large systems like in those examples since it makes
all of them the same class. Once there happens a false positive, lockdep will
become useless.
In reality, each percpu lock is a different lock as we have seen in those
examples where each CPU only take a local one. The only thing that should worry
about is the path that another CPU could take a non-local percpu lock. For
example, invalidate_batched_entropy() which is a for_each_possible_cpu() call.
Is there any other place that another CPU could take a non-local percpu lock but
not a for_each_possible_cpu() or similar iterator?
Even before the above commit, if the system is running long enough, it could
still catch a deadlock from those percpu lock iterators since it will register
each percpu lock usage in lockdep
Overall, it sounds to me the side-effects of commit 383776fa7527 outweight the
benefits, and should be reverted. Do I miss anything?
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] mm/slub: fix a deadlock in shuffle_freelist()
2019-09-13 16:27 [PATCH] mm/slub: fix a deadlock in shuffle_freelist() Qian Cai
2019-09-16 9:03 ` Sebastian Andrzej Siewior
@ 2019-09-25 9:31 ` Peter Zijlstra
2019-09-25 15:18 ` Qian Cai
1 sibling, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2019-09-25 9:31 UTC (permalink / raw)
To: Qian Cai
Cc: akpm, bigeasy, tglx, thgarnie, tytso, cl, penberg, rientjes,
mingo, will, linux-mm, linux-kernel, keescook
On Fri, Sep 13, 2019 at 12:27:44PM -0400, Qian Cai wrote:
> The commit b7d5dc21072c ("random: add a spinlock_t to struct
> batched_entropy") insists on acquiring "batched_entropy_u32.lock" in
> get_random_u32() which introduced the lock chain,
>
> "&rq->lock --> batched_entropy_u32.lock"
>
> even after crng init. As the result, it could result in deadlock below.
> Fix it by using get_random_bytes() in shuffle_freelist() which does not
> need to take on the batched_entropy locks.
>
> WARNING: possible circular locking dependency detected
> 5.3.0-rc7-mm1+ #3 Tainted: G L
> ------------------------------------------------------
> make/7937 is trying to acquire lock:
> ffff900012f225f8 (random_write_wait.lock){....}, at:
> __wake_up_common_lock+0xa8/0x11c
>
> but task is already holding lock:
> ffff0096b9429c00 (batched_entropy_u32.lock){-.-.}, at:
> get_random_u32+0x6c/0x1dc
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #3 (batched_entropy_u32.lock){-.-.}:
> lock_acquire+0x31c/0x360
> _raw_spin_lock_irqsave+0x7c/0x9c
> get_random_u32+0x6c/0x1dc
> new_slab+0x234/0x6c0
> ___slab_alloc+0x3c8/0x650
> kmem_cache_alloc+0x4b0/0x590
> __debug_object_init+0x778/0x8b4
> debug_object_init+0x40/0x50
> debug_init+0x30/0x29c
> hrtimer_init+0x30/0x50
> init_dl_task_timer+0x24/0x44
> __sched_fork+0xc0/0x168
> init_idle+0x78/0x26c
> fork_idle+0x12c/0x178
> idle_threads_init+0x108/0x178
> smp_init+0x20/0x1bc
> kernel_init_freeable+0x198/0x26c
> kernel_init+0x18/0x334
> ret_from_fork+0x10/0x18
>
> -> #2 (&rq->lock){-.-.}:
This relation is silly..
I suspect the below 'works'...
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 63900ca029e0..ec1d72f18b34 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6027,10 +6027,11 @@ void init_idle(struct task_struct *idle, int cpu)
struct rq *rq = cpu_rq(cpu);
unsigned long flags;
+ __sched_fork(0, idle);
+
raw_spin_lock_irqsave(&idle->pi_lock, flags);
raw_spin_lock(&rq->lock);
- __sched_fork(0, idle);
idle->state = TASK_RUNNING;
idle->se.exec_start = sched_clock();
idle->flags |= PF_IDLE;
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH] mm/slub: fix a deadlock in shuffle_freelist()
2019-09-25 9:31 ` Peter Zijlstra
@ 2019-09-25 15:18 ` Qian Cai
2019-09-25 16:45 ` Peter Zijlstra
0 siblings, 1 reply; 23+ messages in thread
From: Qian Cai @ 2019-09-25 15:18 UTC (permalink / raw)
To: Peter Zijlstra
Cc: akpm, bigeasy, tglx, thgarnie, tytso, cl, penberg, rientjes,
mingo, will, linux-mm, linux-kernel, keescook
On Wed, 2019-09-25 at 11:31 +0200, Peter Zijlstra wrote:
> On Fri, Sep 13, 2019 at 12:27:44PM -0400, Qian Cai wrote:
> > The commit b7d5dc21072c ("random: add a spinlock_t to struct
> > batched_entropy") insists on acquiring "batched_entropy_u32.lock" in
> > get_random_u32() which introduced the lock chain,
> >
> > "&rq->lock --> batched_entropy_u32.lock"
> >
> > even after crng init. As the result, it could result in deadlock below.
> > Fix it by using get_random_bytes() in shuffle_freelist() which does not
> > need to take on the batched_entropy locks.
> >
> > WARNING: possible circular locking dependency detected
> > 5.3.0-rc7-mm1+ #3 Tainted: G L
> > ------------------------------------------------------
> > make/7937 is trying to acquire lock:
> > ffff900012f225f8 (random_write_wait.lock){....}, at:
> > __wake_up_common_lock+0xa8/0x11c
> >
> > but task is already holding lock:
> > ffff0096b9429c00 (batched_entropy_u32.lock){-.-.}, at:
> > get_random_u32+0x6c/0x1dc
> >
> > which lock already depends on the new lock.
> >
> > the existing dependency chain (in reverse order) is:
> >
> > -> #3 (batched_entropy_u32.lock){-.-.}:
> > lock_acquire+0x31c/0x360
> > _raw_spin_lock_irqsave+0x7c/0x9c
> > get_random_u32+0x6c/0x1dc
> > new_slab+0x234/0x6c0
> > ___slab_alloc+0x3c8/0x650
> > kmem_cache_alloc+0x4b0/0x590
> > __debug_object_init+0x778/0x8b4
> > debug_object_init+0x40/0x50
> > debug_init+0x30/0x29c
> > hrtimer_init+0x30/0x50
> > init_dl_task_timer+0x24/0x44
> > __sched_fork+0xc0/0x168
> > init_idle+0x78/0x26c
> > fork_idle+0x12c/0x178
> > idle_threads_init+0x108/0x178
> > smp_init+0x20/0x1bc
> > kernel_init_freeable+0x198/0x26c
> > kernel_init+0x18/0x334
> > ret_from_fork+0x10/0x18
> >
> > -> #2 (&rq->lock){-.-.}:
>
> This relation is silly..
>
> I suspect the below 'works'...
Unfortunately, the relation is still there,
copy_process()->rt_mutex_init_task()->"&p->pi_lock"
[24438.676716][ T2] -> #2 (&rq->lock){-.-.}:
[24438.676727][ T2] __lock_acquire+0x5b4/0xbf0
[24438.676736][ T2] lock_acquire+0x130/0x360
[24438.676754][ T2] _raw_spin_lock+0x54/0x80
[24438.676771][ T2] task_fork_fair+0x60/0x190
[24438.676788][ T2] sched_fork+0x128/0x270
[24438.676806][ T2] copy_process+0x7a4/0x1bf0
[24438.676823][ T2] _do_fork+0xac/0xac0
[24438.676841][ T2] kernel_thread+0x70/0xa0
[24438.676858][ T2] rest_init+0x4c/0x42c
[24438.676884][ T2] start_kernel+0x778/0x7c0
[24438.676902][ T2] start_here_common+0x1c/0x334
Whole thing,
[24438.675704][ T2] WARNING: possible circular locking dependency detected
[24438.675714][ T2] 5.3.0-next-20190924 #2 Not tainted
[24438.675722][ T2] ------------------------------------------------------
[24438.675731][ T2] kthreadd/2 is trying to acquire lock:
[24438.675740][ T2] c0000000010a7450 (random_write_wait.lock){..-.}, at:
__wake_up_common_lock+0x88/0x110
[24438.675768][ T2]
[24438.675768][ T2] but task is already holding lock:
[24438.675778][ T2] c000001ffd2f06e0 (batched_entropy_u64.lock){-...}, at:
get_random_u64+0x60/0x100
[24438.675803][ T2]
[24438.675803][ T2] which lock already depends on the new lock.
[24438.675803][ T2]
[24438.675816][ T2]
[24438.675816][ T2] the existing dependency chain (in reverse order) is:
[24438.675836][ T2]
[24438.675836][ T2] -> #4 (batched_entropy_u64.lock){-...}:
[24438.675860][ T2] __lock_acquire+0x5b4/0xbf0
[24438.675878][ T2] lock_acquire+0x130/0x360
[24438.675906][ T2] _raw_spin_lock_irqsave+0x70/0xa0
[24438.675923][ T2] get_random_u64+0x60/0x100
[24438.675944][ T2] add_to_free_area_random+0x164/0x1b0
[24438.675962][ T2] free_one_page+0xb24/0xcf0
[24438.675980][ T2] __free_pages_ok+0x448/0xbf0
[24438.675999][ T2] deferred_init_maxorder+0x404/0x4a4
[24438.676018][ T2] deferred_grow_zone+0x158/0x1f0
[24438.676035][ T2] get_page_from_freelist+0x1dc8/0x1e10
[24438.676063][ T2] __alloc_pages_nodemask+0x1d8/0x1940
[24438.676083][ T2] allocate_slab+0x130/0x2740
[24438.676091][ T2] new_slab+0xa8/0xe0
[24438.676101][ T2] kmem_cache_open+0x254/0x660
[24438.676119][ T2] __kmem_cache_create+0x44/0x2a0
[24438.676136][ T2] create_boot_cache+0xcc/0x110
[24438.676154][ T2] kmem_cache_init+0x90/0x1f0
[24438.676173][ T2] start_kernel+0x3b8/0x7c0
[24438.676191][ T2] start_here_common+0x1c/0x334
[24438.676208][ T2]
[24438.676208][ T2] -> #3 (&(&zone->lock)->rlock){-.-.}:
[24438.676221][ T2] __lock_acquire+0x5b4/0xbf0
[24438.676247][ T2] lock_acquire+0x130/0x360
[24438.676264][ T2] _raw_spin_lock+0x54/0x80
[24438.676282][ T2] rmqueue_bulk.constprop.23+0x64/0xf20
[24438.676300][ T2] get_page_from_freelist+0x718/0x1e10
[24438.676319][ T2] __alloc_pages_nodemask+0x1d8/0x1940
[24438.676339][ T2] alloc_page_interleave+0x34/0x170
[24438.676356][ T2] allocate_slab+0xd1c/0x2740
[24438.676374][ T2] new_slab+0xa8/0xe0
[24438.676391][ T2] ___slab_alloc+0x580/0xef0
[24438.676408][ T2] __slab_alloc+0x64/0xd0
[24438.676426][ T2] kmem_cache_alloc+0x5c4/0x6c0
[24438.676444][ T2] fill_pool+0x280/0x540
[24438.676461][ T2] __debug_object_init+0x60/0x6b0
[24438.676479][ T2] hrtimer_init+0x5c/0x310
[24438.676497][ T2] init_dl_task_timer+0x34/0x60
[24438.676516][ T2] __sched_fork+0x8c/0x110
[24438.676535][ T2] init_idle+0xb4/0x3c0
[24438.676553][ T2] idle_thread_get+0x78/0x120
[24438.676572][ T2] bringup_cpu+0x30/0x230
[24438.676590][ T2] cpuhp_invoke_callback+0x190/0x1580
[24438.676618][ T2] do_cpu_up+0x248/0x460
[24438.676636][ T2] smp_init+0x118/0x1c0
[24438.676662][ T2] kernel_init_freeable+0x3f8/0x8dc
[24438.676681][ T2] kernel_init+0x2c/0x154
[24438.676699][ T2] ret_from_kernel_thread+0x5c/0x74
[24438.676716][ T2]
[24438.676716][ T2] -> #2 (&rq->lock){-.-.}:
[24438.676727][ T2] __lock_acquire+0x5b4/0xbf0
[24438.676736][ T2] lock_acquire+0x130/0x360
[24438.676754][ T2] _raw_spin_lock+0x54/0x80
[24438.676771][ T2] task_fork_fair+0x60/0x190
[24438.676788][ T2] sched_fork+0x128/0x270
[24438.676806][ T2] copy_process+0x7a4/0x1bf0
[24438.676823][ T2] _do_fork+0xac/0xac0
[24438.676841][ T2] kernel_thread+0x70/0xa0
[24438.676858][ T2] rest_init+0x4c/0x42c
[24438.676884][ T2] start_kernel+0x778/0x7c0
[24438.676902][ T2] start_here_common+0x1c/0x334
[24438.676910][ T2]
[24438.676910][ T2] -> #1 (&p->pi_lock){-.-.}:
[24438.676921][ T2] __lock_acquire+0x5b4/0xbf0
[24438.676929][ T2] lock_acquire+0x130/0x360
[24438.676947][ T2] _raw_spin_lock_irqsave+0x70/0xa0
[24438.676973][ T2] try_to_wake_up+0x70/0x1600
[24438.676991][ T2] pollwake+0x88/0xc0
[24438.677009][ T2] __wake_up_common+0xec/0x280
[24438.677026][ T2] __wake_up_common_lock+0xac/0x110
[24438.677044][ T2] account.constprop.8+0x284/0x430
[24438.677061][ T2] extract_entropy.constprop.7+0xd4/0x330
[24438.677080][ T2] _xfer_secondary_pool+0x104/0x3e0
[24438.677097][ T2] push_to_pool+0x58/0x310
[24438.677116][ T2] process_one_work+0x300/0x8e0
[24438.677133][ T2] worker_thread+0x78/0x530
[24438.677151][ T2] kthread+0x1a8/0x1b0
[24438.677180][ T2] ret_from_kernel_thread+0x5c/0x74
[24438.677245][ T2]
[24438.677245][ T2] -> #0 (random_write_wait.lock){..-.}:
[24438.677329][ T2] check_prev_add+0x100/0x11b0
[24438.677377][ T2] validate_chain+0x868/0x1530
[24438.677446][ T2] __lock_acquire+0x5b4/0xbf0
[24438.677516][ T2] lock_acquire+0x130/0x360
[24438.677563][ T2] _raw_spin_lock_irqsave+0x70/0xa0
[24438.677618][ T2] __wake_up_common_lock+0x88/0x110
[24438.677678][ T2] account.constprop.8+0x284/0x430
[24438.677743][ T2] extract_entropy.constprop.7+0xd4/0x330
[24438.677802][ T2] crng_reseed+0x68/0x490
[24438.677867][ T2] _extract_crng+0x104/0x110
[24438.677914][ T2] crng_reseed+0x284/0x490
[24438.677983][ T2] _extract_crng+0x104/0x110
[24438.678032][ T2] get_random_u64+0xdc/0x100
[24438.678101][ T2] copy_process+0x2d8/0x1bf0
[24438.678148][ T2] _do_fork+0xac/0xac0
[24438.678208][ T2] kernel_thread+0x70/0xa0
[24438.678246][ T2] kthreadd+0x270/0x330
[24438.678301][ T2] ret_from_kernel_thread+0x5c/0x74
[24438.678342][ T2]
[24438.678342][ T2] other info that might help us debug this:
[24438.678342][ T2]
[24438.678459][ T2] Chain exists of:
[24438.678459][ T2] random_write_wait.lock --> &(&zone->lock)->rlock -->
batched_entropy_u64.lock
[24438.678459][ T2]
[24438.678636][ T2] Possible unsafe locking scenario:
[24438.678636][ T2]
[24438.678692][ T2] CPU0 CPU1
[24438.678754][ T2] ---- ----
[24438.678814][ T2] lock(batched_entropy_u64.lock);
[24438.678878][ T2] lock(&(&zone->lock)-
>rlock);
[24438.678951][ T2] lock(batched_entropy_u64.l
ock);
[24438.679038][ T2] lock(random_write_wait.lock);
[24438.679098][ T2]
[24438.679098][ T2] *** DEADLOCK ***
[24438.679098][ T2]
[24438.679174][ T2] 1 lock held by kthreadd/2:
[24438.679230][ T2] #0: c000001ffd2f06e0 (batched_entropy_u64.lock){-...},
at: get_random_u64+0x60/0x100
[24438.679341][ T2]
[24438.679341][ T2] stack backtrace:
[24438.679413][ T2] CPU: 13 PID: 2 Comm: kthreadd Not tainted 5.3.0-next-
20190924 #2
[24438.679485][ T2] Call Trace:
[24438.679507][ T2] [c00000002c84efe0] [c00000000091a574]
dump_stack+0xe8/0x164 (unreliable)
[24438.679618][ T2] [c00000002c84f030] [c0000000001cc9b8]
print_circular_bug+0x3a8/0x420
[24438.679701][ T2] [c00000002c84f0e0] [c0000000001ccc90]
check_noncircular+0x260/0x320
[24438.679769][ T2] [c00000002c84f1e0] [c0000000001ce7e0]
check_prev_add+0x100/0x11b0
[24438.679868][ T2] [c00000002c84f2c0] [c0000000001d00f8]
validate_chain+0x868/0x1530
[24438.679950][ T2] [c00000002c84f3f0] [c0000000001d3064]
__lock_acquire+0x5b4/0xbf0
[24438.680059][ T2] [c00000002c84f4f0] [c0000000001d3ed0]
lock_acquire+0x130/0x360
[24438.680122][ T2] [c00000002c84f5d0] [c000000000947d70]
_raw_spin_lock_irqsave+0x70/0xa0
[24438.680207][ T2] [c00000002c84f610] [c0000000001a9488]
__wake_up_common_lock+0x88/0x110
[24438.680298][ T2] [c00000002c84f690] [c0000000006f11a4]
account.constprop.8+0x284/0x430
[24438.680399][ T2] [c00000002c84f750] [c0000000006f1554]
extract_entropy.constprop.7+0xd4/0x330
[24438.680495][ T2] [c00000002c84f7d0] [c0000000006f1818]
crng_reseed+0x68/0x490
[24438.680590][ T2] [c00000002c84f910] [c0000000006f4094]
_extract_crng+0x104/0x110
[24438.680662][ T2] [c00000002c84f950] [c0000000006f1a34]
crng_reseed+0x284/0x490
[24438.680751][ T2] [c00000002c84fa90] [c0000000006f4094]
_extract_crng+0x104/0x110
[24438.680828][ T2] [c00000002c84fad0] [c0000000006f4c0c]
get_random_u64+0xdc/0x100
[24438.680931][ T2] [c00000002c84fb10] [c000000000106988]
copy_process+0x2d8/0x1bf0
[24438.681007][ T2] [c00000002c84fc30] [c00000000010861c] _do_fork+0xac/0xac0
[24438.681074][ T2] [c00000002c84fd10] [c0000000001090d0]
kernel_thread+0x70/0xa0
[24438.681170][ T2] [c00000002c84fd80] [c0000000001518f0]
kthreadd+0x270/0x330
[24438.681257][ T2] [c00000002c84fe20] [c00000000000b748]
ret_from_kernel_thread+0x5c/0x74
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 63900ca029e0..ec1d72f18b34 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6027,10 +6027,11 @@ void init_idle(struct task_struct *idle, int cpu)
> struct rq *rq = cpu_rq(cpu);
> unsigned long flags;
>
> + __sched_fork(0, idle);
> +
> raw_spin_lock_irqsave(&idle->pi_lock, flags);
> raw_spin_lock(&rq->lock);
>
> - __sched_fork(0, idle);
> idle->state = TASK_RUNNING;
> idle->se.exec_start = sched_clock();
> idle->flags |= PF_IDLE;
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] mm/slub: fix a deadlock in shuffle_freelist()
2019-09-25 15:18 ` Qian Cai
@ 2019-09-25 16:45 ` Peter Zijlstra
2019-09-26 12:29 ` Qian Cai
0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2019-09-25 16:45 UTC (permalink / raw)
To: Qian Cai
Cc: akpm, bigeasy, tglx, thgarnie, tytso, cl, penberg, rientjes,
mingo, will, linux-mm, linux-kernel, keescook
On Wed, Sep 25, 2019 at 11:18:47AM -0400, Qian Cai wrote:
> On Wed, 2019-09-25 at 11:31 +0200, Peter Zijlstra wrote:
> > On Fri, Sep 13, 2019 at 12:27:44PM -0400, Qian Cai wrote:
> > > -> #3 (batched_entropy_u32.lock){-.-.}:
> > > lock_acquire+0x31c/0x360
> > > _raw_spin_lock_irqsave+0x7c/0x9c
> > > get_random_u32+0x6c/0x1dc
> > > new_slab+0x234/0x6c0
> > > ___slab_alloc+0x3c8/0x650
> > > kmem_cache_alloc+0x4b0/0x590
> > > __debug_object_init+0x778/0x8b4
> > > debug_object_init+0x40/0x50
> > > debug_init+0x30/0x29c
> > > hrtimer_init+0x30/0x50
> > > init_dl_task_timer+0x24/0x44
> > > __sched_fork+0xc0/0x168
> > > init_idle+0x78/0x26c
> > > fork_idle+0x12c/0x178
> > > idle_threads_init+0x108/0x178
> > > smp_init+0x20/0x1bc
> > > kernel_init_freeable+0x198/0x26c
> > > kernel_init+0x18/0x334
> > > ret_from_fork+0x10/0x18
> > >
> > > -> #2 (&rq->lock){-.-.}:
> >
> > This relation is silly..
> >
> > I suspect the below 'works'...
>
> Unfortunately, the relation is still there,
>
> copy_process()->rt_mutex_init_task()->"&p->pi_lock"
>
> [24438.676716][ T2] -> #2 (&rq->lock){-.-.}:
> [24438.676727][ T2] __lock_acquire+0x5b4/0xbf0
> [24438.676736][ T2] lock_acquire+0x130/0x360
> [24438.676754][ T2] _raw_spin_lock+0x54/0x80
> [24438.676771][ T2] task_fork_fair+0x60/0x190
> [24438.676788][ T2] sched_fork+0x128/0x270
> [24438.676806][ T2] copy_process+0x7a4/0x1bf0
> [24438.676823][ T2] _do_fork+0xac/0xac0
> [24438.676841][ T2] kernel_thread+0x70/0xa0
> [24438.676858][ T2] rest_init+0x4c/0x42c
> [24438.676884][ T2] start_kernel+0x778/0x7c0
> [24438.676902][ T2] start_here_common+0x1c/0x334
That's the 'where we took #2 while holding #1' stacktrace and not
relevant to our discussion.
> [24438.675836][ T2] -> #4 (batched_entropy_u64.lock){-...}:
> [24438.675860][ T2] __lock_acquire+0x5b4/0xbf0
> [24438.675878][ T2] lock_acquire+0x130/0x360
> [24438.675906][ T2] _raw_spin_lock_irqsave+0x70/0xa0
> [24438.675923][ T2] get_random_u64+0x60/0x100
> [24438.675944][ T2] add_to_free_area_random+0x164/0x1b0
> [24438.675962][ T2] free_one_page+0xb24/0xcf0
> [24438.675980][ T2] __free_pages_ok+0x448/0xbf0
> [24438.675999][ T2] deferred_init_maxorder+0x404/0x4a4
> [24438.676018][ T2] deferred_grow_zone+0x158/0x1f0
> [24438.676035][ T2] get_page_from_freelist+0x1dc8/0x1e10
> [24438.676063][ T2] __alloc_pages_nodemask+0x1d8/0x1940
> [24438.676083][ T2] allocate_slab+0x130/0x2740
> [24438.676091][ T2] new_slab+0xa8/0xe0
> [24438.676101][ T2] kmem_cache_open+0x254/0x660
> [24438.676119][ T2] __kmem_cache_create+0x44/0x2a0
> [24438.676136][ T2] create_boot_cache+0xcc/0x110
> [24438.676154][ T2] kmem_cache_init+0x90/0x1f0
> [24438.676173][ T2] start_kernel+0x3b8/0x7c0
> [24438.676191][ T2] start_here_common+0x1c/0x334
> [24438.676208][ T2]
> [24438.676208][ T2] -> #3 (&(&zone->lock)->rlock){-.-.}:
> [24438.676221][ T2] __lock_acquire+0x5b4/0xbf0
> [24438.676247][ T2] lock_acquire+0x130/0x360
> [24438.676264][ T2] _raw_spin_lock+0x54/0x80
> [24438.676282][ T2] rmqueue_bulk.constprop.23+0x64/0xf20
> [24438.676300][ T2] get_page_from_freelist+0x718/0x1e10
> [24438.676319][ T2] __alloc_pages_nodemask+0x1d8/0x1940
> [24438.676339][ T2] alloc_page_interleave+0x34/0x170
> [24438.676356][ T2] allocate_slab+0xd1c/0x2740
> [24438.676374][ T2] new_slab+0xa8/0xe0
> [24438.676391][ T2] ___slab_alloc+0x580/0xef0
> [24438.676408][ T2] __slab_alloc+0x64/0xd0
> [24438.676426][ T2] kmem_cache_alloc+0x5c4/0x6c0
> [24438.676444][ T2] fill_pool+0x280/0x540
> [24438.676461][ T2] __debug_object_init+0x60/0x6b0
> [24438.676479][ T2] hrtimer_init+0x5c/0x310
> [24438.676497][ T2] init_dl_task_timer+0x34/0x60
> [24438.676516][ T2] __sched_fork+0x8c/0x110
> [24438.676535][ T2] init_idle+0xb4/0x3c0
> [24438.676553][ T2] idle_thread_get+0x78/0x120
> [24438.676572][ T2] bringup_cpu+0x30/0x230
> [24438.676590][ T2] cpuhp_invoke_callback+0x190/0x1580
> [24438.676618][ T2] do_cpu_up+0x248/0x460
> [24438.676636][ T2] smp_init+0x118/0x1c0
> [24438.676662][ T2] kernel_init_freeable+0x3f8/0x8dc
> [24438.676681][ T2] kernel_init+0x2c/0x154
> [24438.676699][ T2] ret_from_kernel_thread+0x5c/0x74
> [24438.676716][ T2]
> [24438.676716][ T2] -> #2 (&rq->lock){-.-.}:
This then shows we now have:
rq->lock
zone->lock.rlock
batched_entropy_u64.lock
Which, to me, appears to be distinctly different from the last time,
which was:
rq->lock
batched_entropy_u32.lock
Notable: "u32" != "u64".
But #3 has:
> [24438.676516][ T2] __sched_fork+0x8c/0x110
> [24438.676535][ T2] init_idle+0xb4/0x3c0
Which seems to suggest you didn't actually apply the patch; or rather,
if you did, i'm not immediately seeing where #2 is acquired.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] mm/slub: fix a deadlock in shuffle_freelist()
2019-09-25 16:45 ` Peter Zijlstra
@ 2019-09-26 12:29 ` Qian Cai
2019-10-01 9:18 ` [PATCH] sched: Avoid spurious lock dependencies Peter Zijlstra
0 siblings, 1 reply; 23+ messages in thread
From: Qian Cai @ 2019-09-26 12:29 UTC (permalink / raw)
To: Peter Zijlstra
Cc: akpm, bigeasy, tglx, thgarnie, tytso, cl, penberg, rientjes,
mingo, will, linux-mm, linux-kernel, keescook
On Wed, 2019-09-25 at 18:45 +0200, Peter Zijlstra wrote:
> On Wed, Sep 25, 2019 at 11:18:47AM -0400, Qian Cai wrote:
> > On Wed, 2019-09-25 at 11:31 +0200, Peter Zijlstra wrote:
> > > On Fri, Sep 13, 2019 at 12:27:44PM -0400, Qian Cai wrote:
> > > > -> #3 (batched_entropy_u32.lock){-.-.}:
> > > > lock_acquire+0x31c/0x360
> > > > _raw_spin_lock_irqsave+0x7c/0x9c
> > > > get_random_u32+0x6c/0x1dc
> > > > new_slab+0x234/0x6c0
> > > > ___slab_alloc+0x3c8/0x650
> > > > kmem_cache_alloc+0x4b0/0x590
> > > > __debug_object_init+0x778/0x8b4
> > > > debug_object_init+0x40/0x50
> > > > debug_init+0x30/0x29c
> > > > hrtimer_init+0x30/0x50
> > > > init_dl_task_timer+0x24/0x44
> > > > __sched_fork+0xc0/0x168
> > > > init_idle+0x78/0x26c
> > > > fork_idle+0x12c/0x178
> > > > idle_threads_init+0x108/0x178
> > > > smp_init+0x20/0x1bc
> > > > kernel_init_freeable+0x198/0x26c
> > > > kernel_init+0x18/0x334
> > > > ret_from_fork+0x10/0x18
> > > >
> > > > -> #2 (&rq->lock){-.-.}:
> > >
> > > This relation is silly..
> > >
> > > I suspect the below 'works'...
> >
> > Unfortunately, the relation is still there,
> >
> > copy_process()->rt_mutex_init_task()->"&p->pi_lock"
> >
> > [24438.676716][ T2] -> #2 (&rq->lock){-.-.}:
> > [24438.676727][ T2] __lock_acquire+0x5b4/0xbf0
> > [24438.676736][ T2] lock_acquire+0x130/0x360
> > [24438.676754][ T2] _raw_spin_lock+0x54/0x80
> > [24438.676771][ T2] task_fork_fair+0x60/0x190
> > [24438.676788][ T2] sched_fork+0x128/0x270
> > [24438.676806][ T2] copy_process+0x7a4/0x1bf0
> > [24438.676823][ T2] _do_fork+0xac/0xac0
> > [24438.676841][ T2] kernel_thread+0x70/0xa0
> > [24438.676858][ T2] rest_init+0x4c/0x42c
> > [24438.676884][ T2] start_kernel+0x778/0x7c0
> > [24438.676902][ T2] start_here_common+0x1c/0x334
>
> That's the 'where we took #2 while holding #1' stacktrace and not
> relevant to our discussion.
Oh, you were talking about took #3 while holding #2. Anyway, your patch is
working fine so far. Care to post/merge it officially or do you want me to post
it?
>
> > [24438.675836][ T2] -> #4 (batched_entropy_u64.lock){-...}:
> > [24438.675860][ T2] __lock_acquire+0x5b4/0xbf0
> > [24438.675878][ T2] lock_acquire+0x130/0x360
> > [24438.675906][ T2] _raw_spin_lock_irqsave+0x70/0xa0
> > [24438.675923][ T2] get_random_u64+0x60/0x100
> > [24438.675944][ T2] add_to_free_area_random+0x164/0x1b0
> > [24438.675962][ T2] free_one_page+0xb24/0xcf0
> > [24438.675980][ T2] __free_pages_ok+0x448/0xbf0
> > [24438.675999][ T2] deferred_init_maxorder+0x404/0x4a4
> > [24438.676018][ T2] deferred_grow_zone+0x158/0x1f0
> > [24438.676035][ T2] get_page_from_freelist+0x1dc8/0x1e10
> > [24438.676063][ T2] __alloc_pages_nodemask+0x1d8/0x1940
> > [24438.676083][ T2] allocate_slab+0x130/0x2740
> > [24438.676091][ T2] new_slab+0xa8/0xe0
> > [24438.676101][ T2] kmem_cache_open+0x254/0x660
> > [24438.676119][ T2] __kmem_cache_create+0x44/0x2a0
> > [24438.676136][ T2] create_boot_cache+0xcc/0x110
> > [24438.676154][ T2] kmem_cache_init+0x90/0x1f0
> > [24438.676173][ T2] start_kernel+0x3b8/0x7c0
> > [24438.676191][ T2] start_here_common+0x1c/0x334
> > [24438.676208][ T2]
> > [24438.676208][ T2] -> #3 (&(&zone->lock)->rlock){-.-.}:
> > [24438.676221][ T2] __lock_acquire+0x5b4/0xbf0
> > [24438.676247][ T2] lock_acquire+0x130/0x360
> > [24438.676264][ T2] _raw_spin_lock+0x54/0x80
> > [24438.676282][ T2] rmqueue_bulk.constprop.23+0x64/0xf20
> > [24438.676300][ T2] get_page_from_freelist+0x718/0x1e10
> > [24438.676319][ T2] __alloc_pages_nodemask+0x1d8/0x1940
> > [24438.676339][ T2] alloc_page_interleave+0x34/0x170
> > [24438.676356][ T2] allocate_slab+0xd1c/0x2740
> > [24438.676374][ T2] new_slab+0xa8/0xe0
> > [24438.676391][ T2] ___slab_alloc+0x580/0xef0
> > [24438.676408][ T2] __slab_alloc+0x64/0xd0
> > [24438.676426][ T2] kmem_cache_alloc+0x5c4/0x6c0
> > [24438.676444][ T2] fill_pool+0x280/0x540
> > [24438.676461][ T2] __debug_object_init+0x60/0x6b0
> > [24438.676479][ T2] hrtimer_init+0x5c/0x310
> > [24438.676497][ T2] init_dl_task_timer+0x34/0x60
> > [24438.676516][ T2] __sched_fork+0x8c/0x110
> > [24438.676535][ T2] init_idle+0xb4/0x3c0
> > [24438.676553][ T2] idle_thread_get+0x78/0x120
> > [24438.676572][ T2] bringup_cpu+0x30/0x230
> > [24438.676590][ T2] cpuhp_invoke_callback+0x190/0x1580
> > [24438.676618][ T2] do_cpu_up+0x248/0x460
> > [24438.676636][ T2] smp_init+0x118/0x1c0
> > [24438.676662][ T2] kernel_init_freeable+0x3f8/0x8dc
> > [24438.676681][ T2] kernel_init+0x2c/0x154
> > [24438.676699][ T2] ret_from_kernel_thread+0x5c/0x74
> > [24438.676716][ T2]
> > [24438.676716][ T2] -> #2 (&rq->lock){-.-.}:
>
> This then shows we now have:
>
> rq->lock
> zone->lock.rlock
> batched_entropy_u64.lock
>
> Which, to me, appears to be distinctly different from the last time,
> which was:
>
> rq->lock
> batched_entropy_u32.lock
>
> Notable: "u32" != "u64".
>
> But #3 has:
>
> > [24438.676516][ T2] __sched_fork+0x8c/0x110
> > [24438.676535][ T2] init_idle+0xb4/0x3c0
>
> Which seems to suggest you didn't actually apply the patch; or rather,
> if you did, i'm not immediately seeing where #2 is acquired.
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH] sched: Avoid spurious lock dependencies
2019-09-26 12:29 ` Qian Cai
@ 2019-10-01 9:18 ` Peter Zijlstra
2019-10-01 10:01 ` Valentin Schneider
` (4 more replies)
0 siblings, 5 replies; 23+ messages in thread
From: Peter Zijlstra @ 2019-10-01 9:18 UTC (permalink / raw)
To: Qian Cai
Cc: akpm, bigeasy, tglx, thgarnie, tytso, cl, penberg, rientjes,
mingo, will, linux-mm, linux-kernel, keescook
On Thu, Sep 26, 2019 at 08:29:34AM -0400, Qian Cai wrote:
> Oh, you were talking about took #3 while holding #2. Anyway, your patch is
> working fine so far. Care to post/merge it officially or do you want me to post
> it?
Does the below adequately describe the situation?
---
Subject: sched: Avoid spurious lock dependencies
While seemingly harmless, __sched_fork() does hrtimer_init(), which,
when DEBUG_OBJETS, can end up doing allocations.
This then results in the following lock order:
rq->lock
zone->lock.rlock
batched_entropy_u64.lock
Which in turn causes deadlocks when we do wakeups while holding that
batched_entropy lock -- as the random code does.
Solve this by moving __sched_fork() out from under rq->lock. This is
safe because nothing there relies on rq->lock, as also evident from the
other __sched_fork() callsite.
Fixes: b7d5dc21072c ("random: add a spinlock_t to struct batched_entropy")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
kernel/sched/core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7880f4f64d0e..1832fc0fbec5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6039,10 +6039,11 @@ void init_idle(struct task_struct *idle, int cpu)
struct rq *rq = cpu_rq(cpu);
unsigned long flags;
+ __sched_fork(0, idle);
+
raw_spin_lock_irqsave(&idle->pi_lock, flags);
raw_spin_lock(&rq->lock);
- __sched_fork(0, idle);
idle->state = TASK_RUNNING;
idle->se.exec_start = sched_clock();
idle->flags |= PF_IDLE;
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH] sched: Avoid spurious lock dependencies
2019-10-01 9:18 ` [PATCH] sched: Avoid spurious lock dependencies Peter Zijlstra
@ 2019-10-01 10:01 ` Valentin Schneider
2019-10-01 11:22 ` Qian Cai
` (3 subsequent siblings)
4 siblings, 0 replies; 23+ messages in thread
From: Valentin Schneider @ 2019-10-01 10:01 UTC (permalink / raw)
To: Peter Zijlstra, Qian Cai
Cc: akpm, bigeasy, tglx, thgarnie, tytso, cl, penberg, rientjes,
mingo, will, linux-mm, linux-kernel, keescook
On 01/10/2019 10:18, Peter Zijlstra wrote:
> On Thu, Sep 26, 2019 at 08:29:34AM -0400, Qian Cai wrote:
>
>> Oh, you were talking about took #3 while holding #2. Anyway, your patch is
>> working fine so far. Care to post/merge it officially or do you want me to post
>> it?
>
> Does the below adequately describe the situation?
>
> ---
> Subject: sched: Avoid spurious lock dependencies
>
> While seemingly harmless, __sched_fork() does hrtimer_init(), which,
> when DEBUG_OBJETS, can end up doing allocations.
>
> This then results in the following lock order:
>
> rq->lock
> zone->lock.rlock
> batched_entropy_u64.lock
>
> Which in turn causes deadlocks when we do wakeups while holding that
> batched_entropy lock -- as the random code does.
>
> Solve this by moving __sched_fork() out from under rq->lock. This is
> safe because nothing there relies on rq->lock, as also evident from the
> other __sched_fork() callsite.
>
> Fixes: b7d5dc21072c ("random: add a spinlock_t to struct batched_entropy")
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Funky dependency, but the change looks fine to me.
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
> ---
> kernel/sched/core.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 7880f4f64d0e..1832fc0fbec5 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6039,10 +6039,11 @@ void init_idle(struct task_struct *idle, int cpu)
> struct rq *rq = cpu_rq(cpu);
> unsigned long flags;
>
> + __sched_fork(0, idle);
> +
> raw_spin_lock_irqsave(&idle->pi_lock, flags);
> raw_spin_lock(&rq->lock);
>
> - __sched_fork(0, idle);
> idle->state = TASK_RUNNING;
> idle->se.exec_start = sched_clock();
> idle->flags |= PF_IDLE;
>
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] sched: Avoid spurious lock dependencies
2019-10-01 9:18 ` [PATCH] sched: Avoid spurious lock dependencies Peter Zijlstra
2019-10-01 10:01 ` Valentin Schneider
@ 2019-10-01 11:22 ` Qian Cai
2019-10-01 11:36 ` Srikar Dronamraju
` (2 subsequent siblings)
4 siblings, 0 replies; 23+ messages in thread
From: Qian Cai @ 2019-10-01 11:22 UTC (permalink / raw)
To: Peter Zijlstra
Cc: akpm, bigeasy, tglx, thgarnie, tytso, cl, penberg, rientjes,
mingo, will, linux-mm, linux-kernel, keescook
> On Oct 1, 2019, at 5:18 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> Does the below adequately describe the situation?
Yes, looks fine.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] sched: Avoid spurious lock dependencies
2019-10-01 9:18 ` [PATCH] sched: Avoid spurious lock dependencies Peter Zijlstra
2019-10-01 10:01 ` Valentin Schneider
2019-10-01 11:22 ` Qian Cai
@ 2019-10-01 11:36 ` Srikar Dronamraju
2019-10-01 13:44 ` Peter Zijlstra
2019-10-29 11:10 ` Qian Cai
2019-11-13 10:06 ` [tip: sched/urgent] sched/core: " tip-bot2 for Peter Zijlstra
4 siblings, 1 reply; 23+ messages in thread
From: Srikar Dronamraju @ 2019-10-01 11:36 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Qian Cai, akpm, bigeasy, tglx, thgarnie, tytso, cl, penberg,
rientjes, mingo, will, linux-mm, linux-kernel, keescook
> Subject: sched: Avoid spurious lock dependencies
>
> While seemingly harmless, __sched_fork() does hrtimer_init(), which,
> when DEBUG_OBJETS, can end up doing allocations.
>
NIT: s/DEBUG_OBJETS/DEBUG_OBJECTS
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 7880f4f64d0e..1832fc0fbec5 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6039,10 +6039,11 @@ void init_idle(struct task_struct *idle, int cpu)
> struct rq *rq = cpu_rq(cpu);
> unsigned long flags;
>
> + __sched_fork(0, idle);
> +
> raw_spin_lock_irqsave(&idle->pi_lock, flags);
> raw_spin_lock(&rq->lock);
>
> - __sched_fork(0, idle);
> idle->state = TASK_RUNNING;
> idle->se.exec_start = sched_clock();
> idle->flags |= PF_IDLE;
>
Given that there is a comment just after this which says
"init_task() gets called multiple times on a task",
should we add a check if rq->idle is present and bail out?
if (rq->idle) {
raw_spin_unlock(&rq->lock);
raw_spin_unlock_irqrestore(&idle->pi_lock, flags);
return;
}
Also can we also move the above 3 statements before the lock?
--
Thanks and Regards
Srikar Dronamraju
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] sched: Avoid spurious lock dependencies
2019-10-01 11:36 ` Srikar Dronamraju
@ 2019-10-01 13:44 ` Peter Zijlstra
0 siblings, 0 replies; 23+ messages in thread
From: Peter Zijlstra @ 2019-10-01 13:44 UTC (permalink / raw)
To: Srikar Dronamraju
Cc: Qian Cai, akpm, bigeasy, tglx, thgarnie, tytso, cl, penberg,
rientjes, mingo, will, linux-mm, linux-kernel, keescook
On Tue, Oct 01, 2019 at 05:06:56PM +0530, Srikar Dronamraju wrote:
> > Subject: sched: Avoid spurious lock dependencies
> >
> > While seemingly harmless, __sched_fork() does hrtimer_init(), which,
> > when DEBUG_OBJETS, can end up doing allocations.
> >
>
> NIT: s/DEBUG_OBJETS/DEBUG_OBJECTS
>
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 7880f4f64d0e..1832fc0fbec5 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -6039,10 +6039,11 @@ void init_idle(struct task_struct *idle, int cpu)
> > struct rq *rq = cpu_rq(cpu);
> > unsigned long flags;
> >
> > + __sched_fork(0, idle);
> > +
> > raw_spin_lock_irqsave(&idle->pi_lock, flags);
> > raw_spin_lock(&rq->lock);
> >
> > - __sched_fork(0, idle);
> > idle->state = TASK_RUNNING;
> > idle->se.exec_start = sched_clock();
> > idle->flags |= PF_IDLE;
> >
>
> Given that there is a comment just after this which says
> "init_task() gets called multiple times on a task",
> should we add a check if rq->idle is present and bail out?
>
> if (rq->idle) {
> raw_spin_unlock(&rq->lock);
> raw_spin_unlock_irqrestore(&idle->pi_lock, flags);
> return;
> }
Not really worth it; the best solution is to fix the callchains leading
up to it. It's all hotplug related IIRC and so it's slow anyway.
> Also can we also move the above 3 statements before the lock?
Probably, but to what effect?
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] sched: Avoid spurious lock dependencies
2019-10-01 9:18 ` [PATCH] sched: Avoid spurious lock dependencies Peter Zijlstra
` (2 preceding siblings ...)
2019-10-01 11:36 ` Srikar Dronamraju
@ 2019-10-29 11:10 ` Qian Cai
2019-10-29 12:44 ` Peter Zijlstra
2019-11-13 10:06 ` [tip: sched/urgent] sched/core: " tip-bot2 for Peter Zijlstra
4 siblings, 1 reply; 23+ messages in thread
From: Qian Cai @ 2019-10-29 11:10 UTC (permalink / raw)
To: Peter Zijlstra
Cc: akpm, bigeasy, tglx, thgarnie, tytso, cl, penberg, rientjes,
mingo, will, linux-mm, linux-kernel, keescook
> On Oct 1, 2019, at 5:18 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> Does the below adequately describe the situation?
>
> ---
> Subject: sched: Avoid spurious lock dependencies
>
> While seemingly harmless, __sched_fork() does hrtimer_init(), which,
> when DEBUG_OBJETS, can end up doing allocations.
>
> This then results in the following lock order:
>
> rq->lock
> zone->lock.rlock
> batched_entropy_u64.lock
>
> Which in turn causes deadlocks when we do wakeups while holding that
> batched_entropy lock -- as the random code does.
>
> Solve this by moving __sched_fork() out from under rq->lock. This is
> safe because nothing there relies on rq->lock, as also evident from the
> other __sched_fork() callsite.
>
> Fixes: b7d5dc21072c ("random: add a spinlock_t to struct batched_entropy")
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
> kernel/sched/core.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 7880f4f64d0e..1832fc0fbec5 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6039,10 +6039,11 @@ void init_idle(struct task_struct *idle, int cpu)
> struct rq *rq = cpu_rq(cpu);
> unsigned long flags;
>
> + __sched_fork(0, idle);
> +
> raw_spin_lock_irqsave(&idle->pi_lock, flags);
> raw_spin_lock(&rq->lock);
>
> - __sched_fork(0, idle);
> idle->state = TASK_RUNNING;
> idle->se.exec_start = sched_clock();
> idle->flags |= PF_IDLE;
It looks like this patch has been forgotten forever. Do you need to repost, so Ingo might have a better chance to pick it up?
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] sched: Avoid spurious lock dependencies
2019-10-29 11:10 ` Qian Cai
@ 2019-10-29 12:44 ` Peter Zijlstra
2019-11-12 0:54 ` Qian Cai
0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2019-10-29 12:44 UTC (permalink / raw)
To: Qian Cai
Cc: akpm, bigeasy, tglx, thgarnie, tytso, cl, penberg, rientjes,
mingo, will, linux-mm, linux-kernel, keescook
On Tue, Oct 29, 2019 at 07:10:34AM -0400, Qian Cai wrote:
>
> It looks like this patch has been forgotten forever. Do you need to
> repost, so Ingo might have a better chance to pick it up?
I've queued it now, sorry!
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] sched: Avoid spurious lock dependencies
2019-10-29 12:44 ` Peter Zijlstra
@ 2019-11-12 0:54 ` Qian Cai
0 siblings, 0 replies; 23+ messages in thread
From: Qian Cai @ 2019-11-12 0:54 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andrew Morton, bigeasy, Thomas Gleixner, thgarnie, tytso, cl,
penberg, rientjes, mingo, will, linux-mm, linux-kernel, keescook
> On Oct 29, 2019, at 8:44 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Oct 29, 2019 at 07:10:34AM -0400, Qian Cai wrote:
>>
>> It looks like this patch has been forgotten forever. Do you need to
>> repost, so Ingo might have a better chance to pick it up?
>
> I've queued it now, sorry!
Hmm, this is still not even in the linux-next after another 2 weeks. Not sure
what to do except carrying the patch on my own.
^ permalink raw reply [flat|nested] 23+ messages in thread
* [tip: sched/urgent] sched/core: Avoid spurious lock dependencies
2019-10-01 9:18 ` [PATCH] sched: Avoid spurious lock dependencies Peter Zijlstra
` (3 preceding siblings ...)
2019-10-29 11:10 ` Qian Cai
@ 2019-11-13 10:06 ` tip-bot2 for Peter Zijlstra
2019-11-22 20:01 ` Sebastian Andrzej Siewior
4 siblings, 1 reply; 23+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2019-11-13 10:06 UTC (permalink / raw)
To: linux-tip-commits
Cc: Peter Zijlstra (Intel),
Linus Torvalds, Qian Cai, Thomas Gleixner, akpm, bigeasy, cl,
keescook, penberg, rientjes, thgarnie, tytso, will, Ingo Molnar,
Borislav Petkov, linux-kernel
The following commit has been merged into the sched/urgent branch of tip:
Commit-ID: ff51ff84d82aea5a889b85f2b9fb3aa2b8691668
Gitweb: https://git.kernel.org/tip/ff51ff84d82aea5a889b85f2b9fb3aa2b8691668
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Tue, 01 Oct 2019 11:18:37 +02:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 13 Nov 2019 08:01:30 +01:00
sched/core: Avoid spurious lock dependencies
While seemingly harmless, __sched_fork() does hrtimer_init(), which,
when DEBUG_OBJETS, can end up doing allocations.
This then results in the following lock order:
rq->lock
zone->lock.rlock
batched_entropy_u64.lock
Which in turn causes deadlocks when we do wakeups while holding that
batched_entropy lock -- as the random code does.
Solve this by moving __sched_fork() out from under rq->lock. This is
safe because nothing there relies on rq->lock, as also evident from the
other __sched_fork() callsite.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: bigeasy@linutronix.de
Cc: cl@linux.com
Cc: keescook@chromium.org
Cc: penberg@kernel.org
Cc: rientjes@google.com
Cc: thgarnie@google.com
Cc: tytso@mit.edu
Cc: will@kernel.org
Fixes: b7d5dc21072c ("random: add a spinlock_t to struct batched_entropy")
Link: https://lkml.kernel.org/r/20191001091837.GK4536@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
kernel/sched/core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0f2eb36..33cd250 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6019,10 +6019,11 @@ void init_idle(struct task_struct *idle, int cpu)
struct rq *rq = cpu_rq(cpu);
unsigned long flags;
+ __sched_fork(0, idle);
+
raw_spin_lock_irqsave(&idle->pi_lock, flags);
raw_spin_lock(&rq->lock);
- __sched_fork(0, idle);
idle->state = TASK_RUNNING;
idle->se.exec_start = sched_clock();
idle->flags |= PF_IDLE;
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [tip: sched/urgent] sched/core: Avoid spurious lock dependencies
2019-11-13 10:06 ` [tip: sched/urgent] sched/core: " tip-bot2 for Peter Zijlstra
@ 2019-11-22 20:01 ` Sebastian Andrzej Siewior
2019-11-22 20:20 ` Peter Zijlstra
0 siblings, 1 reply; 23+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-11-22 20:01 UTC (permalink / raw)
To: linux-kernel
Cc: linux-tip-commits, Peter Zijlstra (Intel),
Linus Torvalds, Qian Cai, Thomas Gleixner, akpm, cl, keescook,
penberg, rientjes, thgarnie, tytso, will, Ingo Molnar,
Borislav Petkov
On 2019-11-13 10:06:28 [-0000], tip-bot2 for Peter Zijlstra wrote:
> sched/core: Avoid spurious lock dependencies
>
> While seemingly harmless, __sched_fork() does hrtimer_init(), which,
> when DEBUG_OBJETS, can end up doing allocations.
>
> This then results in the following lock order:
>
> rq->lock
> zone->lock.rlock
> batched_entropy_u64.lock
>
> Which in turn causes deadlocks when we do wakeups while holding that
> batched_entropy lock -- as the random code does.
Peter, can it _really_ cause deadlocks? My understanding was that the
batched_entropy_u64.lock is a per-CPU lock and can _not_ cause a
deadlock because it can be always acquired on multiple CPUs
simultaneously (and it is never acquired cross-CPU).
Lockdep is simply not smart enough to see that and complains about it
like it would complain about a regular lock in this case.
Sebastian
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tip: sched/urgent] sched/core: Avoid spurious lock dependencies
2019-11-22 20:01 ` Sebastian Andrzej Siewior
@ 2019-11-22 20:20 ` Peter Zijlstra
2019-11-22 21:03 ` Qian Cai
0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2019-11-22 20:20 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: linux-kernel, linux-tip-commits, Linus Torvalds, Qian Cai,
Thomas Gleixner, akpm, cl, keescook, penberg, rientjes, thgarnie,
tytso, will, Ingo Molnar, Borislav Petkov
On Fri, Nov 22, 2019 at 09:01:22PM +0100, Sebastian Andrzej Siewior wrote:
> On 2019-11-13 10:06:28 [-0000], tip-bot2 for Peter Zijlstra wrote:
> > sched/core: Avoid spurious lock dependencies
> >
> > While seemingly harmless, __sched_fork() does hrtimer_init(), which,
> > when DEBUG_OBJETS, can end up doing allocations.
> >
> > This then results in the following lock order:
> >
> > rq->lock
> > zone->lock.rlock
> > batched_entropy_u64.lock
> >
> > Which in turn causes deadlocks when we do wakeups while holding that
> > batched_entropy lock -- as the random code does.
>
> Peter, can it _really_ cause deadlocks? My understanding was that the
> batched_entropy_u64.lock is a per-CPU lock and can _not_ cause a
> deadlock because it can be always acquired on multiple CPUs
> simultaneously (and it is never acquired cross-CPU).
> Lockdep is simply not smart enough to see that and complains about it
> like it would complain about a regular lock in this case.
That part yes. That is, even holding a per-cpu lock you can do a wakeup
to the local cpu and recurse back onto rq->lock.
However I don't think it can actually happen bceause this
is init_idle, and we only ever do that on hotplug, so actually creating
the concurrency required for the deadlock might be tricky.
Still, moving that thing out from under the lock was simple and correct.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tip: sched/urgent] sched/core: Avoid spurious lock dependencies
2019-11-22 20:20 ` Peter Zijlstra
@ 2019-11-22 21:03 ` Qian Cai
0 siblings, 0 replies; 23+ messages in thread
From: Qian Cai @ 2019-11-22 21:03 UTC (permalink / raw)
To: Peter Zijlstra, Sebastian Andrzej Siewior
Cc: linux-kernel, linux-tip-commits, Linus Torvalds, Thomas Gleixner,
akpm, cl, keescook, penberg, rientjes, thgarnie, tytso, will,
Ingo Molnar, Borislav Petkov
On Fri, 2019-11-22 at 21:20 +0100, Peter Zijlstra wrote:
> On Fri, Nov 22, 2019 at 09:01:22PM +0100, Sebastian Andrzej Siewior wrote:
> > On 2019-11-13 10:06:28 [-0000], tip-bot2 for Peter Zijlstra wrote:
> > > sched/core: Avoid spurious lock dependencies
> > >
> > > While seemingly harmless, __sched_fork() does hrtimer_init(), which,
> > > when DEBUG_OBJETS, can end up doing allocations.
> > >
> > > This then results in the following lock order:
> > >
> > > rq->lock
> > > zone->lock.rlock
> > > batched_entropy_u64.lock
> > >
> > > Which in turn causes deadlocks when we do wakeups while holding that
> > > batched_entropy lock -- as the random code does.
> >
> > Peter, can it _really_ cause deadlocks? My understanding was that the
> > batched_entropy_u64.lock is a per-CPU lock and can _not_ cause a
> > deadlock because it can be always acquired on multiple CPUs
> > simultaneously (and it is never acquired cross-CPU).
> > Lockdep is simply not smart enough to see that and complains about it
> > like it would complain about a regular lock in this case.
>
> That part yes. That is, even holding a per-cpu lock you can do a wakeup
> to the local cpu and recurse back onto rq->lock.
>
> However I don't think it can actually happen bceause this
> is init_idle, and we only ever do that on hotplug, so actually creating
> the concurrency required for the deadlock might be tricky.
>
> Still, moving that thing out from under the lock was simple and correct.
Well, the patch alone fixed a real deadlock during boot.
https://lore.kernel.org/lkml/1566509603.5576.10.camel@lca.pw/
It needs DEBUG_OBJECTS=y to trigger though.
Suppose it does,
CPU0: zone_lock -> prink() [1]
CPUs: printk() -> zone_lock [2]
[1]
[ 1078.599835][T43784] -> #1 (console_owner){-...}:
[ 1078.606618][T43784] __lock_acquire+0x5c8/0xbb0
[ 1078.611661][T43784] lock_acquire+0x154/0x428
[ 1078.616530][T43784] console_unlock+0x298/0x898
[ 1078.621573][T43784] vprintk_emit+0x2d4/0x460
[ 1078.626442][T43784] vprintk_default+0x48/0x58
[ 1078.631398][T43784] vprintk_func+0x194/0x250
[ 1078.636267][T43784] printk+0xbc/0xec
[ 1078.640443][T43784] _warn_unseeded_randomness+0xb4/0xd0
[ 1078.646267][T43784] get_random_u64+0x4c/0x100
[ 1078.651224][T43784] add_to_free_area_random+0x168/0x1a0
[ 1078.657047][T43784] free_one_page+0x3dc/0xd08
[2]
[ 317.337609] -> #3 (&(&zone->lock)->rlock){-.-.}:
[ 317.337612] __lock_acquire+0x5b3/0xb40
[ 317.337613] lock_acquire+0x126/0x280
[ 317.337613] _raw_spin_lock+0x2f/0x40
[ 317.337614] rmqueue_bulk.constprop.21+0xb6/0x1160
[ 317.337615] get_page_from_freelist+0x898/0x22c0
[ 317.337616] __alloc_pages_nodemask+0x2f3/0x1cd0
[ 317.337617] alloc_page_interleave+0x18/0x130
[ 317.337618] alloc_pages_current+0xf6/0x110
[ 317.337619] allocate_slab+0x4c6/0x19c0
[ 317.337620] new_slab+0x46/0x70
[ 317.337621] ___slab_alloc+0x58b/0x960
[ 317.337621] __slab_alloc+0x43/0x70
[ 317.337622] kmem_cache_alloc+0x354/0x460
[ 317.337623] fill_pool+0x272/0x4b0
[ 317.337624] __debug_object_init+0x86/0x790
[ 317.337624] debug_object_init+0x16/0x20
[ 317.337625] hrtimer_init+0x27/0x1e0
[ 317.337626] init_dl_task_timer+0x20/0x40
[ 317.337627] __sched_fork+0x10b/0x1f0
[ 317.337627] init_idle+0xac/0x520
[ 317.337628] idle_thread_get+0x7c/0xc0
[ 317.337629] bringup_cpu+0x1a/0x1e0
[ 317.337630] cpuhp_invoke_callback+0x197/0x1120
[ 317.337630] _cpu_up+0x171/0x280
[ 317.337631] do_cpu_up+0xb1/0x120
[ 317.337632] cpu_up+0x13/0x20
[ 317.337635] -> #2 (&rq->lock){-.-.}:
[ 317.337638] __lock_acquire+0x5b3/0xb40
[ 317.337639] lock_acquire+0x126/0x280
[ 317.337639] _raw_spin_lock+0x2f/0x40
[ 317.337640] task_fork_fair+0x43/0x200
[ 317.337641] sched_fork+0x29b/0x420
[ 317.337642] copy_process+0xf3c/0x2fd0
[ 317.337642] _do_fork+0xef/0x950
[ 317.337643] kernel_thread+0xa8/0xe0
[ 317.337649] -> #1 (&p->pi_lock){-.-.}:
[ 317.337651] __lock_acquire+0x5b3/0xb40
[ 317.337652] lock_acquire+0x126/0x280
[ 317.337653] _raw_spin_lock_irqsave+0x3a/0x50
[ 317.337653] try_to_wake_up+0xb4/0x1030
[ 317.337654] wake_up_process+0x15/0x20
[ 317.337655] __up+0xaa/0xc0
[ 317.337655] up+0x55/0x60
[ 317.337656] __up_console_sem+0x37/0x60
[ 317.337657] console_unlock+0x3a0/0x750
[ 317.337658] vprintk_emit+0x10d/0x340
[ 317.337658] vprintk_default+0x1f/0x30
[ 317.337659] vprintk_func+0x44/0xd4
[ 317.337660] printk+0x9f/0xc5
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2019-11-22 21:03 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-13 16:27 [PATCH] mm/slub: fix a deadlock in shuffle_freelist() Qian Cai
2019-09-16 9:03 ` Sebastian Andrzej Siewior
2019-09-16 14:01 ` Qian Cai
2019-09-16 19:51 ` Sebastian Andrzej Siewior
2019-09-16 21:31 ` Qian Cai
2019-09-17 7:16 ` Sebastian Andrzej Siewior
2019-09-18 19:59 ` Qian Cai
2019-09-25 9:31 ` Peter Zijlstra
2019-09-25 15:18 ` Qian Cai
2019-09-25 16:45 ` Peter Zijlstra
2019-09-26 12:29 ` Qian Cai
2019-10-01 9:18 ` [PATCH] sched: Avoid spurious lock dependencies Peter Zijlstra
2019-10-01 10:01 ` Valentin Schneider
2019-10-01 11:22 ` Qian Cai
2019-10-01 11:36 ` Srikar Dronamraju
2019-10-01 13:44 ` Peter Zijlstra
2019-10-29 11:10 ` Qian Cai
2019-10-29 12:44 ` Peter Zijlstra
2019-11-12 0:54 ` Qian Cai
2019-11-13 10:06 ` [tip: sched/urgent] sched/core: " tip-bot2 for Peter Zijlstra
2019-11-22 20:01 ` Sebastian Andrzej Siewior
2019-11-22 20:20 ` Peter Zijlstra
2019-11-22 21:03 ` Qian Cai
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).