* hotplug lockdep splat (tip-rt) @ 2017-09-02 11:09 Mike Galbraith 2017-09-03 6:59 ` hotplug lockdep splat (tip) Mike Galbraith 0 siblings, 1 reply; 12+ messages in thread From: Mike Galbraith @ 2017-09-02 11:09 UTC (permalink / raw) To: LKML; +Cc: Thomas Gleixner 4.11-rt rolled forward, sprinkle liberally with seasoning of choice. [ 7514.772861] ====================================================== [ 7514.772862] WARNING: possible circular locking dependency detected [ 7514.772863] 4.13.0.g06260ca-rt11-tip-lockdep #20 Tainted: G E [ 7514.772863] ------------------------------------------------------ [ 7514.772867] stress-cpu-hotp/4102 is trying to acquire lock: [ 7514.772867] ((complete)&st->done){+.+.}, at: [<ffffffff8107208a>] takedown_cpu+0x9a/0x120 [ 7514.772877] [ 7514.772877] but task is already holding lock: [ 7514.772877] (sparse_irq_lock){+.+.}, at: [<ffffffff8107203a>] takedown_cpu+0x4a/0x120 [ 7514.772879] [ 7514.772879] which lock already depends on the new lock. [ 7514.772879] [ 7514.772879] [ 7514.772879] the existing dependency chain (in reverse order) is: [ 7514.772880] [ 7514.772880] -> #2 (sparse_irq_lock){+.+.}: [ 7514.772889] lock_acquire+0xbd/0x250 [ 7514.772908] _mutex_lock+0x31/0x50 [ 7514.772913] irq_affinity_online_cpu+0x13/0xc0 [ 7514.772914] cpuhp_invoke_callback+0x24c/0x9c0 [ 7514.772914] cpuhp_up_callbacks+0x30/0xb0 [ 7514.772915] cpuhp_thread_fun+0x159/0x170 [ 7514.772918] smpboot_thread_fn+0x268/0x310 [ 7514.772919] kthread+0x145/0x180 [ 7514.772921] ret_from_fork+0x2a/0x40 [ 7514.772922] [ 7514.772922] -> #1 (cpuhp_state){+.+.}: [ 7514.772926] smpboot_thread_fn+0x268/0x310 [ 7514.772927] kthread+0x145/0x180 [ 7514.772928] ret_from_fork+0x2a/0x40 [ 7514.772930] 0xffffffffffffffff [ 7514.772930] [ 7514.772930] -> #0 ((complete)&st->done){+.+.}: [ 7514.772932] __lock_acquire+0x113b/0x1190 [ 7514.772933] lock_acquire+0xbd/0x250 [ 7514.772934] wait_for_completion+0x51/0x120 [ 7514.772935] takedown_cpu+0x9a/0x120 [ 7514.772936] cpuhp_invoke_callback+0x24c/0x9c0 [ 7514.772937] cpuhp_down_callbacks+0x3b/0x80 [ 7514.772939] _cpu_down+0xba/0xf0 [ 7514.772940] do_cpu_down+0x35/0x50 [ 7514.772949] device_offline+0x7d/0xa0 [ 7514.772950] online_store+0x3a/0x70 [ 7514.772959] kernfs_fop_write+0x10a/0x190 [ 7514.772962] __vfs_write+0x23/0x150 [ 7514.772963] vfs_write+0xc2/0x1c0 [ 7514.772964] SyS_write+0x45/0xa0 [ 7514.772965] entry_SYSCALL_64_fastpath+0x1f/0xbe [ 7514.772966] [ 7514.772966] other info that might help us debug this: [ 7514.772966] [ 7514.772966] Chain exists of: [ 7514.772966] (complete)&st->done --> cpuhp_state --> sparse_irq_lock [ 7514.772966] [ 7514.772968] Possible unsafe locking scenario: [ 7514.772968] [ 7514.772968] CPU0 CPU1 [ 7514.772968] ---- ---- [ 7514.772968] lock(sparse_irq_lock); [ 7514.772969] lock(cpuhp_state); [ 7514.772970] lock(sparse_irq_lock); [ 7514.772970] lock((complete)&st->done); [ 7514.772971] [ 7514.772971] *** DEADLOCK *** [ 7514.772971] [ 7514.772972] 8 locks held by stress-cpu-hotp/4102: [ 7514.772972] #0: (sb_writers#4){.+.+}, at: [<ffffffff8126c410>] vfs_write+0x190/0x1c0 [ 7514.772974] #1: (&of->mutex){+.+.}, at: [<ffffffff8130187a>] kernfs_fop_write+0xda/0x190 [ 7514.772976] #2: (s_active#140){.+.+}, at: [<ffffffff81301882>] kernfs_fop_write+0xe2/0x190 [ 7514.772979] #3: (device_hotplug_lock){+.+.}, at: [<ffffffff8153aa71>] lock_device_hotplug_sysfs+0x11/0x40 [ 7514.772981] #4: (&dev->mutex){....}, at: [<ffffffff8153c20f>] device_offline+0x3f/0xa0 [ 7514.772983] #5: (cpu_add_remove_lock){+.+.}, at: [<ffffffff8107352f>] do_cpu_down+0x1f/0x50 [ 7514.772985] #6: (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff810d2101>] percpu_down_write+0x21/0x110 [ 7514.772987] #7: (sparse_irq_lock){+.+.}, at: [<ffffffff8107203a>] takedown_cpu+0x4a/0x120 [ 7514.772989] [ 7514.772989] stack backtrace: [ 7514.772990] CPU: 5 PID: 4102 Comm: stress-cpu-hotp Tainted: G E 4.13.0.g06260ca-rt11-tip-lockdep #20 [ 7514.772991] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 [ 7514.772992] Call Trace: [ 7514.772995] dump_stack+0x7c/0xbf [ 7514.772997] print_circular_bug+0x2d3/0x2e0 [ 7514.772999] ? copy_trace+0xb0/0xb0 [ 7514.773001] check_prev_add+0x666/0x700 [ 7514.773002] ? copy_trace+0xb0/0xb0 [ 7514.773008] ? __stop_cpus+0x51/0x70 [ 7514.773010] ? copy_trace+0xb0/0xb0 [ 7514.773011] __lock_acquire+0x113b/0x1190 [ 7514.773013] ? trace_hardirqs_on_caller+0xf2/0x1a0 [ 7514.773015] lock_acquire+0xbd/0x250 [ 7514.773018] ? takedown_cpu+0x9a/0x120 [ 7514.773020] wait_for_completion+0x51/0x120 [ 7514.773021] ? takedown_cpu+0x9a/0x120 [ 7514.773022] ? cpuhp_invoke_callback+0x9c0/0x9c0 [ 7514.773023] takedown_cpu+0x9a/0x120 [ 7514.773025] ? cpuhp_complete_idle_dead+0x10/0x10 [ 7514.773026] cpuhp_invoke_callback+0x24c/0x9c0 [ 7514.773028] cpuhp_down_callbacks+0x3b/0x80 [ 7514.773030] _cpu_down+0xba/0xf0 [ 7514.773031] do_cpu_down+0x35/0x50 [ 7514.773033] device_offline+0x7d/0xa0 [ 7514.773034] online_store+0x3a/0x70 [ 7514.773036] kernfs_fop_write+0x10a/0x190 [ 7514.773037] __vfs_write+0x23/0x150 [ 7514.773039] ? rcu_read_lock_sched_held+0x9b/0xb0 [ 7514.773043] ? rcu_sync_lockdep_assert+0x2d/0x60 [ 7514.773045] ? __sb_start_write+0x190/0x240 [ 7514.773046] ? vfs_write+0x190/0x1c0 [ 7514.773048] vfs_write+0xc2/0x1c0 [ 7514.773050] SyS_write+0x45/0xa0 [ 7514.773051] entry_SYSCALL_64_fastpath+0x1f/0xbe [ 7514.773053] RIP: 0033:0x7fc5e51fd2d0 [ 7514.773053] RSP: 002b:00007ffd73d17678 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 7514.773054] RAX: ffffffffffffffda RBX: 00007fc5e54bd678 RCX: 00007fc5e51fd2d0 [ 7514.773055] RDX: 0000000000000002 RSI: 00007fc5e5d64000 RDI: 0000000000000001 [ 7514.773056] RBP: 00007fc5e54bd620 R08: 000000000000000a R09: 00007fc5e5d16700 [ 7514.773056] R10: 000000000198bc50 R11: 0000000000000246 R12: 0000000000000110 [ 7514.773057] R13: 00000000000000e4 R14: 0000000000002710 R15: 00000000000000f1 ^ permalink raw reply [flat|nested] 12+ messages in thread
* hotplug lockdep splat (tip) 2017-09-02 11:09 hotplug lockdep splat (tip-rt) Mike Galbraith @ 2017-09-03 6:59 ` Mike Galbraith 2017-09-04 7:55 ` Peter Zijlstra 0 siblings, 1 reply; 12+ messages in thread From: Mike Galbraith @ 2017-09-03 6:59 UTC (permalink / raw) To: LKML; +Cc: Thomas Gleixner, Peter Zijlstra, Byungchul Park [ 126.626908] Unregister pv shared memory for cpu 1 [ 126.631041] [ 126.631269] ====================================================== [ 126.632442] WARNING: possible circular locking dependency detected [ 126.633599] 4.13.0.g06260ca-tip-lockdep #2 Tainted: G E [ 126.634380] ------------------------------------------------------ [ 126.635124] stress-cpu-hotp/3156 is trying to acquire lock: [ 126.635804] ((complete)&st->done){+.+.}, at: [<ffffffff8108d19a>] takedown_cpu+0x8a/0xf0 [ 126.636809] [ 126.636809] but task is already holding lock: [ 126.637567] (sparse_irq_lock){+.+.}, at: [<ffffffff81107ac7>] irq_lock_sparse+0x17/0x20 [ 126.638665] [ 126.638665] which lock already depends on the new lock. [ 126.638665] [ 126.639679] [ 126.639679] the existing dependency chain (in reverse order) is: [ 126.641086] [ 126.641086] -> #2 (sparse_irq_lock){+.+.}: [ 126.641862] __lock_acquire+0x116a/0x11c0 [ 126.642456] lock_acquire+0xbd/0x200 [ 126.642975] __mutex_lock+0x70/0x8e0 [ 126.643499] mutex_lock_nested+0x1b/0x20 [ 126.644118] irq_lock_sparse+0x17/0x20 [ 126.644870] irq_affinity_online_cpu+0x18/0xd0 [ 126.645941] cpuhp_invoke_callback+0x1f6/0x830 [ 126.647000] cpuhp_up_callbacks+0x37/0xb0 [ 126.647961] cpuhp_thread_fun+0x14f/0x160 [ 126.648672] smpboot_thread_fn+0x11a/0x1e0 [ 126.649511] kthread+0x152/0x190 [ 126.650347] ret_from_fork+0x2a/0x40 [ 126.651239] [ 126.651239] -> #1 (cpuhp_state){+.+.}: [ 126.651956] cpuhp_thread_fun+0x54/0x160 [ 126.652523] smpboot_thread_fn+0x11a/0x1e0 [ 126.653233] kthread+0x152/0x190 [ 126.654090] ret_from_fork+0x2a/0x40 [ 126.656166] 0xffffffffffffffff [ 126.657242] [ 126.657242] -> #0 ((complete)&st->done){+.+.}: [ 126.658407] check_prev_add+0x666/0x700 [ 126.659570] __lock_acquire+0x116a/0x11c0 [ 126.660576] lock_acquire+0xbd/0x200 [ 126.663278] wait_for_completion+0x53/0x1a0 [ 126.666632] takedown_cpu+0x8a/0xf0 [ 126.669845] cpuhp_invoke_callback+0x1f6/0x830 [ 126.670623] cpuhp_down_callbacks+0x42/0x80 [ 126.673508] _cpu_down+0xc6/0x100 [ 126.674323] do_cpu_down+0x3e/0x60 [ 126.674825] cpu_down+0x10/0x20 [ 126.675306] cpu_subsys_offline+0x14/0x20 [ 126.675895] device_offline+0x8c/0xb0 [ 126.676561] online_store+0x40/0x80 [ 126.677081] dev_attr_store+0x18/0x30 [ 126.677674] sysfs_kf_write+0x44/0x60 [ 126.678338] kernfs_fop_write+0x113/0x1a0 [ 126.678922] __vfs_write+0x28/0x150 [ 126.679553] vfs_write+0xc7/0x1c0 [ 126.680034] SyS_write+0x49/0xa0 [ 126.680563] entry_SYSCALL_64_fastpath+0x1f/0xbe [ 126.681459] [ 126.681459] other info that might help us debug this: [ 126.681459] [ 126.683271] Chain exists of: [ 126.683271] (complete)&st->done --> cpuhp_state --> sparse_irq_lock [ 126.683271] [ 126.687307] Possible unsafe locking scenario: [ 126.687307] [ 126.688244] CPU0 CPU1 [ 126.689027] ---- ---- [ 126.689591] lock(sparse_irq_lock); [ 126.690036] lock(cpuhp_state); [ 126.690970] lock(sparse_irq_lock); [ 126.692057] lock((complete)&st->done); [ 126.693199] [ 126.693199] *** DEADLOCK *** [ 126.693199] [ 126.694283] 8 locks held by stress-cpu-hotp/3156: [ 126.694772] #0: (sb_writers#4){.+.+}, at: [<ffffffff8129b226>] vfs_write+0x196/0x1c0 [ 126.696285] #1: (&of->mutex){+.+.}, at: [<ffffffff81334fc3>] kernfs_fop_write+0xe3/0x1a0 [ 126.697178] #2: (s_active#158){.+.+}, at: [<ffffffff81334fcb>] kernfs_fop_write+0xeb/0x1a0 [ 126.698367] #3: (device_hotplug_lock){+.+.}, at: [<ffffffff8158f255>] lock_device_hotplug_sysfs+0x15/0x40 [ 126.699632] #4: (&dev->mutex){....}, at: [<ffffffff81590aaa>] device_offline+0x4a/0xb0 [ 126.701229] #5: (cpu_add_remove_lock){+.+.}, at: [<ffffffff8108e587>] do_cpu_down+0x27/0x60 [ 126.703277] #6: (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff810ed7c6>] percpu_down_write+0x26/0x110 [ 126.706302] #7: (sparse_irq_lock){+.+.}, at: [<ffffffff81107ac7>] irq_lock_sparse+0x17/0x20 [ 126.708966] [ 126.708966] stack backtrace: [ 126.709558] CPU: 0 PID: 3156 Comm: stress-cpu-hotp Tainted: G E 4.13.0.g06260ca-tip-lockdep #2 [ 126.710790] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 [ 126.712075] Call Trace: [ 126.712523] dump_stack+0x85/0xc5 [ 126.712904] print_circular_bug+0x2e9/0x2f7 [ 126.713308] ? copy_trace+0xc0/0xc0 [ 126.713636] check_prev_add+0x666/0x700 [ 126.713978] ? copy_trace+0xc0/0xc0 [ 126.714426] ? copy_trace+0xc0/0xc0 [ 126.714847] __lock_acquire+0x116a/0x11c0 [ 126.715452] lock_acquire+0xbd/0x200 [ 126.716073] ? takedown_cpu+0x8a/0xf0 [ 126.716930] wait_for_completion+0x53/0x1a0 [ 126.717805] ? takedown_cpu+0x8a/0xf0 [ 126.718352] ? stop_machine_cpuslocked+0xee/0x100 [ 126.718867] ? cpuhp_invoke_callback+0x830/0x830 [ 126.719489] ? cpuhp_complete_idle_dead+0x20/0x20 [ 126.720109] takedown_cpu+0x8a/0xf0 [ 126.720597] cpuhp_invoke_callback+0x1f6/0x830 [ 126.721208] cpuhp_down_callbacks+0x42/0x80 [ 126.721805] _cpu_down+0xc6/0x100 [ 126.722292] do_cpu_down+0x3e/0x60 [ 126.722773] cpu_down+0x10/0x20 [ 126.723207] cpu_subsys_offline+0x14/0x20 [ 126.723762] device_offline+0x8c/0xb0 [ 126.724233] online_store+0x40/0x80 [ 126.724831] dev_attr_store+0x18/0x30 [ 126.725520] sysfs_kf_write+0x44/0x60 [ 126.726255] kernfs_fop_write+0x113/0x1a0 [ 126.727038] __vfs_write+0x28/0x150 [ 126.727753] ? rcu_read_lock_sched_held+0x98/0xa0 [ 126.728503] ? rcu_sync_lockdep_assert+0x2f/0x60 [ 126.729106] ? __sb_start_write+0x14e/0x1e0 [ 126.729509] ? vfs_write+0x196/0x1c0 [ 126.729974] ? security_file_permission+0x3b/0xc0 [ 126.730472] vfs_write+0xc7/0x1c0 [ 126.731070] ? trace_hardirqs_on_caller+0xf9/0x190 [ 126.731655] SyS_write+0x49/0xa0 [ 126.732054] entry_SYSCALL_64_fastpath+0x1f/0xbe [ 126.732666] RIP: 0033:0x7f1e159502d0 [ 126.733056] RSP: 002b:00007ffc1190efa8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 126.734263] RAX: ffffffffffffffda RBX: 00007f1e15c10678 RCX: 00007f1e159502d0 [ 126.734973] RDX: 0000000000000002 RSI: 00007f1e164b7000 RDI: 0000000000000001 [ 126.735855] RBP: 00007f1e15c10620 R08: 000000000000000a R09: 00007f1e16469700 [ 126.736842] R10: 0000000001720c40 R11: 0000000000000246 R12: 0000000000000110 [ 126.738137] R13: 00000000000000e4 R14: 0000000000002710 R15: 00000000000000f1 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: hotplug lockdep splat (tip) 2017-09-03 6:59 ` hotplug lockdep splat (tip) Mike Galbraith @ 2017-09-04 7:55 ` Peter Zijlstra 2017-09-04 13:27 ` Mike Galbraith 2017-09-04 14:23 ` Peter Zijlstra 0 siblings, 2 replies; 12+ messages in thread From: Peter Zijlstra @ 2017-09-04 7:55 UTC (permalink / raw) To: Mike Galbraith; +Cc: LKML, Thomas Gleixner, Byungchul Park On Sun, Sep 03, 2017 at 08:59:35AM +0200, Mike Galbraith wrote: > > [ 126.626908] Unregister pv shared memory for cpu 1 > [ 126.631041] > [ 126.631269] ====================================================== > [ 126.632442] WARNING: possible circular locking dependency detected > [ 126.633599] 4.13.0.g06260ca-tip-lockdep #2 Tainted: G E > [ 126.634380] ------------------------------------------------------ > [ 126.635124] stress-cpu-hotp/3156 is trying to acquire lock: > [ 126.635804] ((complete)&st->done){+.+.}, at: [<ffffffff8108d19a>] takedown_cpu+0x8a/0xf0 > [ 126.636809] > [ 126.636809] but task is already holding lock: > [ 126.637567] (sparse_irq_lock){+.+.}, at: [<ffffffff81107ac7>] irq_lock_sparse+0x17/0x20 > [ 126.638665] https://lkml.kernel.org/r/20170829193416.GC32112@worktop.programming.kicks-ass.net I still need to write a coherent Changelog and comments for that :/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: hotplug lockdep splat (tip) 2017-09-04 7:55 ` Peter Zijlstra @ 2017-09-04 13:27 ` Mike Galbraith 2017-09-04 14:24 ` Peter Zijlstra 2017-09-04 15:37 ` Peter Zijlstra 2017-09-04 14:23 ` Peter Zijlstra 1 sibling, 2 replies; 12+ messages in thread From: Mike Galbraith @ 2017-09-04 13:27 UTC (permalink / raw) To: Peter Zijlstra; +Cc: LKML, Thomas Gleixner, Byungchul Park On Mon, 2017-09-04 at 09:55 +0200, Peter Zijlstra wrote: > On Sun, Sep 03, 2017 at 08:59:35AM +0200, Mike Galbraith wrote: > > > > [ 126.626908] Unregister pv shared memory for cpu 1 > > [ 126.631041] > > [ 126.631269] ====================================================== > > [ 126.632442] WARNING: possible circular locking dependency detected > > [ 126.633599] 4.13.0.g06260ca-tip-lockdep #2 Tainted: G E > > [ 126.634380] ------------------------------------------------------ > > [ 126.635124] stress-cpu-hotp/3156 is trying to acquire lock: > > [ 126.635804] ((complete)&st->done){+.+.}, at: [<ffffffff8108d19a>] takedown_cpu+0x8a/0xf0 > > [ 126.636809] > > [ 126.636809] but task is already holding lock: > > [ 126.637567] (sparse_irq_lock){+.+.}, at: [<ffffffff81107ac7>] irq_lock_sparse+0x17/0x20 > > [ 126.638665] > > https://lkml.kernel.org/r/20170829193416.GC32112@worktop.programming.kicks-ass.net Well, flavor of gripe changed. [ 164.114290] ====================================================== [ 164.115146] WARNING: possible circular locking dependency detected [ 164.115751] 4.13.0.g90abd70-tip-lockdep #4 Tainted: G E [ 164.116348] ------------------------------------------------------ [ 164.116919] cpuhp/0/12 is trying to acquire lock: [ 164.117381] (cpuhp_state){+.+.}, at: [<ffffffff8108e4aa>] cpuhp_thread_fun+0x2a/0x160 [ 164.118097] but now in release context of a crosslock acquired at the following: [ 164.118845] ((complete)&per_cpu_ptr(&cpuhp_state, i)->done#2){+.+.}, at: [<ffffffff8108e71f>] cpuhp_issue_call+0x13f/0x170 [ 164.119789] which lock already depends on the new lock. [ 164.120483] the existing dependency chain (in reverse order) is: [ 164.121121] -> #2 ((complete)&per_cpu_ptr(&cpuhp_state, i)->done#2){+.+.}: [ 164.121840] __lock_acquire+0x116a/0x11c0 [ 164.122382] lock_acquire+0xbd/0x200 [ 164.122741] wait_for_completion+0x53/0x1a0 [ 164.123181] takedown_cpu+0x8a/0xf0 [ 164.123533] cpuhp_invoke_callback+0x1f6/0x830 [ 164.123967] cpuhp_down_callbacks+0x42/0x80 [ 164.124433] _cpu_down+0x155/0x190 [ 164.124779] do_cpu_down+0x3e/0x60 [ 164.125127] cpu_down+0x10/0x20 [ 164.125456] cpu_subsys_offline+0x14/0x20 [ 164.125937] device_offline+0x8c/0xb0 [ 164.126487] online_store+0x40/0x80 [ 164.126943] dev_attr_store+0x18/0x30 [ 164.127335] sysfs_kf_write+0x44/0x60 [ 164.127703] kernfs_fop_write+0x113/0x1a0 [ 164.128141] __vfs_write+0x28/0x150 [ 164.128532] vfs_write+0xc7/0x1c0 [ 164.128871] SyS_write+0x49/0xa0 [ 164.129206] entry_SYSCALL_64_fastpath+0x1f/0xbe [ 164.129656] -> #1 (sparse_irq_lock){+.+.}: [ 164.130145] __lock_acquire+0x116a/0x11c0 [ 164.130546] lock_acquire+0xbd/0x200 [ 164.130913] __mutex_lock+0x70/0x8e0 [ 164.131276] mutex_lock_nested+0x1b/0x20 [ 164.131664] irq_lock_sparse+0x17/0x20 [ 164.132039] irq_affinity_online_cpu+0x18/0xd0 [ 164.132463] cpuhp_invoke_callback+0x1f6/0x830 [ 164.132928] cpuhp_up_callbacks+0x37/0xb0 [ 164.133487] cpuhp_thread_fun+0x14f/0x160 [ 164.133885] smpboot_thread_fn+0x11a/0x1e0 [ 164.134333] kthread+0x152/0x190 [ 164.134652] ret_from_fork+0x2a/0x40 [ 164.135010] -> #0 (cpuhp_state){+.+.}: [ 164.135454] cpuhp_thread_fun+0x54/0x160 [ 164.135834] smpboot_thread_fn+0x11a/0x1e0 [ 164.136235] kthread+0x152/0x190 [ 164.136558] ret_from_fork+0x2a/0x40 [ 164.136920] 0xffffffffffffffff [ 164.137239] other info that might help us debug this: [ 164.137927] Chain exists of: cpuhp_state --> sparse_irq_lock --> (complete)&per_cpu_ptr(&cpuhp_state, i)->done#2 [ 164.139136] Possible unsafe locking scenario by crosslock: [ 164.139898] CPU0 CPU1 [ 164.140298] ---- ---- [ 164.140689] lock(sparse_irq_lock); [ 164.141015] lock((complete)&per_cpu_ptr(&cpuhp_state, i)->done#2); [ 164.141565] lock(cpuhp_state); [ 164.142077] unlock((complete)&per_cpu_ptr(&cpuhp_state, i)->done#2); [ 164.142859] *** DEADLOCK *** [ 164.143382] 1 lock held by cpuhp/0/12: [ 164.143794] #0: (&x->wait#2){-...}, at: [<ffffffff810e1afd>] complete+0x1d/0x60 [ 164.144461] stack backtrace: [ 164.144837] CPU: 0 PID: 12 Comm: cpuhp/0 Tainted: G E 4.13.0.g90abd70-tip-lockdep #4 [ 164.145594] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 [ 164.146485] Call Trace: [ 164.146707] dump_stack+0x85/0xc5 [ 164.147004] print_circular_bug+0x2e9/0x2f7 [ 164.147369] check_prev_add+0x666/0x700 [ 164.147706] ? print_bfs_bug+0x40/0x40 [ 164.148045] ? trace_hardirqs_on+0xd/0x10 [ 164.148389] lock_commit_crosslock+0x410/0x5a0 [ 164.148780] complete+0x29/0x60 [ 164.149064] cpuhp_thread_fun+0xa1/0x160 [ 164.149433] ? cpuhp_thread_fun+0x2a/0x160 [ 164.149814] smpboot_thread_fn+0x11a/0x1e0 [ 164.150193] kthread+0x152/0x190 [ 164.150482] ? sort_range+0x30/0x30 [ 164.150781] ? kthread_create_on_node+0x40/0x40 [ 164.151184] ret_from_fork+0x2a/0x40 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: hotplug lockdep splat (tip) 2017-09-04 13:27 ` Mike Galbraith @ 2017-09-04 14:24 ` Peter Zijlstra 2017-09-04 16:31 ` Mike Galbraith 2017-09-04 15:37 ` Peter Zijlstra 1 sibling, 1 reply; 12+ messages in thread From: Peter Zijlstra @ 2017-09-04 14:24 UTC (permalink / raw) To: Mike Galbraith; +Cc: LKML, Thomas Gleixner, Byungchul Park On Mon, Sep 04, 2017 at 03:27:07PM +0200, Mike Galbraith wrote: > On Mon, 2017-09-04 at 09:55 +0200, Peter Zijlstra wrote: > > On Sun, Sep 03, 2017 at 08:59:35AM +0200, Mike Galbraith wrote: > > > > > > [ 126.626908] Unregister pv shared memory for cpu 1 > > > [ 126.631041] > > > [ 126.631269] ====================================================== > > > [ 126.632442] WARNING: possible circular locking dependency detected > > > [ 126.633599] 4.13.0.g06260ca-tip-lockdep #2 Tainted: G E > > > [ 126.634380] ------------------------------------------------------ > > > [ 126.635124] stress-cpu-hotp/3156 is trying to acquire lock: > > > [ 126.635804] ((complete)&st->done){+.+.}, at: [<ffffffff8108d19a>] takedown_cpu+0x8a/0xf0 > > > [ 126.636809] > > > [ 126.636809] but task is already holding lock: > > > [ 126.637567] (sparse_irq_lock){+.+.}, at: [<ffffffff81107ac7>] irq_lock_sparse+0x17/0x20 > > > [ 126.638665] > > > > https://lkml.kernel.org/r/20170829193416.GC32112@worktop.programming.kicks-ass.net > > Well, flavor of gripe changed. > > [ 164.114290] ====================================================== > [ 164.115146] WARNING: possible circular locking dependency detected > [ 164.115751] 4.13.0.g90abd70-tip-lockdep #4 Tainted: G E > [ 164.116348] ------------------------------------------------------ > [ 164.116919] cpuhp/0/12 is trying to acquire lock: > [ 164.117381] (cpuhp_state){+.+.}, at: [<ffffffff8108e4aa>] cpuhp_thread_fun+0x2a/0x160 > [ 164.118097] > but now in release context of a crosslock acquired at the following: > [ 164.118845] ((complete)&per_cpu_ptr(&cpuhp_state, i)->done#2){+.+.}, at: [<ffffffff8108e71f>] cpuhp_issue_call+0x13f/0x170 > [ 164.119789] Cute, I'd not seen that one.. anything special required to reproduce? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: hotplug lockdep splat (tip) 2017-09-04 14:24 ` Peter Zijlstra @ 2017-09-04 16:31 ` Mike Galbraith 2017-09-04 16:54 ` Mike Galbraith 0 siblings, 1 reply; 12+ messages in thread From: Mike Galbraith @ 2017-09-04 16:31 UTC (permalink / raw) To: Peter Zijlstra; +Cc: LKML, Thomas Gleixner, Byungchul Park On Mon, 2017-09-04 at 16:24 +0200, Peter Zijlstra wrote: > On Mon, Sep 04, 2017 at 03:27:07PM +0200, Mike Galbraith wrote: > > On Mon, 2017-09-04 at 09:55 +0200, Peter Zijlstra wrote: > > > On Sun, Sep 03, 2017 at 08:59:35AM +0200, Mike Galbraith wrote: > > > > > > > > [ 126.626908] Unregister pv shared memory for cpu 1 > > > > [ 126.631041] > > > > [ 126.631269] ====================================================== > > > > [ 126.632442] WARNING: possible circular locking dependency detected > > > > [ 126.633599] 4.13.0.g06260ca-tip-lockdep #2 Tainted: G E > > > > [ 126.634380] ------------------------------------------------------ > > > > [ 126.635124] stress-cpu-hotp/3156 is trying to acquire lock: > > > > [ 126.635804] ((complete)&st->done){+.+.}, at: [<ffffffff8108d19a>] takedown_cpu+0x8a/0xf0 > > > > [ 126.636809] > > > > [ 126.636809] but task is already holding lock: > > > > [ 126.637567] (sparse_irq_lock){+.+.}, at: [<ffffffff81107ac7>] irq_lock_sparse+0x17/0x20 > > > > [ 126.638665] > > > > > > https://lkml.kernel.org/r/20170829193416.GC32112@worktop.programming.kicks-ass.net > > > > Well, flavor of gripe changed. > > > > [ 164.114290] ====================================================== > > [ 164.115146] WARNING: possible circular locking dependency detected > > [ 164.115751] 4.13.0.g90abd70-tip-lockdep #4 Tainted: G E > > [ 164.116348] ------------------------------------------------------ > > [ 164.116919] cpuhp/0/12 is trying to acquire lock: > > [ 164.117381] (cpuhp_state){+.+.}, at: [<ffffffff8108e4aa>] cpuhp_thread_fun+0x2a/0x160 > > [ 164.118097] > > but now in release context of a crosslock acquired at the following: > > [ 164.118845] ((complete)&per_cpu_ptr(&cpuhp_state, i)->done#2){+.+.}, at: [<ffffffff8108e71f>] cpuhp_issue_call+0x13f/0x170 > > [ 164.119789] > > Cute, I'd not seen that one.. anything special required to reproduce? No. I build an enterprise(ish) config, run it in a KVM box, with the guest being a full clone of the host, just memory constrained so it can't oom the host. -Mike ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: hotplug lockdep splat (tip) 2017-09-04 16:31 ` Mike Galbraith @ 2017-09-04 16:54 ` Mike Galbraith 0 siblings, 0 replies; 12+ messages in thread From: Mike Galbraith @ 2017-09-04 16:54 UTC (permalink / raw) To: Peter Zijlstra; +Cc: LKML, Thomas Gleixner, Byungchul Park On Mon, 2017-09-04 at 18:31 +0200, Mike Galbraith wrote: > On Mon, 2017-09-04 at 16:24 +0200, Peter Zijlstra wrote: > > On Mon, Sep 04, 2017 at 03:27:07PM +0200, Mike Galbraith wrote: > > > On Mon, 2017-09-04 at 09:55 +0200, Peter Zijlstra wrote: > > > > On Sun, Sep 03, 2017 at 08:59:35AM +0200, Mike Galbraith wrote: > > > > > > > > > > [ 126.626908] Unregister pv shared memory for cpu 1 > > > > > [ 126.631041] > > > > > [ 126.631269] ====================================================== > > > > > [ 126.632442] WARNING: possible circular locking dependency detected > > > > > [ 126.633599] 4.13.0.g06260ca-tip-lockdep #2 Tainted: G E > > > > > [ 126.634380] ------------------------------------------------------ > > > > > [ 126.635124] stress-cpu-hotp/3156 is trying to acquire lock: > > > > > [ 126.635804] ((complete)&st->done){+.+.}, at: [<ffffffff8108d19a>] takedown_cpu+0x8a/0xf0 > > > > > [ 126.636809] > > > > > [ 126.636809] but task is already holding lock: > > > > > [ 126.637567] (sparse_irq_lock){+.+.}, at: [<ffffffff81107ac7>] irq_lock_sparse+0x17/0x20 > > > > > [ 126.638665] > > > > > > > > https://lkml.kernel.org/r/20170829193416.GC32112@worktop.programming.kicks-ass.net > > > > > > Well, flavor of gripe changed. > > > > > > [ 164.114290] ====================================================== > > > [ 164.115146] WARNING: possible circular locking dependency detected > > > [ 164.115751] 4.13.0.g90abd70-tip-lockdep #4 Tainted: G E > > > [ 164.116348] ------------------------------------------------------ > > > [ 164.116919] cpuhp/0/12 is trying to acquire lock: > > > [ 164.117381] (cpuhp_state){+.+.}, at: [<ffffffff8108e4aa>] cpuhp_thread_fun+0x2a/0x160 > > > [ 164.118097] > > > but now in release context of a crosslock acquired at the following: > > > [ 164.118845] ((complete)&per_cpu_ptr(&cpuhp_state, i)->done#2){+.+.}, at: [<ffffffff8108e71f>] cpuhp_issue_call+0x13f/0x170 > > > [ 164.119789] > > > > Cute, I'd not seen that one.. anything special required to reproduce? > > No. I build an enterprise(ish) config, run it in a KVM box, with the > guest being a full clone of the host, just memory constrained so it > can't oom the host. Qualifier: I run Steven's hotplug stress script, as that has killed oh so many kernels.. but I never run it alone. In this case, I run stockfish with it, initial light survival test. If I think things are looking pretty solid, I turn up the heat a bit, toss other things into the mix with random kill to make sure things don't get into a comfortable rut. -Mike ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: hotplug lockdep splat (tip) 2017-09-04 13:27 ` Mike Galbraith 2017-09-04 14:24 ` Peter Zijlstra @ 2017-09-04 15:37 ` Peter Zijlstra 2017-09-04 17:02 ` Mike Galbraith 1 sibling, 1 reply; 12+ messages in thread From: Peter Zijlstra @ 2017-09-04 15:37 UTC (permalink / raw) To: Mike Galbraith; +Cc: LKML, Thomas Gleixner, Byungchul Park On Mon, Sep 04, 2017 at 03:27:07PM +0200, Mike Galbraith wrote: > Well, flavor of gripe changed. > > [ 164.114290] ====================================================== > [ 164.115146] WARNING: possible circular locking dependency detected > [ 164.115751] 4.13.0.g90abd70-tip-lockdep #4 Tainted: G E > [ 164.116348] ------------------------------------------------------ > [ 164.116919] cpuhp/0/12 is trying to acquire lock: > [ 164.117381] (cpuhp_state){+.+.}, at: [<ffffffff8108e4aa>] cpuhp_thread_fun+0x2a/0x160 > [ 164.118097] > but now in release context of a crosslock acquired at the following: > [ 164.118845] ((complete)&per_cpu_ptr(&cpuhp_state, i)->done#2){+.+.}, at: [<ffffffff8108e71f>] cpuhp_issue_call+0x13f/0x170 > [ 164.119789] > which lock already depends on the new lock. > > [ 164.120483] > the existing dependency chain (in reverse order) is: > [ 164.121121] > -> #2 ((complete)&per_cpu_ptr(&cpuhp_state, i)->done#2){+.+.}: > [ 164.122741] wait_for_completion+0x53/0x1a0 > [ 164.123181] takedown_cpu+0x8a/0xf0 So this is: takedown_cpu() irq_lock_sparse wait_for_completion(st->done); > -> #1 (sparse_irq_lock){+.+.}: > [ 164.131664] irq_lock_sparse+0x17/0x20 > [ 164.132039] irq_affinity_online_cpu+0x18/0xd0 > [ 164.132463] cpuhp_invoke_callback+0x1f6/0x830 > [ 164.132928] cpuhp_up_callbacks+0x37/0xb0 > [ 164.133487] cpuhp_thread_fun+0x14f/0x160 This is: cpuhp_state irq_lock_sparse > [ 164.148780] complete+0x29/0x60 > [ 164.149064] cpuhp_thread_fun+0xa1/0x160 And this is I think: cpuhp_thread_fun() cpuhq_state complete(st->done) Which then spells deadlock like: CPU0 CPU1 CPU2 cpuhp_state irq_lock_sparse() cpuhp_state wait_for_completion() irq_lock_sparse() complete() or something, because CPU0 needs to wait for CPU1's irq_lock_sparse which will wait for CPU2's completion, which in turn waits for CPU0's cpuhp_state. Now, this again mixes up and down chains, but now on cpuhp_state. But I cannot reproduce this.. I've let: while :; do echo 0 > /sys/devices/system/cpu/cpu1/online ; echo 1 > /sys/devices/system/cpu/cpu1/online ; done run for a while, but nothing... :/ Doth teh beloweth make nice? --- kernel/cpu.c | 47 +++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 39 insertions(+), 8 deletions(-) diff --git a/kernel/cpu.c b/kernel/cpu.c index acf5308fad51..2ab324d7ff7b 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -67,11 +67,15 @@ struct cpuhp_cpu_state { static DEFINE_PER_CPU(struct cpuhp_cpu_state, cpuhp_state); #if defined(CONFIG_LOCKDEP) && defined(CONFIG_SMP) -static struct lock_class_key cpuhp_state_key; +static struct lock_class_key cpuhp_state_up_key; +#ifdef CONFIG_HOTPLUG_CPU +static struct lock_class_key cpuhp_state_down_key; +#endif static struct lockdep_map cpuhp_state_lock_map = - STATIC_LOCKDEP_MAP_INIT("cpuhp_state", &cpuhp_state_key); + STATIC_LOCKDEP_MAP_INIT("cpuhp_state-up", &cpuhp_state_up_key); #endif + /** * cpuhp_step - Hotplug state machine step * @name: Name of the step @@ -533,6 +537,28 @@ void __init cpuhp_threads_init(void) kthread_unpark(this_cpu_read(cpuhp_state.thread)); } +/* + * _cpu_down() and _cpu_up() have different lock ordering wrt st->done, but + * because these two functions are globally serialized and st->done is private + * to them, we can simply re-init st->done for each of them to separate the + * lock chains. + * + * Must be macro to ensure we have two different call sites. + */ +#ifdef CONFIG_LOCKDEP +#define lockdep_reinit_st_done() \ +do { \ + int __cpu; \ + for_each_possible_cpu(__cpu) { \ + struct cpuhp_cpu_state *st = \ + per_cpu_ptr(&cpuhp_state, __cpu); \ + init_completion(&st->done); \ + } \ +} while(0) +#else +#define lockdep_reinit_st_done() +#endif + #ifdef CONFIG_HOTPLUG_CPU /** * clear_tasks_mm_cpumask - Safely clear tasks' mm_cpumask for a CPU @@ -676,12 +702,6 @@ void cpuhp_report_idle_dead(void) cpuhp_complete_idle_dead, st, 0); } -#else -#define takedown_cpu NULL -#endif - -#ifdef CONFIG_HOTPLUG_CPU - /* Requires cpu_add_remove_lock to be held */ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen, enum cpuhp_state target) @@ -697,6 +717,10 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen, cpus_write_lock(); + lockdep_reinit_st_done(); + lockdep_init_map(&cpuhp_state_lock_map, "cpuhp_state-down", + &cpuhp_state_down_key, 0); + cpuhp_tasks_frozen = tasks_frozen; prev_state = st->state; @@ -759,6 +783,9 @@ int cpu_down(unsigned int cpu) return do_cpu_down(cpu, CPUHP_OFFLINE); } EXPORT_SYMBOL(cpu_down); + +#else +#define takedown_cpu NULL #endif /*CONFIG_HOTPLUG_CPU*/ /** @@ -806,6 +833,10 @@ static int _cpu_up(unsigned int cpu, int tasks_frozen, enum cpuhp_state target) cpus_write_lock(); + lockdep_reinit_st_done(); + lockdep_init_map(&cpuhp_state_lock_map, "cpuhp_state-up", + &cpuhp_state_up_key, 0); + if (!cpu_present(cpu)) { ret = -EINVAL; goto out; ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: hotplug lockdep splat (tip) 2017-09-04 15:37 ` Peter Zijlstra @ 2017-09-04 17:02 ` Mike Galbraith 2017-09-04 19:44 ` Peter Zijlstra 0 siblings, 1 reply; 12+ messages in thread From: Mike Galbraith @ 2017-09-04 17:02 UTC (permalink / raw) To: Peter Zijlstra; +Cc: LKML, Thomas Gleixner, Byungchul Park On Mon, 2017-09-04 at 17:37 +0200, Peter Zijlstra wrote: > > Doth teh beloweth make nice? Yes, no more insta-gripe. > --- > kernel/cpu.c | 47 +++++++++++++++++++++++++++++++++++++++-------- > 1 file changed, 39 insertions(+), 8 deletions(-) > > diff --git a/kernel/cpu.c b/kernel/cpu.c > index acf5308fad51..2ab324d7ff7b 100644 > --- a/kernel/cpu.c > +++ b/kernel/cpu.c > @@ -67,11 +67,15 @@ struct cpuhp_cpu_state { > static DEFINE_PER_CPU(struct cpuhp_cpu_state, cpuhp_state); > > #if defined(CONFIG_LOCKDEP) && defined(CONFIG_SMP) > -static struct lock_class_key cpuhp_state_key; > +static struct lock_class_key cpuhp_state_up_key; > +#ifdef CONFIG_HOTPLUG_CPU > +static struct lock_class_key cpuhp_state_down_key; > +#endif > static struct lockdep_map cpuhp_state_lock_map = > - STATIC_LOCKDEP_MAP_INIT("cpuhp_state", &cpuhp_state_key); > + STATIC_LOCKDEP_MAP_INIT("cpuhp_state-up", &cpuhp_state_up_key); > #endif > > + > /** > * cpuhp_step - Hotplug state machine step > * @name: Name of the step > @@ -533,6 +537,28 @@ void __init cpuhp_threads_init(void) > kthread_unpark(this_cpu_read(cpuhp_state.thread)); > } > > +/* > + * _cpu_down() and _cpu_up() have different lock ordering wrt st->done, but > + * because these two functions are globally serialized and st->done is private > + * to them, we can simply re-init st->done for each of them to separate the > + * lock chains. > + * > + * Must be macro to ensure we have two different call sites. > + */ > +#ifdef CONFIG_LOCKDEP > +#define lockdep_reinit_st_done() \ > +do { \ > + int __cpu; \ > + for_each_possible_cpu(__cpu) { \ > + struct cpuhp_cpu_state *st = \ > + per_cpu_ptr(&cpuhp_state, __cpu); \ > + init_completion(&st->done); \ > + } \ > +} while(0) > +#else > +#define lockdep_reinit_st_done() > +#endif > + > #ifdef CONFIG_HOTPLUG_CPU > /** > * clear_tasks_mm_cpumask - Safely clear tasks' mm_cpumask for a CPU > @@ -676,12 +702,6 @@ void cpuhp_report_idle_dead(void) > cpuhp_complete_idle_dead, st, 0); > } > > -#else > -#define takedown_cpu NULL > -#endif > - > -#ifdef CONFIG_HOTPLUG_CPU > - > /* Requires cpu_add_remove_lock to be held */ > static int __ref _cpu_down(unsigned int cpu, int tasks_frozen, > enum cpuhp_state target) > @@ -697,6 +717,10 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen, > > cpus_write_lock(); > > + lockdep_reinit_st_done(); > + lockdep_init_map(&cpuhp_state_lock_map, "cpuhp_state-down", > + &cpuhp_state_down_key, 0); > + > cpuhp_tasks_frozen = tasks_frozen; > > prev_state = st->state; > @@ -759,6 +783,9 @@ int cpu_down(unsigned int cpu) > return do_cpu_down(cpu, CPUHP_OFFLINE); > } > EXPORT_SYMBOL(cpu_down); > + > +#else > +#define takedown_cpu NULL > #endif /*CONFIG_HOTPLUG_CPU*/ > > /** > @@ -806,6 +833,10 @@ static int _cpu_up(unsigned int cpu, int tasks_frozen, enum cpuhp_state target) > > cpus_write_lock(); > > + lockdep_reinit_st_done(); > + lockdep_init_map(&cpuhp_state_lock_map, "cpuhp_state-up", > + &cpuhp_state_up_key, 0); > + > if (!cpu_present(cpu)) { > ret = -EINVAL; > goto out; ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: hotplug lockdep splat (tip) 2017-09-04 17:02 ` Mike Galbraith @ 2017-09-04 19:44 ` Peter Zijlstra 0 siblings, 0 replies; 12+ messages in thread From: Peter Zijlstra @ 2017-09-04 19:44 UTC (permalink / raw) To: Mike Galbraith; +Cc: LKML, Thomas Gleixner, Byungchul Park On Mon, Sep 04, 2017 at 07:02:00PM +0200, Mike Galbraith wrote: > On Mon, 2017-09-04 at 17:37 +0200, Peter Zijlstra wrote: > > > > Doth teh beloweth make nice? > > Yes, no more insta-gripe. Shiny, I'll go write a Changelog for that second part. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: hotplug lockdep splat (tip) 2017-09-04 7:55 ` Peter Zijlstra 2017-09-04 13:27 ` Mike Galbraith @ 2017-09-04 14:23 ` Peter Zijlstra 2017-09-04 17:15 ` Mike Galbraith 1 sibling, 1 reply; 12+ messages in thread From: Peter Zijlstra @ 2017-09-04 14:23 UTC (permalink / raw) To: Mike Galbraith Cc: LKML, Thomas Gleixner, Byungchul Park, Sebastian Andrzej Siewior On Mon, Sep 04, 2017 at 09:55:02AM +0200, Peter Zijlstra wrote: > On Sun, Sep 03, 2017 at 08:59:35AM +0200, Mike Galbraith wrote: > > > > [ 126.626908] Unregister pv shared memory for cpu 1 > > [ 126.631041] > > [ 126.631269] ====================================================== > > [ 126.632442] WARNING: possible circular locking dependency detected > > [ 126.633599] 4.13.0.g06260ca-tip-lockdep #2 Tainted: G E > > [ 126.634380] ------------------------------------------------------ > > [ 126.635124] stress-cpu-hotp/3156 is trying to acquire lock: > > [ 126.635804] ((complete)&st->done){+.+.}, at: [<ffffffff8108d19a>] takedown_cpu+0x8a/0xf0 > > [ 126.636809] > > [ 126.636809] but task is already holding lock: > > [ 126.637567] (sparse_irq_lock){+.+.}, at: [<ffffffff81107ac7>] irq_lock_sparse+0x17/0x20 > > [ 126.638665] > > https://lkml.kernel.org/r/20170829193416.GC32112@worktop.programming.kicks-ass.net > > I still need to write a coherent Changelog and comments for that :/ How's this? --- Subject: smp/hotplug,lockdep: Annotate st->done With the new lockdep cross-release feature, cpu hotplug reports the following deadlock: takedown_cpu() irq_lock_sparse() wait_for_completion(&st->done) cpuhp_thread_fun cpuhp_up_callback cpuhp_invoke_callback irq_affinity_online_cpu irq_local_spare() irq_unlock_sparse() complete(&st->done) However, CPU-up and CPU-down are globally serialized, so the above scenario cannot in fact happen. Annotate this by splitting the st->done dependency chain for up and down. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- kernel/cpu.c | 35 +++++++++++++++++++++++++++++------ 1 file changed, 29 insertions(+), 6 deletions(-) diff --git a/kernel/cpu.c b/kernel/cpu.c index acf5308fad51..0f44ddf64f24 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -533,6 +533,28 @@ void __init cpuhp_threads_init(void) kthread_unpark(this_cpu_read(cpuhp_state.thread)); } +/* + * _cpu_down() and _cpu_up() have different lock ordering wrt st->done, but + * because these two functions are globally serialized and st->done is private + * to them, we can simply re-init st->done for each of them to separate the + * lock chains. + * + * Must be macro to ensure we have two different call sites. + */ +#ifdef CONFIG_LOCKDEP +#define lockdep_reinit_st_done() \ +do { \ + int __cpu; \ + for_each_possible_cpu(__cpu) { \ + struct cpuhp_cpu_state *st = \ + per_cpu_ptr(&cpuhp_state, __cpu); \ + init_completion(&st->done); \ + } \ +} while(0) +#else +#define lockdep_reinit_st_done() +#endif + #ifdef CONFIG_HOTPLUG_CPU /** * clear_tasks_mm_cpumask - Safely clear tasks' mm_cpumask for a CPU @@ -676,12 +698,6 @@ void cpuhp_report_idle_dead(void) cpuhp_complete_idle_dead, st, 0); } -#else -#define takedown_cpu NULL -#endif - -#ifdef CONFIG_HOTPLUG_CPU - /* Requires cpu_add_remove_lock to be held */ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen, enum cpuhp_state target) @@ -697,6 +713,8 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen, cpus_write_lock(); + lockdep_reinit_st_done(); + cpuhp_tasks_frozen = tasks_frozen; prev_state = st->state; @@ -759,6 +777,9 @@ int cpu_down(unsigned int cpu) return do_cpu_down(cpu, CPUHP_OFFLINE); } EXPORT_SYMBOL(cpu_down); + +#else +#define takedown_cpu NULL #endif /*CONFIG_HOTPLUG_CPU*/ /** @@ -806,6 +827,8 @@ static int _cpu_up(unsigned int cpu, int tasks_frozen, enum cpuhp_state target) cpus_write_lock(); + lockdep_reinit_st_done(); + if (!cpu_present(cpu)) { ret = -EINVAL; goto out; ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: hotplug lockdep splat (tip) 2017-09-04 14:23 ` Peter Zijlstra @ 2017-09-04 17:15 ` Mike Galbraith 0 siblings, 0 replies; 12+ messages in thread From: Mike Galbraith @ 2017-09-04 17:15 UTC (permalink / raw) To: Peter Zijlstra Cc: LKML, Thomas Gleixner, Byungchul Park, Sebastian Andrzej Siewior On Mon, 2017-09-04 at 16:23 +0200, Peter Zijlstra wrote: > On Mon, Sep 04, 2017 at 09:55:02AM +0200, Peter Zijlstra wrote: > > On Sun, Sep 03, 2017 at 08:59:35AM +0200, Mike Galbraith wrote: > > > > > > [ 126.626908] Unregister pv shared memory for cpu 1 > > > [ 126.631041] > > > [ 126.631269] ====================================================== > > > [ 126.632442] WARNING: possible circular locking dependency detected > > > [ 126.633599] 4.13.0.g06260ca-tip-lockdep #2 Tainted: G E > > > [ 126.634380] ------------------------------------------------------ > > > [ 126.635124] stress-cpu-hotp/3156 is trying to acquire lock: > > > [ 126.635804] ((complete)&st->done){+.+.}, at: [<ffffffff8108d19a>] takedown_cpu+0x8a/0xf0 > > > [ 126.636809] > > > [ 126.636809] but task is already holding lock: > > > [ 126.637567] (sparse_irq_lock){+.+.}, at: [<ffffffff81107ac7>] irq_lock_sparse+0x17/0x20 > > > [ 126.638665] > > > > https://lkml.kernel.org/r/20170829193416.GC32112@worktop.programming.kicks-ass.net > > > > I still need to write a coherent Changelog and comments for that :/ > > How's this? [ 138.241466] ====================================================== [ 138.242014] WARNING: possible circular locking dependency detected [ 138.242802] 4.13.0.g90abd70-tip-lockdep #6 Tainted: G E [ 138.243701] ------------------------------------------------------ [ 138.244540] cpuhp/0/12 is trying to acquire lock: [ 138.245187] (cpuhp_state){+.+.}, at: [<ffffffff8108e50a>] cpuhp_thread_fun+0x2a/0x160 [ 138.246312] but now in release context of a crosslock acquired at the following: [ 138.247519] ((complete)&st->done#3){+.+.}, at: [<ffffffff8108e77f>] cpuhp_issue_call+0x13f/0x170 [ 138.248745] which lock already depends on the new lock. [ 138.249861] the existing dependency chain (in reverse order) is: [ 138.250836] -> #2 ((complete)&st->done#3){+.+.}: [ 138.251732] __lock_acquire+0x116a/0x11c0 [ 138.252339] lock_acquire+0xbd/0x200 [ 138.252906] wait_for_completion+0x53/0x1a0 [ 138.253587] takedown_cpu+0x8a/0xf0 [ 138.254194] cpuhp_invoke_callback+0x1f6/0x830 [ 138.254906] cpuhp_down_callbacks+0x42/0x80 [ 138.255586] _cpu_down+0x14d/0x180 [ 138.256179] do_cpu_down+0x3e/0x60 [ 138.256774] cpu_down+0x10/0x20 [ 138.257295] cpu_subsys_offline+0x14/0x20 [ 138.257973] device_offline+0x8c/0xb0 [ 138.258548] online_store+0x40/0x80 [ 138.259094] dev_attr_store+0x18/0x30 [ 138.259668] sysfs_kf_write+0x44/0x60 [ 138.260213] kernfs_fop_write+0x113/0x1a0 [ 138.260873] __vfs_write+0x28/0x150 [ 138.261447] vfs_write+0xc7/0x1c0 [ 138.262009] SyS_write+0x49/0xa0 [ 138.262535] entry_SYSCALL_64_fastpath+0x1f/0xbe [ 138.263276] -> #1 (sparse_irq_lock){+.+.}: [ 138.264070] __lock_acquire+0x116a/0x11c0 [ 138.264731] lock_acquire+0xbd/0x200 [ 138.265300] __mutex_lock+0x70/0x8e0 [ 138.265881] mutex_lock_nested+0x1b/0x20 [ 138.266515] irq_lock_sparse+0x17/0x20 [ 138.267127] irq_affinity_online_cpu+0x18/0xd0 [ 138.267813] cpuhp_invoke_callback+0x1f6/0x830 [ 138.268538] cpuhp_up_callbacks+0x37/0xb0 [ 138.269174] cpuhp_thread_fun+0x14f/0x160 [ 138.269829] smpboot_thread_fn+0x11a/0x1e0 [ 138.270490] kthread+0x152/0x190 [ 138.271055] ret_from_fork+0x2a/0x40 [ 138.271661] -> #0 (cpuhp_state){+.+.}: [ 138.272424] cpuhp_thread_fun+0x54/0x160 [ 138.273104] smpboot_thread_fn+0x11a/0x1e0 [ 138.273816] kthread+0x152/0x190 [ 138.274371] ret_from_fork+0x2a/0x40 [ 138.274958] 0xffffffffffffffff [ 138.275450] other info that might help us debug this: [ 138.276552] Chain exists of: cpuhp_state --> sparse_irq_lock --> (complete)&st- >done#3 [ 138.278114] Possible unsafe locking scenario by crosslock: [ 138.279122] CPU0 CPU1 [ 138.279777] ---- ---- [ 138.280418] lock(sparse_irq_lock); [ 138.280936] lock((complete)&st->done#3); [ 138.281511] lock(cpuhp_state); [ 138.282322] unlock((complete)&st- >done#3); [ 138.283269] *** DEADLOCK *** [ 138.284147] 1 lock held by cpuhp/0/12: [ 138.284688] #0: (&x->wait#2){-...}, at: [<ffffffff810e1b5d>] complete+0x1d/0x60 [ 138.285734] stack backtrace: [ 138.286368] CPU: 0 PID: 12 Comm: cpuhp/0 Tainted: G E 4.13.0.g90abd70-tip-lockdep #6 [ 138.287800] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 [ 138.289324] Call Trace: [ 138.289708] dump_stack+0x85/0xc5 [ 138.290199] print_circular_bug+0x2e9/0x2f7 [ 138.290805] check_prev_add+0x666/0x700 [ 138.291313] ? print_bfs_bug+0x40/0x40 [ 138.291866] ? trace_hardirqs_on+0xd/0x10 [ 138.292223] lock_commit_crosslock+0x410/0x5a0 [ 138.292620] complete+0x29/0x60 [ 138.292900] cpuhp_thread_fun+0xa1/0x160 [ 138.293233] ? cpuhp_thread_fun+0x2a/0x160 [ 138.293584] smpboot_thread_fn+0x11a/0x1e0 [ 138.293943] kthread+0x152/0x190 [ 138.294221] ? sort_range+0x30/0x30 [ 138.294521] ? kthread_create_on_node+0x40/0x40 [ 138.294920] ? kthread+0x10/0x190 [ 138.295203] ret_from_fork+0x2a/0x40 ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2017-09-04 19:44 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-09-02 11:09 hotplug lockdep splat (tip-rt) Mike Galbraith 2017-09-03 6:59 ` hotplug lockdep splat (tip) Mike Galbraith 2017-09-04 7:55 ` Peter Zijlstra 2017-09-04 13:27 ` Mike Galbraith 2017-09-04 14:24 ` Peter Zijlstra 2017-09-04 16:31 ` Mike Galbraith 2017-09-04 16:54 ` Mike Galbraith 2017-09-04 15:37 ` Peter Zijlstra 2017-09-04 17:02 ` Mike Galbraith 2017-09-04 19:44 ` Peter Zijlstra 2017-09-04 14:23 ` Peter Zijlstra 2017-09-04 17:15 ` Mike Galbraith
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.