* x86/thermal: AB-BA dependency between mvm->mutex and tz->lock @ 2017-07-31 11:18 Jiri Kosina 2017-08-03 9:10 ` Jiri Kosina 0 siblings, 1 reply; 20+ messages in thread From: Jiri Kosina @ 2017-07-31 11:18 UTC (permalink / raw) To: Zhang Rui, Eduardo Valentin; +Cc: linux-pm, linux-kernel Hi, booting current Linus' tree, I'm seeing lockdep splat (see the end of this mail). Apparently, there is AB-BA between tz->lock and mvm->mutex through the CPU hotplug lock. The obivous depency is: thermal_zone_get_temp() acquires tz->lock, and then calls iwl_mvm_tzone_get_temp() (through tz->ops->get_temp() callback), which acquires mvm->mutex The less obvious dependency is primarily caused by iwl_op_mode_mvm_start() allocating workqueue (#2 stacktrace) while holding mvm->mutex (which is broken, because that mutex is being taken also from CPU hotplug callback path, hence the AB-BA). ====================================================== WARNING: possible circular locking dependency detected 4.13.0-rc2-00110-g0b5477d #347 Not tainted ------------------------------------------------------ modprobe/881 is trying to acquire lock: (&mvm->mutex){+.+.+.}, at: [<ffffffffc0c504d2>] iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm] but task is already holding lock: (&tz->lock){+.+.+.}, at: [<ffffffff926a9351>] thermal_zone_get_temp+0x41/0x70 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #5 (&tz->lock){+.+.+.}: lock_acquire+0xbd/0x220 __mutex_lock+0x6e/0x900 mutex_lock_nested+0x1b/0x20 thermal_zone_get_temp+0x41/0x70 thermal_zone_device_update+0x3c/0x280 thermal_zone_device_register+0x3b8/0x610 pkg_thermal_cpu_online+0x20b/0x284 [x86_pkg_temp_thermal] cpuhp_invoke_callback+0xac/0x900 cpuhp_thread_fun+0x79/0x160 smpboot_thread_fn+0x156/0x220 kthread+0x114/0x150 ret_from_fork+0x2a/0x40 -> #4 (cpuhp_state){+.+.+.}: lock_acquire+0xbd/0x220 cpuhp_issue_call+0xea/0x170 __cpuhp_setup_state_cpuslocked+0x12a/0x190 __cpuhp_setup_state+0x46/0xc0 page_writeback_init+0x43/0x67 pagecache_init+0x39/0x3c start_kernel+0x45a/0x4ae x86_64_start_reservations+0x24/0x26 x86_64_start_kernel+0x13d/0x14c verify_cpu+0x0/0xf1 -> #3 (cpuhp_state_mutex){+.+.+.}: lock_acquire+0xbd/0x220 __mutex_lock+0x6e/0x900 mutex_lock_nested+0x1b/0x20 __cpuhp_setup_state_cpuslocked+0x4f/0x190 __cpuhp_setup_state+0x46/0xc0 page_alloc_init+0x28/0x30 start_kernel+0x186/0x4ae x86_64_start_reservations+0x24/0x26 x86_64_start_kernel+0x13d/0x14c verify_cpu+0x0/0xf1 -> #2 (cpu_hotplug_lock.rw_sem){++++++}: lock_acquire+0xbd/0x220 cpus_read_lock+0x46/0x90 apply_workqueue_attrs+0x17/0x50 __alloc_workqueue_key+0x195/0x4d0 _iwl_pcie_rx_init+0x384/0x390 [iwlwifi] iwl_pcie_rx_init+0x1e/0x380 [iwlwifi] iwl_trans_pcie_start_fw+0x295/0x6f0 [iwlwifi] iwl_mvm_load_ucode_wait_alive+0xe7/0x390 [iwlmvm] iwl_run_init_mvm_ucode+0x84/0x320 [iwlmvm] iwl_op_mode_mvm_start+0x964/0xd30 [iwlmvm] _iwl_op_mode_start.isra.9+0x47/0xa0 [iwlwifi] iwl_opmode_register+0xaa/0xd0 [iwlwifi] iwl_mvm_init+0x37/0x1000 [iwlmvm] do_one_initcall+0x51/0x1a9 do_init_module+0x60/0x20e load_module+0x203f/0x2b50 SYSC_finit_module+0x96/0xd0 SyS_finit_module+0xe/0x10 entry_SYSCALL_64_fastpath+0x23/0xc2 -> #1 (&trans_pcie->mutex){+.+.+.}: lock_acquire+0xbd/0x220 __mutex_lock+0x6e/0x900 mutex_lock_nested+0x1b/0x20 iwl_trans_pcie_start_fw+0x130/0x6f0 [iwlwifi] iwl_mvm_load_ucode_wait_alive+0xe7/0x390 [iwlmvm] iwl_run_init_mvm_ucode+0x84/0x320 [iwlmvm] iwl_op_mode_mvm_start+0x964/0xd30 [iwlmvm] _iwl_op_mode_start.isra.9+0x47/0xa0 [iwlwifi] iwl_opmode_register+0xaa/0xd0 [iwlwifi] iwl_mvm_init+0x37/0x1000 [iwlmvm] do_one_initcall+0x51/0x1a9 do_init_module+0x60/0x20e load_module+0x203f/0x2b50 SYSC_finit_module+0x96/0xd0 SyS_finit_module+0xe/0x10 entry_SYSCALL_64_fastpath+0x23/0xc2 -> #0 (&mvm->mutex){+.+.+.}: __lock_acquire+0x13e1/0x1400 lock_acquire+0xbd/0x220 __mutex_lock+0x6e/0x900 mutex_lock_nested+0x1b/0x20 iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm] thermal_zone_get_temp+0x51/0x70 thermal_zone_device_update+0x3c/0x280 thermal_zone_device_register+0x3b8/0x610 iwl_mvm_thermal_initialize+0x1d1/0x3a0 [iwlmvm] iwl_op_mode_mvm_start+0xa1d/0xd30 [iwlmvm] _iwl_op_mode_start.isra.9+0x47/0xa0 [iwlwifi] iwl_opmode_register+0xaa/0xd0 [iwlwifi] iwl_mvm_init+0x37/0x1000 [iwlmvm] do_one_initcall+0x51/0x1a9 do_init_module+0x60/0x20e load_module+0x203f/0x2b50 SYSC_finit_module+0x96/0xd0 SyS_finit_module+0xe/0x10 entry_SYSCALL_64_fastpath+0x23/0xc2 other info that might help us debug this: Chain exists of: &mvm->mutex --> cpuhp_state --> &tz->lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&tz->lock); lock(cpuhp_state); lock(&tz->lock); lock(&mvm->mutex); *** DEADLOCK *** 2 locks held by modprobe/881: #0: (&iwlwifi_opmode_table_mtx){+.+.+.}, at: [<ffffffffc0aa2df4>] iwl_opmode_register+0x24/0xd0 [iwlwifi] #1: (&tz->lock){+.+.+.}, at: [<ffffffff926a9351>] thermal_zone_get_temp+0x41/0x70 stack backtrace: CPU: 3 PID: 881 Comm: modprobe Not tainted 4.13.0-rc2-00110-g0b5477d #347 Hardware name: LENOVO 20K5S22R00/20K5S22R00, BIOS R0IET38W (1.16 ) 05/31/2017 Call Trace: dump_stack+0x85/0xc9 print_circular_bug+0x1f9/0x207 __lock_acquire+0x13e1/0x1400 lock_acquire+0xbd/0x220 ? iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm] ? iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm] __mutex_lock+0x6e/0x900 ? iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm] ? thermal_zone_get_temp+0x41/0x70 ? iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm] ? thermal_zone_get_temp+0x41/0x70 ? find_held_lock+0x39/0xb0 mutex_lock_nested+0x1b/0x20 iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm] thermal_zone_get_temp+0x51/0x70 thermal_zone_device_update+0x3c/0x280 thermal_zone_device_register+0x3b8/0x610 iwl_mvm_thermal_initialize+0x1d1/0x3a0 [iwlmvm] iwl_op_mode_mvm_start+0xa1d/0xd30 [iwlmvm] _iwl_op_mode_start.isra.9+0x47/0xa0 [iwlwifi] iwl_opmode_register+0xaa/0xd0 [iwlwifi] iwl_mvm_init+0x37/0x1000 [iwlmvm] ? 0xffffffffc0c87000 do_one_initcall+0x51/0x1a9 ? rcu_read_lock_sched_held+0x98/0xa0 ? kmem_cache_alloc_trace+0x2a5/0x340 do_init_module+0x60/0x20e load_module+0x203f/0x2b50 ? __symbol_put+0x50/0x50 SYSC_finit_module+0x96/0xd0 SyS_finit_module+0xe/0x10 entry_SYSCALL_64_fastpath+0x23/0xc2 RIP: 0033:0x7f2ef067cc89 RSP: 002b:00007ffea2ea3d78 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f2ef067cc89 RDX: 0000000000000000 RSI: 000000000041af06 RDI: 0000000000000001 RBP: 0000000000000005 R08: 0000000000000000 R09: 000000000096b230 R10: 0000000000000001 R11: 0000000000000246 R12: 00007ffea2ea2d80 R13: 00007ffea2ea2d60 R14: 0000000000000005 R15: 000000000096f3c0 thermal thermal_zone3: failed to read out thermal zone (-5) -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: x86/thermal: AB-BA dependency between mvm->mutex and tz->lock 2017-07-31 11:18 x86/thermal: AB-BA dependency between mvm->mutex and tz->lock Jiri Kosina @ 2017-08-03 9:10 ` Jiri Kosina 2017-08-03 9:43 ` [linuxwifi] " Coelho, Luciano 0 siblings, 1 reply; 20+ messages in thread From: Jiri Kosina @ 2017-08-03 9:10 UTC (permalink / raw) To: Zhang Rui, Eduardo Valentin, Kalle Valo, Sara Sharon, Johannes Berg, Emmanuel Grumbach, Luca Coelho Cc: linux-pm, linux-kernel, Intel Linux Wireless On Mon, 31 Jul 2017, Jiri Kosina wrote: > Hi, > > booting current Linus' tree, I'm seeing lockdep splat (see the end of this > mail). > > Apparently, there is AB-BA between tz->lock and mvm->mutex through the CPU > hotplug lock. > > The obivous depency is: thermal_zone_get_temp() acquires tz->lock, and > then calls iwl_mvm_tzone_get_temp() (through tz->ops->get_temp() > callback), which acquires mvm->mutex > > The less obvious dependency is primarily caused by iwl_op_mode_mvm_start() > allocating workqueue (#2 stacktrace) while holding mvm->mutex (which is > broken, because that mutex is being taken also from CPU hotplug callback > path, hence the AB-BA). As the "central" part of the dependency is being added by iwlwifi driver (_iwl_pcie_rx_init() allocating workqueue while holding trans_pcie->mutex), I'm adding iwlwifi folks as well to CC. > > ====================================================== > WARNING: possible circular locking dependency detected > 4.13.0-rc2-00110-g0b5477d #347 Not tainted > ------------------------------------------------------ > modprobe/881 is trying to acquire lock: > (&mvm->mutex){+.+.+.}, at: [<ffffffffc0c504d2>] iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm] > > but task is already holding lock: > (&tz->lock){+.+.+.}, at: [<ffffffff926a9351>] thermal_zone_get_temp+0x41/0x70 > > which lock already depends on the new lock. > > > the existing dependency chain (in reverse order) is: > > -> #5 (&tz->lock){+.+.+.}: > lock_acquire+0xbd/0x220 > __mutex_lock+0x6e/0x900 > mutex_lock_nested+0x1b/0x20 > thermal_zone_get_temp+0x41/0x70 > thermal_zone_device_update+0x3c/0x280 > thermal_zone_device_register+0x3b8/0x610 > pkg_thermal_cpu_online+0x20b/0x284 [x86_pkg_temp_thermal] > cpuhp_invoke_callback+0xac/0x900 > cpuhp_thread_fun+0x79/0x160 > smpboot_thread_fn+0x156/0x220 > kthread+0x114/0x150 > ret_from_fork+0x2a/0x40 > > -> #4 (cpuhp_state){+.+.+.}: > lock_acquire+0xbd/0x220 > cpuhp_issue_call+0xea/0x170 > __cpuhp_setup_state_cpuslocked+0x12a/0x190 > __cpuhp_setup_state+0x46/0xc0 > page_writeback_init+0x43/0x67 > pagecache_init+0x39/0x3c > start_kernel+0x45a/0x4ae > x86_64_start_reservations+0x24/0x26 > x86_64_start_kernel+0x13d/0x14c > verify_cpu+0x0/0xf1 > > -> #3 (cpuhp_state_mutex){+.+.+.}: > lock_acquire+0xbd/0x220 > __mutex_lock+0x6e/0x900 > mutex_lock_nested+0x1b/0x20 > __cpuhp_setup_state_cpuslocked+0x4f/0x190 > __cpuhp_setup_state+0x46/0xc0 > page_alloc_init+0x28/0x30 > start_kernel+0x186/0x4ae > x86_64_start_reservations+0x24/0x26 > x86_64_start_kernel+0x13d/0x14c > verify_cpu+0x0/0xf1 > > -> #2 (cpu_hotplug_lock.rw_sem){++++++}: > lock_acquire+0xbd/0x220 > cpus_read_lock+0x46/0x90 > apply_workqueue_attrs+0x17/0x50 > __alloc_workqueue_key+0x195/0x4d0 > _iwl_pcie_rx_init+0x384/0x390 [iwlwifi] > iwl_pcie_rx_init+0x1e/0x380 [iwlwifi] > iwl_trans_pcie_start_fw+0x295/0x6f0 [iwlwifi] > iwl_mvm_load_ucode_wait_alive+0xe7/0x390 [iwlmvm] > iwl_run_init_mvm_ucode+0x84/0x320 [iwlmvm] > iwl_op_mode_mvm_start+0x964/0xd30 [iwlmvm] > _iwl_op_mode_start.isra.9+0x47/0xa0 [iwlwifi] > iwl_opmode_register+0xaa/0xd0 [iwlwifi] > iwl_mvm_init+0x37/0x1000 [iwlmvm] > do_one_initcall+0x51/0x1a9 > do_init_module+0x60/0x20e > load_module+0x203f/0x2b50 > SYSC_finit_module+0x96/0xd0 > SyS_finit_module+0xe/0x10 > entry_SYSCALL_64_fastpath+0x23/0xc2 > > -> #1 (&trans_pcie->mutex){+.+.+.}: > lock_acquire+0xbd/0x220 > __mutex_lock+0x6e/0x900 > mutex_lock_nested+0x1b/0x20 > iwl_trans_pcie_start_fw+0x130/0x6f0 [iwlwifi] > iwl_mvm_load_ucode_wait_alive+0xe7/0x390 [iwlmvm] > iwl_run_init_mvm_ucode+0x84/0x320 [iwlmvm] > iwl_op_mode_mvm_start+0x964/0xd30 [iwlmvm] > _iwl_op_mode_start.isra.9+0x47/0xa0 [iwlwifi] > iwl_opmode_register+0xaa/0xd0 [iwlwifi] > iwl_mvm_init+0x37/0x1000 [iwlmvm] > do_one_initcall+0x51/0x1a9 > do_init_module+0x60/0x20e > load_module+0x203f/0x2b50 > SYSC_finit_module+0x96/0xd0 > SyS_finit_module+0xe/0x10 > entry_SYSCALL_64_fastpath+0x23/0xc2 > > -> #0 (&mvm->mutex){+.+.+.}: > __lock_acquire+0x13e1/0x1400 > lock_acquire+0xbd/0x220 > __mutex_lock+0x6e/0x900 > mutex_lock_nested+0x1b/0x20 > iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm] > thermal_zone_get_temp+0x51/0x70 > thermal_zone_device_update+0x3c/0x280 > thermal_zone_device_register+0x3b8/0x610 > iwl_mvm_thermal_initialize+0x1d1/0x3a0 [iwlmvm] > iwl_op_mode_mvm_start+0xa1d/0xd30 [iwlmvm] > _iwl_op_mode_start.isra.9+0x47/0xa0 [iwlwifi] > iwl_opmode_register+0xaa/0xd0 [iwlwifi] > iwl_mvm_init+0x37/0x1000 [iwlmvm] > do_one_initcall+0x51/0x1a9 > do_init_module+0x60/0x20e > load_module+0x203f/0x2b50 > SYSC_finit_module+0x96/0xd0 > SyS_finit_module+0xe/0x10 > entry_SYSCALL_64_fastpath+0x23/0xc2 > > other info that might help us debug this: > > Chain exists of: > &mvm->mutex --> cpuhp_state --> &tz->lock > > Possible unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > lock(&tz->lock); > lock(cpuhp_state); > lock(&tz->lock); > lock(&mvm->mutex); > > *** DEADLOCK *** > > 2 locks held by modprobe/881: > #0: (&iwlwifi_opmode_table_mtx){+.+.+.}, at: [<ffffffffc0aa2df4>] iwl_opmode_register+0x24/0xd0 [iwlwifi] > #1: (&tz->lock){+.+.+.}, at: [<ffffffff926a9351>] thermal_zone_get_temp+0x41/0x70 > > stack backtrace: > CPU: 3 PID: 881 Comm: modprobe Not tainted 4.13.0-rc2-00110-g0b5477d #347 > Hardware name: LENOVO 20K5S22R00/20K5S22R00, BIOS R0IET38W (1.16 ) 05/31/2017 > Call Trace: > dump_stack+0x85/0xc9 > print_circular_bug+0x1f9/0x207 > __lock_acquire+0x13e1/0x1400 > lock_acquire+0xbd/0x220 > ? iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm] > ? iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm] > __mutex_lock+0x6e/0x900 > ? iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm] > ? thermal_zone_get_temp+0x41/0x70 > ? iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm] > ? thermal_zone_get_temp+0x41/0x70 > ? find_held_lock+0x39/0xb0 > mutex_lock_nested+0x1b/0x20 > iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm] > thermal_zone_get_temp+0x51/0x70 > thermal_zone_device_update+0x3c/0x280 > thermal_zone_device_register+0x3b8/0x610 > iwl_mvm_thermal_initialize+0x1d1/0x3a0 [iwlmvm] > iwl_op_mode_mvm_start+0xa1d/0xd30 [iwlmvm] > _iwl_op_mode_start.isra.9+0x47/0xa0 [iwlwifi] > iwl_opmode_register+0xaa/0xd0 [iwlwifi] > iwl_mvm_init+0x37/0x1000 [iwlmvm] > ? 0xffffffffc0c87000 > do_one_initcall+0x51/0x1a9 > ? rcu_read_lock_sched_held+0x98/0xa0 > ? kmem_cache_alloc_trace+0x2a5/0x340 > do_init_module+0x60/0x20e > load_module+0x203f/0x2b50 > ? __symbol_put+0x50/0x50 > SYSC_finit_module+0x96/0xd0 > SyS_finit_module+0xe/0x10 > entry_SYSCALL_64_fastpath+0x23/0xc2 > RIP: 0033:0x7f2ef067cc89 > RSP: 002b:00007ffea2ea3d78 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 > RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f2ef067cc89 > RDX: 0000000000000000 RSI: 000000000041af06 RDI: 0000000000000001 > RBP: 0000000000000005 R08: 0000000000000000 R09: 000000000096b230 > R10: 0000000000000001 R11: 0000000000000246 R12: 00007ffea2ea2d80 > R13: 00007ffea2ea2d60 R14: 0000000000000005 R15: 000000000096f3c0 > thermal thermal_zone3: failed to read out thermal zone (-5) > > > -- > Jiri Kosina > SUSE Labs > -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [linuxwifi] x86/thermal: AB-BA dependency between mvm->mutex and tz->lock 2017-08-03 9:10 ` Jiri Kosina @ 2017-08-03 9:43 ` Coelho, Luciano 2017-08-03 10:02 ` Kalle Valo 0 siblings, 1 reply; 20+ messages in thread From: Coelho, Luciano @ 2017-08-03 9:43 UTC (permalink / raw) To: jikos, Zhang, Rui, kvalo, edubezval, Sharon, Sara, Berg, Johannes, Grumbach, Emmanuel Cc: linuxwifi, linux-kernel, linux-pm, Weinehall, David On Thu, 2017-08-03 at 11:10 +0200, Jiri Kosina wrote: > On Mon, 31 Jul 2017, Jiri Kosina wrote: > > > Hi, > > > > booting current Linus' tree, I'm seeing lockdep splat (see the end of this > > mail). > > > > Apparently, there is AB-BA between tz->lock and mvm->mutex through the CPU > > hotplug lock. > > > > The obivous depency is: thermal_zone_get_temp() acquires tz->lock, and > > then calls iwl_mvm_tzone_get_temp() (through tz->ops->get_temp() > > callback), which acquires mvm->mutex > > > > The less obvious dependency is primarily caused by iwl_op_mode_mvm_start() > > allocating workqueue (#2 stacktrace) while holding mvm->mutex (which is > > broken, because that mutex is being taken also from CPU hotplug callback > > path, hence the AB-BA). > > As the "central" part of the dependency is being added by iwlwifi driver > (_iwl_pcie_rx_init() allocating workqueue while holding > trans_pcie->mutex), I'm adding iwlwifi folks as well to CC. > > > > > ====================================================== > > WARNING: possible circular locking dependency detected > > 4.13.0-rc2-00110-g0b5477d #347 Not tainted > > ------------------------------------------------------ > > modprobe/881 is trying to acquire lock: > > (&mvm->mutex){+.+.+.}, at: [<ffffffffc0c504d2>] iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm] > > > > but task is already holding lock: > > (&tz->lock){+.+.+.}, at: [<ffffffff926a9351>] thermal_zone_get_temp+0x41/0x70 > > > > which lock already depends on the new lock. > > > > > > the existing dependency chain (in reverse order) is: > > > > -> #5 (&tz->lock){+.+.+.}: > > lock_acquire+0xbd/0x220 > > __mutex_lock+0x6e/0x900 > > mutex_lock_nested+0x1b/0x20 > > thermal_zone_get_temp+0x41/0x70 > > thermal_zone_device_update+0x3c/0x280 > > thermal_zone_device_register+0x3b8/0x610 > > pkg_thermal_cpu_online+0x20b/0x284 [x86_pkg_temp_thermal] > > cpuhp_invoke_callback+0xac/0x900 > > cpuhp_thread_fun+0x79/0x160 > > smpboot_thread_fn+0x156/0x220 > > kthread+0x114/0x150 > > ret_from_fork+0x2a/0x40 > > > > -> #4 (cpuhp_state){+.+.+.}: > > lock_acquire+0xbd/0x220 > > cpuhp_issue_call+0xea/0x170 > > __cpuhp_setup_state_cpuslocked+0x12a/0x190 > > __cpuhp_setup_state+0x46/0xc0 > > page_writeback_init+0x43/0x67 > > pagecache_init+0x39/0x3c > > start_kernel+0x45a/0x4ae > > x86_64_start_reservations+0x24/0x26 > > x86_64_start_kernel+0x13d/0x14c > > verify_cpu+0x0/0xf1 > > > > -> #3 (cpuhp_state_mutex){+.+.+.}: > > lock_acquire+0xbd/0x220 > > __mutex_lock+0x6e/0x900 > > mutex_lock_nested+0x1b/0x20 > > __cpuhp_setup_state_cpuslocked+0x4f/0x190 > > __cpuhp_setup_state+0x46/0xc0 > > page_alloc_init+0x28/0x30 > > start_kernel+0x186/0x4ae > > x86_64_start_reservations+0x24/0x26 > > x86_64_start_kernel+0x13d/0x14c > > verify_cpu+0x0/0xf1 > > > > -> #2 (cpu_hotplug_lock.rw_sem){++++++}: > > lock_acquire+0xbd/0x220 > > cpus_read_lock+0x46/0x90 > > apply_workqueue_attrs+0x17/0x50 > > __alloc_workqueue_key+0x195/0x4d0 > > _iwl_pcie_rx_init+0x384/0x390 [iwlwifi] > > iwl_pcie_rx_init+0x1e/0x380 [iwlwifi] > > iwl_trans_pcie_start_fw+0x295/0x6f0 [iwlwifi] > > iwl_mvm_load_ucode_wait_alive+0xe7/0x390 [iwlmvm] > > iwl_run_init_mvm_ucode+0x84/0x320 [iwlmvm] > > iwl_op_mode_mvm_start+0x964/0xd30 [iwlmvm] > > _iwl_op_mode_start.isra.9+0x47/0xa0 [iwlwifi] > > iwl_opmode_register+0xaa/0xd0 [iwlwifi] > > iwl_mvm_init+0x37/0x1000 [iwlmvm] > > do_one_initcall+0x51/0x1a9 > > do_init_module+0x60/0x20e > > load_module+0x203f/0x2b50 > > SYSC_finit_module+0x96/0xd0 > > SyS_finit_module+0xe/0x10 > > entry_SYSCALL_64_fastpath+0x23/0xc2 > > > > -> #1 (&trans_pcie->mutex){+.+.+.}: > > lock_acquire+0xbd/0x220 > > __mutex_lock+0x6e/0x900 > > mutex_lock_nested+0x1b/0x20 > > iwl_trans_pcie_start_fw+0x130/0x6f0 [iwlwifi] > > iwl_mvm_load_ucode_wait_alive+0xe7/0x390 [iwlmvm] > > iwl_run_init_mvm_ucode+0x84/0x320 [iwlmvm] > > iwl_op_mode_mvm_start+0x964/0xd30 [iwlmvm] > > _iwl_op_mode_start.isra.9+0x47/0xa0 [iwlwifi] > > iwl_opmode_register+0xaa/0xd0 [iwlwifi] > > iwl_mvm_init+0x37/0x1000 [iwlmvm] > > do_one_initcall+0x51/0x1a9 > > do_init_module+0x60/0x20e > > load_module+0x203f/0x2b50 > > SYSC_finit_module+0x96/0xd0 > > SyS_finit_module+0xe/0x10 > > entry_SYSCALL_64_fastpath+0x23/0xc2 > > > > -> #0 (&mvm->mutex){+.+.+.}: > > __lock_acquire+0x13e1/0x1400 > > lock_acquire+0xbd/0x220 > > __mutex_lock+0x6e/0x900 > > mutex_lock_nested+0x1b/0x20 > > iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm] > > thermal_zone_get_temp+0x51/0x70 > > thermal_zone_device_update+0x3c/0x280 > > thermal_zone_device_register+0x3b8/0x610 > > iwl_mvm_thermal_initialize+0x1d1/0x3a0 [iwlmvm] > > iwl_op_mode_mvm_start+0xa1d/0xd30 [iwlmvm] > > _iwl_op_mode_start.isra.9+0x47/0xa0 [iwlwifi] > > iwl_opmode_register+0xaa/0xd0 [iwlwifi] > > iwl_mvm_init+0x37/0x1000 [iwlmvm] > > do_one_initcall+0x51/0x1a9 > > do_init_module+0x60/0x20e > > load_module+0x203f/0x2b50 > > SYSC_finit_module+0x96/0xd0 > > SyS_finit_module+0xe/0x10 > > entry_SYSCALL_64_fastpath+0x23/0xc2 > > > > other info that might help us debug this: > > > > Chain exists of: > > &mvm->mutex --> cpuhp_state --> &tz->lock > > > > Possible unsafe locking scenario: > > > > CPU0 CPU1 > > ---- ---- > > lock(&tz->lock); > > lock(cpuhp_state); > > lock(&tz->lock); > > lock(&mvm->mutex); > > > > *** DEADLOCK *** > > > > 2 locks held by modprobe/881: > > #0: (&iwlwifi_opmode_table_mtx){+.+.+.}, at: [<ffffffffc0aa2df4>] iwl_opmode_register+0x24/0xd0 [iwlwifi] > > #1: (&tz->lock){+.+.+.}, at: [<ffffffff926a9351>] thermal_zone_get_temp+0x41/0x70 > > > > stack backtrace: > > CPU: 3 PID: 881 Comm: modprobe Not tainted 4.13.0-rc2-00110-g0b5477d #347 > > Hardware name: LENOVO 20K5S22R00/20K5S22R00, BIOS R0IET38W (1.16 ) 05/31/2017 > > Call Trace: > > dump_stack+0x85/0xc9 > > print_circular_bug+0x1f9/0x207 > > __lock_acquire+0x13e1/0x1400 > > lock_acquire+0xbd/0x220 > > ? iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm] > > ? iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm] > > __mutex_lock+0x6e/0x900 > > ? iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm] > > ? thermal_zone_get_temp+0x41/0x70 > > ? iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm] > > ? thermal_zone_get_temp+0x41/0x70 > > ? find_held_lock+0x39/0xb0 > > mutex_lock_nested+0x1b/0x20 > > iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm] > > thermal_zone_get_temp+0x51/0x70 > > thermal_zone_device_update+0x3c/0x280 > > thermal_zone_device_register+0x3b8/0x610 > > iwl_mvm_thermal_initialize+0x1d1/0x3a0 [iwlmvm] > > iwl_op_mode_mvm_start+0xa1d/0xd30 [iwlmvm] > > _iwl_op_mode_start.isra.9+0x47/0xa0 [iwlwifi] > > iwl_opmode_register+0xaa/0xd0 [iwlwifi] > > iwl_mvm_init+0x37/0x1000 [iwlmvm] > > ? 0xffffffffc0c87000 > > do_one_initcall+0x51/0x1a9 > > ? rcu_read_lock_sched_held+0x98/0xa0 > > ? kmem_cache_alloc_trace+0x2a5/0x340 > > do_init_module+0x60/0x20e > > load_module+0x203f/0x2b50 > > ? __symbol_put+0x50/0x50 > > SYSC_finit_module+0x96/0xd0 > > SyS_finit_module+0xe/0x10 > > entry_SYSCALL_64_fastpath+0x23/0xc2 > > RIP: 0033:0x7f2ef067cc89 > > RSP: 002b:00007ffea2ea3d78 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 > > RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f2ef067cc89 > > RDX: 0000000000000000 RSI: 000000000041af06 RDI: 0000000000000001 > > RBP: 0000000000000005 R08: 0000000000000000 R09: 000000000096b230 > > R10: 0000000000000001 R11: 0000000000000246 R12: 00007ffea2ea2d80 > > R13: 00007ffea2ea2d60 R14: 0000000000000005 R15: 000000000096f3c0 > > thermal thermal_zone3: failed to read out thermal zone (-5) CCing David Weinehall who also just reported this to me. We'll check this ASAP. Thanks for reporting! -- Luca. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [linuxwifi] x86/thermal: AB-BA dependency between mvm->mutex and tz->lock 2017-08-03 9:43 ` [linuxwifi] " Coelho, Luciano @ 2017-08-03 10:02 ` Kalle Valo 2017-08-03 11:28 ` Coelho, Luciano 0 siblings, 1 reply; 20+ messages in thread From: Kalle Valo @ 2017-08-03 10:02 UTC (permalink / raw) To: Coelho, Luciano Cc: jikos, Zhang, Rui, edubezval, Sharon, Sara, Berg, Johannes, Grumbach, Emmanuel, linuxwifi, linux-kernel, linux-pm, Weinehall, David, linux-wireless "Coelho, Luciano" <luciano.coelho@intel.com> writes: > On Thu, 2017-08-03 at 11:10 +0200, Jiri Kosina wrote: >> On Mon, 31 Jul 2017, Jiri Kosina wrote: >> >> > Hi, >> > >> > booting current Linus' tree, I'm seeing lockdep splat (see the end of this >> > mail). >> > >> > Apparently, there is AB-BA between tz->lock and mvm->mutex through the CPU >> > hotplug lock. >> > >> > The obivous depency is: thermal_zone_get_temp() acquires tz->lock, and >> > then calls iwl_mvm_tzone_get_temp() (through tz->ops->get_temp() >> > callback), which acquires mvm->mutex >> > >> > The less obvious dependency is primarily caused by iwl_op_mode_mvm_start() >> > allocating workqueue (#2 stacktrace) while holding mvm->mutex (which is >> > broken, because that mutex is being taken also from CPU hotplug callback >> > path, hence the AB-BA). >> >> As the "central" part of the dependency is being added by iwlwifi driver >> (_iwl_pcie_rx_init() allocating workqueue while holding >> trans_pcie->mutex), I'm adding iwlwifi folks as well to CC. >> >> > >> > ====================================================== >> > WARNING: possible circular locking dependency detected >> > 4.13.0-rc2-00110-g0b5477d #347 Not tainted >> > ------------------------------------------------------ >> > modprobe/881 is trying to acquire lock: >> > (&mvm->mutex){+.+.+.}, at: [<ffffffffc0c504d2>] iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm] >> > >> > but task is already holding lock: >> > (&tz->lock){+.+.+.}, at: [<ffffffff926a9351>] thermal_zone_get_temp+0x41/0x70 >> > >> > which lock already depends on the new lock. >> > >> > >> > the existing dependency chain (in reverse order) is: >> > >> > -> #5 (&tz->lock){+.+.+.}: >> > lock_acquire+0xbd/0x220 >> > __mutex_lock+0x6e/0x900 >> > mutex_lock_nested+0x1b/0x20 >> > thermal_zone_get_temp+0x41/0x70 >> > thermal_zone_device_update+0x3c/0x280 >> > thermal_zone_device_register+0x3b8/0x610 >> > pkg_thermal_cpu_online+0x20b/0x284 [x86_pkg_temp_thermal] >> > cpuhp_invoke_callback+0xac/0x900 >> > cpuhp_thread_fun+0x79/0x160 >> > smpboot_thread_fn+0x156/0x220 >> > kthread+0x114/0x150 >> > ret_from_fork+0x2a/0x40 >> > >> > -> #4 (cpuhp_state){+.+.+.}: >> > lock_acquire+0xbd/0x220 >> > cpuhp_issue_call+0xea/0x170 >> > __cpuhp_setup_state_cpuslocked+0x12a/0x190 >> > __cpuhp_setup_state+0x46/0xc0 >> > page_writeback_init+0x43/0x67 >> > pagecache_init+0x39/0x3c >> > start_kernel+0x45a/0x4ae >> > x86_64_start_reservations+0x24/0x26 >> > x86_64_start_kernel+0x13d/0x14c >> > verify_cpu+0x0/0xf1 >> > >> > -> #3 (cpuhp_state_mutex){+.+.+.}: >> > lock_acquire+0xbd/0x220 >> > __mutex_lock+0x6e/0x900 >> > mutex_lock_nested+0x1b/0x20 >> > __cpuhp_setup_state_cpuslocked+0x4f/0x190 >> > __cpuhp_setup_state+0x46/0xc0 >> > page_alloc_init+0x28/0x30 >> > start_kernel+0x186/0x4ae >> > x86_64_start_reservations+0x24/0x26 >> > x86_64_start_kernel+0x13d/0x14c >> > verify_cpu+0x0/0xf1 >> > >> > -> #2 (cpu_hotplug_lock.rw_sem){++++++}: >> > lock_acquire+0xbd/0x220 >> > cpus_read_lock+0x46/0x90 >> > apply_workqueue_attrs+0x17/0x50 >> > __alloc_workqueue_key+0x195/0x4d0 >> > _iwl_pcie_rx_init+0x384/0x390 [iwlwifi] >> > iwl_pcie_rx_init+0x1e/0x380 [iwlwifi] >> > iwl_trans_pcie_start_fw+0x295/0x6f0 [iwlwifi] >> > iwl_mvm_load_ucode_wait_alive+0xe7/0x390 [iwlmvm] >> > iwl_run_init_mvm_ucode+0x84/0x320 [iwlmvm] >> > iwl_op_mode_mvm_start+0x964/0xd30 [iwlmvm] >> > _iwl_op_mode_start.isra.9+0x47/0xa0 [iwlwifi] >> > iwl_opmode_register+0xaa/0xd0 [iwlwifi] >> > iwl_mvm_init+0x37/0x1000 [iwlmvm] >> > do_one_initcall+0x51/0x1a9 >> > do_init_module+0x60/0x20e >> > load_module+0x203f/0x2b50 >> > SYSC_finit_module+0x96/0xd0 >> > SyS_finit_module+0xe/0x10 >> > entry_SYSCALL_64_fastpath+0x23/0xc2 >> > >> > -> #1 (&trans_pcie->mutex){+.+.+.}: >> > lock_acquire+0xbd/0x220 >> > __mutex_lock+0x6e/0x900 >> > mutex_lock_nested+0x1b/0x20 >> > iwl_trans_pcie_start_fw+0x130/0x6f0 [iwlwifi] >> > iwl_mvm_load_ucode_wait_alive+0xe7/0x390 [iwlmvm] >> > iwl_run_init_mvm_ucode+0x84/0x320 [iwlmvm] >> > iwl_op_mode_mvm_start+0x964/0xd30 [iwlmvm] >> > _iwl_op_mode_start.isra.9+0x47/0xa0 [iwlwifi] >> > iwl_opmode_register+0xaa/0xd0 [iwlwifi] >> > iwl_mvm_init+0x37/0x1000 [iwlmvm] >> > do_one_initcall+0x51/0x1a9 >> > do_init_module+0x60/0x20e >> > load_module+0x203f/0x2b50 >> > SYSC_finit_module+0x96/0xd0 >> > SyS_finit_module+0xe/0x10 >> > entry_SYSCALL_64_fastpath+0x23/0xc2 >> > >> > -> #0 (&mvm->mutex){+.+.+.}: >> > __lock_acquire+0x13e1/0x1400 >> > lock_acquire+0xbd/0x220 >> > __mutex_lock+0x6e/0x900 >> > mutex_lock_nested+0x1b/0x20 >> > iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm] >> > thermal_zone_get_temp+0x51/0x70 >> > thermal_zone_device_update+0x3c/0x280 >> > thermal_zone_device_register+0x3b8/0x610 >> > iwl_mvm_thermal_initialize+0x1d1/0x3a0 [iwlmvm] >> > iwl_op_mode_mvm_start+0xa1d/0xd30 [iwlmvm] >> > _iwl_op_mode_start.isra.9+0x47/0xa0 [iwlwifi] >> > iwl_opmode_register+0xaa/0xd0 [iwlwifi] >> > iwl_mvm_init+0x37/0x1000 [iwlmvm] >> > do_one_initcall+0x51/0x1a9 >> > do_init_module+0x60/0x20e >> > load_module+0x203f/0x2b50 >> > SYSC_finit_module+0x96/0xd0 >> > SyS_finit_module+0xe/0x10 >> > entry_SYSCALL_64_fastpath+0x23/0xc2 >> > >> > other info that might help us debug this: >> > >> > Chain exists of: >> > &mvm->mutex --> cpuhp_state --> &tz->lock >> > >> > Possible unsafe locking scenario: >> > >> > CPU0 CPU1 >> > ---- ---- >> > lock(&tz->lock); >> > lock(cpuhp_state); >> > lock(&tz->lock); >> > lock(&mvm->mutex); >> > >> > *** DEADLOCK *** >> > >> > 2 locks held by modprobe/881: >> > #0: (&iwlwifi_opmode_table_mtx){+.+.+.}, at: [<ffffffffc0aa2df4>] iwl_opmode_register+0x24/0xd0 [iwlwifi] >> > #1: (&tz->lock){+.+.+.}, at: [<ffffffff926a9351>] thermal_zone_get_temp+0x41/0x70 >> > >> > stack backtrace: >> > CPU: 3 PID: 881 Comm: modprobe Not tainted 4.13.0-rc2-00110-g0b5477d #347 >> > Hardware name: LENOVO 20K5S22R00/20K5S22R00, BIOS R0IET38W (1.16 ) 05/31/2017 >> > Call Trace: >> > dump_stack+0x85/0xc9 >> > print_circular_bug+0x1f9/0x207 >> > __lock_acquire+0x13e1/0x1400 >> > lock_acquire+0xbd/0x220 >> > ? iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm] >> > ? iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm] >> > __mutex_lock+0x6e/0x900 >> > ? iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm] >> > ? thermal_zone_get_temp+0x41/0x70 >> > ? iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm] >> > ? thermal_zone_get_temp+0x41/0x70 >> > ? find_held_lock+0x39/0xb0 >> > mutex_lock_nested+0x1b/0x20 >> > iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm] >> > thermal_zone_get_temp+0x51/0x70 >> > thermal_zone_device_update+0x3c/0x280 >> > thermal_zone_device_register+0x3b8/0x610 >> > iwl_mvm_thermal_initialize+0x1d1/0x3a0 [iwlmvm] >> > iwl_op_mode_mvm_start+0xa1d/0xd30 [iwlmvm] >> > _iwl_op_mode_start.isra.9+0x47/0xa0 [iwlwifi] >> > iwl_opmode_register+0xaa/0xd0 [iwlwifi] >> > iwl_mvm_init+0x37/0x1000 [iwlmvm] >> > ? 0xffffffffc0c87000 >> > do_one_initcall+0x51/0x1a9 >> > ? rcu_read_lock_sched_held+0x98/0xa0 >> > ? kmem_cache_alloc_trace+0x2a5/0x340 >> > do_init_module+0x60/0x20e >> > load_module+0x203f/0x2b50 >> > ? __symbol_put+0x50/0x50 >> > SYSC_finit_module+0x96/0xd0 >> > SyS_finit_module+0xe/0x10 >> > entry_SYSCALL_64_fastpath+0x23/0xc2 >> > RIP: 0033:0x7f2ef067cc89 >> > RSP: 002b:00007ffea2ea3d78 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 >> > RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f2ef067cc89 >> > RDX: 0000000000000000 RSI: 000000000041af06 RDI: 0000000000000001 >> > RBP: 0000000000000005 R08: 0000000000000000 R09: 000000000096b230 >> > R10: 0000000000000001 R11: 0000000000000246 R12: 00007ffea2ea2d80 >> > R13: 00007ffea2ea2d60 R14: 0000000000000005 R15: 000000000096f3c0 >> > thermal thermal_zone3: failed to read out thermal zone (-5) > > CCing David Weinehall who also just reported this to me. > > We'll check this ASAP. Thanks for reporting! Adding linux-wireless also to the loop. -- Kalle Valo ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [linuxwifi] x86/thermal: AB-BA dependency between mvm->mutex and tz->lock 2017-08-03 10:02 ` Kalle Valo @ 2017-08-03 11:28 ` Coelho, Luciano 2017-08-03 11:34 ` Jiri Kosina 0 siblings, 1 reply; 20+ messages in thread From: Coelho, Luciano @ 2017-08-03 11:28 UTC (permalink / raw) To: jikos Cc: linux-kernel, linuxwifi, Zhang, Rui, edubezval, linux-pm, Weinehall, David, Berg, Johannes, kvalo, Sharon, Sara, linux-wireless, Grumbach, Emmanuel On Thu, 2017-08-03 at 13:02 +0300, Kalle Valo wrote: > "Coelho, Luciano" <luciano.coelho@intel.com> writes: > > > On Thu, 2017-08-03 at 11:10 +0200, Jiri Kosina wrote: > > > On Mon, 31 Jul 2017, Jiri Kosina wrote: > > > > > > > Hi, > > > > > > > > booting current Linus' tree, I'm seeing lockdep splat (see the end of this > > > > mail). > > > > > > > > Apparently, there is AB-BA between tz->lock and mvm->mutex through the CPU > > > > hotplug lock. > > > > > > > > The obivous depency is: thermal_zone_get_temp() acquires tz->lock, and > > > > then calls iwl_mvm_tzone_get_temp() (through tz->ops->get_temp() > > > > callback), which acquires mvm->mutex > > > > > > > > The less obvious dependency is primarily caused by iwl_op_mode_mvm_start() > > > > allocating workqueue (#2 stacktrace) while holding mvm->mutex (which is > > > > broken, because that mutex is being taken also from CPU hotplug callback > > > > path, hence the AB-BA). > > > > > > As the "central" part of the dependency is being added by iwlwifi driver > > > (_iwl_pcie_rx_init() allocating workqueue while holding > > > trans_pcie->mutex), I'm adding iwlwifi folks as well to CC. [...] > > > > -> #2 (cpu_hotplug_lock.rw_sem){++++++}: > > > > lock_acquire+0xbd/0x220 > > > > cpus_read_lock+0x46/0x90 > > > > apply_workqueue_attrs+0x17/0x50 > > > > __alloc_workqueue_key+0x195/0x4d0 > > > > _iwl_pcie_rx_init+0x384/0x390 [iwlwifi] > > > > iwl_pcie_rx_init+0x1e/0x380 [iwlwifi] > > > > iwl_trans_pcie_start_fw+0x295/0x6f0 [iwlwifi] > > > > iwl_mvm_load_ucode_wait_alive+0xe7/0x390 [iwlmvm] > > > > iwl_run_init_mvm_ucode+0x84/0x320 [iwlmvm] > > > > iwl_op_mode_mvm_start+0x964/0xd30 [iwlmvm] > > > > _iwl_op_mode_start.isra.9+0x47/0xa0 [iwlwifi] > > > > iwl_opmode_register+0xaa/0xd0 [iwlwifi] > > > > iwl_mvm_init+0x37/0x1000 [iwlmvm] > > > > do_one_initcall+0x51/0x1a9 > > > > do_init_module+0x60/0x20e > > > > load_module+0x203f/0x2b50 > > > > SYSC_finit_module+0x96/0xd0 > > > > SyS_finit_module+0xe/0x10 > > > > entry_SYSCALL_64_fastpath+0x23/0xc2 Okay, so as I understand it the problem has been there for a long time, but the splat is only coming up now because of Thomas' patch that adds the lockdep map[1], right? I see the workqueue allocation you mentioned. I'll try to move this allocation out of the mutex and see how it goes. [1] http://lkml.kernel.org/r/20170524081549.709375845@linutronix.de -- Cheers, Luca. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [linuxwifi] x86/thermal: AB-BA dependency between mvm->mutex and tz->lock 2017-08-03 11:28 ` Coelho, Luciano @ 2017-08-03 11:34 ` Jiri Kosina 2017-08-03 13:09 ` Jiri Kosina 0 siblings, 1 reply; 20+ messages in thread From: Jiri Kosina @ 2017-08-03 11:34 UTC (permalink / raw) To: Coelho, Luciano Cc: linux-kernel, linuxwifi, Zhang, Rui, edubezval, linux-pm, Weinehall, David, Berg, Johannes, kvalo, Sharon, Sara, linux-wireless, Grumbach, Emmanuel On Thu, 3 Aug 2017, Coelho, Luciano wrote: > Okay, so as I understand it the problem has been there for a long time, > but the splat is only coming up now because of Thomas' patch that adds > the lockdep map[1], right? Yeah, sorry, forgot to mention that pre-49dfe2a67797 kernels wouldn't produce this, as there would not be aware of the fact that cpus_read_lock() is actually semantically a lock. > I see the workqueue allocation you mentioned. I'll try to move this > allocation out of the mutex and see how it goes. I have been briefly looking into this as well -- it'll basically have to be moved out of the trans_pcie->mutex context, but (a) I'm not sure whether that's actually safe (b) iwl_pcie_rx_reuse_rbd() (which is where corresponding work is being queued) is not a proper context either (it's atomic context) Thanks, -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [linuxwifi] x86/thermal: AB-BA dependency between mvm->mutex and tz->lock 2017-08-03 11:34 ` Jiri Kosina @ 2017-08-03 13:09 ` Jiri Kosina 2017-08-17 13:38 ` Jiri Kosina 0 siblings, 1 reply; 20+ messages in thread From: Jiri Kosina @ 2017-08-03 13:09 UTC (permalink / raw) To: Coelho, Luciano Cc: linux-kernel, linuxwifi, Zhang, Rui, edubezval, linux-pm, Weinehall, David, Berg, Johannes, kvalo, Sharon, Sara, linux-wireless, Grumbach, Emmanuel On Thu, 3 Aug 2017, Jiri Kosina wrote: > > I see the workqueue allocation you mentioned. I'll try to move this > > allocation out of the mutex and see how it goes. > > I have been briefly looking into this as well -- it'll basically have to > be moved out of the trans_pcie->mutex context, but > > (a) I'm not sure whether that's actually safe > (b) iwl_pcie_rx_reuse_rbd() (which is where corresponding work is being > queued) is not a proper context either (it's atomic context) Actually moving it out of trans_pcie->mutex is likely not to be enough, the dependency would still be there, just the graph will have one vertex less, with the dependency going directly from mvm mutex to cpu_hotplug_lock. -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [linuxwifi] x86/thermal: AB-BA dependency between mvm->mutex and tz->lock 2017-08-03 13:09 ` Jiri Kosina @ 2017-08-17 13:38 ` Jiri Kosina 2017-08-17 14:02 ` Coelho, Luciano ` (2 more replies) 0 siblings, 3 replies; 20+ messages in thread From: Jiri Kosina @ 2017-08-17 13:38 UTC (permalink / raw) To: Coelho, Luciano Cc: linux-kernel, linuxwifi, Zhang, Rui, edubezval, linux-pm, Weinehall, David, Berg, Johannes, kvalo, Sharon, Sara, linux-wireless, Grumbach, Emmanuel Hi, anything new on this front please? The splat (and therefore deadlock potential) is still there with current Linus' tree. Thanks, -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [linuxwifi] x86/thermal: AB-BA dependency between mvm->mutex and tz->lock 2017-08-17 13:38 ` Jiri Kosina @ 2017-08-17 14:02 ` Coelho, Luciano 2017-08-22 7:32 ` [PATCH] iwlwifi: pcie: move rx workqueue initialization to iwl_trans_pcie_alloc() Luca Coelho 2017-08-22 7:37 ` [PATCH v2] " Luca Coelho 2 siblings, 0 replies; 20+ messages in thread From: Coelho, Luciano @ 2017-08-17 14:02 UTC (permalink / raw) To: jikos Cc: linux-kernel, linuxwifi, Zhang, Rui, Weinehall, David, linux-pm, edubezval, Berg, Johannes, kvalo, Sharon, Sara, linux-wireless, Grumbach, Emmanuel On Thu, 2017-08-17 at 15:38 +0200, Jiri Kosina wrote: > Hi, > > anything new on this front please? > > The splat (and therefore deadlock potential) is still there with current > Linus' tree. Sorry, haven't had more time to spend on it. I'll do it this evening. But, just to clarify, the deadlock potential has been there for a while, right? The only difference is that now we get the splat. Not saying we shouldn't fix it though. ;) -- Luca. ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH] iwlwifi: pcie: move rx workqueue initialization to iwl_trans_pcie_alloc() 2017-08-17 13:38 ` Jiri Kosina 2017-08-17 14:02 ` Coelho, Luciano @ 2017-08-22 7:32 ` Luca Coelho 2017-08-22 7:37 ` [PATCH v2] " Luca Coelho 2 siblings, 0 replies; 20+ messages in thread From: Luca Coelho @ 2017-08-22 7:32 UTC (permalink / raw) To: jikos, linux-wireless Cc: luciano.coelho, linux-kernel, linuxwifi, rui.zhang, edubezval, linux-pm, david.weinehall, johannes.berg, kvalo, sara.sharon, emmanuel.grumbach From: Luca Coelho <luciano.coelho@intel.com> Work queues cannot be allocated in when a mutex is held because the mutex may be in use and that would make it sleep. Doing so generates the following splat with 4.13+: [ 19.513298] ====================================================== [ 19.513429] WARNING: possible circular locking dependency detected [ 19.513557] 4.13.0-rc5+ #6 Not tainted [ 19.513638] ------------------------------------------------------ [ 19.513767] cpuhp/0/12 is trying to acquire lock: [ 19.513867] (&tz->lock){+.+.+.}, at: [<ffffffff924afebb>] thermal_zone_get_temp+0x5b/0xb0 [ 19.514047] [ 19.514047] but task is already holding lock: [ 19.514166] (cpuhp_state){+.+.+.}, at: [<ffffffff91cc4baa>] cpuhp_thread_fun+0x3a/0x210 [ 19.514338] [ 19.514338] which lock already depends on the new lock. This lock dependency already existed with previous kernel versions, but it was not detected until commit ... was introduced. Reported-by: David Weinehall <david.weinehall@intel.com> Reported-by: Jiri Kosina <jikos@kernel.org> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> --- drivers/net/wireless/intel/iwlwifi/pcie/internal.h | 2 ++ drivers/net/wireless/intel/iwlwifi/pcie/rx.c | 10 +--------- drivers/net/wireless/intel/iwlwifi/pcie/trans.c | 9 +++++++++ 3 files changed, 12 insertions(+), 9 deletions(-) diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/internal.h b/drivers/net/wireless/intel/iwlwifi/pcie/internal.h index fa315d84e98e..a1ea9ef97ed9 100644 --- a/drivers/net/wireless/intel/iwlwifi/pcie/internal.h +++ b/drivers/net/wireless/intel/iwlwifi/pcie/internal.h @@ -787,6 +787,8 @@ int iwl_pci_fw_enter_d0i3(struct iwl_trans *trans); void iwl_pcie_enable_rx_wake(struct iwl_trans *trans, bool enable); +void iwl_pcie_rx_allocator_work(struct work_struct *data); + /* common functions that are used by gen2 transport */ void iwl_pcie_apm_config(struct iwl_trans *trans); int iwl_pcie_prepare_card_hw(struct iwl_trans *trans); diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/rx.c b/drivers/net/wireless/intel/iwlwifi/pcie/rx.c index 351c4423125a..942736d3fa75 100644 --- a/drivers/net/wireless/intel/iwlwifi/pcie/rx.c +++ b/drivers/net/wireless/intel/iwlwifi/pcie/rx.c @@ -597,7 +597,7 @@ static void iwl_pcie_rx_allocator_get(struct iwl_trans *trans, rxq->free_count += RX_CLAIM_REQ_ALLOC; } -static void iwl_pcie_rx_allocator_work(struct work_struct *data) +void iwl_pcie_rx_allocator_work(struct work_struct *data) { struct iwl_rb_allocator *rba_p = container_of(data, struct iwl_rb_allocator, rx_alloc); @@ -900,10 +900,6 @@ static int _iwl_pcie_rx_init(struct iwl_trans *trans) return err; } def_rxq = trans_pcie->rxq; - if (!rba->alloc_wq) - rba->alloc_wq = alloc_workqueue("rb_allocator", - WQ_HIGHPRI | WQ_UNBOUND, 1); - INIT_WORK(&rba->rx_alloc, iwl_pcie_rx_allocator_work); spin_lock(&rba->lock); atomic_set(&rba->req_pending, 0); @@ -1017,10 +1013,6 @@ void iwl_pcie_rx_free(struct iwl_trans *trans) } cancel_work_sync(&rba->rx_alloc); - if (rba->alloc_wq) { - destroy_workqueue(rba->alloc_wq); - rba->alloc_wq = NULL; - } iwl_pcie_free_rbs_pool(trans); diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c index f95eec52508e..3927bbf04f72 100644 --- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c +++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c @@ -1786,6 +1786,11 @@ void iwl_trans_pcie_free(struct iwl_trans *trans) iwl_pcie_tx_free(trans); iwl_pcie_rx_free(trans); + if (trans_pcie->rba.alloc_wq) { + destroy_workqueue(trans_pcie->rba.alloc_wq); + trans_pcie->rba.alloc_wq = NULL; + } + if (trans_pcie->msix_enabled) { for (i = 0; i < trans_pcie->alloc_vecs; i++) { irq_set_affinity_hint( @@ -3169,6 +3174,10 @@ struct iwl_trans *iwl_trans_pcie_alloc(struct pci_dev *pdev, trans_pcie->inta_mask = CSR_INI_SET_MASK; } + trans_pcie->rba.alloc_wq = alloc_workqueue("rb_allocator", + WQ_HIGHPRI | WQ_UNBOUND, 1); + INIT_WORK(&trans_pcie->rba.rx_alloc, iwl_pcie_rx_allocator_work); + #ifdef CONFIG_IWLWIFI_PCIE_RTPM trans->runtime_pm_mode = IWL_PLAT_PM_MODE_D0I3; #else -- 2.14.1 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2] iwlwifi: pcie: move rx workqueue initialization to iwl_trans_pcie_alloc() 2017-08-17 13:38 ` Jiri Kosina 2017-08-17 14:02 ` Coelho, Luciano 2017-08-22 7:32 ` [PATCH] iwlwifi: pcie: move rx workqueue initialization to iwl_trans_pcie_alloc() Luca Coelho @ 2017-08-22 7:37 ` Luca Coelho 2017-08-24 6:00 ` Luca Coelho ` (2 more replies) 2 siblings, 3 replies; 20+ messages in thread From: Luca Coelho @ 2017-08-22 7:37 UTC (permalink / raw) To: jikos, linux-wireless Cc: luciano.coelho, linux-kernel, linuxwifi, rui.zhang, edubezval, linux-pm, david.weinehall, johannes.berg, kvalo, sara.sharon, emmanuel.grumbach From: Luca Coelho <luciano.coelho@intel.com> Work queues cannot be allocated when a mutex is held because the mutex may be in use and that would make it sleep. Doing so generates the following splat with 4.13+: [ 19.513298] ====================================================== [ 19.513429] WARNING: possible circular locking dependency detected [ 19.513557] 4.13.0-rc5+ #6 Not tainted [ 19.513638] ------------------------------------------------------ [ 19.513767] cpuhp/0/12 is trying to acquire lock: [ 19.513867] (&tz->lock){+.+.+.}, at: [<ffffffff924afebb>] thermal_zone_get_temp+0x5b/0xb0 [ 19.514047] [ 19.514047] but task is already holding lock: [ 19.514166] (cpuhp_state){+.+.+.}, at: [<ffffffff91cc4baa>] cpuhp_thread_fun+0x3a/0x210 [ 19.514338] [ 19.514338] which lock already depends on the new lock. This lock dependency already existed with previous kernel versions, but it was not detected until commit 49dfe2a67797 ("cpuhotplug: Link lock stacks for hotplug callbacks") was introduced. Reported-by: David Weinehall <david.weinehall@intel.com> Reported-by: Jiri Kosina <jikos@kernel.org> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> --- In v2: - updated the commit message to a new version, with a grammar fix and the actual commit that exposed the problem; drivers/net/wireless/intel/iwlwifi/pcie/internal.h | 2 ++ drivers/net/wireless/intel/iwlwifi/pcie/rx.c | 10 +--------- drivers/net/wireless/intel/iwlwifi/pcie/trans.c | 9 +++++++++ 3 files changed, 12 insertions(+), 9 deletions(-) diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/internal.h b/drivers/net/wireless/intel/iwlwifi/pcie/internal.h index fa315d84e98e..a1ea9ef97ed9 100644 --- a/drivers/net/wireless/intel/iwlwifi/pcie/internal.h +++ b/drivers/net/wireless/intel/iwlwifi/pcie/internal.h @@ -787,6 +787,8 @@ int iwl_pci_fw_enter_d0i3(struct iwl_trans *trans); void iwl_pcie_enable_rx_wake(struct iwl_trans *trans, bool enable); +void iwl_pcie_rx_allocator_work(struct work_struct *data); + /* common functions that are used by gen2 transport */ void iwl_pcie_apm_config(struct iwl_trans *trans); int iwl_pcie_prepare_card_hw(struct iwl_trans *trans); diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/rx.c b/drivers/net/wireless/intel/iwlwifi/pcie/rx.c index 351c4423125a..942736d3fa75 100644 --- a/drivers/net/wireless/intel/iwlwifi/pcie/rx.c +++ b/drivers/net/wireless/intel/iwlwifi/pcie/rx.c @@ -597,7 +597,7 @@ static void iwl_pcie_rx_allocator_get(struct iwl_trans *trans, rxq->free_count += RX_CLAIM_REQ_ALLOC; } -static void iwl_pcie_rx_allocator_work(struct work_struct *data) +void iwl_pcie_rx_allocator_work(struct work_struct *data) { struct iwl_rb_allocator *rba_p = container_of(data, struct iwl_rb_allocator, rx_alloc); @@ -900,10 +900,6 @@ static int _iwl_pcie_rx_init(struct iwl_trans *trans) return err; } def_rxq = trans_pcie->rxq; - if (!rba->alloc_wq) - rba->alloc_wq = alloc_workqueue("rb_allocator", - WQ_HIGHPRI | WQ_UNBOUND, 1); - INIT_WORK(&rba->rx_alloc, iwl_pcie_rx_allocator_work); spin_lock(&rba->lock); atomic_set(&rba->req_pending, 0); @@ -1017,10 +1013,6 @@ void iwl_pcie_rx_free(struct iwl_trans *trans) } cancel_work_sync(&rba->rx_alloc); - if (rba->alloc_wq) { - destroy_workqueue(rba->alloc_wq); - rba->alloc_wq = NULL; - } iwl_pcie_free_rbs_pool(trans); diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c index f95eec52508e..3927bbf04f72 100644 --- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c +++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c @@ -1786,6 +1786,11 @@ void iwl_trans_pcie_free(struct iwl_trans *trans) iwl_pcie_tx_free(trans); iwl_pcie_rx_free(trans); + if (trans_pcie->rba.alloc_wq) { + destroy_workqueue(trans_pcie->rba.alloc_wq); + trans_pcie->rba.alloc_wq = NULL; + } + if (trans_pcie->msix_enabled) { for (i = 0; i < trans_pcie->alloc_vecs; i++) { irq_set_affinity_hint( @@ -3169,6 +3174,10 @@ struct iwl_trans *iwl_trans_pcie_alloc(struct pci_dev *pdev, trans_pcie->inta_mask = CSR_INI_SET_MASK; } + trans_pcie->rba.alloc_wq = alloc_workqueue("rb_allocator", + WQ_HIGHPRI | WQ_UNBOUND, 1); + INIT_WORK(&trans_pcie->rba.rx_alloc, iwl_pcie_rx_allocator_work); + #ifdef CONFIG_IWLWIFI_PCIE_RTPM trans->runtime_pm_mode = IWL_PLAT_PM_MODE_D0I3; #else -- 2.14.1 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v2] iwlwifi: pcie: move rx workqueue initialization to iwl_trans_pcie_alloc() 2017-08-22 7:37 ` [PATCH v2] " Luca Coelho @ 2017-08-24 6:00 ` Luca Coelho 2017-08-24 19:56 ` Jiri Kosina 2017-08-24 13:50 ` [v2] " Kalle Valo 2017-08-30 14:57 ` [PATCH v2] " David Weinehall 2 siblings, 1 reply; 20+ messages in thread From: Luca Coelho @ 2017-08-24 6:00 UTC (permalink / raw) To: jikos, linux-wireless Cc: linux-kernel, linuxwifi, rui.zhang, edubezval, linux-pm, david.weinehall, johannes.berg, kvalo, sara.sharon, emmanuel.grumbach On Tue, 2017-08-22 at 10:37 +0300, Luca Coelho wrote: > From: Luca Coelho <luciano.coelho@intel.com> > > Work queues cannot be allocated when a mutex is held because the mutex > may be in use and that would make it sleep. Doing so generates the > following splat with 4.13+: > > [ 19.513298] ====================================================== > [ 19.513429] WARNING: possible circular locking dependency detected > [ 19.513557] 4.13.0-rc5+ #6 Not tainted > [ 19.513638] ------------------------------------------------------ > [ 19.513767] cpuhp/0/12 is trying to acquire lock: > [ 19.513867] (&tz->lock){+.+.+.}, at: [<ffffffff924afebb>] thermal_zone_get_temp+0x5b/0xb0 > [ 19.514047] > [ 19.514047] but task is already holding lock: > [ 19.514166] (cpuhp_state){+.+.+.}, at: [<ffffffff91cc4baa>] cpuhp_thread_fun+0x3a/0x210 > [ 19.514338] > [ 19.514338] which lock already depends on the new lock. > > This lock dependency already existed with previous kernel versions, > but it was not detected until commit 49dfe2a67797 ("cpuhotplug: Link > lock stacks for hotplug callbacks") was introduced. > > Reported-by: David Weinehall <david.weinehall@intel.com> > Reported-by: Jiri Kosina <jikos@kernel.org> > Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Jiri, did you have a chance to try this out? I'm about to ask Kalle to merge this so it gets in in time for 4.13-rc7. -- Cheers, Luca. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] iwlwifi: pcie: move rx workqueue initialization to iwl_trans_pcie_alloc() 2017-08-24 6:00 ` Luca Coelho @ 2017-08-24 19:56 ` Jiri Kosina 2017-08-24 20:00 ` Luca Coelho 0 siblings, 1 reply; 20+ messages in thread From: Jiri Kosina @ 2017-08-24 19:56 UTC (permalink / raw) To: Luca Coelho Cc: linux-wireless, linux-kernel, linuxwifi, rui.zhang, edubezval, linux-pm, david.weinehall, johannes.berg, kvalo, sara.sharon, emmanuel.grumbach On Thu, 24 Aug 2017, Luca Coelho wrote: > > Work queues cannot be allocated when a mutex is held because the mutex > > may be in use and that would make it sleep. Doing so generates the > > following splat with 4.13+: > > > > [ 19.513298] ====================================================== > > [ 19.513429] WARNING: possible circular locking dependency detected > > [ 19.513557] 4.13.0-rc5+ #6 Not tainted > > [ 19.513638] ------------------------------------------------------ > > [ 19.513767] cpuhp/0/12 is trying to acquire lock: > > [ 19.513867] (&tz->lock){+.+.+.}, at: [<ffffffff924afebb>] thermal_zone_get_temp+0x5b/0xb0 > > [ 19.514047] > > [ 19.514047] but task is already holding lock: > > [ 19.514166] (cpuhp_state){+.+.+.}, at: [<ffffffff91cc4baa>] cpuhp_thread_fun+0x3a/0x210 > > [ 19.514338] > > [ 19.514338] which lock already depends on the new lock. > > > > This lock dependency already existed with previous kernel versions, > > but it was not detected until commit 49dfe2a67797 ("cpuhotplug: Link > > lock stacks for hotplug callbacks") was introduced. > > > > Reported-by: David Weinehall <david.weinehall@intel.com> > > Reported-by: Jiri Kosina <jikos@kernel.org> > > Signed-off-by: Luca Coelho <luciano.coelho@intel.com> > > Jiri, did you have a chance to try this out? I'm about to ask Kalle to > merge this so it gets in in time for 4.13-rc7. Sorry, I am almost completely offline for one more week (vacation), and will not have access to the affected system before that. But this indeed looks like a correct fix to me, so feel free to add Acked-by: Jiri Kosina <jkosina@suse.cz> I'll be able to provide my Tested-by: eventually only in ~10 days. Thanks, -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] iwlwifi: pcie: move rx workqueue initialization to iwl_trans_pcie_alloc() 2017-08-24 19:56 ` Jiri Kosina @ 2017-08-24 20:00 ` Luca Coelho 2017-08-29 8:19 ` Kalle Valo 2017-09-04 11:43 ` Jiri Kosina 0 siblings, 2 replies; 20+ messages in thread From: Luca Coelho @ 2017-08-24 20:00 UTC (permalink / raw) To: Jiri Kosina Cc: linux-wireless, linux-kernel, linuxwifi, rui.zhang, edubezval, linux-pm, david.weinehall, johannes.berg, kvalo, sara.sharon, emmanuel.grumbach On Thu, 2017-08-24 at 21:56 +0200, Jiri Kosina wrote: > On Thu, 24 Aug 2017, Luca Coelho wrote: > > > > Work queues cannot be allocated when a mutex is held because the mutex > > > may be in use and that would make it sleep. Doing so generates the > > > following splat with 4.13+: > > > > > > [ 19.513298] ====================================================== > > > [ 19.513429] WARNING: possible circular locking dependency detected > > > [ 19.513557] 4.13.0-rc5+ #6 Not tainted > > > [ 19.513638] ------------------------------------------------------ > > > [ 19.513767] cpuhp/0/12 is trying to acquire lock: > > > [ 19.513867] (&tz->lock){+.+.+.}, at: [<ffffffff924afebb>] thermal_zone_get_temp+0x5b/0xb0 > > > [ 19.514047] > > > [ 19.514047] but task is already holding lock: > > > [ 19.514166] (cpuhp_state){+.+.+.}, at: [<ffffffff91cc4baa>] cpuhp_thread_fun+0x3a/0x210 > > > [ 19.514338] > > > [ 19.514338] which lock already depends on the new lock. > > > > > > This lock dependency already existed with previous kernel versions, > > > but it was not detected until commit 49dfe2a67797 ("cpuhotplug: Link > > > lock stacks for hotplug callbacks") was introduced. > > > > > > Reported-by: David Weinehall <david.weinehall@intel.com> > > > Reported-by: Jiri Kosina <jikos@kernel.org> > > > Signed-off-by: Luca Coelho <luciano.coelho@intel.com> > > > > Jiri, did you have a chance to try this out? I'm about to ask Kalle to > > merge this so it gets in in time for 4.13-rc7. > > Sorry, I am almost completely offline for one more week (vacation), and > will not have access to the affected system before that. Sounds good! Enjoy! ;) > But this indeed > looks like a correct fix to me, so feel free to add > > Acked-by: Jiri Kosina <jkosina@suse.cz> > > I'll be able to provide my Tested-by: eventually only in ~10 days. Kalle already picked it up in wireless-drivers and this should make it's way to 4.13-rc7 (we hope). In any case, thanks for reporting and the help debugging it. -- Cheers, Luca. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] iwlwifi: pcie: move rx workqueue initialization to iwl_trans_pcie_alloc() 2017-08-24 20:00 ` Luca Coelho @ 2017-08-29 8:19 ` Kalle Valo 2017-09-04 11:43 ` Jiri Kosina 1 sibling, 0 replies; 20+ messages in thread From: Kalle Valo @ 2017-08-29 8:19 UTC (permalink / raw) To: Luca Coelho Cc: Jiri Kosina, linux-wireless, linux-kernel, linuxwifi, rui.zhang, edubezval, linux-pm, david.weinehall, johannes.berg, sara.sharon, emmanuel.grumbach Luca Coelho <luca@coelho.fi> writes: > On Thu, 2017-08-24 at 21:56 +0200, Jiri Kosina wrote: >> On Thu, 24 Aug 2017, Luca Coelho wrote: >> >> > > Work queues cannot be allocated when a mutex is held because the mutex >> > > may be in use and that would make it sleep. Doing so generates the >> > > following splat with 4.13+: >> > > >> > > [ 19.513298] ====================================================== >> > > [ 19.513429] WARNING: possible circular locking dependency detected >> > > [ 19.513557] 4.13.0-rc5+ #6 Not tainted >> > > [ 19.513638] ------------------------------------------------------ >> > > [ 19.513767] cpuhp/0/12 is trying to acquire lock: >> > > [ 19.513867] (&tz->lock){+.+.+.}, at: [<ffffffff924afebb>] >> > > thermal_zone_get_temp+0x5b/0xb0 >> > > [ 19.514047] >> > > [ 19.514047] but task is already holding lock: >> > > [ 19.514166] (cpuhp_state){+.+.+.}, at: [<ffffffff91cc4baa>] >> > > cpuhp_thread_fun+0x3a/0x210 >> > > [ 19.514338] >> > > [ 19.514338] which lock already depends on the new lock. >> > > >> > > This lock dependency already existed with previous kernel versions, >> > > but it was not detected until commit 49dfe2a67797 ("cpuhotplug: Link >> > > lock stacks for hotplug callbacks") was introduced. >> > > >> > > Reported-by: David Weinehall <david.weinehall@intel.com> >> > > Reported-by: Jiri Kosina <jikos@kernel.org> >> > > Signed-off-by: Luca Coelho <luciano.coelho@intel.com> >> > >> > Jiri, did you have a chance to try this out? I'm about to ask Kalle to >> > merge this so it gets in in time for 4.13-rc7. >> >> Sorry, I am almost completely offline for one more week (vacation), and >> will not have access to the affected system before that. > > Sounds good! Enjoy! ;) > > >> But this indeed >> looks like a correct fix to me, so feel free to add >> >> Acked-by: Jiri Kosina <jkosina@suse.cz> >> >> I'll be able to provide my Tested-by: eventually only in ~10 days. > > > Kalle already picked it up in wireless-drivers and this should make it's > way to 4.13-rc7 (we hope). It (10a54d8196d1) didn't make it to -rc7 but it's in net tree now and should make it to the next release on Sunday (either -rc8 or the final). -- Kalle Valo ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] iwlwifi: pcie: move rx workqueue initialization to iwl_trans_pcie_alloc() 2017-08-24 20:00 ` Luca Coelho 2017-08-29 8:19 ` Kalle Valo @ 2017-09-04 11:43 ` Jiri Kosina 2017-09-04 11:44 ` Luca Coelho 1 sibling, 1 reply; 20+ messages in thread From: Jiri Kosina @ 2017-09-04 11:43 UTC (permalink / raw) To: Luca Coelho Cc: linux-wireless, linux-kernel, linuxwifi, rui.zhang, edubezval, linux-pm, david.weinehall, johannes.berg, kvalo, sara.sharon, emmanuel.grumbach On Thu, 24 Aug 2017, Luca Coelho wrote: > > looks like a correct fix to me, so feel free to add > > > > Acked-by: Jiri Kosina <jkosina@suse.cz> > > > > I'll be able to provide my Tested-by: eventually only in ~10 days. > > > Kalle already picked it up in wireless-drivers and this should make it's > way to 4.13-rc7 (we hope). > > In any case, thanks for reporting and the help debugging it. I know it's pretty late for this to be added to the commit, but I don't want this to be left in the void, so for the sake of completness: Tested-by: Jiri Kosina <jkosina@suse.cz> Thanks, -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] iwlwifi: pcie: move rx workqueue initialization to iwl_trans_pcie_alloc() 2017-09-04 11:43 ` Jiri Kosina @ 2017-09-04 11:44 ` Luca Coelho 0 siblings, 0 replies; 20+ messages in thread From: Luca Coelho @ 2017-09-04 11:44 UTC (permalink / raw) To: Jiri Kosina Cc: linux-wireless, linux-kernel, linuxwifi, rui.zhang, edubezval, linux-pm, david.weinehall, johannes.berg, kvalo, sara.sharon, emmanuel.grumbach On Mon, 2017-09-04 at 13:43 +0200, Jiri Kosina wrote: > On Thu, 24 Aug 2017, Luca Coelho wrote: > > > > looks like a correct fix to me, so feel free to add > > > > > > Acked-by: Jiri Kosina <jkosina@suse.cz> > > > > > > I'll be able to provide my Tested-by: eventually only in ~10 days. > > > > > > Kalle already picked it up in wireless-drivers and this should make it's > > way to 4.13-rc7 (we hope). > > > > In any case, thanks for reporting and the help debugging it. > > I know it's pretty late for this to be added to the commit, but I don't > want this to be left in the void, so for the sake of completness: > > Tested-by: Jiri Kosina <jkosina@suse.cz> Thanks, Jiri, for reporting, debugging and testing! -- Cheers, Luca. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [v2] iwlwifi: pcie: move rx workqueue initialization to iwl_trans_pcie_alloc() 2017-08-22 7:37 ` [PATCH v2] " Luca Coelho 2017-08-24 6:00 ` Luca Coelho @ 2017-08-24 13:50 ` Kalle Valo 2017-08-30 14:57 ` [PATCH v2] " David Weinehall 2 siblings, 0 replies; 20+ messages in thread From: Kalle Valo @ 2017-08-24 13:50 UTC (permalink / raw) To: Luciano Coelho Cc: jikos, linux-wireless, luciano.coelho, linux-kernel, linuxwifi, rui.zhang, edubezval, linux-pm, david.weinehall, johannes.berg, sara.sharon, emmanuel.grumbach Luciano Coelho <luca@coelho.fi> wrote: > From: Luca Coelho <luciano.coelho@intel.com> > > Work queues cannot be allocated when a mutex is held because the mutex > may be in use and that would make it sleep. Doing so generates the > following splat with 4.13+: > > [ 19.513298] ====================================================== > [ 19.513429] WARNING: possible circular locking dependency detected > [ 19.513557] 4.13.0-rc5+ #6 Not tainted > [ 19.513638] ------------------------------------------------------ > [ 19.513767] cpuhp/0/12 is trying to acquire lock: > [ 19.513867] (&tz->lock){+.+.+.}, at: [<ffffffff924afebb>] thermal_zone_get_temp+0x5b/0xb0 > [ 19.514047] > [ 19.514047] but task is already holding lock: > [ 19.514166] (cpuhp_state){+.+.+.}, at: [<ffffffff91cc4baa>] cpuhp_thread_fun+0x3a/0x210 > [ 19.514338] > [ 19.514338] which lock already depends on the new lock. > > This lock dependency already existed with previous kernel versions, > but it was not detected until commit 49dfe2a67797 ("cpuhotplug: Link > lock stacks for hotplug callbacks") was introduced. > > Reported-by: David Weinehall <david.weinehall@intel.com> > Reported-by: Jiri Kosina <jikos@kernel.org> > Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Patch applied to wireless-drivers.git, thanks. 10a54d8196d1 iwlwifi: pcie: move rx workqueue initialization to iwl_trans_pcie_alloc() -- https://patchwork.kernel.org/patch/9914349/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] iwlwifi: pcie: move rx workqueue initialization to iwl_trans_pcie_alloc() 2017-08-22 7:37 ` [PATCH v2] " Luca Coelho 2017-08-24 6:00 ` Luca Coelho 2017-08-24 13:50 ` [v2] " Kalle Valo @ 2017-08-30 14:57 ` David Weinehall 2017-08-31 6:11 ` Luca Coelho 2 siblings, 1 reply; 20+ messages in thread From: David Weinehall @ 2017-08-30 14:57 UTC (permalink / raw) To: Luca Coelho Cc: jikos, linux-wireless, luciano.coelho, linux-kernel, linuxwifi, rui.zhang, edubezval, linux-pm, david.weinehall, johannes.berg, kvalo, sara.sharon, emmanuel.grumbach On Tue, Aug 22, 2017 at 10:37:29AM +0300, Luca Coelho wrote: > From: Luca Coelho <luciano.coelho@intel.com> > > Work queues cannot be allocated when a mutex is held because the mutex > may be in use and that would make it sleep. Doing so generates the > following splat with 4.13+: > > [ 19.513298] ====================================================== > [ 19.513429] WARNING: possible circular locking dependency detected > [ 19.513557] 4.13.0-rc5+ #6 Not tainted > [ 19.513638] ------------------------------------------------------ > [ 19.513767] cpuhp/0/12 is trying to acquire lock: > [ 19.513867] (&tz->lock){+.+.+.}, at: [<ffffffff924afebb>] thermal_zone_get_temp+0x5b/0xb0 > [ 19.514047] > [ 19.514047] but task is already holding lock: > [ 19.514166] (cpuhp_state){+.+.+.}, at: [<ffffffff91cc4baa>] cpuhp_thread_fun+0x3a/0x210 > [ 19.514338] > [ 19.514338] which lock already depends on the new lock. > > This lock dependency already existed with previous kernel versions, > but it was not detected until commit 49dfe2a67797 ("cpuhotplug: Link > lock stacks for hotplug callbacks") was introduced. > > Reported-by: David Weinehall <david.weinehall@intel.com> > Reported-by: Jiri Kosina <jikos@kernel.org> > Signed-off-by: Luca Coelho <luciano.coelho@intel.com> With this patch I no longer get the lockdep warning, and the driver seems to work just as well as before. Thanks! Tested-by: David Weinehall <david.weinehall@linux.intel.com> > --- > In v2: > - updated the commit message to a new version, with a grammar fix > and the actual commit that exposed the problem; > > drivers/net/wireless/intel/iwlwifi/pcie/internal.h | 2 ++ > drivers/net/wireless/intel/iwlwifi/pcie/rx.c | 10 +--------- > drivers/net/wireless/intel/iwlwifi/pcie/trans.c | 9 +++++++++ > 3 files changed, 12 insertions(+), 9 deletions(-) > > diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/internal.h b/drivers/net/wireless/intel/iwlwifi/pcie/internal.h > index fa315d84e98e..a1ea9ef97ed9 100644 > --- a/drivers/net/wireless/intel/iwlwifi/pcie/internal.h > +++ b/drivers/net/wireless/intel/iwlwifi/pcie/internal.h > @@ -787,6 +787,8 @@ int iwl_pci_fw_enter_d0i3(struct iwl_trans *trans); > > void iwl_pcie_enable_rx_wake(struct iwl_trans *trans, bool enable); > > +void iwl_pcie_rx_allocator_work(struct work_struct *data); > + > /* common functions that are used by gen2 transport */ > void iwl_pcie_apm_config(struct iwl_trans *trans); > int iwl_pcie_prepare_card_hw(struct iwl_trans *trans); > diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/rx.c b/drivers/net/wireless/intel/iwlwifi/pcie/rx.c > index 351c4423125a..942736d3fa75 100644 > --- a/drivers/net/wireless/intel/iwlwifi/pcie/rx.c > +++ b/drivers/net/wireless/intel/iwlwifi/pcie/rx.c > @@ -597,7 +597,7 @@ static void iwl_pcie_rx_allocator_get(struct iwl_trans *trans, > rxq->free_count += RX_CLAIM_REQ_ALLOC; > } > > -static void iwl_pcie_rx_allocator_work(struct work_struct *data) > +void iwl_pcie_rx_allocator_work(struct work_struct *data) > { > struct iwl_rb_allocator *rba_p = > container_of(data, struct iwl_rb_allocator, rx_alloc); > @@ -900,10 +900,6 @@ static int _iwl_pcie_rx_init(struct iwl_trans *trans) > return err; > } > def_rxq = trans_pcie->rxq; > - if (!rba->alloc_wq) > - rba->alloc_wq = alloc_workqueue("rb_allocator", > - WQ_HIGHPRI | WQ_UNBOUND, 1); > - INIT_WORK(&rba->rx_alloc, iwl_pcie_rx_allocator_work); > > spin_lock(&rba->lock); > atomic_set(&rba->req_pending, 0); > @@ -1017,10 +1013,6 @@ void iwl_pcie_rx_free(struct iwl_trans *trans) > } > > cancel_work_sync(&rba->rx_alloc); > - if (rba->alloc_wq) { > - destroy_workqueue(rba->alloc_wq); > - rba->alloc_wq = NULL; > - } > > iwl_pcie_free_rbs_pool(trans); > > diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c > index f95eec52508e..3927bbf04f72 100644 > --- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c > +++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c > @@ -1786,6 +1786,11 @@ void iwl_trans_pcie_free(struct iwl_trans *trans) > iwl_pcie_tx_free(trans); > iwl_pcie_rx_free(trans); > > + if (trans_pcie->rba.alloc_wq) { > + destroy_workqueue(trans_pcie->rba.alloc_wq); > + trans_pcie->rba.alloc_wq = NULL; > + } > + > if (trans_pcie->msix_enabled) { > for (i = 0; i < trans_pcie->alloc_vecs; i++) { > irq_set_affinity_hint( > @@ -3169,6 +3174,10 @@ struct iwl_trans *iwl_trans_pcie_alloc(struct pci_dev *pdev, > trans_pcie->inta_mask = CSR_INI_SET_MASK; > } > > + trans_pcie->rba.alloc_wq = alloc_workqueue("rb_allocator", > + WQ_HIGHPRI | WQ_UNBOUND, 1); > + INIT_WORK(&trans_pcie->rba.rx_alloc, iwl_pcie_rx_allocator_work); > + > #ifdef CONFIG_IWLWIFI_PCIE_RTPM > trans->runtime_pm_mode = IWL_PLAT_PM_MODE_D0I3; > #else > -- > 2.14.1 > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] iwlwifi: pcie: move rx workqueue initialization to iwl_trans_pcie_alloc() 2017-08-30 14:57 ` [PATCH v2] " David Weinehall @ 2017-08-31 6:11 ` Luca Coelho 0 siblings, 0 replies; 20+ messages in thread From: Luca Coelho @ 2017-08-31 6:11 UTC (permalink / raw) To: David Weinehall Cc: jikos, linux-wireless, linux-kernel, linuxwifi, rui.zhang, edubezval, linux-pm, david.weinehall, johannes.berg, kvalo, sara.sharon, emmanuel.grumbach On Wed, 2017-08-30 at 17:57 +0300, David Weinehall wrote: > On Tue, Aug 22, 2017 at 10:37:29AM +0300, Luca Coelho wrote: > > From: Luca Coelho <luciano.coelho@intel.com> > > > > Work queues cannot be allocated when a mutex is held because the mutex > > may be in use and that would make it sleep. Doing so generates the > > following splat with 4.13+: > > > > [ 19.513298] ====================================================== > > [ 19.513429] WARNING: possible circular locking dependency detected > > [ 19.513557] 4.13.0-rc5+ #6 Not tainted > > [ 19.513638] ------------------------------------------------------ > > [ 19.513767] cpuhp/0/12 is trying to acquire lock: > > [ 19.513867] (&tz->lock){+.+.+.}, at: [<ffffffff924afebb>] thermal_zone_get_temp+0x5b/0xb0 > > [ 19.514047] > > [ 19.514047] but task is already holding lock: > > [ 19.514166] (cpuhp_state){+.+.+.}, at: [<ffffffff91cc4baa>] cpuhp_thread_fun+0x3a/0x210 > > [ 19.514338] > > [ 19.514338] which lock already depends on the new lock. > > > > This lock dependency already existed with previous kernel versions, > > but it was not detected until commit 49dfe2a67797 ("cpuhotplug: Link > > lock stacks for hotplug callbacks") was introduced. > > > > Reported-by: David Weinehall <david.weinehall@intel.com> > > Reported-by: Jiri Kosina <jikos@kernel.org> > > Signed-off-by: Luca Coelho <luciano.coelho@intel.com> > > With this patch I no longer get the lockdep warning, > and the driver seems to work just as well as before. Great! Thanks for reporting and testing, David! > Thanks! > > Tested-by: David Weinehall <david.weinehall@linux.intel.com> Thanks for this too, but unfortunately it's too late to add it, since the patch is already in net tree. -- Cheers, Luca. ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2017-09-04 11:45 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-07-31 11:18 x86/thermal: AB-BA dependency between mvm->mutex and tz->lock Jiri Kosina 2017-08-03 9:10 ` Jiri Kosina 2017-08-03 9:43 ` [linuxwifi] " Coelho, Luciano 2017-08-03 10:02 ` Kalle Valo 2017-08-03 11:28 ` Coelho, Luciano 2017-08-03 11:34 ` Jiri Kosina 2017-08-03 13:09 ` Jiri Kosina 2017-08-17 13:38 ` Jiri Kosina 2017-08-17 14:02 ` Coelho, Luciano 2017-08-22 7:32 ` [PATCH] iwlwifi: pcie: move rx workqueue initialization to iwl_trans_pcie_alloc() Luca Coelho 2017-08-22 7:37 ` [PATCH v2] " Luca Coelho 2017-08-24 6:00 ` Luca Coelho 2017-08-24 19:56 ` Jiri Kosina 2017-08-24 20:00 ` Luca Coelho 2017-08-29 8:19 ` Kalle Valo 2017-09-04 11:43 ` Jiri Kosina 2017-09-04 11:44 ` Luca Coelho 2017-08-24 13:50 ` [v2] " Kalle Valo 2017-08-30 14:57 ` [PATCH v2] " David Weinehall 2017-08-31 6:11 ` Luca Coelho
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).