* hung task while plugging in cfg80211 @ 2013-07-12 11:49 Jeff Layton 2013-07-24 9:59 ` Stanislaw Gruszka 0 siblings, 1 reply; 7+ messages in thread From: Jeff Layton @ 2013-07-12 11:49 UTC (permalink / raw) To: linux-wireless, linville I updated the kernel on my rawhide KVM guest, and noticed that the ethernet interface wasn't coming up at all. While poking around, I saw this stack trace pop up: Jul 12 07:29:25 rawhide kernel: [ 241.156196] INFO: task modprobe:501 blocked for more than 120 seconds. Jul 12 07:29:25 rawhide kernel: [ 241.157222] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 12 07:29:25 rawhide kernel: [ 241.158379] modprobe D ffff880118c526e0 4584 501 500 0x00000080 Jul 12 07:29:25 rawhide kernel: [ 241.159458] ffff8800d10c3ba8 0000000000000046 00000000001d5300 ffff8800d10c3fd8 Jul 12 07:29:25 rawhide kernel: [ 241.160644] ffff8800d10c3fd8 00000000001d5300 ffff8800d2ae4dc0 ffff8800d2ae4dc0 Jul 12 07:29:25 rawhide kernel: [ 241.161875] ffffffff81d0bfa0 ffffffff81d0bfa8 ffffffff00000000 ffffffff81d0bff0 Jul 12 07:29:25 rawhide kernel: [ 241.163096] Call Trace: Jul 12 07:29:25 rawhide kernel: [ 241.163460] [<ffffffff817377e9>] schedule+0x29/0x70 Jul 12 07:29:25 rawhide kernel: [ 241.164235] [<ffffffff817392ad>] rwsem_down_write_failed+0xed/0x1a0 Jul 12 07:29:25 rawhide kernel: [ 241.165489] [<ffffffff810bb600>] ? update_cpu_load_active+0xb0/0xb0 Jul 12 07:29:25 rawhide kernel: [ 241.166267] [<ffffffff8137d503>] call_rwsem_down_write_failed+0x13/0x20 Jul 12 07:29:25 rawhide kernel: [ 241.166760] [<ffffffff81736aad>] ? down_write+0x9d/0xb2 Jul 12 07:29:25 rawhide kernel: [ 241.167189] [<ffffffff8162d975>] ? genl_lock_all+0x15/0x30 Jul 12 07:29:25 rawhide kernel: [ 241.167603] [<ffffffff8162d975>] genl_lock_all+0x15/0x30 Jul 12 07:29:25 rawhide kernel: [ 241.168036] [<ffffffff8162ea83>] genl_register_family+0x53/0x1f0 Jul 12 07:29:25 rawhide kernel: [ 241.168490] [<ffffffffa015e000>] ? 0xffffffffa015dfff Jul 12 07:29:25 rawhide kernel: [ 241.168877] [<ffffffff8162f520>] genl_register_family_with_ops+0x20/0x80 Jul 12 07:29:25 rawhide kernel: [ 241.169406] [<ffffffffa015e000>] ? 0xffffffffa015dfff Jul 12 07:29:25 rawhide kernel: [ 241.169802] [<ffffffffa0101ec4>] nl80211_init+0x24/0xf0 [cfg80211] Jul 12 07:29:25 rawhide kernel: [ 241.170297] [<ffffffffa015e000>] ? 0xffffffffa015dfff Jul 12 07:29:25 rawhide kernel: [ 241.170686] [<ffffffffa015e043>] cfg80211_init+0x43/0xdb [cfg80211] Jul 12 07:29:25 rawhide kernel: [ 241.171230] [<ffffffff810020fa>] do_one_initcall+0xfa/0x1b0 Jul 12 07:29:25 rawhide kernel: [ 241.171651] [<ffffffff8105cb93>] ? set_memory_nx+0x43/0x50 Jul 12 07:29:25 rawhide kernel: [ 241.172099] [<ffffffff810fa2bf>] load_module+0x1c6f/0x27f0 Jul 12 07:29:25 rawhide kernel: [ 241.172515] [<ffffffff810f59a0>] ? store_uevent+0x40/0x40 Jul 12 07:29:25 rawhide kernel: [ 241.172933] [<ffffffff810fafd6>] SyS_finit_module+0x86/0xb0 Jul 12 07:29:25 rawhide kernel: [ 241.173379] [<ffffffff81744019>] system_call_fastpath+0x16/0x1b ...on a hunch, I blacklisted the cfg80211 module, and that seems to have worked around the problem for now. I suspect that NM was trying to plug in this module and it hung, and it couldn't proceed any further to configure the virtual ethernet interface. In the fedora rawhide kernels, last known good one is kernel-3.11.0-0.rc0.git3.1.fc20, and bad one is kernel-3.11.0-0.rc0.git6.1.fc20. I haven't bisected it further. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: hung task while plugging in cfg80211 2013-07-12 11:49 hung task while plugging in cfg80211 Jeff Layton @ 2013-07-24 9:59 ` Stanislaw Gruszka 2013-07-25 14:51 ` Jeff Layton 0 siblings, 1 reply; 7+ messages in thread From: Stanislaw Gruszka @ 2013-07-24 9:59 UTC (permalink / raw) To: Jeff Layton Cc: linux-wireless, linville, netdev, Thomas Graf, Stephen Hemminger, rjones, Marcel Holtmann On Fri, Jul 12, 2013 at 07:49:11AM -0400, Jeff Layton wrote: > I updated the kernel on my rawhide KVM guest, and noticed that the > ethernet interface wasn't coming up at all. While poking around, I saw > this stack trace pop up: > > Jul 12 07:29:25 rawhide kernel: [ 241.156196] INFO: task modprobe:501 blocked for more than 120 seconds. > Jul 12 07:29:25 rawhide kernel: [ 241.157222] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Jul 12 07:29:25 rawhide kernel: [ 241.158379] modprobe D ffff880118c526e0 4584 501 500 0x00000080 > Jul 12 07:29:25 rawhide kernel: [ 241.159458] ffff8800d10c3ba8 0000000000000046 00000000001d5300 ffff8800d10c3fd8 > Jul 12 07:29:25 rawhide kernel: [ 241.160644] ffff8800d10c3fd8 00000000001d5300 ffff8800d2ae4dc0 ffff8800d2ae4dc0 > Jul 12 07:29:25 rawhide kernel: [ 241.161875] ffffffff81d0bfa0 ffffffff81d0bfa8 ffffffff00000000 ffffffff81d0bff0 > Jul 12 07:29:25 rawhide kernel: [ 241.163096] Call Trace: > Jul 12 07:29:25 rawhide kernel: [ 241.163460] [<ffffffff817377e9>] schedule+0x29/0x70 > Jul 12 07:29:25 rawhide kernel: [ 241.164235] [<ffffffff817392ad>] rwsem_down_write_failed+0xed/0x1a0 > Jul 12 07:29:25 rawhide kernel: [ 241.165489] [<ffffffff810bb600>] ? update_cpu_load_active+0xb0/0xb0 > Jul 12 07:29:25 rawhide kernel: [ 241.166267] [<ffffffff8137d503>] call_rwsem_down_write_failed+0x13/0x20 > Jul 12 07:29:25 rawhide kernel: [ 241.166760] [<ffffffff81736aad>] ? down_write+0x9d/0xb2 > Jul 12 07:29:25 rawhide kernel: [ 241.167189] [<ffffffff8162d975>] ? genl_lock_all+0x15/0x30 > Jul 12 07:29:25 rawhide kernel: [ 241.167603] [<ffffffff8162d975>] genl_lock_all+0x15/0x30 I reproduced that on qemu-kvm with rawhide LiveCD. Below more information gathered using sysrq: (qemu) [ 2458.110724] SysRq : Show Locks Held [ 2458.111347] [ 2458.111347] Showing all locks held in the system: [ 2458.111347] 1 lock held by NetworkManager/582: [ 2458.111347] #0: (cb_lock){++++++}, at: [<ffffffff8162bc79>] genl_rcv+0x19/0x40 [ 2458.111347] 1 lock held by modprobe/603: [ 2458.111347] #0: (cb_lock){++++++}, at: [<ffffffff8162baa5>] genl_lock_all+0x15/0x30 [ 2458.111347] [ 2458.111347] ============================================= [ 2458.111347] (qemu) sendkey alt-sysrq-w (qemu) [ 2461.579457] SysRq : Show Blocked State [ 2461.580103] task PC stack pid father [ 2461.580103] NetworkManager D ffff880034b84500 4040 582 1 0x00000080 [ 2461.580103] ffff8800197ff720 0000000000000046 00000000001d5340 ffff8800197fffd8 [ 2461.580103] ffff8800197fffd8 00000000001d5340 ffff880019631700 7fffffffffffffff [ 2461.580103] ffff8800197ff880 ffff8800197ff878 ffff880019631700 ffff880019631700 [ 2461.580103] Call Trace: [ 2461.580103] [<ffffffff817355f9>] schedule+0x29/0x70 [ 2461.580103] [<ffffffff81731ad1>] schedule_timeout+0x1c1/0x360 [ 2461.580103] [<ffffffff810e69eb>] ? mark_held_locks+0xbb/0x140 [ 2461.580103] [<ffffffff817377ac>] ? _raw_spin_unlock_irq+0x2c/0x50 [ 2461.580103] [<ffffffff810e6b6d>] ? trace_hardirqs_on_caller+0xfd/0x1c0 [ 2461.580103] [<ffffffff81736398>] wait_for_completion_killable+0xe8/0x170 [ 2461.580103] [<ffffffff810b7fa0>] ? wake_up_state+0x20/0x20 [ 2461.580103] [<ffffffff81095825>] call_usermodehelper_exec+0x1a5/0x210 [ 2461.580103] [<ffffffff817362ed>] ? wait_for_completion_killable+0x3d/0x170 [ 2461.580103] [<ffffffff81095cc3>] __request_module+0x1b3/0x370 [ 2461.580103] [<ffffffff810e6b6d>] ? trace_hardirqs_on_caller+0xfd/0x1c0 [ 2461.580103] [<ffffffff8162c5c9>] ctrl_getfamily+0x159/0x190 [ 2461.580103] [<ffffffff8162d8a4>] genl_family_rcv_msg+0x1f4/0x2e0 [ 2461.580103] [<ffffffff8162d990>] ? genl_family_rcv_msg+0x2e0/0x2e0 [ 2461.580103] [<ffffffff8162da1e>] genl_rcv_msg+0x8e/0xd0 [ 2461.580103] [<ffffffff8162b729>] netlink_rcv_skb+0xa9/0xc0 [ 2461.580103] [<ffffffff8162bc88>] genl_rcv+0x28/0x40 [ 2461.580103] [<ffffffff8162ad6d>] netlink_unicast+0xdd/0x190 [ 2461.580103] [<ffffffff8162b149>] netlink_sendmsg+0x329/0x750 [ 2461.580103] [<ffffffff815db849>] sock_sendmsg+0x99/0xd0 [ 2461.580103] [<ffffffff810bb58f>] ? local_clock+0x5f/0x70 [ 2461.580103] [<ffffffff810e96e8>] ? lock_release_non_nested+0x308/0x350 [ 2461.580103] [<ffffffff815dbc6e>] ___sys_sendmsg+0x39e/0x3b0 [ 2461.580103] [<ffffffff810565af>] ? kvm_clock_read+0x2f/0x50 [ 2461.580103] [<ffffffff810218b9>] ? sched_clock+0x9/0x10 [ 2461.580103] [<ffffffff810bb2bd>] ? sched_clock_local+0x1d/0x80 [ 2461.580103] [<ffffffff810bb448>] ? sched_clock_cpu+0xa8/0x100 [ 2461.580103] [<ffffffff810e33ad>] ? trace_hardirqs_off+0xd/0x10 [ 2461.580103] [<ffffffff810bb58f>] ? local_clock+0x5f/0x70 [ 2461.580103] [<ffffffff810e3f7f>] ? lock_release_holdtime.part.28+0xf/0x1a0 [ 2461.580103] [<ffffffff8120fec9>] ? fget_light+0xf9/0x510 [ 2461.580103] [<ffffffff8120fe0c>] ? fget_light+0x3c/0x510 [ 2461.580103] [<ffffffff815dd1d2>] __sys_sendmsg+0x42/0x80 [ 2461.580103] [<ffffffff815dd222>] SyS_sendmsg+0x12/0x20 [ 2461.580103] [<ffffffff81741ad9>] system_call_fastpath+0x16/0x1b [ 2461.580103] modprobe D ffff88000f2c8000 4632 603 602 0x00000080 [ 2461.580103] ffff88000f04fba8 0000000000000046 00000000001d5340 ffff88000f04ffd8 [ 2461.580103] ffff88000f04ffd8 00000000001d5340 ffff8800377d4500 ffff8800377d4500 [ 2461.580103] ffffffff81d0b260 ffffffff81d0b268 ffffffff00000000 ffffffff81d0b2b0 [ 2461.580103] Call Trace: [ 2461.580103] [<ffffffff817355f9>] schedule+0x29/0x70 [ 2461.580103] [<ffffffff81736d4d>] rwsem_down_write_failed+0xed/0x1a0 [ 2461.580103] [<ffffffff810bb200>] ? update_cpu_load_active+0x10/0xb0 [ 2461.580103] [<ffffffff8137b473>] call_rwsem_down_write_failed+0x13/0x20 [ 2461.580103] [<ffffffff8173492d>] ? down_write+0x9d/0xb2 [ 2461.580103] [<ffffffff8162baa5>] ? genl_lock_all+0x15/0x30 [ 2461.580103] [<ffffffff8162baa5>] genl_lock_all+0x15/0x30 [ 2461.580103] [<ffffffff8162cbb3>] genl_register_family+0x53/0x1f0 [ 2461.580103] [<ffffffffa01dc000>] ? 0xffffffffa01dbfff [ 2461.580103] [<ffffffff8162d650>] genl_register_family_with_ops+0x20/0x80 [ 2461.580103] [<ffffffffa01dc000>] ? 0xffffffffa01dbfff [ 2461.580103] [<ffffffffa017fe84>] nl80211_init+0x24/0xf0 [cfg80211] [ 2461.580103] [<ffffffffa01dc000>] ? 0xffffffffa01dbfff [ 2461.580103] [<ffffffffa01dc043>] cfg80211_init+0x43/0xdb [cfg80211] [ 2461.580103] [<ffffffff810020fa>] do_one_initcall+0xfa/0x1b0 [ 2461.580103] [<ffffffff8105cb93>] ? set_memory_nx+0x43/0x50 [ 2461.580103] [<ffffffff810f75af>] load_module+0x1c6f/0x27f0 [ 2461.580103] [<ffffffff810f2c90>] ? store_uevent+0x40/0x40 [ 2461.580103] [<ffffffff810f82c6>] SyS_finit_module+0x86/0xb0 [ 2461.580103] [<ffffffff81741ad9>] system_call_fastpath+0x16/0x1b [ 2461.580103] Sched Debug Version: v0.10, 3.11.0-0.rc1.git4.1.fc20.x86_64 #1 So call user space to load cfg80211 module (alias net-pf-16-proto-16-family-nl80211) from ctrl_getfamily() with read taken cb_lock semaphore. That deadlock in modprobe process in genl_register_family() when try to down_write cb_lock. This is consequence of commit: commit fb4e156886ce6e8309e912d8b370d192330d19d3 Author: Marcel Holtmann <marcel@holtmann.org> Date: Sun Apr 28 16:22:06 2013 -0700 nl80211: Add generic netlink module alias for cfg80211/nl80211 I prepared a possible fix for that (patch below), but I'm not sure if it do right thing, it can cause other oddity, please test (with CONFIG_LOCKDEP compiled). I think I would prefer rather remove autoloading genetlink modules from kernel or use just one lock in instead of two cb_lock & genl_mutex . Stanislaw diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c index 2fd6dbe..1076fe1 100644 --- a/net/netlink/genetlink.c +++ b/net/netlink/genetlink.c @@ -877,8 +877,10 @@ static int ctrl_getfamily(struct sk_buff *skb, struct genl_info *info) #ifdef CONFIG_MODULES if (res == NULL) { genl_unlock(); + up_read(&cb_lock); request_module("net-pf-%d-proto-%d-family-%s", PF_NETLINK, NETLINK_GENERIC, name); + down_read(&cb_lock); genl_lock(); res = genl_family_find_byname(name); } ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: hung task while plugging in cfg80211 @ 2013-07-25 14:51 ` Jeff Layton 0 siblings, 0 replies; 7+ messages in thread From: Jeff Layton @ 2013-07-25 14:51 UTC (permalink / raw) To: Stanislaw Gruszka Cc: linux-wireless, linville, netdev, Thomas Graf, Stephen Hemminger, rjones, Marcel Holtmann On Wed, 24 Jul 2013 11:59:12 +0200 Stanislaw Gruszka <sgruszka@redhat.com> wrote: > On Fri, Jul 12, 2013 at 07:49:11AM -0400, Jeff Layton wrote: > > I updated the kernel on my rawhide KVM guest, and noticed that the > > ethernet interface wasn't coming up at all. While poking around, I saw > > this stack trace pop up: > > > > Jul 12 07:29:25 rawhide kernel: [ 241.156196] INFO: task modprobe:501 blocked for more than 120 seconds. > > Jul 12 07:29:25 rawhide kernel: [ 241.157222] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > Jul 12 07:29:25 rawhide kernel: [ 241.158379] modprobe D ffff880118c526e0 4584 501 500 0x00000080 > > Jul 12 07:29:25 rawhide kernel: [ 241.159458] ffff8800d10c3ba8 0000000000000046 00000000001d5300 ffff8800d10c3fd8 > > Jul 12 07:29:25 rawhide kernel: [ 241.160644] ffff8800d10c3fd8 00000000001d5300 ffff8800d2ae4dc0 ffff8800d2ae4dc0 > > Jul 12 07:29:25 rawhide kernel: [ 241.161875] ffffffff81d0bfa0 ffffffff81d0bfa8 ffffffff00000000 ffffffff81d0bff0 > > Jul 12 07:29:25 rawhide kernel: [ 241.163096] Call Trace: > > Jul 12 07:29:25 rawhide kernel: [ 241.163460] [<ffffffff817377e9>] schedule+0x29/0x70 > > Jul 12 07:29:25 rawhide kernel: [ 241.164235] [<ffffffff817392ad>] rwsem_down_write_failed+0xed/0x1a0 > > Jul 12 07:29:25 rawhide kernel: [ 241.165489] [<ffffffff810bb600>] ? update_cpu_load_active+0xb0/0xb0 > > Jul 12 07:29:25 rawhide kernel: [ 241.166267] [<ffffffff8137d503>] call_rwsem_down_write_failed+0x13/0x20 > > Jul 12 07:29:25 rawhide kernel: [ 241.166760] [<ffffffff81736aad>] ? down_write+0x9d/0xb2 > > Jul 12 07:29:25 rawhide kernel: [ 241.167189] [<ffffffff8162d975>] ? genl_lock_all+0x15/0x30 > > Jul 12 07:29:25 rawhide kernel: [ 241.167603] [<ffffffff8162d975>] genl_lock_all+0x15/0x30 > > I reproduced that on qemu-kvm with rawhide LiveCD. Below more > information gathered using sysrq: > > (qemu) [ 2458.110724] SysRq : Show Locks Held > [ 2458.111347] > [ 2458.111347] Showing all locks held in the system: > [ 2458.111347] 1 lock held by NetworkManager/582: > [ 2458.111347] #0: (cb_lock){++++++}, at: [<ffffffff8162bc79>] genl_rcv+0x19/0x40 > [ 2458.111347] 1 lock held by modprobe/603: > [ 2458.111347] #0: (cb_lock){++++++}, at: [<ffffffff8162baa5>] genl_lock_all+0x15/0x30 > [ 2458.111347] > [ 2458.111347] ============================================= > [ 2458.111347] > > (qemu) sendkey alt-sysrq-w > (qemu) [ 2461.579457] SysRq : Show Blocked State > [ 2461.580103] task PC stack pid father > [ 2461.580103] NetworkManager D ffff880034b84500 4040 582 1 0x00000080 > [ 2461.580103] ffff8800197ff720 0000000000000046 00000000001d5340 ffff8800197fffd8 > [ 2461.580103] ffff8800197fffd8 00000000001d5340 ffff880019631700 7fffffffffffffff > [ 2461.580103] ffff8800197ff880 ffff8800197ff878 ffff880019631700 ffff880019631700 > [ 2461.580103] Call Trace: > [ 2461.580103] [<ffffffff817355f9>] schedule+0x29/0x70 > [ 2461.580103] [<ffffffff81731ad1>] schedule_timeout+0x1c1/0x360 > [ 2461.580103] [<ffffffff810e69eb>] ? mark_held_locks+0xbb/0x140 > [ 2461.580103] [<ffffffff817377ac>] ? _raw_spin_unlock_irq+0x2c/0x50 > [ 2461.580103] [<ffffffff810e6b6d>] ? trace_hardirqs_on_caller+0xfd/0x1c0 > [ 2461.580103] [<ffffffff81736398>] wait_for_completion_killable+0xe8/0x170 > [ 2461.580103] [<ffffffff810b7fa0>] ? wake_up_state+0x20/0x20 > [ 2461.580103] [<ffffffff81095825>] call_usermodehelper_exec+0x1a5/0x210 > [ 2461.580103] [<ffffffff817362ed>] ? wait_for_completion_killable+0x3d/0x170 > [ 2461.580103] [<ffffffff81095cc3>] __request_module+0x1b3/0x370 > [ 2461.580103] [<ffffffff810e6b6d>] ? trace_hardirqs_on_caller+0xfd/0x1c0 > [ 2461.580103] [<ffffffff8162c5c9>] ctrl_getfamily+0x159/0x190 > [ 2461.580103] [<ffffffff8162d8a4>] genl_family_rcv_msg+0x1f4/0x2e0 > [ 2461.580103] [<ffffffff8162d990>] ? genl_family_rcv_msg+0x2e0/0x2e0 > [ 2461.580103] [<ffffffff8162da1e>] genl_rcv_msg+0x8e/0xd0 > [ 2461.580103] [<ffffffff8162b729>] netlink_rcv_skb+0xa9/0xc0 > [ 2461.580103] [<ffffffff8162bc88>] genl_rcv+0x28/0x40 > [ 2461.580103] [<ffffffff8162ad6d>] netlink_unicast+0xdd/0x190 > [ 2461.580103] [<ffffffff8162b149>] netlink_sendmsg+0x329/0x750 > [ 2461.580103] [<ffffffff815db849>] sock_sendmsg+0x99/0xd0 > [ 2461.580103] [<ffffffff810bb58f>] ? local_clock+0x5f/0x70 > [ 2461.580103] [<ffffffff810e96e8>] ? lock_release_non_nested+0x308/0x350 > [ 2461.580103] [<ffffffff815dbc6e>] ___sys_sendmsg+0x39e/0x3b0 > [ 2461.580103] [<ffffffff810565af>] ? kvm_clock_read+0x2f/0x50 > [ 2461.580103] [<ffffffff810218b9>] ? sched_clock+0x9/0x10 > [ 2461.580103] [<ffffffff810bb2bd>] ? sched_clock_local+0x1d/0x80 > [ 2461.580103] [<ffffffff810bb448>] ? sched_clock_cpu+0xa8/0x100 > [ 2461.580103] [<ffffffff810e33ad>] ? trace_hardirqs_off+0xd/0x10 > [ 2461.580103] [<ffffffff810bb58f>] ? local_clock+0x5f/0x70 > [ 2461.580103] [<ffffffff810e3f7f>] ? lock_release_holdtime.part.28+0xf/0x1a0 > [ 2461.580103] [<ffffffff8120fec9>] ? fget_light+0xf9/0x510 > [ 2461.580103] [<ffffffff8120fe0c>] ? fget_light+0x3c/0x510 > [ 2461.580103] [<ffffffff815dd1d2>] __sys_sendmsg+0x42/0x80 > [ 2461.580103] [<ffffffff815dd222>] SyS_sendmsg+0x12/0x20 > [ 2461.580103] [<ffffffff81741ad9>] system_call_fastpath+0x16/0x1b > [ 2461.580103] modprobe D ffff88000f2c8000 4632 603 602 0x00000080 > [ 2461.580103] ffff88000f04fba8 0000000000000046 00000000001d5340 ffff88000f04ffd8 > [ 2461.580103] ffff88000f04ffd8 00000000001d5340 ffff8800377d4500 ffff8800377d4500 > [ 2461.580103] ffffffff81d0b260 ffffffff81d0b268 ffffffff00000000 ffffffff81d0b2b0 > [ 2461.580103] Call Trace: > [ 2461.580103] [<ffffffff817355f9>] schedule+0x29/0x70 > [ 2461.580103] [<ffffffff81736d4d>] rwsem_down_write_failed+0xed/0x1a0 > [ 2461.580103] [<ffffffff810bb200>] ? update_cpu_load_active+0x10/0xb0 > [ 2461.580103] [<ffffffff8137b473>] call_rwsem_down_write_failed+0x13/0x20 > [ 2461.580103] [<ffffffff8173492d>] ? down_write+0x9d/0xb2 > [ 2461.580103] [<ffffffff8162baa5>] ? genl_lock_all+0x15/0x30 > [ 2461.580103] [<ffffffff8162baa5>] genl_lock_all+0x15/0x30 > [ 2461.580103] [<ffffffff8162cbb3>] genl_register_family+0x53/0x1f0 > [ 2461.580103] [<ffffffffa01dc000>] ? 0xffffffffa01dbfff > [ 2461.580103] [<ffffffff8162d650>] genl_register_family_with_ops+0x20/0x80 > [ 2461.580103] [<ffffffffa01dc000>] ? 0xffffffffa01dbfff > [ 2461.580103] [<ffffffffa017fe84>] nl80211_init+0x24/0xf0 [cfg80211] > [ 2461.580103] [<ffffffffa01dc000>] ? 0xffffffffa01dbfff > [ 2461.580103] [<ffffffffa01dc043>] cfg80211_init+0x43/0xdb [cfg80211] > [ 2461.580103] [<ffffffff810020fa>] do_one_initcall+0xfa/0x1b0 > [ 2461.580103] [<ffffffff8105cb93>] ? set_memory_nx+0x43/0x50 > [ 2461.580103] [<ffffffff810f75af>] load_module+0x1c6f/0x27f0 > [ 2461.580103] [<ffffffff810f2c90>] ? store_uevent+0x40/0x40 > [ 2461.580103] [<ffffffff810f82c6>] SyS_finit_module+0x86/0xb0 > [ 2461.580103] [<ffffffff81741ad9>] system_call_fastpath+0x16/0x1b > [ 2461.580103] Sched Debug Version: v0.10, 3.11.0-0.rc1.git4.1.fc20.x86_64 #1 > > So call user space to load cfg80211 module (alias > net-pf-16-proto-16-family-nl80211) from ctrl_getfamily() with read taken > cb_lock semaphore. That deadlock in modprobe process in > genl_register_family() when try to down_write cb_lock. > > This is consequence of commit: > > commit fb4e156886ce6e8309e912d8b370d192330d19d3 > Author: Marcel Holtmann <marcel@holtmann.org> > Date: Sun Apr 28 16:22:06 2013 -0700 > > nl80211: Add generic netlink module alias for cfg80211/nl80211 > > I prepared a possible fix for that (patch below), but I'm not sure if it > do right thing, it can cause other oddity, please test (with > CONFIG_LOCKDEP compiled). I think I would prefer rather remove > autoloading genetlink modules from kernel or use just one lock > in instead of two cb_lock & genl_mutex . > > Stanislaw > > diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c > index 2fd6dbe..1076fe1 100644 > --- a/net/netlink/genetlink.c > +++ b/net/netlink/genetlink.c > @@ -877,8 +877,10 @@ static int ctrl_getfamily(struct sk_buff *skb, struct genl_info *info) > #ifdef CONFIG_MODULES > if (res == NULL) { > genl_unlock(); > + up_read(&cb_lock); > request_module("net-pf-%d-proto-%d-family-%s", > PF_NETLINK, NETLINK_GENERIC, name); > + down_read(&cb_lock); > genl_lock(); > res = genl_family_find_byname(name); > } The patch above fixes the problem for me. I'll leave it to others to decide whether it's the best approach. Thanks! -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: hung task while plugging in cfg80211 @ 2013-07-25 14:51 ` Jeff Layton 0 siblings, 0 replies; 7+ messages in thread From: Jeff Layton @ 2013-07-25 14:51 UTC (permalink / raw) To: Stanislaw Gruszka Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA, linville-2XuSBdqkA4R54TAoqtyWWQ, netdev-u79uwXL29TY76Z2rM5mHXA, Thomas Graf, Stephen Hemminger, rjones-H+wXaHxf7aLQT0dZR+AlfA, Marcel Holtmann On Wed, 24 Jul 2013 11:59:12 +0200 Stanislaw Gruszka <sgruszka-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > On Fri, Jul 12, 2013 at 07:49:11AM -0400, Jeff Layton wrote: > > I updated the kernel on my rawhide KVM guest, and noticed that the > > ethernet interface wasn't coming up at all. While poking around, I saw > > this stack trace pop up: > > > > Jul 12 07:29:25 rawhide kernel: [ 241.156196] INFO: task modprobe:501 blocked for more than 120 seconds. > > Jul 12 07:29:25 rawhide kernel: [ 241.157222] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > Jul 12 07:29:25 rawhide kernel: [ 241.158379] modprobe D ffff880118c526e0 4584 501 500 0x00000080 > > Jul 12 07:29:25 rawhide kernel: [ 241.159458] ffff8800d10c3ba8 0000000000000046 00000000001d5300 ffff8800d10c3fd8 > > Jul 12 07:29:25 rawhide kernel: [ 241.160644] ffff8800d10c3fd8 00000000001d5300 ffff8800d2ae4dc0 ffff8800d2ae4dc0 > > Jul 12 07:29:25 rawhide kernel: [ 241.161875] ffffffff81d0bfa0 ffffffff81d0bfa8 ffffffff00000000 ffffffff81d0bff0 > > Jul 12 07:29:25 rawhide kernel: [ 241.163096] Call Trace: > > Jul 12 07:29:25 rawhide kernel: [ 241.163460] [<ffffffff817377e9>] schedule+0x29/0x70 > > Jul 12 07:29:25 rawhide kernel: [ 241.164235] [<ffffffff817392ad>] rwsem_down_write_failed+0xed/0x1a0 > > Jul 12 07:29:25 rawhide kernel: [ 241.165489] [<ffffffff810bb600>] ? update_cpu_load_active+0xb0/0xb0 > > Jul 12 07:29:25 rawhide kernel: [ 241.166267] [<ffffffff8137d503>] call_rwsem_down_write_failed+0x13/0x20 > > Jul 12 07:29:25 rawhide kernel: [ 241.166760] [<ffffffff81736aad>] ? down_write+0x9d/0xb2 > > Jul 12 07:29:25 rawhide kernel: [ 241.167189] [<ffffffff8162d975>] ? genl_lock_all+0x15/0x30 > > Jul 12 07:29:25 rawhide kernel: [ 241.167603] [<ffffffff8162d975>] genl_lock_all+0x15/0x30 > > I reproduced that on qemu-kvm with rawhide LiveCD. Below more > information gathered using sysrq: > > (qemu) [ 2458.110724] SysRq : Show Locks Held > [ 2458.111347] > [ 2458.111347] Showing all locks held in the system: > [ 2458.111347] 1 lock held by NetworkManager/582: > [ 2458.111347] #0: (cb_lock){++++++}, at: [<ffffffff8162bc79>] genl_rcv+0x19/0x40 > [ 2458.111347] 1 lock held by modprobe/603: > [ 2458.111347] #0: (cb_lock){++++++}, at: [<ffffffff8162baa5>] genl_lock_all+0x15/0x30 > [ 2458.111347] > [ 2458.111347] ============================================= > [ 2458.111347] > > (qemu) sendkey alt-sysrq-w > (qemu) [ 2461.579457] SysRq : Show Blocked State > [ 2461.580103] task PC stack pid father > [ 2461.580103] NetworkManager D ffff880034b84500 4040 582 1 0x00000080 > [ 2461.580103] ffff8800197ff720 0000000000000046 00000000001d5340 ffff8800197fffd8 > [ 2461.580103] ffff8800197fffd8 00000000001d5340 ffff880019631700 7fffffffffffffff > [ 2461.580103] ffff8800197ff880 ffff8800197ff878 ffff880019631700 ffff880019631700 > [ 2461.580103] Call Trace: > [ 2461.580103] [<ffffffff817355f9>] schedule+0x29/0x70 > [ 2461.580103] [<ffffffff81731ad1>] schedule_timeout+0x1c1/0x360 > [ 2461.580103] [<ffffffff810e69eb>] ? mark_held_locks+0xbb/0x140 > [ 2461.580103] [<ffffffff817377ac>] ? _raw_spin_unlock_irq+0x2c/0x50 > [ 2461.580103] [<ffffffff810e6b6d>] ? trace_hardirqs_on_caller+0xfd/0x1c0 > [ 2461.580103] [<ffffffff81736398>] wait_for_completion_killable+0xe8/0x170 > [ 2461.580103] [<ffffffff810b7fa0>] ? wake_up_state+0x20/0x20 > [ 2461.580103] [<ffffffff81095825>] call_usermodehelper_exec+0x1a5/0x210 > [ 2461.580103] [<ffffffff817362ed>] ? wait_for_completion_killable+0x3d/0x170 > [ 2461.580103] [<ffffffff81095cc3>] __request_module+0x1b3/0x370 > [ 2461.580103] [<ffffffff810e6b6d>] ? trace_hardirqs_on_caller+0xfd/0x1c0 > [ 2461.580103] [<ffffffff8162c5c9>] ctrl_getfamily+0x159/0x190 > [ 2461.580103] [<ffffffff8162d8a4>] genl_family_rcv_msg+0x1f4/0x2e0 > [ 2461.580103] [<ffffffff8162d990>] ? genl_family_rcv_msg+0x2e0/0x2e0 > [ 2461.580103] [<ffffffff8162da1e>] genl_rcv_msg+0x8e/0xd0 > [ 2461.580103] [<ffffffff8162b729>] netlink_rcv_skb+0xa9/0xc0 > [ 2461.580103] [<ffffffff8162bc88>] genl_rcv+0x28/0x40 > [ 2461.580103] [<ffffffff8162ad6d>] netlink_unicast+0xdd/0x190 > [ 2461.580103] [<ffffffff8162b149>] netlink_sendmsg+0x329/0x750 > [ 2461.580103] [<ffffffff815db849>] sock_sendmsg+0x99/0xd0 > [ 2461.580103] [<ffffffff810bb58f>] ? local_clock+0x5f/0x70 > [ 2461.580103] [<ffffffff810e96e8>] ? lock_release_non_nested+0x308/0x350 > [ 2461.580103] [<ffffffff815dbc6e>] ___sys_sendmsg+0x39e/0x3b0 > [ 2461.580103] [<ffffffff810565af>] ? kvm_clock_read+0x2f/0x50 > [ 2461.580103] [<ffffffff810218b9>] ? sched_clock+0x9/0x10 > [ 2461.580103] [<ffffffff810bb2bd>] ? sched_clock_local+0x1d/0x80 > [ 2461.580103] [<ffffffff810bb448>] ? sched_clock_cpu+0xa8/0x100 > [ 2461.580103] [<ffffffff810e33ad>] ? trace_hardirqs_off+0xd/0x10 > [ 2461.580103] [<ffffffff810bb58f>] ? local_clock+0x5f/0x70 > [ 2461.580103] [<ffffffff810e3f7f>] ? lock_release_holdtime.part.28+0xf/0x1a0 > [ 2461.580103] [<ffffffff8120fec9>] ? fget_light+0xf9/0x510 > [ 2461.580103] [<ffffffff8120fe0c>] ? fget_light+0x3c/0x510 > [ 2461.580103] [<ffffffff815dd1d2>] __sys_sendmsg+0x42/0x80 > [ 2461.580103] [<ffffffff815dd222>] SyS_sendmsg+0x12/0x20 > [ 2461.580103] [<ffffffff81741ad9>] system_call_fastpath+0x16/0x1b > [ 2461.580103] modprobe D ffff88000f2c8000 4632 603 602 0x00000080 > [ 2461.580103] ffff88000f04fba8 0000000000000046 00000000001d5340 ffff88000f04ffd8 > [ 2461.580103] ffff88000f04ffd8 00000000001d5340 ffff8800377d4500 ffff8800377d4500 > [ 2461.580103] ffffffff81d0b260 ffffffff81d0b268 ffffffff00000000 ffffffff81d0b2b0 > [ 2461.580103] Call Trace: > [ 2461.580103] [<ffffffff817355f9>] schedule+0x29/0x70 > [ 2461.580103] [<ffffffff81736d4d>] rwsem_down_write_failed+0xed/0x1a0 > [ 2461.580103] [<ffffffff810bb200>] ? update_cpu_load_active+0x10/0xb0 > [ 2461.580103] [<ffffffff8137b473>] call_rwsem_down_write_failed+0x13/0x20 > [ 2461.580103] [<ffffffff8173492d>] ? down_write+0x9d/0xb2 > [ 2461.580103] [<ffffffff8162baa5>] ? genl_lock_all+0x15/0x30 > [ 2461.580103] [<ffffffff8162baa5>] genl_lock_all+0x15/0x30 > [ 2461.580103] [<ffffffff8162cbb3>] genl_register_family+0x53/0x1f0 > [ 2461.580103] [<ffffffffa01dc000>] ? 0xffffffffa01dbfff > [ 2461.580103] [<ffffffff8162d650>] genl_register_family_with_ops+0x20/0x80 > [ 2461.580103] [<ffffffffa01dc000>] ? 0xffffffffa01dbfff > [ 2461.580103] [<ffffffffa017fe84>] nl80211_init+0x24/0xf0 [cfg80211] > [ 2461.580103] [<ffffffffa01dc000>] ? 0xffffffffa01dbfff > [ 2461.580103] [<ffffffffa01dc043>] cfg80211_init+0x43/0xdb [cfg80211] > [ 2461.580103] [<ffffffff810020fa>] do_one_initcall+0xfa/0x1b0 > [ 2461.580103] [<ffffffff8105cb93>] ? set_memory_nx+0x43/0x50 > [ 2461.580103] [<ffffffff810f75af>] load_module+0x1c6f/0x27f0 > [ 2461.580103] [<ffffffff810f2c90>] ? store_uevent+0x40/0x40 > [ 2461.580103] [<ffffffff810f82c6>] SyS_finit_module+0x86/0xb0 > [ 2461.580103] [<ffffffff81741ad9>] system_call_fastpath+0x16/0x1b > [ 2461.580103] Sched Debug Version: v0.10, 3.11.0-0.rc1.git4.1.fc20.x86_64 #1 > > So call user space to load cfg80211 module (alias > net-pf-16-proto-16-family-nl80211) from ctrl_getfamily() with read taken > cb_lock semaphore. That deadlock in modprobe process in > genl_register_family() when try to down_write cb_lock. > > This is consequence of commit: > > commit fb4e156886ce6e8309e912d8b370d192330d19d3 > Author: Marcel Holtmann <marcel-kz+m5ild9QBg9hUCZPvPmw@public.gmane.org> > Date: Sun Apr 28 16:22:06 2013 -0700 > > nl80211: Add generic netlink module alias for cfg80211/nl80211 > > I prepared a possible fix for that (patch below), but I'm not sure if it > do right thing, it can cause other oddity, please test (with > CONFIG_LOCKDEP compiled). I think I would prefer rather remove > autoloading genetlink modules from kernel or use just one lock > in instead of two cb_lock & genl_mutex . > > Stanislaw > > diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c > index 2fd6dbe..1076fe1 100644 > --- a/net/netlink/genetlink.c > +++ b/net/netlink/genetlink.c > @@ -877,8 +877,10 @@ static int ctrl_getfamily(struct sk_buff *skb, struct genl_info *info) > #ifdef CONFIG_MODULES > if (res == NULL) { > genl_unlock(); > + up_read(&cb_lock); > request_module("net-pf-%d-proto-%d-family-%s", > PF_NETLINK, NETLINK_GENERIC, name); > + down_read(&cb_lock); > genl_lock(); > res = genl_family_find_byname(name); > } The patch above fixes the problem for me. I'll leave it to others to decide whether it's the best approach. Thanks! -- Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 3.11] genetlink: release cb_lock before requesting additional module 2013-07-25 14:51 ` Jeff Layton (?) @ 2013-07-26 9:00 ` Stanislaw Gruszka 2013-07-26 17:08 ` Pravin Shelar -1 siblings, 1 reply; 7+ messages in thread From: Stanislaw Gruszka @ 2013-07-26 9:00 UTC (permalink / raw) To: David S. Miller Cc: linux-wireless, linville, netdev, Thomas Graf, Stephen Hemminger, rjones, Marcel Holtmann, Jeff Layton Requesting external module with cb_lock taken can result in the deadlock like showed below: [ 2458.111347] Showing all locks held in the system: [ 2458.111347] 1 lock held by NetworkManager/582: [ 2458.111347] #0: (cb_lock){++++++}, at: [<ffffffff8162bc79>] genl_rcv+0x19/0x40 [ 2458.111347] 1 lock held by modprobe/603: [ 2458.111347] #0: (cb_lock){++++++}, at: [<ffffffff8162baa5>] genl_lock_all+0x15/0x30 [ 2461.579457] SysRq : Show Blocked State [ 2461.580103] task PC stack pid father [ 2461.580103] NetworkManager D ffff880034b84500 4040 582 1 0x00000080 [ 2461.580103] ffff8800197ff720 0000000000000046 00000000001d5340 ffff8800197fffd8 [ 2461.580103] ffff8800197fffd8 00000000001d5340 ffff880019631700 7fffffffffffffff [ 2461.580103] ffff8800197ff880 ffff8800197ff878 ffff880019631700 ffff880019631700 [ 2461.580103] Call Trace: [ 2461.580103] [<ffffffff817355f9>] schedule+0x29/0x70 [ 2461.580103] [<ffffffff81731ad1>] schedule_timeout+0x1c1/0x360 [ 2461.580103] [<ffffffff810e69eb>] ? mark_held_locks+0xbb/0x140 [ 2461.580103] [<ffffffff817377ac>] ? _raw_spin_unlock_irq+0x2c/0x50 [ 2461.580103] [<ffffffff810e6b6d>] ? trace_hardirqs_on_caller+0xfd/0x1c0 [ 2461.580103] [<ffffffff81736398>] wait_for_completion_killable+0xe8/0x170 [ 2461.580103] [<ffffffff810b7fa0>] ? wake_up_state+0x20/0x20 [ 2461.580103] [<ffffffff81095825>] call_usermodehelper_exec+0x1a5/0x210 [ 2461.580103] [<ffffffff817362ed>] ? wait_for_completion_killable+0x3d/0x170 [ 2461.580103] [<ffffffff81095cc3>] __request_module+0x1b3/0x370 [ 2461.580103] [<ffffffff810e6b6d>] ? trace_hardirqs_on_caller+0xfd/0x1c0 [ 2461.580103] [<ffffffff8162c5c9>] ctrl_getfamily+0x159/0x190 [ 2461.580103] [<ffffffff8162d8a4>] genl_family_rcv_msg+0x1f4/0x2e0 [ 2461.580103] [<ffffffff8162d990>] ? genl_family_rcv_msg+0x2e0/0x2e0 [ 2461.580103] [<ffffffff8162da1e>] genl_rcv_msg+0x8e/0xd0 [ 2461.580103] [<ffffffff8162b729>] netlink_rcv_skb+0xa9/0xc0 [ 2461.580103] [<ffffffff8162bc88>] genl_rcv+0x28/0x40 [ 2461.580103] [<ffffffff8162ad6d>] netlink_unicast+0xdd/0x190 [ 2461.580103] [<ffffffff8162b149>] netlink_sendmsg+0x329/0x750 [ 2461.580103] [<ffffffff815db849>] sock_sendmsg+0x99/0xd0 [ 2461.580103] [<ffffffff810bb58f>] ? local_clock+0x5f/0x70 [ 2461.580103] [<ffffffff810e96e8>] ? lock_release_non_nested+0x308/0x350 [ 2461.580103] [<ffffffff815dbc6e>] ___sys_sendmsg+0x39e/0x3b0 [ 2461.580103] [<ffffffff810565af>] ? kvm_clock_read+0x2f/0x50 [ 2461.580103] [<ffffffff810218b9>] ? sched_clock+0x9/0x10 [ 2461.580103] [<ffffffff810bb2bd>] ? sched_clock_local+0x1d/0x80 [ 2461.580103] [<ffffffff810bb448>] ? sched_clock_cpu+0xa8/0x100 [ 2461.580103] [<ffffffff810e33ad>] ? trace_hardirqs_off+0xd/0x10 [ 2461.580103] [<ffffffff810bb58f>] ? local_clock+0x5f/0x70 [ 2461.580103] [<ffffffff810e3f7f>] ? lock_release_holdtime.part.28+0xf/0x1a0 [ 2461.580103] [<ffffffff8120fec9>] ? fget_light+0xf9/0x510 [ 2461.580103] [<ffffffff8120fe0c>] ? fget_light+0x3c/0x510 [ 2461.580103] [<ffffffff815dd1d2>] __sys_sendmsg+0x42/0x80 [ 2461.580103] [<ffffffff815dd222>] SyS_sendmsg+0x12/0x20 [ 2461.580103] [<ffffffff81741ad9>] system_call_fastpath+0x16/0x1b [ 2461.580103] modprobe D ffff88000f2c8000 4632 603 602 0x00000080 [ 2461.580103] ffff88000f04fba8 0000000000000046 00000000001d5340 ffff88000f04ffd8 [ 2461.580103] ffff88000f04ffd8 00000000001d5340 ffff8800377d4500 ffff8800377d4500 [ 2461.580103] ffffffff81d0b260 ffffffff81d0b268 ffffffff00000000 ffffffff81d0b2b0 [ 2461.580103] Call Trace: [ 2461.580103] [<ffffffff817355f9>] schedule+0x29/0x70 [ 2461.580103] [<ffffffff81736d4d>] rwsem_down_write_failed+0xed/0x1a0 [ 2461.580103] [<ffffffff810bb200>] ? update_cpu_load_active+0x10/0xb0 [ 2461.580103] [<ffffffff8137b473>] call_rwsem_down_write_failed+0x13/0x20 [ 2461.580103] [<ffffffff8173492d>] ? down_write+0x9d/0xb2 [ 2461.580103] [<ffffffff8162baa5>] ? genl_lock_all+0x15/0x30 [ 2461.580103] [<ffffffff8162baa5>] genl_lock_all+0x15/0x30 [ 2461.580103] [<ffffffff8162cbb3>] genl_register_family+0x53/0x1f0 [ 2461.580103] [<ffffffffa01dc000>] ? 0xffffffffa01dbfff [ 2461.580103] [<ffffffff8162d650>] genl_register_family_with_ops+0x20/0x80 [ 2461.580103] [<ffffffffa01dc000>] ? 0xffffffffa01dbfff [ 2461.580103] [<ffffffffa017fe84>] nl80211_init+0x24/0xf0 [cfg80211] [ 2461.580103] [<ffffffffa01dc000>] ? 0xffffffffa01dbfff [ 2461.580103] [<ffffffffa01dc043>] cfg80211_init+0x43/0xdb [cfg80211] [ 2461.580103] [<ffffffff810020fa>] do_one_initcall+0xfa/0x1b0 [ 2461.580103] [<ffffffff8105cb93>] ? set_memory_nx+0x43/0x50 [ 2461.580103] [<ffffffff810f75af>] load_module+0x1c6f/0x27f0 [ 2461.580103] [<ffffffff810f2c90>] ? store_uevent+0x40/0x40 [ 2461.580103] [<ffffffff810f82c6>] SyS_finit_module+0x86/0xb0 [ 2461.580103] [<ffffffff81741ad9>] system_call_fastpath+0x16/0x1b [ 2461.580103] Sched Debug Version: v0.10, 3.11.0-0.rc1.git4.1.fc20.x86_64 #1 Problem start to happen after adding net-pf-16-proto-16-family-nl80211 alias name to cfg80211 module by below commit (though that commit itself is perfectly fine): commit fb4e156886ce6e8309e912d8b370d192330d19d3 Author: Marcel Holtmann <marcel@holtmann.org> Date: Sun Apr 28 16:22:06 2013 -0700 nl80211: Add generic netlink module alias for cfg80211/nl80211 Reported-and-tested-by: Jeff Layton <jlayton@redhat.com> Reported-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> --- net/netlink/genetlink.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c index 2fd6dbe..1076fe1 100644 --- a/net/netlink/genetlink.c +++ b/net/netlink/genetlink.c @@ -877,8 +877,10 @@ static int ctrl_getfamily(struct sk_buff *skb, struct genl_info *info) #ifdef CONFIG_MODULES if (res == NULL) { genl_unlock(); + up_read(&cb_lock); request_module("net-pf-%d-proto-%d-family-%s", PF_NETLINK, NETLINK_GENERIC, name); + down_read(&cb_lock); genl_lock(); res = genl_family_find_byname(name); } -- 1.7.11.7 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 3.11] genetlink: release cb_lock before requesting additional module 2013-07-26 9:00 ` [PATCH 3.11] genetlink: release cb_lock before requesting additional module Stanislaw Gruszka @ 2013-07-26 17:08 ` Pravin Shelar 2013-07-28 5:20 ` David Miller 0 siblings, 1 reply; 7+ messages in thread From: Pravin Shelar @ 2013-07-26 17:08 UTC (permalink / raw) To: Stanislaw Gruszka Cc: David S. Miller, linux-wireless, linville, netdev, Thomas Graf, Stephen Hemminger, rjones, Marcel Holtmann, Jeff Layton On Fri, Jul 26, 2013 at 2:00 AM, Stanislaw Gruszka <sgruszka@redhat.com> wrote: > Requesting external module with cb_lock taken can result in > the deadlock like showed below: > > [ 2458.111347] Showing all locks held in the system: > [ 2458.111347] 1 lock held by NetworkManager/582: > [ 2458.111347] #0: (cb_lock){++++++}, at: [<ffffffff8162bc79>] genl_rcv+0x19/0x40 > [ 2458.111347] 1 lock held by modprobe/603: > [ 2458.111347] #0: (cb_lock){++++++}, at: [<ffffffff8162baa5>] genl_lock_all+0x15/0x30 > > [ 2461.579457] SysRq : Show Blocked State > [ 2461.580103] task PC stack pid father > [ 2461.580103] NetworkManager D ffff880034b84500 4040 582 1 0x00000080 > [ 2461.580103] ffff8800197ff720 0000000000000046 00000000001d5340 ffff8800197fffd8 > [ 2461.580103] ffff8800197fffd8 00000000001d5340 ffff880019631700 7fffffffffffffff > [ 2461.580103] ffff8800197ff880 ffff8800197ff878 ffff880019631700 ffff880019631700 > [ 2461.580103] Call Trace: > [ 2461.580103] [<ffffffff817355f9>] schedule+0x29/0x70 > [ 2461.580103] [<ffffffff81731ad1>] schedule_timeout+0x1c1/0x360 > [ 2461.580103] [<ffffffff810e69eb>] ? mark_held_locks+0xbb/0x140 > [ 2461.580103] [<ffffffff817377ac>] ? _raw_spin_unlock_irq+0x2c/0x50 > [ 2461.580103] [<ffffffff810e6b6d>] ? trace_hardirqs_on_caller+0xfd/0x1c0 > [ 2461.580103] [<ffffffff81736398>] wait_for_completion_killable+0xe8/0x170 > [ 2461.580103] [<ffffffff810b7fa0>] ? wake_up_state+0x20/0x20 > [ 2461.580103] [<ffffffff81095825>] call_usermodehelper_exec+0x1a5/0x210 > [ 2461.580103] [<ffffffff817362ed>] ? wait_for_completion_killable+0x3d/0x170 > [ 2461.580103] [<ffffffff81095cc3>] __request_module+0x1b3/0x370 > [ 2461.580103] [<ffffffff810e6b6d>] ? trace_hardirqs_on_caller+0xfd/0x1c0 > [ 2461.580103] [<ffffffff8162c5c9>] ctrl_getfamily+0x159/0x190 > [ 2461.580103] [<ffffffff8162d8a4>] genl_family_rcv_msg+0x1f4/0x2e0 > [ 2461.580103] [<ffffffff8162d990>] ? genl_family_rcv_msg+0x2e0/0x2e0 > [ 2461.580103] [<ffffffff8162da1e>] genl_rcv_msg+0x8e/0xd0 > [ 2461.580103] [<ffffffff8162b729>] netlink_rcv_skb+0xa9/0xc0 > [ 2461.580103] [<ffffffff8162bc88>] genl_rcv+0x28/0x40 > [ 2461.580103] [<ffffffff8162ad6d>] netlink_unicast+0xdd/0x190 > [ 2461.580103] [<ffffffff8162b149>] netlink_sendmsg+0x329/0x750 > [ 2461.580103] [<ffffffff815db849>] sock_sendmsg+0x99/0xd0 > [ 2461.580103] [<ffffffff810bb58f>] ? local_clock+0x5f/0x70 > [ 2461.580103] [<ffffffff810e96e8>] ? lock_release_non_nested+0x308/0x350 > [ 2461.580103] [<ffffffff815dbc6e>] ___sys_sendmsg+0x39e/0x3b0 > [ 2461.580103] [<ffffffff810565af>] ? kvm_clock_read+0x2f/0x50 > [ 2461.580103] [<ffffffff810218b9>] ? sched_clock+0x9/0x10 > [ 2461.580103] [<ffffffff810bb2bd>] ? sched_clock_local+0x1d/0x80 > [ 2461.580103] [<ffffffff810bb448>] ? sched_clock_cpu+0xa8/0x100 > [ 2461.580103] [<ffffffff810e33ad>] ? trace_hardirqs_off+0xd/0x10 > [ 2461.580103] [<ffffffff810bb58f>] ? local_clock+0x5f/0x70 > [ 2461.580103] [<ffffffff810e3f7f>] ? lock_release_holdtime.part.28+0xf/0x1a0 > [ 2461.580103] [<ffffffff8120fec9>] ? fget_light+0xf9/0x510 > [ 2461.580103] [<ffffffff8120fe0c>] ? fget_light+0x3c/0x510 > [ 2461.580103] [<ffffffff815dd1d2>] __sys_sendmsg+0x42/0x80 > [ 2461.580103] [<ffffffff815dd222>] SyS_sendmsg+0x12/0x20 > [ 2461.580103] [<ffffffff81741ad9>] system_call_fastpath+0x16/0x1b > [ 2461.580103] modprobe D ffff88000f2c8000 4632 603 602 0x00000080 > [ 2461.580103] ffff88000f04fba8 0000000000000046 00000000001d5340 ffff88000f04ffd8 > [ 2461.580103] ffff88000f04ffd8 00000000001d5340 ffff8800377d4500 ffff8800377d4500 > [ 2461.580103] ffffffff81d0b260 ffffffff81d0b268 ffffffff00000000 ffffffff81d0b2b0 > [ 2461.580103] Call Trace: > [ 2461.580103] [<ffffffff817355f9>] schedule+0x29/0x70 > [ 2461.580103] [<ffffffff81736d4d>] rwsem_down_write_failed+0xed/0x1a0 > [ 2461.580103] [<ffffffff810bb200>] ? update_cpu_load_active+0x10/0xb0 > [ 2461.580103] [<ffffffff8137b473>] call_rwsem_down_write_failed+0x13/0x20 > [ 2461.580103] [<ffffffff8173492d>] ? down_write+0x9d/0xb2 > [ 2461.580103] [<ffffffff8162baa5>] ? genl_lock_all+0x15/0x30 > [ 2461.580103] [<ffffffff8162baa5>] genl_lock_all+0x15/0x30 > [ 2461.580103] [<ffffffff8162cbb3>] genl_register_family+0x53/0x1f0 > [ 2461.580103] [<ffffffffa01dc000>] ? 0xffffffffa01dbfff > [ 2461.580103] [<ffffffff8162d650>] genl_register_family_with_ops+0x20/0x80 > [ 2461.580103] [<ffffffffa01dc000>] ? 0xffffffffa01dbfff > [ 2461.580103] [<ffffffffa017fe84>] nl80211_init+0x24/0xf0 [cfg80211] > [ 2461.580103] [<ffffffffa01dc000>] ? 0xffffffffa01dbfff > [ 2461.580103] [<ffffffffa01dc043>] cfg80211_init+0x43/0xdb [cfg80211] > [ 2461.580103] [<ffffffff810020fa>] do_one_initcall+0xfa/0x1b0 > [ 2461.580103] [<ffffffff8105cb93>] ? set_memory_nx+0x43/0x50 > [ 2461.580103] [<ffffffff810f75af>] load_module+0x1c6f/0x27f0 > [ 2461.580103] [<ffffffff810f2c90>] ? store_uevent+0x40/0x40 > [ 2461.580103] [<ffffffff810f82c6>] SyS_finit_module+0x86/0xb0 > [ 2461.580103] [<ffffffff81741ad9>] system_call_fastpath+0x16/0x1b > [ 2461.580103] Sched Debug Version: v0.10, 3.11.0-0.rc1.git4.1.fc20.x86_64 #1 > > Problem start to happen after adding net-pf-16-proto-16-family-nl80211 > alias name to cfg80211 module by below commit (though that commit > itself is perfectly fine): > > commit fb4e156886ce6e8309e912d8b370d192330d19d3 > Author: Marcel Holtmann <marcel@holtmann.org> > Date: Sun Apr 28 16:22:06 2013 -0700 > > nl80211: Add generic netlink module alias for cfg80211/nl80211 > > Reported-and-tested-by: Jeff Layton <jlayton@redhat.com> > Reported-by: Richard W.M. Jones <rjones@redhat.com> > Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> > --- > net/netlink/genetlink.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c > index 2fd6dbe..1076fe1 100644 > --- a/net/netlink/genetlink.c > +++ b/net/netlink/genetlink.c > @@ -877,8 +877,10 @@ static int ctrl_getfamily(struct sk_buff *skb, struct genl_info *info) > #ifdef CONFIG_MODULES > if (res == NULL) { > genl_unlock(); > + up_read(&cb_lock); > request_module("net-pf-%d-proto-%d-family-%s", > PF_NETLINK, NETLINK_GENERIC, name); > + down_read(&cb_lock); > genl_lock(); > res = genl_family_find_byname(name); > } This is genl issue and it was introduced by commit def3117493eafd (genl: Allow concurrent genl callbacks.). Reviewed-by: Pravin B Shelar <pshelar@nicira.com> > -- > 1.7.11.7 > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 3.11] genetlink: release cb_lock before requesting additional module 2013-07-26 17:08 ` Pravin Shelar @ 2013-07-28 5:20 ` David Miller 0 siblings, 0 replies; 7+ messages in thread From: David Miller @ 2013-07-28 5:20 UTC (permalink / raw) To: pshelar Cc: sgruszka, linux-wireless, linville, netdev, tgraf, shemminger, rjones, marcel, jlayton From: Pravin Shelar <pshelar@nicira.com> Date: Fri, 26 Jul 2013 10:08:25 -0700 > On Fri, Jul 26, 2013 at 2:00 AM, Stanislaw Gruszka <sgruszka@redhat.com> wrote: >> Requesting external module with cb_lock taken can result in >> the deadlock like showed below: ... >> Problem start to happen after adding net-pf-16-proto-16-family-nl80211 >> alias name to cfg80211 module by below commit (though that commit >> itself is perfectly fine): >> >> commit fb4e156886ce6e8309e912d8b370d192330d19d3 >> Author: Marcel Holtmann <marcel@holtmann.org> >> Date: Sun Apr 28 16:22:06 2013 -0700 >> >> nl80211: Add generic netlink module alias for cfg80211/nl80211 >> >> Reported-and-tested-by: Jeff Layton <jlayton@redhat.com> >> Reported-by: Richard W.M. Jones <rjones@redhat.com> >> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> ... > This is genl issue and it was introduced by commit def3117493eafd > (genl: Allow concurrent genl callbacks.). > > Reviewed-by: Pravin B Shelar <pshelar@nicira.com> Applied and queued up for -stable, thanks. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-07-28 5:20 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-07-12 11:49 hung task while plugging in cfg80211 Jeff Layton 2013-07-24 9:59 ` Stanislaw Gruszka 2013-07-25 14:51 ` Jeff Layton 2013-07-25 14:51 ` Jeff Layton 2013-07-26 9:00 ` [PATCH 3.11] genetlink: release cb_lock before requesting additional module Stanislaw Gruszka 2013-07-26 17:08 ` Pravin Shelar 2013-07-28 5:20 ` David Miller
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.