All of lore.kernel.org
 help / color / mirror / Atom feed
* hung task while plugging in cfg80211
@ 2013-07-12 11:49 Jeff Layton
  2013-07-24  9:59 ` Stanislaw Gruszka
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff Layton @ 2013-07-12 11:49 UTC (permalink / raw)
  To: linux-wireless, linville

I updated the kernel on my rawhide KVM guest, and noticed that the
ethernet interface wasn't coming up at all. While poking around, I saw
this stack trace pop up:

Jul 12 07:29:25 rawhide kernel: [  241.156196] INFO: task modprobe:501 blocked for more than 120 seconds.
Jul 12 07:29:25 rawhide kernel: [  241.157222] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 12 07:29:25 rawhide kernel: [  241.158379] modprobe        D ffff880118c526e0  4584   501    500 0x00000080
Jul 12 07:29:25 rawhide kernel: [  241.159458]  ffff8800d10c3ba8 0000000000000046 00000000001d5300 ffff8800d10c3fd8
Jul 12 07:29:25 rawhide kernel: [  241.160644]  ffff8800d10c3fd8 00000000001d5300 ffff8800d2ae4dc0 ffff8800d2ae4dc0
Jul 12 07:29:25 rawhide kernel: [  241.161875]  ffffffff81d0bfa0 ffffffff81d0bfa8 ffffffff00000000 ffffffff81d0bff0
Jul 12 07:29:25 rawhide kernel: [  241.163096] Call Trace:
Jul 12 07:29:25 rawhide kernel: [  241.163460]  [<ffffffff817377e9>] schedule+0x29/0x70
Jul 12 07:29:25 rawhide kernel: [  241.164235]  [<ffffffff817392ad>] rwsem_down_write_failed+0xed/0x1a0
Jul 12 07:29:25 rawhide kernel: [  241.165489]  [<ffffffff810bb600>] ? update_cpu_load_active+0xb0/0xb0
Jul 12 07:29:25 rawhide kernel: [  241.166267]  [<ffffffff8137d503>] call_rwsem_down_write_failed+0x13/0x20
Jul 12 07:29:25 rawhide kernel: [  241.166760]  [<ffffffff81736aad>] ? down_write+0x9d/0xb2
Jul 12 07:29:25 rawhide kernel: [  241.167189]  [<ffffffff8162d975>] ? genl_lock_all+0x15/0x30
Jul 12 07:29:25 rawhide kernel: [  241.167603]  [<ffffffff8162d975>] genl_lock_all+0x15/0x30
Jul 12 07:29:25 rawhide kernel: [  241.168036]  [<ffffffff8162ea83>] genl_register_family+0x53/0x1f0
Jul 12 07:29:25 rawhide kernel: [  241.168490]  [<ffffffffa015e000>] ? 0xffffffffa015dfff
Jul 12 07:29:25 rawhide kernel: [  241.168877]  [<ffffffff8162f520>] genl_register_family_with_ops+0x20/0x80
Jul 12 07:29:25 rawhide kernel: [  241.169406]  [<ffffffffa015e000>] ? 0xffffffffa015dfff
Jul 12 07:29:25 rawhide kernel: [  241.169802]  [<ffffffffa0101ec4>] nl80211_init+0x24/0xf0 [cfg80211]
Jul 12 07:29:25 rawhide kernel: [  241.170297]  [<ffffffffa015e000>] ? 0xffffffffa015dfff
Jul 12 07:29:25 rawhide kernel: [  241.170686]  [<ffffffffa015e043>] cfg80211_init+0x43/0xdb [cfg80211]
Jul 12 07:29:25 rawhide kernel: [  241.171230]  [<ffffffff810020fa>] do_one_initcall+0xfa/0x1b0
Jul 12 07:29:25 rawhide kernel: [  241.171651]  [<ffffffff8105cb93>] ? set_memory_nx+0x43/0x50
Jul 12 07:29:25 rawhide kernel: [  241.172099]  [<ffffffff810fa2bf>] load_module+0x1c6f/0x27f0
Jul 12 07:29:25 rawhide kernel: [  241.172515]  [<ffffffff810f59a0>] ? store_uevent+0x40/0x40
Jul 12 07:29:25 rawhide kernel: [  241.172933]  [<ffffffff810fafd6>] SyS_finit_module+0x86/0xb0
Jul 12 07:29:25 rawhide kernel: [  241.173379]  [<ffffffff81744019>] system_call_fastpath+0x16/0x1b

...on a hunch, I blacklisted the cfg80211 module, and that seems to
have worked around the problem for now. I suspect that NM was trying to
plug in this module and it hung, and it couldn't proceed any further to
configure the virtual ethernet interface.

In the fedora rawhide kernels, last known good one is
kernel-3.11.0-0.rc0.git3.1.fc20, and bad one is
kernel-3.11.0-0.rc0.git6.1.fc20. I haven't bisected it further.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: hung task while plugging in cfg80211
  2013-07-12 11:49 hung task while plugging in cfg80211 Jeff Layton
@ 2013-07-24  9:59 ` Stanislaw Gruszka
  2013-07-25 14:51     ` Jeff Layton
  0 siblings, 1 reply; 7+ messages in thread
From: Stanislaw Gruszka @ 2013-07-24  9:59 UTC (permalink / raw)
  To: Jeff Layton
  Cc: linux-wireless, linville, netdev, Thomas Graf, Stephen Hemminger,
	rjones, Marcel Holtmann

On Fri, Jul 12, 2013 at 07:49:11AM -0400, Jeff Layton wrote:
> I updated the kernel on my rawhide KVM guest, and noticed that the
> ethernet interface wasn't coming up at all. While poking around, I saw
> this stack trace pop up:
> 
> Jul 12 07:29:25 rawhide kernel: [  241.156196] INFO: task modprobe:501 blocked for more than 120 seconds.
> Jul 12 07:29:25 rawhide kernel: [  241.157222] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Jul 12 07:29:25 rawhide kernel: [  241.158379] modprobe        D ffff880118c526e0  4584   501    500 0x00000080
> Jul 12 07:29:25 rawhide kernel: [  241.159458]  ffff8800d10c3ba8 0000000000000046 00000000001d5300 ffff8800d10c3fd8
> Jul 12 07:29:25 rawhide kernel: [  241.160644]  ffff8800d10c3fd8 00000000001d5300 ffff8800d2ae4dc0 ffff8800d2ae4dc0
> Jul 12 07:29:25 rawhide kernel: [  241.161875]  ffffffff81d0bfa0 ffffffff81d0bfa8 ffffffff00000000 ffffffff81d0bff0
> Jul 12 07:29:25 rawhide kernel: [  241.163096] Call Trace:
> Jul 12 07:29:25 rawhide kernel: [  241.163460]  [<ffffffff817377e9>] schedule+0x29/0x70
> Jul 12 07:29:25 rawhide kernel: [  241.164235]  [<ffffffff817392ad>] rwsem_down_write_failed+0xed/0x1a0
> Jul 12 07:29:25 rawhide kernel: [  241.165489]  [<ffffffff810bb600>] ? update_cpu_load_active+0xb0/0xb0
> Jul 12 07:29:25 rawhide kernel: [  241.166267]  [<ffffffff8137d503>] call_rwsem_down_write_failed+0x13/0x20
> Jul 12 07:29:25 rawhide kernel: [  241.166760]  [<ffffffff81736aad>] ? down_write+0x9d/0xb2
> Jul 12 07:29:25 rawhide kernel: [  241.167189]  [<ffffffff8162d975>] ? genl_lock_all+0x15/0x30
> Jul 12 07:29:25 rawhide kernel: [  241.167603]  [<ffffffff8162d975>] genl_lock_all+0x15/0x30

I reproduced that on qemu-kvm with rawhide LiveCD. Below more
information gathered using sysrq:

(qemu) [ 2458.110724] SysRq : Show Locks Held
[ 2458.111347] 
[ 2458.111347] Showing all locks held in the system:
[ 2458.111347] 1 lock held by NetworkManager/582:
[ 2458.111347]  #0:  (cb_lock){++++++}, at: [<ffffffff8162bc79>] genl_rcv+0x19/0x40
[ 2458.111347] 1 lock held by modprobe/603:
[ 2458.111347]  #0:  (cb_lock){++++++}, at: [<ffffffff8162baa5>] genl_lock_all+0x15/0x30
[ 2458.111347] 
[ 2458.111347] =============================================
[ 2458.111347] 

(qemu) sendkey alt-sysrq-w
(qemu) [ 2461.579457] SysRq : Show Blocked State
[ 2461.580103]   task                        PC stack   pid father
[ 2461.580103] NetworkManager  D ffff880034b84500  4040   582      1 0x00000080
[ 2461.580103]  ffff8800197ff720 0000000000000046 00000000001d5340 ffff8800197fffd8
[ 2461.580103]  ffff8800197fffd8 00000000001d5340 ffff880019631700 7fffffffffffffff
[ 2461.580103]  ffff8800197ff880 ffff8800197ff878 ffff880019631700 ffff880019631700
[ 2461.580103] Call Trace:
[ 2461.580103]  [<ffffffff817355f9>] schedule+0x29/0x70
[ 2461.580103]  [<ffffffff81731ad1>] schedule_timeout+0x1c1/0x360
[ 2461.580103]  [<ffffffff810e69eb>] ? mark_held_locks+0xbb/0x140
[ 2461.580103]  [<ffffffff817377ac>] ? _raw_spin_unlock_irq+0x2c/0x50
[ 2461.580103]  [<ffffffff810e6b6d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
[ 2461.580103]  [<ffffffff81736398>] wait_for_completion_killable+0xe8/0x170
[ 2461.580103]  [<ffffffff810b7fa0>] ? wake_up_state+0x20/0x20
[ 2461.580103]  [<ffffffff81095825>] call_usermodehelper_exec+0x1a5/0x210
[ 2461.580103]  [<ffffffff817362ed>] ? wait_for_completion_killable+0x3d/0x170
[ 2461.580103]  [<ffffffff81095cc3>] __request_module+0x1b3/0x370
[ 2461.580103]  [<ffffffff810e6b6d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
[ 2461.580103]  [<ffffffff8162c5c9>] ctrl_getfamily+0x159/0x190
[ 2461.580103]  [<ffffffff8162d8a4>] genl_family_rcv_msg+0x1f4/0x2e0
[ 2461.580103]  [<ffffffff8162d990>] ? genl_family_rcv_msg+0x2e0/0x2e0
[ 2461.580103]  [<ffffffff8162da1e>] genl_rcv_msg+0x8e/0xd0
[ 2461.580103]  [<ffffffff8162b729>] netlink_rcv_skb+0xa9/0xc0
[ 2461.580103]  [<ffffffff8162bc88>] genl_rcv+0x28/0x40
[ 2461.580103]  [<ffffffff8162ad6d>] netlink_unicast+0xdd/0x190
[ 2461.580103]  [<ffffffff8162b149>] netlink_sendmsg+0x329/0x750
[ 2461.580103]  [<ffffffff815db849>] sock_sendmsg+0x99/0xd0
[ 2461.580103]  [<ffffffff810bb58f>] ? local_clock+0x5f/0x70
[ 2461.580103]  [<ffffffff810e96e8>] ? lock_release_non_nested+0x308/0x350
[ 2461.580103]  [<ffffffff815dbc6e>] ___sys_sendmsg+0x39e/0x3b0
[ 2461.580103]  [<ffffffff810565af>] ? kvm_clock_read+0x2f/0x50
[ 2461.580103]  [<ffffffff810218b9>] ? sched_clock+0x9/0x10
[ 2461.580103]  [<ffffffff810bb2bd>] ? sched_clock_local+0x1d/0x80
[ 2461.580103]  [<ffffffff810bb448>] ? sched_clock_cpu+0xa8/0x100
[ 2461.580103]  [<ffffffff810e33ad>] ? trace_hardirqs_off+0xd/0x10
[ 2461.580103]  [<ffffffff810bb58f>] ? local_clock+0x5f/0x70
[ 2461.580103]  [<ffffffff810e3f7f>] ? lock_release_holdtime.part.28+0xf/0x1a0
[ 2461.580103]  [<ffffffff8120fec9>] ? fget_light+0xf9/0x510
[ 2461.580103]  [<ffffffff8120fe0c>] ? fget_light+0x3c/0x510
[ 2461.580103]  [<ffffffff815dd1d2>] __sys_sendmsg+0x42/0x80
[ 2461.580103]  [<ffffffff815dd222>] SyS_sendmsg+0x12/0x20
[ 2461.580103]  [<ffffffff81741ad9>] system_call_fastpath+0x16/0x1b
[ 2461.580103] modprobe        D ffff88000f2c8000  4632   603    602 0x00000080
[ 2461.580103]  ffff88000f04fba8 0000000000000046 00000000001d5340 ffff88000f04ffd8
[ 2461.580103]  ffff88000f04ffd8 00000000001d5340 ffff8800377d4500 ffff8800377d4500
[ 2461.580103]  ffffffff81d0b260 ffffffff81d0b268 ffffffff00000000 ffffffff81d0b2b0
[ 2461.580103] Call Trace:
[ 2461.580103]  [<ffffffff817355f9>] schedule+0x29/0x70
[ 2461.580103]  [<ffffffff81736d4d>] rwsem_down_write_failed+0xed/0x1a0
[ 2461.580103]  [<ffffffff810bb200>] ? update_cpu_load_active+0x10/0xb0
[ 2461.580103]  [<ffffffff8137b473>] call_rwsem_down_write_failed+0x13/0x20
[ 2461.580103]  [<ffffffff8173492d>] ? down_write+0x9d/0xb2
[ 2461.580103]  [<ffffffff8162baa5>] ? genl_lock_all+0x15/0x30
[ 2461.580103]  [<ffffffff8162baa5>] genl_lock_all+0x15/0x30
[ 2461.580103]  [<ffffffff8162cbb3>] genl_register_family+0x53/0x1f0
[ 2461.580103]  [<ffffffffa01dc000>] ? 0xffffffffa01dbfff
[ 2461.580103]  [<ffffffff8162d650>] genl_register_family_with_ops+0x20/0x80
[ 2461.580103]  [<ffffffffa01dc000>] ? 0xffffffffa01dbfff
[ 2461.580103]  [<ffffffffa017fe84>] nl80211_init+0x24/0xf0 [cfg80211]
[ 2461.580103]  [<ffffffffa01dc000>] ? 0xffffffffa01dbfff
[ 2461.580103]  [<ffffffffa01dc043>] cfg80211_init+0x43/0xdb [cfg80211]
[ 2461.580103]  [<ffffffff810020fa>] do_one_initcall+0xfa/0x1b0
[ 2461.580103]  [<ffffffff8105cb93>] ? set_memory_nx+0x43/0x50
[ 2461.580103]  [<ffffffff810f75af>] load_module+0x1c6f/0x27f0
[ 2461.580103]  [<ffffffff810f2c90>] ? store_uevent+0x40/0x40
[ 2461.580103]  [<ffffffff810f82c6>] SyS_finit_module+0x86/0xb0
[ 2461.580103]  [<ffffffff81741ad9>] system_call_fastpath+0x16/0x1b
[ 2461.580103] Sched Debug Version: v0.10, 3.11.0-0.rc1.git4.1.fc20.x86_64 #1

So call user space to load cfg80211 module (alias
net-pf-16-proto-16-family-nl80211) from ctrl_getfamily() with read taken
cb_lock semaphore. That deadlock in modprobe process in
genl_register_family() when try to down_write cb_lock.

This is consequence of commit:

commit fb4e156886ce6e8309e912d8b370d192330d19d3
Author: Marcel Holtmann <marcel@holtmann.org>
Date:   Sun Apr 28 16:22:06 2013 -0700

    nl80211: Add generic netlink module alias for cfg80211/nl80211

I prepared a possible fix for that (patch below), but I'm not sure if it
do right thing, it can cause other oddity, please test (with
CONFIG_LOCKDEP compiled). I think I would prefer rather remove
autoloading genetlink modules from kernel or use just one lock
in instead of two cb_lock & genl_mutex .

Stanislaw

diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index 2fd6dbe..1076fe1 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -877,8 +877,10 @@ static int ctrl_getfamily(struct sk_buff *skb, struct genl_info *info)
 #ifdef CONFIG_MODULES
 		if (res == NULL) {
 			genl_unlock();
+			up_read(&cb_lock);
 			request_module("net-pf-%d-proto-%d-family-%s",
 				       PF_NETLINK, NETLINK_GENERIC, name);
+			down_read(&cb_lock);
 			genl_lock();
 			res = genl_family_find_byname(name);
 		}

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: hung task while plugging in cfg80211
@ 2013-07-25 14:51     ` Jeff Layton
  0 siblings, 0 replies; 7+ messages in thread
From: Jeff Layton @ 2013-07-25 14:51 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: linux-wireless, linville, netdev, Thomas Graf, Stephen Hemminger,
	rjones, Marcel Holtmann

On Wed, 24 Jul 2013 11:59:12 +0200
Stanislaw Gruszka <sgruszka@redhat.com> wrote:

> On Fri, Jul 12, 2013 at 07:49:11AM -0400, Jeff Layton wrote:
> > I updated the kernel on my rawhide KVM guest, and noticed that the
> > ethernet interface wasn't coming up at all. While poking around, I saw
> > this stack trace pop up:
> > 
> > Jul 12 07:29:25 rawhide kernel: [  241.156196] INFO: task modprobe:501 blocked for more than 120 seconds.
> > Jul 12 07:29:25 rawhide kernel: [  241.157222] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > Jul 12 07:29:25 rawhide kernel: [  241.158379] modprobe        D ffff880118c526e0  4584   501    500 0x00000080
> > Jul 12 07:29:25 rawhide kernel: [  241.159458]  ffff8800d10c3ba8 0000000000000046 00000000001d5300 ffff8800d10c3fd8
> > Jul 12 07:29:25 rawhide kernel: [  241.160644]  ffff8800d10c3fd8 00000000001d5300 ffff8800d2ae4dc0 ffff8800d2ae4dc0
> > Jul 12 07:29:25 rawhide kernel: [  241.161875]  ffffffff81d0bfa0 ffffffff81d0bfa8 ffffffff00000000 ffffffff81d0bff0
> > Jul 12 07:29:25 rawhide kernel: [  241.163096] Call Trace:
> > Jul 12 07:29:25 rawhide kernel: [  241.163460]  [<ffffffff817377e9>] schedule+0x29/0x70
> > Jul 12 07:29:25 rawhide kernel: [  241.164235]  [<ffffffff817392ad>] rwsem_down_write_failed+0xed/0x1a0
> > Jul 12 07:29:25 rawhide kernel: [  241.165489]  [<ffffffff810bb600>] ? update_cpu_load_active+0xb0/0xb0
> > Jul 12 07:29:25 rawhide kernel: [  241.166267]  [<ffffffff8137d503>] call_rwsem_down_write_failed+0x13/0x20
> > Jul 12 07:29:25 rawhide kernel: [  241.166760]  [<ffffffff81736aad>] ? down_write+0x9d/0xb2
> > Jul 12 07:29:25 rawhide kernel: [  241.167189]  [<ffffffff8162d975>] ? genl_lock_all+0x15/0x30
> > Jul 12 07:29:25 rawhide kernel: [  241.167603]  [<ffffffff8162d975>] genl_lock_all+0x15/0x30
> 
> I reproduced that on qemu-kvm with rawhide LiveCD. Below more
> information gathered using sysrq:
> 
> (qemu) [ 2458.110724] SysRq : Show Locks Held
> [ 2458.111347] 
> [ 2458.111347] Showing all locks held in the system:
> [ 2458.111347] 1 lock held by NetworkManager/582:
> [ 2458.111347]  #0:  (cb_lock){++++++}, at: [<ffffffff8162bc79>] genl_rcv+0x19/0x40
> [ 2458.111347] 1 lock held by modprobe/603:
> [ 2458.111347]  #0:  (cb_lock){++++++}, at: [<ffffffff8162baa5>] genl_lock_all+0x15/0x30
> [ 2458.111347] 
> [ 2458.111347] =============================================
> [ 2458.111347] 
> 
> (qemu) sendkey alt-sysrq-w
> (qemu) [ 2461.579457] SysRq : Show Blocked State
> [ 2461.580103]   task                        PC stack   pid father
> [ 2461.580103] NetworkManager  D ffff880034b84500  4040   582      1 0x00000080
> [ 2461.580103]  ffff8800197ff720 0000000000000046 00000000001d5340 ffff8800197fffd8
> [ 2461.580103]  ffff8800197fffd8 00000000001d5340 ffff880019631700 7fffffffffffffff
> [ 2461.580103]  ffff8800197ff880 ffff8800197ff878 ffff880019631700 ffff880019631700
> [ 2461.580103] Call Trace:
> [ 2461.580103]  [<ffffffff817355f9>] schedule+0x29/0x70
> [ 2461.580103]  [<ffffffff81731ad1>] schedule_timeout+0x1c1/0x360
> [ 2461.580103]  [<ffffffff810e69eb>] ? mark_held_locks+0xbb/0x140
> [ 2461.580103]  [<ffffffff817377ac>] ? _raw_spin_unlock_irq+0x2c/0x50
> [ 2461.580103]  [<ffffffff810e6b6d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
> [ 2461.580103]  [<ffffffff81736398>] wait_for_completion_killable+0xe8/0x170
> [ 2461.580103]  [<ffffffff810b7fa0>] ? wake_up_state+0x20/0x20
> [ 2461.580103]  [<ffffffff81095825>] call_usermodehelper_exec+0x1a5/0x210
> [ 2461.580103]  [<ffffffff817362ed>] ? wait_for_completion_killable+0x3d/0x170
> [ 2461.580103]  [<ffffffff81095cc3>] __request_module+0x1b3/0x370
> [ 2461.580103]  [<ffffffff810e6b6d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
> [ 2461.580103]  [<ffffffff8162c5c9>] ctrl_getfamily+0x159/0x190
> [ 2461.580103]  [<ffffffff8162d8a4>] genl_family_rcv_msg+0x1f4/0x2e0
> [ 2461.580103]  [<ffffffff8162d990>] ? genl_family_rcv_msg+0x2e0/0x2e0
> [ 2461.580103]  [<ffffffff8162da1e>] genl_rcv_msg+0x8e/0xd0
> [ 2461.580103]  [<ffffffff8162b729>] netlink_rcv_skb+0xa9/0xc0
> [ 2461.580103]  [<ffffffff8162bc88>] genl_rcv+0x28/0x40
> [ 2461.580103]  [<ffffffff8162ad6d>] netlink_unicast+0xdd/0x190
> [ 2461.580103]  [<ffffffff8162b149>] netlink_sendmsg+0x329/0x750
> [ 2461.580103]  [<ffffffff815db849>] sock_sendmsg+0x99/0xd0
> [ 2461.580103]  [<ffffffff810bb58f>] ? local_clock+0x5f/0x70
> [ 2461.580103]  [<ffffffff810e96e8>] ? lock_release_non_nested+0x308/0x350
> [ 2461.580103]  [<ffffffff815dbc6e>] ___sys_sendmsg+0x39e/0x3b0
> [ 2461.580103]  [<ffffffff810565af>] ? kvm_clock_read+0x2f/0x50
> [ 2461.580103]  [<ffffffff810218b9>] ? sched_clock+0x9/0x10
> [ 2461.580103]  [<ffffffff810bb2bd>] ? sched_clock_local+0x1d/0x80
> [ 2461.580103]  [<ffffffff810bb448>] ? sched_clock_cpu+0xa8/0x100
> [ 2461.580103]  [<ffffffff810e33ad>] ? trace_hardirqs_off+0xd/0x10
> [ 2461.580103]  [<ffffffff810bb58f>] ? local_clock+0x5f/0x70
> [ 2461.580103]  [<ffffffff810e3f7f>] ? lock_release_holdtime.part.28+0xf/0x1a0
> [ 2461.580103]  [<ffffffff8120fec9>] ? fget_light+0xf9/0x510
> [ 2461.580103]  [<ffffffff8120fe0c>] ? fget_light+0x3c/0x510
> [ 2461.580103]  [<ffffffff815dd1d2>] __sys_sendmsg+0x42/0x80
> [ 2461.580103]  [<ffffffff815dd222>] SyS_sendmsg+0x12/0x20
> [ 2461.580103]  [<ffffffff81741ad9>] system_call_fastpath+0x16/0x1b
> [ 2461.580103] modprobe        D ffff88000f2c8000  4632   603    602 0x00000080
> [ 2461.580103]  ffff88000f04fba8 0000000000000046 00000000001d5340 ffff88000f04ffd8
> [ 2461.580103]  ffff88000f04ffd8 00000000001d5340 ffff8800377d4500 ffff8800377d4500
> [ 2461.580103]  ffffffff81d0b260 ffffffff81d0b268 ffffffff00000000 ffffffff81d0b2b0
> [ 2461.580103] Call Trace:
> [ 2461.580103]  [<ffffffff817355f9>] schedule+0x29/0x70
> [ 2461.580103]  [<ffffffff81736d4d>] rwsem_down_write_failed+0xed/0x1a0
> [ 2461.580103]  [<ffffffff810bb200>] ? update_cpu_load_active+0x10/0xb0
> [ 2461.580103]  [<ffffffff8137b473>] call_rwsem_down_write_failed+0x13/0x20
> [ 2461.580103]  [<ffffffff8173492d>] ? down_write+0x9d/0xb2
> [ 2461.580103]  [<ffffffff8162baa5>] ? genl_lock_all+0x15/0x30
> [ 2461.580103]  [<ffffffff8162baa5>] genl_lock_all+0x15/0x30
> [ 2461.580103]  [<ffffffff8162cbb3>] genl_register_family+0x53/0x1f0
> [ 2461.580103]  [<ffffffffa01dc000>] ? 0xffffffffa01dbfff
> [ 2461.580103]  [<ffffffff8162d650>] genl_register_family_with_ops+0x20/0x80
> [ 2461.580103]  [<ffffffffa01dc000>] ? 0xffffffffa01dbfff
> [ 2461.580103]  [<ffffffffa017fe84>] nl80211_init+0x24/0xf0 [cfg80211]
> [ 2461.580103]  [<ffffffffa01dc000>] ? 0xffffffffa01dbfff
> [ 2461.580103]  [<ffffffffa01dc043>] cfg80211_init+0x43/0xdb [cfg80211]
> [ 2461.580103]  [<ffffffff810020fa>] do_one_initcall+0xfa/0x1b0
> [ 2461.580103]  [<ffffffff8105cb93>] ? set_memory_nx+0x43/0x50
> [ 2461.580103]  [<ffffffff810f75af>] load_module+0x1c6f/0x27f0
> [ 2461.580103]  [<ffffffff810f2c90>] ? store_uevent+0x40/0x40
> [ 2461.580103]  [<ffffffff810f82c6>] SyS_finit_module+0x86/0xb0
> [ 2461.580103]  [<ffffffff81741ad9>] system_call_fastpath+0x16/0x1b
> [ 2461.580103] Sched Debug Version: v0.10, 3.11.0-0.rc1.git4.1.fc20.x86_64 #1
> 
> So call user space to load cfg80211 module (alias
> net-pf-16-proto-16-family-nl80211) from ctrl_getfamily() with read taken
> cb_lock semaphore. That deadlock in modprobe process in
> genl_register_family() when try to down_write cb_lock.
> 
> This is consequence of commit:
> 
> commit fb4e156886ce6e8309e912d8b370d192330d19d3
> Author: Marcel Holtmann <marcel@holtmann.org>
> Date:   Sun Apr 28 16:22:06 2013 -0700
> 
>     nl80211: Add generic netlink module alias for cfg80211/nl80211
> 
> I prepared a possible fix for that (patch below), but I'm not sure if it
> do right thing, it can cause other oddity, please test (with
> CONFIG_LOCKDEP compiled). I think I would prefer rather remove
> autoloading genetlink modules from kernel or use just one lock
> in instead of two cb_lock & genl_mutex .
> 
> Stanislaw
> 
> diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
> index 2fd6dbe..1076fe1 100644
> --- a/net/netlink/genetlink.c
> +++ b/net/netlink/genetlink.c
> @@ -877,8 +877,10 @@ static int ctrl_getfamily(struct sk_buff *skb, struct genl_info *info)
>  #ifdef CONFIG_MODULES
>  		if (res == NULL) {
>  			genl_unlock();
> +			up_read(&cb_lock);
>  			request_module("net-pf-%d-proto-%d-family-%s",
>  				       PF_NETLINK, NETLINK_GENERIC, name);
> +			down_read(&cb_lock);
>  			genl_lock();
>  			res = genl_family_find_byname(name);
>  		}

The patch above fixes the problem for me. I'll leave it to others to
decide whether it's the best approach.

Thanks!
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: hung task while plugging in cfg80211
@ 2013-07-25 14:51     ` Jeff Layton
  0 siblings, 0 replies; 7+ messages in thread
From: Jeff Layton @ 2013-07-25 14:51 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	linville-2XuSBdqkA4R54TAoqtyWWQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	Thomas Graf, Stephen Hemminger, rjones-H+wXaHxf7aLQT0dZR+AlfA,
	Marcel Holtmann

On Wed, 24 Jul 2013 11:59:12 +0200
Stanislaw Gruszka <sgruszka-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> On Fri, Jul 12, 2013 at 07:49:11AM -0400, Jeff Layton wrote:
> > I updated the kernel on my rawhide KVM guest, and noticed that the
> > ethernet interface wasn't coming up at all. While poking around, I saw
> > this stack trace pop up:
> > 
> > Jul 12 07:29:25 rawhide kernel: [  241.156196] INFO: task modprobe:501 blocked for more than 120 seconds.
> > Jul 12 07:29:25 rawhide kernel: [  241.157222] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > Jul 12 07:29:25 rawhide kernel: [  241.158379] modprobe        D ffff880118c526e0  4584   501    500 0x00000080
> > Jul 12 07:29:25 rawhide kernel: [  241.159458]  ffff8800d10c3ba8 0000000000000046 00000000001d5300 ffff8800d10c3fd8
> > Jul 12 07:29:25 rawhide kernel: [  241.160644]  ffff8800d10c3fd8 00000000001d5300 ffff8800d2ae4dc0 ffff8800d2ae4dc0
> > Jul 12 07:29:25 rawhide kernel: [  241.161875]  ffffffff81d0bfa0 ffffffff81d0bfa8 ffffffff00000000 ffffffff81d0bff0
> > Jul 12 07:29:25 rawhide kernel: [  241.163096] Call Trace:
> > Jul 12 07:29:25 rawhide kernel: [  241.163460]  [<ffffffff817377e9>] schedule+0x29/0x70
> > Jul 12 07:29:25 rawhide kernel: [  241.164235]  [<ffffffff817392ad>] rwsem_down_write_failed+0xed/0x1a0
> > Jul 12 07:29:25 rawhide kernel: [  241.165489]  [<ffffffff810bb600>] ? update_cpu_load_active+0xb0/0xb0
> > Jul 12 07:29:25 rawhide kernel: [  241.166267]  [<ffffffff8137d503>] call_rwsem_down_write_failed+0x13/0x20
> > Jul 12 07:29:25 rawhide kernel: [  241.166760]  [<ffffffff81736aad>] ? down_write+0x9d/0xb2
> > Jul 12 07:29:25 rawhide kernel: [  241.167189]  [<ffffffff8162d975>] ? genl_lock_all+0x15/0x30
> > Jul 12 07:29:25 rawhide kernel: [  241.167603]  [<ffffffff8162d975>] genl_lock_all+0x15/0x30
> 
> I reproduced that on qemu-kvm with rawhide LiveCD. Below more
> information gathered using sysrq:
> 
> (qemu) [ 2458.110724] SysRq : Show Locks Held
> [ 2458.111347] 
> [ 2458.111347] Showing all locks held in the system:
> [ 2458.111347] 1 lock held by NetworkManager/582:
> [ 2458.111347]  #0:  (cb_lock){++++++}, at: [<ffffffff8162bc79>] genl_rcv+0x19/0x40
> [ 2458.111347] 1 lock held by modprobe/603:
> [ 2458.111347]  #0:  (cb_lock){++++++}, at: [<ffffffff8162baa5>] genl_lock_all+0x15/0x30
> [ 2458.111347] 
> [ 2458.111347] =============================================
> [ 2458.111347] 
> 
> (qemu) sendkey alt-sysrq-w
> (qemu) [ 2461.579457] SysRq : Show Blocked State
> [ 2461.580103]   task                        PC stack   pid father
> [ 2461.580103] NetworkManager  D ffff880034b84500  4040   582      1 0x00000080
> [ 2461.580103]  ffff8800197ff720 0000000000000046 00000000001d5340 ffff8800197fffd8
> [ 2461.580103]  ffff8800197fffd8 00000000001d5340 ffff880019631700 7fffffffffffffff
> [ 2461.580103]  ffff8800197ff880 ffff8800197ff878 ffff880019631700 ffff880019631700
> [ 2461.580103] Call Trace:
> [ 2461.580103]  [<ffffffff817355f9>] schedule+0x29/0x70
> [ 2461.580103]  [<ffffffff81731ad1>] schedule_timeout+0x1c1/0x360
> [ 2461.580103]  [<ffffffff810e69eb>] ? mark_held_locks+0xbb/0x140
> [ 2461.580103]  [<ffffffff817377ac>] ? _raw_spin_unlock_irq+0x2c/0x50
> [ 2461.580103]  [<ffffffff810e6b6d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
> [ 2461.580103]  [<ffffffff81736398>] wait_for_completion_killable+0xe8/0x170
> [ 2461.580103]  [<ffffffff810b7fa0>] ? wake_up_state+0x20/0x20
> [ 2461.580103]  [<ffffffff81095825>] call_usermodehelper_exec+0x1a5/0x210
> [ 2461.580103]  [<ffffffff817362ed>] ? wait_for_completion_killable+0x3d/0x170
> [ 2461.580103]  [<ffffffff81095cc3>] __request_module+0x1b3/0x370
> [ 2461.580103]  [<ffffffff810e6b6d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
> [ 2461.580103]  [<ffffffff8162c5c9>] ctrl_getfamily+0x159/0x190
> [ 2461.580103]  [<ffffffff8162d8a4>] genl_family_rcv_msg+0x1f4/0x2e0
> [ 2461.580103]  [<ffffffff8162d990>] ? genl_family_rcv_msg+0x2e0/0x2e0
> [ 2461.580103]  [<ffffffff8162da1e>] genl_rcv_msg+0x8e/0xd0
> [ 2461.580103]  [<ffffffff8162b729>] netlink_rcv_skb+0xa9/0xc0
> [ 2461.580103]  [<ffffffff8162bc88>] genl_rcv+0x28/0x40
> [ 2461.580103]  [<ffffffff8162ad6d>] netlink_unicast+0xdd/0x190
> [ 2461.580103]  [<ffffffff8162b149>] netlink_sendmsg+0x329/0x750
> [ 2461.580103]  [<ffffffff815db849>] sock_sendmsg+0x99/0xd0
> [ 2461.580103]  [<ffffffff810bb58f>] ? local_clock+0x5f/0x70
> [ 2461.580103]  [<ffffffff810e96e8>] ? lock_release_non_nested+0x308/0x350
> [ 2461.580103]  [<ffffffff815dbc6e>] ___sys_sendmsg+0x39e/0x3b0
> [ 2461.580103]  [<ffffffff810565af>] ? kvm_clock_read+0x2f/0x50
> [ 2461.580103]  [<ffffffff810218b9>] ? sched_clock+0x9/0x10
> [ 2461.580103]  [<ffffffff810bb2bd>] ? sched_clock_local+0x1d/0x80
> [ 2461.580103]  [<ffffffff810bb448>] ? sched_clock_cpu+0xa8/0x100
> [ 2461.580103]  [<ffffffff810e33ad>] ? trace_hardirqs_off+0xd/0x10
> [ 2461.580103]  [<ffffffff810bb58f>] ? local_clock+0x5f/0x70
> [ 2461.580103]  [<ffffffff810e3f7f>] ? lock_release_holdtime.part.28+0xf/0x1a0
> [ 2461.580103]  [<ffffffff8120fec9>] ? fget_light+0xf9/0x510
> [ 2461.580103]  [<ffffffff8120fe0c>] ? fget_light+0x3c/0x510
> [ 2461.580103]  [<ffffffff815dd1d2>] __sys_sendmsg+0x42/0x80
> [ 2461.580103]  [<ffffffff815dd222>] SyS_sendmsg+0x12/0x20
> [ 2461.580103]  [<ffffffff81741ad9>] system_call_fastpath+0x16/0x1b
> [ 2461.580103] modprobe        D ffff88000f2c8000  4632   603    602 0x00000080
> [ 2461.580103]  ffff88000f04fba8 0000000000000046 00000000001d5340 ffff88000f04ffd8
> [ 2461.580103]  ffff88000f04ffd8 00000000001d5340 ffff8800377d4500 ffff8800377d4500
> [ 2461.580103]  ffffffff81d0b260 ffffffff81d0b268 ffffffff00000000 ffffffff81d0b2b0
> [ 2461.580103] Call Trace:
> [ 2461.580103]  [<ffffffff817355f9>] schedule+0x29/0x70
> [ 2461.580103]  [<ffffffff81736d4d>] rwsem_down_write_failed+0xed/0x1a0
> [ 2461.580103]  [<ffffffff810bb200>] ? update_cpu_load_active+0x10/0xb0
> [ 2461.580103]  [<ffffffff8137b473>] call_rwsem_down_write_failed+0x13/0x20
> [ 2461.580103]  [<ffffffff8173492d>] ? down_write+0x9d/0xb2
> [ 2461.580103]  [<ffffffff8162baa5>] ? genl_lock_all+0x15/0x30
> [ 2461.580103]  [<ffffffff8162baa5>] genl_lock_all+0x15/0x30
> [ 2461.580103]  [<ffffffff8162cbb3>] genl_register_family+0x53/0x1f0
> [ 2461.580103]  [<ffffffffa01dc000>] ? 0xffffffffa01dbfff
> [ 2461.580103]  [<ffffffff8162d650>] genl_register_family_with_ops+0x20/0x80
> [ 2461.580103]  [<ffffffffa01dc000>] ? 0xffffffffa01dbfff
> [ 2461.580103]  [<ffffffffa017fe84>] nl80211_init+0x24/0xf0 [cfg80211]
> [ 2461.580103]  [<ffffffffa01dc000>] ? 0xffffffffa01dbfff
> [ 2461.580103]  [<ffffffffa01dc043>] cfg80211_init+0x43/0xdb [cfg80211]
> [ 2461.580103]  [<ffffffff810020fa>] do_one_initcall+0xfa/0x1b0
> [ 2461.580103]  [<ffffffff8105cb93>] ? set_memory_nx+0x43/0x50
> [ 2461.580103]  [<ffffffff810f75af>] load_module+0x1c6f/0x27f0
> [ 2461.580103]  [<ffffffff810f2c90>] ? store_uevent+0x40/0x40
> [ 2461.580103]  [<ffffffff810f82c6>] SyS_finit_module+0x86/0xb0
> [ 2461.580103]  [<ffffffff81741ad9>] system_call_fastpath+0x16/0x1b
> [ 2461.580103] Sched Debug Version: v0.10, 3.11.0-0.rc1.git4.1.fc20.x86_64 #1
> 
> So call user space to load cfg80211 module (alias
> net-pf-16-proto-16-family-nl80211) from ctrl_getfamily() with read taken
> cb_lock semaphore. That deadlock in modprobe process in
> genl_register_family() when try to down_write cb_lock.
> 
> This is consequence of commit:
> 
> commit fb4e156886ce6e8309e912d8b370d192330d19d3
> Author: Marcel Holtmann <marcel-kz+m5ild9QBg9hUCZPvPmw@public.gmane.org>
> Date:   Sun Apr 28 16:22:06 2013 -0700
> 
>     nl80211: Add generic netlink module alias for cfg80211/nl80211
> 
> I prepared a possible fix for that (patch below), but I'm not sure if it
> do right thing, it can cause other oddity, please test (with
> CONFIG_LOCKDEP compiled). I think I would prefer rather remove
> autoloading genetlink modules from kernel or use just one lock
> in instead of two cb_lock & genl_mutex .
> 
> Stanislaw
> 
> diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
> index 2fd6dbe..1076fe1 100644
> --- a/net/netlink/genetlink.c
> +++ b/net/netlink/genetlink.c
> @@ -877,8 +877,10 @@ static int ctrl_getfamily(struct sk_buff *skb, struct genl_info *info)
>  #ifdef CONFIG_MODULES
>  		if (res == NULL) {
>  			genl_unlock();
> +			up_read(&cb_lock);
>  			request_module("net-pf-%d-proto-%d-family-%s",
>  				       PF_NETLINK, NETLINK_GENERIC, name);
> +			down_read(&cb_lock);
>  			genl_lock();
>  			res = genl_family_find_byname(name);
>  		}

The patch above fixes the problem for me. I'll leave it to others to
decide whether it's the best approach.

Thanks!
-- 
Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 3.11] genetlink: release cb_lock before requesting additional module
  2013-07-25 14:51     ` Jeff Layton
  (?)
@ 2013-07-26  9:00     ` Stanislaw Gruszka
  2013-07-26 17:08       ` Pravin Shelar
  -1 siblings, 1 reply; 7+ messages in thread
From: Stanislaw Gruszka @ 2013-07-26  9:00 UTC (permalink / raw)
  To: David S. Miller
  Cc: linux-wireless, linville, netdev, Thomas Graf, Stephen Hemminger,
	rjones, Marcel Holtmann, Jeff Layton

Requesting external module with cb_lock taken can result in
the deadlock like showed below:

[ 2458.111347] Showing all locks held in the system:
[ 2458.111347] 1 lock held by NetworkManager/582:
[ 2458.111347]  #0:  (cb_lock){++++++}, at: [<ffffffff8162bc79>] genl_rcv+0x19/0x40
[ 2458.111347] 1 lock held by modprobe/603:
[ 2458.111347]  #0:  (cb_lock){++++++}, at: [<ffffffff8162baa5>] genl_lock_all+0x15/0x30

[ 2461.579457] SysRq : Show Blocked State
[ 2461.580103]   task                        PC stack   pid father
[ 2461.580103] NetworkManager  D ffff880034b84500  4040   582      1 0x00000080
[ 2461.580103]  ffff8800197ff720 0000000000000046 00000000001d5340 ffff8800197fffd8
[ 2461.580103]  ffff8800197fffd8 00000000001d5340 ffff880019631700 7fffffffffffffff
[ 2461.580103]  ffff8800197ff880 ffff8800197ff878 ffff880019631700 ffff880019631700
[ 2461.580103] Call Trace:
[ 2461.580103]  [<ffffffff817355f9>] schedule+0x29/0x70
[ 2461.580103]  [<ffffffff81731ad1>] schedule_timeout+0x1c1/0x360
[ 2461.580103]  [<ffffffff810e69eb>] ? mark_held_locks+0xbb/0x140
[ 2461.580103]  [<ffffffff817377ac>] ? _raw_spin_unlock_irq+0x2c/0x50
[ 2461.580103]  [<ffffffff810e6b6d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
[ 2461.580103]  [<ffffffff81736398>] wait_for_completion_killable+0xe8/0x170
[ 2461.580103]  [<ffffffff810b7fa0>] ? wake_up_state+0x20/0x20
[ 2461.580103]  [<ffffffff81095825>] call_usermodehelper_exec+0x1a5/0x210
[ 2461.580103]  [<ffffffff817362ed>] ? wait_for_completion_killable+0x3d/0x170
[ 2461.580103]  [<ffffffff81095cc3>] __request_module+0x1b3/0x370
[ 2461.580103]  [<ffffffff810e6b6d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
[ 2461.580103]  [<ffffffff8162c5c9>] ctrl_getfamily+0x159/0x190
[ 2461.580103]  [<ffffffff8162d8a4>] genl_family_rcv_msg+0x1f4/0x2e0
[ 2461.580103]  [<ffffffff8162d990>] ? genl_family_rcv_msg+0x2e0/0x2e0
[ 2461.580103]  [<ffffffff8162da1e>] genl_rcv_msg+0x8e/0xd0
[ 2461.580103]  [<ffffffff8162b729>] netlink_rcv_skb+0xa9/0xc0
[ 2461.580103]  [<ffffffff8162bc88>] genl_rcv+0x28/0x40
[ 2461.580103]  [<ffffffff8162ad6d>] netlink_unicast+0xdd/0x190
[ 2461.580103]  [<ffffffff8162b149>] netlink_sendmsg+0x329/0x750
[ 2461.580103]  [<ffffffff815db849>] sock_sendmsg+0x99/0xd0
[ 2461.580103]  [<ffffffff810bb58f>] ? local_clock+0x5f/0x70
[ 2461.580103]  [<ffffffff810e96e8>] ? lock_release_non_nested+0x308/0x350
[ 2461.580103]  [<ffffffff815dbc6e>] ___sys_sendmsg+0x39e/0x3b0
[ 2461.580103]  [<ffffffff810565af>] ? kvm_clock_read+0x2f/0x50
[ 2461.580103]  [<ffffffff810218b9>] ? sched_clock+0x9/0x10
[ 2461.580103]  [<ffffffff810bb2bd>] ? sched_clock_local+0x1d/0x80
[ 2461.580103]  [<ffffffff810bb448>] ? sched_clock_cpu+0xa8/0x100
[ 2461.580103]  [<ffffffff810e33ad>] ? trace_hardirqs_off+0xd/0x10
[ 2461.580103]  [<ffffffff810bb58f>] ? local_clock+0x5f/0x70
[ 2461.580103]  [<ffffffff810e3f7f>] ? lock_release_holdtime.part.28+0xf/0x1a0
[ 2461.580103]  [<ffffffff8120fec9>] ? fget_light+0xf9/0x510
[ 2461.580103]  [<ffffffff8120fe0c>] ? fget_light+0x3c/0x510
[ 2461.580103]  [<ffffffff815dd1d2>] __sys_sendmsg+0x42/0x80
[ 2461.580103]  [<ffffffff815dd222>] SyS_sendmsg+0x12/0x20
[ 2461.580103]  [<ffffffff81741ad9>] system_call_fastpath+0x16/0x1b
[ 2461.580103] modprobe        D ffff88000f2c8000  4632   603    602 0x00000080
[ 2461.580103]  ffff88000f04fba8 0000000000000046 00000000001d5340 ffff88000f04ffd8
[ 2461.580103]  ffff88000f04ffd8 00000000001d5340 ffff8800377d4500 ffff8800377d4500
[ 2461.580103]  ffffffff81d0b260 ffffffff81d0b268 ffffffff00000000 ffffffff81d0b2b0
[ 2461.580103] Call Trace:
[ 2461.580103]  [<ffffffff817355f9>] schedule+0x29/0x70
[ 2461.580103]  [<ffffffff81736d4d>] rwsem_down_write_failed+0xed/0x1a0
[ 2461.580103]  [<ffffffff810bb200>] ? update_cpu_load_active+0x10/0xb0
[ 2461.580103]  [<ffffffff8137b473>] call_rwsem_down_write_failed+0x13/0x20
[ 2461.580103]  [<ffffffff8173492d>] ? down_write+0x9d/0xb2
[ 2461.580103]  [<ffffffff8162baa5>] ? genl_lock_all+0x15/0x30
[ 2461.580103]  [<ffffffff8162baa5>] genl_lock_all+0x15/0x30
[ 2461.580103]  [<ffffffff8162cbb3>] genl_register_family+0x53/0x1f0
[ 2461.580103]  [<ffffffffa01dc000>] ? 0xffffffffa01dbfff
[ 2461.580103]  [<ffffffff8162d650>] genl_register_family_with_ops+0x20/0x80
[ 2461.580103]  [<ffffffffa01dc000>] ? 0xffffffffa01dbfff
[ 2461.580103]  [<ffffffffa017fe84>] nl80211_init+0x24/0xf0 [cfg80211]
[ 2461.580103]  [<ffffffffa01dc000>] ? 0xffffffffa01dbfff
[ 2461.580103]  [<ffffffffa01dc043>] cfg80211_init+0x43/0xdb [cfg80211]
[ 2461.580103]  [<ffffffff810020fa>] do_one_initcall+0xfa/0x1b0
[ 2461.580103]  [<ffffffff8105cb93>] ? set_memory_nx+0x43/0x50
[ 2461.580103]  [<ffffffff810f75af>] load_module+0x1c6f/0x27f0
[ 2461.580103]  [<ffffffff810f2c90>] ? store_uevent+0x40/0x40
[ 2461.580103]  [<ffffffff810f82c6>] SyS_finit_module+0x86/0xb0
[ 2461.580103]  [<ffffffff81741ad9>] system_call_fastpath+0x16/0x1b
[ 2461.580103] Sched Debug Version: v0.10, 3.11.0-0.rc1.git4.1.fc20.x86_64 #1

Problem start to happen after adding net-pf-16-proto-16-family-nl80211
alias name to cfg80211 module by below commit (though that commit
itself is perfectly fine):

commit fb4e156886ce6e8309e912d8b370d192330d19d3
Author: Marcel Holtmann <marcel@holtmann.org>
Date:   Sun Apr 28 16:22:06 2013 -0700

    nl80211: Add generic netlink module alias for cfg80211/nl80211

Reported-and-tested-by: Jeff Layton <jlayton@redhat.com>
Reported-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
---
 net/netlink/genetlink.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index 2fd6dbe..1076fe1 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -877,8 +877,10 @@ static int ctrl_getfamily(struct sk_buff *skb, struct genl_info *info)
 #ifdef CONFIG_MODULES
 		if (res == NULL) {
 			genl_unlock();
+			up_read(&cb_lock);
 			request_module("net-pf-%d-proto-%d-family-%s",
 				       PF_NETLINK, NETLINK_GENERIC, name);
+			down_read(&cb_lock);
 			genl_lock();
 			res = genl_family_find_byname(name);
 		}
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 3.11] genetlink: release cb_lock before requesting additional module
  2013-07-26  9:00     ` [PATCH 3.11] genetlink: release cb_lock before requesting additional module Stanislaw Gruszka
@ 2013-07-26 17:08       ` Pravin Shelar
  2013-07-28  5:20         ` David Miller
  0 siblings, 1 reply; 7+ messages in thread
From: Pravin Shelar @ 2013-07-26 17:08 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: David S. Miller, linux-wireless, linville, netdev, Thomas Graf,
	Stephen Hemminger, rjones, Marcel Holtmann, Jeff Layton

On Fri, Jul 26, 2013 at 2:00 AM, Stanislaw Gruszka <sgruszka@redhat.com> wrote:
> Requesting external module with cb_lock taken can result in
> the deadlock like showed below:
>
> [ 2458.111347] Showing all locks held in the system:
> [ 2458.111347] 1 lock held by NetworkManager/582:
> [ 2458.111347]  #0:  (cb_lock){++++++}, at: [<ffffffff8162bc79>] genl_rcv+0x19/0x40
> [ 2458.111347] 1 lock held by modprobe/603:
> [ 2458.111347]  #0:  (cb_lock){++++++}, at: [<ffffffff8162baa5>] genl_lock_all+0x15/0x30
>
> [ 2461.579457] SysRq : Show Blocked State
> [ 2461.580103]   task                        PC stack   pid father
> [ 2461.580103] NetworkManager  D ffff880034b84500  4040   582      1 0x00000080
> [ 2461.580103]  ffff8800197ff720 0000000000000046 00000000001d5340 ffff8800197fffd8
> [ 2461.580103]  ffff8800197fffd8 00000000001d5340 ffff880019631700 7fffffffffffffff
> [ 2461.580103]  ffff8800197ff880 ffff8800197ff878 ffff880019631700 ffff880019631700
> [ 2461.580103] Call Trace:
> [ 2461.580103]  [<ffffffff817355f9>] schedule+0x29/0x70
> [ 2461.580103]  [<ffffffff81731ad1>] schedule_timeout+0x1c1/0x360
> [ 2461.580103]  [<ffffffff810e69eb>] ? mark_held_locks+0xbb/0x140
> [ 2461.580103]  [<ffffffff817377ac>] ? _raw_spin_unlock_irq+0x2c/0x50
> [ 2461.580103]  [<ffffffff810e6b6d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
> [ 2461.580103]  [<ffffffff81736398>] wait_for_completion_killable+0xe8/0x170
> [ 2461.580103]  [<ffffffff810b7fa0>] ? wake_up_state+0x20/0x20
> [ 2461.580103]  [<ffffffff81095825>] call_usermodehelper_exec+0x1a5/0x210
> [ 2461.580103]  [<ffffffff817362ed>] ? wait_for_completion_killable+0x3d/0x170
> [ 2461.580103]  [<ffffffff81095cc3>] __request_module+0x1b3/0x370
> [ 2461.580103]  [<ffffffff810e6b6d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
> [ 2461.580103]  [<ffffffff8162c5c9>] ctrl_getfamily+0x159/0x190
> [ 2461.580103]  [<ffffffff8162d8a4>] genl_family_rcv_msg+0x1f4/0x2e0
> [ 2461.580103]  [<ffffffff8162d990>] ? genl_family_rcv_msg+0x2e0/0x2e0
> [ 2461.580103]  [<ffffffff8162da1e>] genl_rcv_msg+0x8e/0xd0
> [ 2461.580103]  [<ffffffff8162b729>] netlink_rcv_skb+0xa9/0xc0
> [ 2461.580103]  [<ffffffff8162bc88>] genl_rcv+0x28/0x40
> [ 2461.580103]  [<ffffffff8162ad6d>] netlink_unicast+0xdd/0x190
> [ 2461.580103]  [<ffffffff8162b149>] netlink_sendmsg+0x329/0x750
> [ 2461.580103]  [<ffffffff815db849>] sock_sendmsg+0x99/0xd0
> [ 2461.580103]  [<ffffffff810bb58f>] ? local_clock+0x5f/0x70
> [ 2461.580103]  [<ffffffff810e96e8>] ? lock_release_non_nested+0x308/0x350
> [ 2461.580103]  [<ffffffff815dbc6e>] ___sys_sendmsg+0x39e/0x3b0
> [ 2461.580103]  [<ffffffff810565af>] ? kvm_clock_read+0x2f/0x50
> [ 2461.580103]  [<ffffffff810218b9>] ? sched_clock+0x9/0x10
> [ 2461.580103]  [<ffffffff810bb2bd>] ? sched_clock_local+0x1d/0x80
> [ 2461.580103]  [<ffffffff810bb448>] ? sched_clock_cpu+0xa8/0x100
> [ 2461.580103]  [<ffffffff810e33ad>] ? trace_hardirqs_off+0xd/0x10
> [ 2461.580103]  [<ffffffff810bb58f>] ? local_clock+0x5f/0x70
> [ 2461.580103]  [<ffffffff810e3f7f>] ? lock_release_holdtime.part.28+0xf/0x1a0
> [ 2461.580103]  [<ffffffff8120fec9>] ? fget_light+0xf9/0x510
> [ 2461.580103]  [<ffffffff8120fe0c>] ? fget_light+0x3c/0x510
> [ 2461.580103]  [<ffffffff815dd1d2>] __sys_sendmsg+0x42/0x80
> [ 2461.580103]  [<ffffffff815dd222>] SyS_sendmsg+0x12/0x20
> [ 2461.580103]  [<ffffffff81741ad9>] system_call_fastpath+0x16/0x1b
> [ 2461.580103] modprobe        D ffff88000f2c8000  4632   603    602 0x00000080
> [ 2461.580103]  ffff88000f04fba8 0000000000000046 00000000001d5340 ffff88000f04ffd8
> [ 2461.580103]  ffff88000f04ffd8 00000000001d5340 ffff8800377d4500 ffff8800377d4500
> [ 2461.580103]  ffffffff81d0b260 ffffffff81d0b268 ffffffff00000000 ffffffff81d0b2b0
> [ 2461.580103] Call Trace:
> [ 2461.580103]  [<ffffffff817355f9>] schedule+0x29/0x70
> [ 2461.580103]  [<ffffffff81736d4d>] rwsem_down_write_failed+0xed/0x1a0
> [ 2461.580103]  [<ffffffff810bb200>] ? update_cpu_load_active+0x10/0xb0
> [ 2461.580103]  [<ffffffff8137b473>] call_rwsem_down_write_failed+0x13/0x20
> [ 2461.580103]  [<ffffffff8173492d>] ? down_write+0x9d/0xb2
> [ 2461.580103]  [<ffffffff8162baa5>] ? genl_lock_all+0x15/0x30
> [ 2461.580103]  [<ffffffff8162baa5>] genl_lock_all+0x15/0x30
> [ 2461.580103]  [<ffffffff8162cbb3>] genl_register_family+0x53/0x1f0
> [ 2461.580103]  [<ffffffffa01dc000>] ? 0xffffffffa01dbfff
> [ 2461.580103]  [<ffffffff8162d650>] genl_register_family_with_ops+0x20/0x80
> [ 2461.580103]  [<ffffffffa01dc000>] ? 0xffffffffa01dbfff
> [ 2461.580103]  [<ffffffffa017fe84>] nl80211_init+0x24/0xf0 [cfg80211]
> [ 2461.580103]  [<ffffffffa01dc000>] ? 0xffffffffa01dbfff
> [ 2461.580103]  [<ffffffffa01dc043>] cfg80211_init+0x43/0xdb [cfg80211]
> [ 2461.580103]  [<ffffffff810020fa>] do_one_initcall+0xfa/0x1b0
> [ 2461.580103]  [<ffffffff8105cb93>] ? set_memory_nx+0x43/0x50
> [ 2461.580103]  [<ffffffff810f75af>] load_module+0x1c6f/0x27f0
> [ 2461.580103]  [<ffffffff810f2c90>] ? store_uevent+0x40/0x40
> [ 2461.580103]  [<ffffffff810f82c6>] SyS_finit_module+0x86/0xb0
> [ 2461.580103]  [<ffffffff81741ad9>] system_call_fastpath+0x16/0x1b
> [ 2461.580103] Sched Debug Version: v0.10, 3.11.0-0.rc1.git4.1.fc20.x86_64 #1
>
> Problem start to happen after adding net-pf-16-proto-16-family-nl80211
> alias name to cfg80211 module by below commit (though that commit
> itself is perfectly fine):
>
> commit fb4e156886ce6e8309e912d8b370d192330d19d3
> Author: Marcel Holtmann <marcel@holtmann.org>
> Date:   Sun Apr 28 16:22:06 2013 -0700
>
>     nl80211: Add generic netlink module alias for cfg80211/nl80211
>
> Reported-and-tested-by: Jeff Layton <jlayton@redhat.com>
> Reported-by: Richard W.M. Jones <rjones@redhat.com>
> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
> ---
>  net/netlink/genetlink.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
> index 2fd6dbe..1076fe1 100644
> --- a/net/netlink/genetlink.c
> +++ b/net/netlink/genetlink.c
> @@ -877,8 +877,10 @@ static int ctrl_getfamily(struct sk_buff *skb, struct genl_info *info)
>  #ifdef CONFIG_MODULES
>                 if (res == NULL) {
>                         genl_unlock();
> +                       up_read(&cb_lock);
>                         request_module("net-pf-%d-proto-%d-family-%s",
>                                        PF_NETLINK, NETLINK_GENERIC, name);
> +                       down_read(&cb_lock);
>                         genl_lock();
>                         res = genl_family_find_byname(name);
>                 }
This is genl issue and it was introduced by commit def3117493eafd
(genl: Allow concurrent genl callbacks.).

Reviewed-by: Pravin B Shelar <pshelar@nicira.com>

> --
> 1.7.11.7
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 3.11] genetlink: release cb_lock before requesting additional module
  2013-07-26 17:08       ` Pravin Shelar
@ 2013-07-28  5:20         ` David Miller
  0 siblings, 0 replies; 7+ messages in thread
From: David Miller @ 2013-07-28  5:20 UTC (permalink / raw)
  To: pshelar
  Cc: sgruszka, linux-wireless, linville, netdev, tgraf, shemminger,
	rjones, marcel, jlayton

From: Pravin Shelar <pshelar@nicira.com>
Date: Fri, 26 Jul 2013 10:08:25 -0700

> On Fri, Jul 26, 2013 at 2:00 AM, Stanislaw Gruszka <sgruszka@redhat.com> wrote:
>> Requesting external module with cb_lock taken can result in
>> the deadlock like showed below:
 ...
>> Problem start to happen after adding net-pf-16-proto-16-family-nl80211
>> alias name to cfg80211 module by below commit (though that commit
>> itself is perfectly fine):
>>
>> commit fb4e156886ce6e8309e912d8b370d192330d19d3
>> Author: Marcel Holtmann <marcel@holtmann.org>
>> Date:   Sun Apr 28 16:22:06 2013 -0700
>>
>>     nl80211: Add generic netlink module alias for cfg80211/nl80211
>>
>> Reported-and-tested-by: Jeff Layton <jlayton@redhat.com>
>> Reported-by: Richard W.M. Jones <rjones@redhat.com>
>> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
 ...
> This is genl issue and it was introduced by commit def3117493eafd
> (genl: Allow concurrent genl callbacks.).
> 
> Reviewed-by: Pravin B Shelar <pshelar@nicira.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-07-28  5:20 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-12 11:49 hung task while plugging in cfg80211 Jeff Layton
2013-07-24  9:59 ` Stanislaw Gruszka
2013-07-25 14:51   ` Jeff Layton
2013-07-25 14:51     ` Jeff Layton
2013-07-26  9:00     ` [PATCH 3.11] genetlink: release cb_lock before requesting additional module Stanislaw Gruszka
2013-07-26 17:08       ` Pravin Shelar
2013-07-28  5:20         ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.