linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* net: deadlock on genl_mutex
@ 2016-11-26 17:04 Dmitry Vyukov
  2016-11-26 17:12 ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Dmitry Vyukov @ 2016-11-26 17:04 UTC (permalink / raw)
  To: David Miller, Matti Vaittinen, Tycho Andersen, Cong Wang,
	Florian Westphal, stephen hemminger, Tom Herbert, netdev, LKML,
	Eric Dumazet, rgb
  Cc: syzkaller

Hello,

The following program triggers deadlock warnings on genl_mutex:

https://gist.githubusercontent.com/dvyukov/65e33d053e507d2ab0bf6ae83d989585/raw/b3c640ec58e894b50bcbf255c471406466cfa5d0/gistfile1.txt

On commit 16ae16c6e5616c084168740990fc508bda6655d4 (Nov 24).

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
in_atomic(): 1, irqs_disabled(): 0, pid: 32289, name: syz-executor
CPU: 0 PID: 32289 Comm: syz-executor Not tainted 4.9.0-rc5+ #54
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 ffff88003ec06420 ffffffff834c2e39 ffffffff00000000 1ffff10007d80c17
 ffffed0007d80c0f 0000000041b58ab3 ffffffff89575550 ffffffff834c2b4b
 ffffffff8baab1a0 dffffc0000000000 0000000000000000 ffff880068f794e0
Call Trace:
 <IRQ> [  287.394552]  [<     inline     >] __dump_stack lib/dump_stack.c:15
 <IRQ> [  287.394552]  [<ffffffff834c2e39>] dump_stack+0x2ee/0x3f5
lib/dump_stack.c:51
 [<ffffffff814b6ac3>] ___might_sleep+0x483/0x660 kernel/sched/core.c:7761
 [<ffffffff814b6d3a>] __might_sleep+0x9a/0x1a0 kernel/sched/core.c:7720
 [<ffffffff88139aaa>] mutex_lock_nested+0x1ea/0xf20 kernel/locking/mutex.c:620
 [<     inline     >] genl_lock net/netlink/genetlink.c:31
 [<ffffffff86cb5a11>] genl_lock_done+0x71/0xd0 net/netlink/genetlink.c:531
 [<ffffffff86ca5458>] netlink_sock_destruct+0xf8/0x400
net/netlink/af_netlink.c:331
 [<ffffffff86a7b234>] __sk_destruct+0xf4/0x7f0 net/core/sock.c:1423
 [<ffffffff86a87d6c>] sk_destruct+0x4c/0x80 net/core/sock.c:1453
 [<ffffffff86a87dfc>] __sk_free+0x5c/0x230 net/core/sock.c:1461
 [<ffffffff86a87ff8>] sk_free+0x28/0x30 net/core/sock.c:1472
 [<     inline     >] sock_put include/net/sock.h:1591
 [<ffffffff86ca6cd1>] deferred_put_nlk_sk+0x31/0x40 net/netlink/af_netlink.c:652
 [<     inline     >] __rcu_reclaim kernel/rcu/rcu.h:118
 [<ffffffff815cbc9d>] rcu_do_batch.isra.70+0x9ed/0xe20 kernel/rcu/tree.c:2776
 [<     inline     >] invoke_rcu_callbacks kernel/rcu/tree.c:3040
 [<     inline     >] __rcu_process_callbacks kernel/rcu/tree.c:3007
 [<ffffffff815cc55c>] rcu_process_callbacks+0x48c/0xd70 kernel/rcu/tree.c:3024
 [<ffffffff8814d53b>] __do_softirq+0x32b/0xca8 kernel/softirq.c:284
 [<     inline     >] invoke_softirq kernel/softirq.c:364
 [<ffffffff8141a941>] irq_exit+0x1d1/0x210 kernel/softirq.c:405
 [<     inline     >] exiting_irq arch/x86/include/asm/apic.h:659
 [<ffffffff8814ca30>] smp_apic_timer_interrupt+0x80/0xa0
arch/x86/kernel/apic/apic.c:960
 [<ffffffff8814badc>] apic_timer_interrupt+0x8c/0xa0
arch/x86/entry/entry_64.S:489
 <EOI> [  287.403717]  [<ffffffff8155c987>] ? lock_is_held+0x247/0x310
 [<ffffffff814b6bde>] ___might_sleep+0x59e/0x660 kernel/sched/core.c:7729
 [<ffffffff814b6d3a>] __might_sleep+0x9a/0x1a0 kernel/sched/core.c:7720
 [<ffffffff88142d08>] down_read+0x78/0x160 kernel/locking/rwsem.c:21
 [<     inline     >] anon_vma_lock_read include/linux/rmap.h:127
 [<ffffffff81968295>] validate_mm+0xe5/0x880 mm/mmap.c:347
 [<ffffffff8196bf0b>] vma_link+0x11b/0x180 mm/mmap.c:605
 [<ffffffff81977f46>] mmap_region+0x1076/0x1880 mm/mmap.c:1692
 [<ffffffff81978e4f>] do_mmap+0x6ff/0xe80 mm/mmap.c:1450
 [<     inline     >] do_mmap_pgoff include/linux/mm.h:2039
 [<ffffffff818fd527>] vm_mmap_pgoff+0x1b7/0x210 mm/util.c:305
 [<     inline     >] SYSC_mmap_pgoff mm/mmap.c:1500
 [<ffffffff8196f961>] SyS_mmap_pgoff+0x231/0x5e0 mm/mmap.c:1458
 [<     inline     >] SYSC_mmap arch/x86/kernel/sys_x86_64.c:95
 [<ffffffff8124bf4b>] SyS_mmap+0x1b/0x30 arch/x86/kernel/sys_x86_64.c:86
 [<ffffffff88149dc5>] entry_SYSCALL_64_fastpath+0x23/0xc6

=================================
[ INFO: inconsistent lock state ]
4.9.0-rc5+ #54 Tainted: G        W
---------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
syz-executor/32289 [HC0[0]:SC1[1]:HE1:SE0] takes:
 ([  287.580014] genl_mutex
[<     inline     >] genl_lock net/netlink/genetlink.c:31
[<ffffffff86cb5a11>] genl_lock_done+0x71/0xd0 net/netlink/genetlink.c:531
{SOFTIRQ-ON-W} state was registered at:
  [  287.580014] [<     inline     >] mark_irqflags
kernel/locking/lockdep.c:2938
  [  287.580014] [<ffffffff81567ad7>] __lock_acquire+0x6e7/0x3380
kernel/locking/lockdep.c:3292
  [  287.580014] [<ffffffff8156b642>] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3746
  [  287.580014] [<     inline     >] __mutex_lock_common
kernel/locking/mutex.c:521
  [  287.580014] [<ffffffff88139aff>] mutex_lock_nested+0x23f/0xf20
kernel/locking/mutex.c:621
  [  287.580014] [<     inline     >] genl_lock net/netlink/genetlink.c:31
  [  287.580014] [<     inline     >] genl_lock_all net/netlink/genetlink.c:52
  [  287.580014] [<ffffffff86cba52e>]
__genl_register_family+0x2ce/0x1870 net/netlink/genetlink.c:374
  [  287.580014] [<     inline     >]
_genl_register_family_with_ops_grps include/net/genetlink.h:173
  [  287.580014] [<ffffffff8ab90c02>] genl_init+0x11d/0x185
net/netlink/genetlink.c:1084
  [  287.580014] [<ffffffff8100244b>] do_one_initcall+0xfb/0x3f0 init/main.c:778
  [  287.580014] [<     inline     >] do_initcall_level init/main.c:844
  [  287.580014] [<     inline     >] do_initcalls init/main.c:852
  [  287.580014] [<     inline     >] do_basic_setup init/main.c:870
  [  287.580014] [<ffffffff8aa3d03d>] kernel_init_freeable+0x5c4/0x69e
init/main.c:1017
  [  287.580014] [<ffffffff88129c88>] kernel_init+0x18/0x180 init/main.c:943
  [  287.580014] [<ffffffff8814a05a>] ret_from_fork+0x2a/0x40
arch/x86/entry/entry_64.S:433

[   78.258919] [ INFO: inconsistent lock state ]
[   78.258919] 4.9.0-rc5+ #54 Tainted: G        W
[   78.258919] ---------------------------------
[   78.258919] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[   78.258919] syz-fuzzer/5211 [HC0[0]:SC1[1]:HE1:SE0] takes:
[   78.258919]  ([   78.258919] genl_mutex
){+.?.+.}[   78.258919] , at:
[   78.258919] [<ffffffff86cb5a11>] genl_lock_done+0x71/0xd0
[   78.258919] {SOFTIRQ-ON-W} state was registered at:
[   78.258919]   [   78.258919] [<ffffffff81567ad7>] __lock_acquire+0x6e7/0x3380
[   78.258919]   [   78.258919] [<ffffffff8156b642>] lock_acquire+0x2a2/0x790
[   78.258919]   [   78.258919] [<ffffffff88139aff>]
mutex_lock_nested+0x23f/0xf20
[   78.258919]   [   78.258919] [<ffffffff86cba52e>]
__genl_register_family+0x2ce/0x1870
[   78.258919]   [   78.258919] [<ffffffff8ab90c02>] genl_init+0x11d/0x185
[   78.258919]   [   78.258919] [<ffffffff8100244b>] do_one_initcall+0xfb/0x3f0
[   78.258919]   [   78.258919] [<ffffffff8aa3d03d>]
kernel_init_freeable+0x5c4/0x69e
[   78.258919]   [   78.258919] [<ffffffff88129c88>] kernel_init+0x18/0x180
[   78.258919]   [   78.258919] [<ffffffff8814a05a>] ret_from_fork+0x2a/0x40
[   78.258919] irq event stamp: 149484
[   78.258919] hardirqs last  enabled at (149484): [   78.258919]
[<ffffffff8814a7df>] restore_regs_and_iret+0x0/0x1d
[   78.258919] hardirqs last disabled at (149483): [   78.258919]
[<ffffffff8814bad7>] apic_timer_interrupt+0x87/0xa0
[   78.258919] softirqs last  enabled at (149302): [   78.258919]
[<ffffffff8814da39>] __do_softirq+0x829/0xca8
[   78.258919] softirqs last disabled at (149437): [   78.258919]
[<ffffffff8141a941>] irq_exit+0x1d1/0x210

[   78.258919]
[   78.258919] other info that might help us debug this:
[   78.258919]  Possible unsafe locking scenario:
[   78.258919]
[   78.258919]        CPU0
[   78.258919]        ----
[   78.258919]   lock([   78.258919] genl_mutex
[   78.258919] );
[   78.258919]   <Interrupt>
[   78.258919]     lock([   78.258919] genl_mutex
[   78.258919] );
[   78.258919]
[   78.258919]  *** DEADLOCK ***
[   78.258919]
[   78.258919] 1 lock held by syz-fuzzer/5211:
[   78.258919]  #0: [   78.258919]  (
rcu_callback[   78.258919] ){......}
, at: [   78.258919] [<ffffffff815cbc43>] rcu_do_batch.isra.70+0x993/0xe20
[   78.258919]
[   78.258919] stack backtrace:

CPU: 0 PID: 32289 Comm: syz-executor Tainted: G        W       4.9.0-rc5+ #54
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 ffff88003ec05db8 ffffffff834c2e39 ffffffff00000000 1ffff10007d80b4a
 ffffed0007d80b42 0000000041b58ab3 ffffffff89575550 ffffffff834c2b4b
 ffff88003948a340 ffff88003ec22cc0 ffff8800384dd280 0000000041b58ab3
Call Trace:
 <IRQ> [  287.580014]  [<     inline     >] __dump_stack lib/dump_stack.c:15
 <IRQ> [  287.580014]  [<ffffffff834c2e39>] dump_stack+0x2ee/0x3f5
lib/dump_stack.c:51
 [<ffffffff815648df>] print_usage_bug+0x3ef/0x450 kernel/locking/lockdep.c:2388
 [<     inline     >] valid_state kernel/locking/lockdep.c:2401
 [<     inline     >] mark_lock_irq kernel/locking/lockdep.c:2599
 [<ffffffff81565870>] mark_lock+0xf30/0x1410 kernel/locking/lockdep.c:3062
 [<     inline     >] mark_irqflags kernel/locking/lockdep.c:2920
 [<ffffffff8156811e>] __lock_acquire+0xd2e/0x3380 kernel/locking/lockdep.c:3292
 [<ffffffff8156b642>] lock_acquire+0x2a2/0x790 kernel/locking/lockdep.c:3746
 [<     inline     >] __mutex_lock_common kernel/locking/mutex.c:521
 [<ffffffff88139aff>] mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
 [<     inline     >] genl_lock net/netlink/genetlink.c:31
 [<ffffffff86cb5a11>] genl_lock_done+0x71/0xd0 net/netlink/genetlink.c:531
 [<ffffffff86ca5458>] netlink_sock_destruct+0xf8/0x400
net/netlink/af_netlink.c:331
 [<ffffffff86a7b234>] __sk_destruct+0xf4/0x7f0 net/core/sock.c:1423
 [<ffffffff86a87d6c>] sk_destruct+0x4c/0x80 net/core/sock.c:1453
 [<ffffffff86a87dfc>] __sk_free+0x5c/0x230 net/core/sock.c:1461
 [<ffffffff86a87ff8>] sk_free+0x28/0x30 net/core/sock.c:1472
 [<     inline     >] sock_put include/net/sock.h:1591
 [<ffffffff86ca6cd1>] deferred_put_nlk_sk+0x31/0x40 net/netlink/af_netlink.c:652
 [<     inline     >] __rcu_reclaim kernel/rcu/rcu.h:118
 [<ffffffff815cbc9d>] rcu_do_batch.isra.70+0x9ed/0xe20 kernel/rcu/tree.c:2776
 [<     inline     >] invoke_rcu_callbacks kernel/rcu/tree.c:3040
 [<     inline     >] __rcu_process_callbacks kernel/rcu/tree.c:3007
 [<ffffffff815cc55c>] rcu_process_callbacks+0x48c/0xd70 kernel/rcu/tree.c:3024
 [<ffffffff8814d53b>] __do_softirq+0x32b/0xca8 kernel/softirq.c:284
 [<     inline     >] invoke_softirq kernel/softirq.c:364
 [<ffffffff8141a941>] irq_exit+0x1d1/0x210 kernel/softirq.c:405
 [<     inline     >] exiting_irq arch/x86/include/asm/apic.h:659
 [<ffffffff8814ca30>] smp_apic_timer_interrupt+0x80/0xa0
arch/x86/kernel/apic/apic.c:960
 [<ffffffff8814badc>] apic_timer_interrupt+0x8c/0xa0
arch/x86/entry/entry_64.S:489
 <EOI> [  287.580014]  [<ffffffff8155c987>] ? lock_is_held+0x247/0x310
 [<ffffffff814b6bde>] ___might_sleep+0x59e/0x660 kernel/sched/core.c:7729
 [<ffffffff814b6d3a>] __might_sleep+0x9a/0x1a0 kernel/sched/core.c:7720
 [<ffffffff88142d08>] down_read+0x78/0x160 kernel/locking/rwsem.c:21
 [<     inline     >] anon_vma_lock_read include/linux/rmap.h:127
 [<ffffffff81968295>] validate_mm+0xe5/0x880 mm/mmap.c:347
 [<ffffffff8196bf0b>] vma_link+0x11b/0x180 mm/mmap.c:605
 [<ffffffff81977f46>] mmap_region+0x1076/0x1880 mm/mmap.c:1692
 [<ffffffff81978e4f>] do_mmap+0x6ff/0xe80 mm/mmap.c:1450
 [<     inline     >] do_mmap_pgoff include/linux/mm.h:2039
 [<ffffffff818fd527>] vm_mmap_pgoff+0x1b7/0x210 mm/util.c:305
 [<     inline     >] SYSC_mmap_pgoff mm/mmap.c:1500
 [<ffffffff8196f961>] SyS_mmap_pgoff+0x231/0x5e0 mm/mmap.c:1458
 [<     inline     >] SYSC_mmap arch/x86/kernel/sys_x86_64.c:95
 [<ffffffff8124bf4b>] SyS_mmap+0x1b/0x30 arch/x86/kernel/sys_x86_64.c:86
 [<ffffffff88149dc5>] entry_SYSCALL_64_fastpath+0x23/0xc6

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: net: deadlock on genl_mutex
  2016-11-26 17:04 net: deadlock on genl_mutex Dmitry Vyukov
@ 2016-11-26 17:12 ` Eric Dumazet
  2016-11-29  5:59   ` subashab
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2016-11-26 17:12 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: David Miller, Matti Vaittinen, Tycho Andersen, Cong Wang,
	Florian Westphal, stephen hemminger, Tom Herbert, netdev, LKML,
	Richard Guy Briggs, syzkaller

On Sat, Nov 26, 2016 at 9:04 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> Hello,
>
> The following program triggers deadlock warnings on genl_mutex:
>
> https://gist.githubusercontent.com/dvyukov/65e33d053e507d2ab0bf6ae83d989585/raw/b3c640ec58e894b50bcbf255c471406466cfa5d0/gistfile1.txt
>
> On commit 16ae16c6e5616c084168740990fc508bda6655d4 (Nov 24).
>
> BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
> in_atomic(): 1, irqs_disabled(): 0, pid: 32289, name: syz-executor
> CPU: 0 PID: 32289 Comm: syz-executor Not tainted 4.9.0-rc5+ #54
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>  ffff88003ec06420 ffffffff834c2e39 ffffffff00000000 1ffff10007d80c17
>  ffffed0007d80c0f 0000000041b58ab3 ffffffff89575550 ffffffff834c2b4b
>  ffffffff8baab1a0 dffffc0000000000 0000000000000000 ffff880068f794e0
> Call Trace:
>  <IRQ> [  287.394552]  [<     inline     >] __dump_stack lib/dump_stack.c:15
>  <IRQ> [  287.394552]  [<ffffffff834c2e39>] dump_stack+0x2ee/0x3f5
> lib/dump_stack.c:51
>  [<ffffffff814b6ac3>] ___might_sleep+0x483/0x660 kernel/sched/core.c:7761
>  [<ffffffff814b6d3a>] __might_sleep+0x9a/0x1a0 kernel/sched/core.c:7720
>  [<ffffffff88139aaa>] mutex_lock_nested+0x1ea/0xf20 kernel/locking/mutex.c:620
>  [<     inline     >] genl_lock net/netlink/genetlink.c:31
>  [<ffffffff86cb5a11>] genl_lock_done+0x71/0xd0 net/netlink/genetlink.c:531
>  [<ffffffff86ca5458>] netlink_sock_destruct+0xf8/0x400
> net/netlink/af_netlink.c:331
>  [<ffffffff86a7b234>] __sk_destruct+0xf4/0x7f0 net/core/sock.c:1423
>  [<ffffffff86a87d6c>] sk_destruct+0x4c/0x80 net/core/sock.c:1453
>  [<ffffffff86a87dfc>] __sk_free+0x5c/0x230 net/core/sock.c:1461
>  [<ffffffff86a87ff8>] sk_free+0x28/0x30 net/core/sock.c:1472
>  [<     inline     >] sock_put include/net/sock.h:1591
>  [<ffffffff86ca6cd1>] deferred_put_nlk_sk+0x31/0x40 net/netlink/af_netlink.c:652
>  [<     inline     >] __rcu_reclaim kernel/rcu/rcu.h:118
>  [<ffffffff815cbc9d>] rcu_do_batch.isra.70+0x9ed/0xe20 kernel/rcu/tree.c:2776
>  [<     inline     >] invoke_rcu_callbacks kernel/rcu/tree.c:3040
>  [<     inline     >] __rcu_process_callbacks kernel/rcu/tree.c:3007
>  [<ffffffff815cc55c>] rcu_process_callbacks+0x48c/0xd70 kernel/rcu/tree.c:3024
>  [<ffffffff8814d53b>] __do_softirq+0x32b/0xca8 kernel/softirq.c:284
>  [<     inline     >] invoke_softirq kernel/softirq.c:364
>  [<ffffffff8141a941>] irq_exit+0x1d1/0x210 kernel/softirq.c:405
>  [<     inline     >] exiting_irq arch/x86/include/asm/apic.h:659
>  [<ffffffff8814ca30>] smp_apic_timer_interrupt+0x80/0xa0
> arch/x86/kernel/apic/apic.c:960
>  [<ffffffff8814badc>] apic_timer_interrupt+0x8c/0xa0
> arch/x86/entry/entry_64.S:489
>  <EOI> [  287.403717]  [<ffffffff8155c987>] ? lock_is_held+0x247/0x310
>  [<ffffffff814b6bde>] ___might_sleep+0x59e/0x660 kernel/sched/core.c:7729
>  [<ffffffff814b6d3a>] __might_sleep+0x9a/0x1a0 kernel/sched/core.c:7720
>  [<ffffffff88142d08>] down_read+0x78/0x160 kernel/locking/rwsem.c:21
>  [<     inline     >] anon_vma_lock_read include/linux/rmap.h:127
>  [<ffffffff81968295>] validate_mm+0xe5/0x880 mm/mmap.c:347
>  [<ffffffff8196bf0b>] vma_link+0x11b/0x180 mm/mmap.c:605
>  [<ffffffff81977f46>] mmap_region+0x1076/0x1880 mm/mmap.c:1692
>  [<ffffffff81978e4f>] do_mmap+0x6ff/0xe80 mm/mmap.c:1450
>  [<     inline     >] do_mmap_pgoff include/linux/mm.h:2039
>  [<ffffffff818fd527>] vm_mmap_pgoff+0x1b7/0x210 mm/util.c:305
>  [<     inline     >] SYSC_mmap_pgoff mm/mmap.c:1500
>  [<ffffffff8196f961>] SyS_mmap_pgoff+0x231/0x5e0 mm/mmap.c:1458
>  [<     inline     >] SYSC_mmap arch/x86/kernel/sys_x86_64.c:95
>  [<ffffffff8124bf4b>] SyS_mmap+0x1b/0x30 arch/x86/kernel/sys_x86_64.c:86
>  [<ffffffff88149dc5>] entry_SYSCALL_64_fastpath+0x23/0xc6
>
> =================================
> [ INFO: inconsistent lock state ]
> 4.9.0-rc5+ #54 Tainted: G        W
> ---------------------------------
> inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
> syz-executor/32289 [HC0[0]:SC1[1]:HE1:SE0] takes:
>  ([  287.580014] genl_mutex
> [<     inline     >] genl_lock net/netlink/genetlink.c:31
> [<ffffffff86cb5a11>] genl_lock_done+0x71/0xd0 net/netlink/genetlink.c:531
> {SOFTIRQ-ON-W} state was registered at:
>   [  287.580014] [<     inline     >] mark_irqflags
> kernel/locking/lockdep.c:2938
>   [  287.580014] [<ffffffff81567ad7>] __lock_acquire+0x6e7/0x3380
> kernel/locking/lockdep.c:3292
>   [  287.580014] [<ffffffff8156b642>] lock_acquire+0x2a2/0x790
> kernel/locking/lockdep.c:3746
>   [  287.580014] [<     inline     >] __mutex_lock_common
> kernel/locking/mutex.c:521
>   [  287.580014] [<ffffffff88139aff>] mutex_lock_nested+0x23f/0xf20
> kernel/locking/mutex.c:621
>   [  287.580014] [<     inline     >] genl_lock net/netlink/genetlink.c:31
>   [  287.580014] [<     inline     >] genl_lock_all net/netlink/genetlink.c:52
>   [  287.580014] [<ffffffff86cba52e>]
> __genl_register_family+0x2ce/0x1870 net/netlink/genetlink.c:374
>   [  287.580014] [<     inline     >]
> _genl_register_family_with_ops_grps include/net/genetlink.h:173
>   [  287.580014] [<ffffffff8ab90c02>] genl_init+0x11d/0x185
> net/netlink/genetlink.c:1084
>   [  287.580014] [<ffffffff8100244b>] do_one_initcall+0xfb/0x3f0 init/main.c:778
>   [  287.580014] [<     inline     >] do_initcall_level init/main.c:844
>   [  287.580014] [<     inline     >] do_initcalls init/main.c:852
>   [  287.580014] [<     inline     >] do_basic_setup init/main.c:870
>   [  287.580014] [<ffffffff8aa3d03d>] kernel_init_freeable+0x5c4/0x69e
> init/main.c:1017
>   [  287.580014] [<ffffffff88129c88>] kernel_init+0x18/0x180 init/main.c:943
>   [  287.580014] [<ffffffff8814a05a>] ret_from_fork+0x2a/0x40
> arch/x86/entry/entry_64.S:433
>
> [   78.258919] [ INFO: inconsistent lock state ]
> [   78.258919] 4.9.0-rc5+ #54 Tainted: G        W
> [   78.258919] ---------------------------------
> [   78.258919] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
> [   78.258919] syz-fuzzer/5211 [HC0[0]:SC1[1]:HE1:SE0] takes:
> [   78.258919]  ([   78.258919] genl_mutex
> ){+.?.+.}[   78.258919] , at:
> [   78.258919] [<ffffffff86cb5a11>] genl_lock_done+0x71/0xd0
> [   78.258919] {SOFTIRQ-ON-W} state was registered at:
> [   78.258919]   [   78.258919] [<ffffffff81567ad7>] __lock_acquire+0x6e7/0x3380
> [   78.258919]   [   78.258919] [<ffffffff8156b642>] lock_acquire+0x2a2/0x790
> [   78.258919]   [   78.258919] [<ffffffff88139aff>]
> mutex_lock_nested+0x23f/0xf20
> [   78.258919]   [   78.258919] [<ffffffff86cba52e>]
> __genl_register_family+0x2ce/0x1870
> [   78.258919]   [   78.258919] [<ffffffff8ab90c02>] genl_init+0x11d/0x185
> [   78.258919]   [   78.258919] [<ffffffff8100244b>] do_one_initcall+0xfb/0x3f0
> [   78.258919]   [   78.258919] [<ffffffff8aa3d03d>]
> kernel_init_freeable+0x5c4/0x69e
> [   78.258919]   [   78.258919] [<ffffffff88129c88>] kernel_init+0x18/0x180
> [   78.258919]   [   78.258919] [<ffffffff8814a05a>] ret_from_fork+0x2a/0x40
> [   78.258919] irq event stamp: 149484
> [   78.258919] hardirqs last  enabled at (149484): [   78.258919]
> [<ffffffff8814a7df>] restore_regs_and_iret+0x0/0x1d
> [   78.258919] hardirqs last disabled at (149483): [   78.258919]
> [<ffffffff8814bad7>] apic_timer_interrupt+0x87/0xa0
> [   78.258919] softirqs last  enabled at (149302): [   78.258919]
> [<ffffffff8814da39>] __do_softirq+0x829/0xca8
> [   78.258919] softirqs last disabled at (149437): [   78.258919]
> [<ffffffff8141a941>] irq_exit+0x1d1/0x210
>
> [   78.258919]
> [   78.258919] other info that might help us debug this:
> [   78.258919]  Possible unsafe locking scenario:
> [   78.258919]
> [   78.258919]        CPU0
> [   78.258919]        ----
> [   78.258919]   lock([   78.258919] genl_mutex
> [   78.258919] );
> [   78.258919]   <Interrupt>
> [   78.258919]     lock([   78.258919] genl_mutex
> [   78.258919] );
> [   78.258919]
> [   78.258919]  *** DEADLOCK ***
> [   78.258919]
> [   78.258919] 1 lock held by syz-fuzzer/5211:
> [   78.258919]  #0: [   78.258919]  (
> rcu_callback[   78.258919] ){......}
> , at: [   78.258919] [<ffffffff815cbc43>] rcu_do_batch.isra.70+0x993/0xe20
> [   78.258919]
> [   78.258919] stack backtrace:
>
> CPU: 0 PID: 32289 Comm: syz-executor Tainted: G        W       4.9.0-rc5+ #54
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>  ffff88003ec05db8 ffffffff834c2e39 ffffffff00000000 1ffff10007d80b4a
>  ffffed0007d80b42 0000000041b58ab3 ffffffff89575550 ffffffff834c2b4b
>  ffff88003948a340 ffff88003ec22cc0 ffff8800384dd280 0000000041b58ab3
> Call Trace:
>  <IRQ> [  287.580014]  [<     inline     >] __dump_stack lib/dump_stack.c:15
>  <IRQ> [  287.580014]  [<ffffffff834c2e39>] dump_stack+0x2ee/0x3f5
> lib/dump_stack.c:51
>  [<ffffffff815648df>] print_usage_bug+0x3ef/0x450 kernel/locking/lockdep.c:2388
>  [<     inline     >] valid_state kernel/locking/lockdep.c:2401
>  [<     inline     >] mark_lock_irq kernel/locking/lockdep.c:2599
>  [<ffffffff81565870>] mark_lock+0xf30/0x1410 kernel/locking/lockdep.c:3062
>  [<     inline     >] mark_irqflags kernel/locking/lockdep.c:2920
>  [<ffffffff8156811e>] __lock_acquire+0xd2e/0x3380 kernel/locking/lockdep.c:3292
>  [<ffffffff8156b642>] lock_acquire+0x2a2/0x790 kernel/locking/lockdep.c:3746
>  [<     inline     >] __mutex_lock_common kernel/locking/mutex.c:521
>  [<ffffffff88139aff>] mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
>  [<     inline     >] genl_lock net/netlink/genetlink.c:31
>  [<ffffffff86cb5a11>] genl_lock_done+0x71/0xd0 net/netlink/genetlink.c:531
>  [<ffffffff86ca5458>] netlink_sock_destruct+0xf8/0x400
> net/netlink/af_netlink.c:331
>  [<ffffffff86a7b234>] __sk_destruct+0xf4/0x7f0 net/core/sock.c:1423
>  [<ffffffff86a87d6c>] sk_destruct+0x4c/0x80 net/core/sock.c:1453
>  [<ffffffff86a87dfc>] __sk_free+0x5c/0x230 net/core/sock.c:1461
>  [<ffffffff86a87ff8>] sk_free+0x28/0x30 net/core/sock.c:1472
>  [<     inline     >] sock_put include/net/sock.h:1591
>  [<ffffffff86ca6cd1>] deferred_put_nlk_sk+0x31/0x40 net/netlink/af_netlink.c:652
>  [<     inline     >] __rcu_reclaim kernel/rcu/rcu.h:118
>  [<ffffffff815cbc9d>] rcu_do_batch.isra.70+0x9ed/0xe20 kernel/rcu/tree.c:2776
>  [<     inline     >] invoke_rcu_callbacks kernel/rcu/tree.c:3040
>  [<     inline     >] __rcu_process_callbacks kernel/rcu/tree.c:3007
>  [<ffffffff815cc55c>] rcu_process_callbacks+0x48c/0xd70 kernel/rcu/tree.c:3024
>  [<ffffffff8814d53b>] __do_softirq+0x32b/0xca8 kernel/softirq.c:284
>  [<     inline     >] invoke_softirq kernel/softirq.c:364
>  [<ffffffff8141a941>] irq_exit+0x1d1/0x210 kernel/softirq.c:405
>  [<     inline     >] exiting_irq arch/x86/include/asm/apic.h:659
>  [<ffffffff8814ca30>] smp_apic_timer_interrupt+0x80/0xa0
> arch/x86/kernel/apic/apic.c:960
>  [<ffffffff8814badc>] apic_timer_interrupt+0x8c/0xa0
> arch/x86/entry/entry_64.S:489
>  <EOI> [  287.580014]  [<ffffffff8155c987>] ? lock_is_held+0x247/0x310
>  [<ffffffff814b6bde>] ___might_sleep+0x59e/0x660 kernel/sched/core.c:7729
>  [<ffffffff814b6d3a>] __might_sleep+0x9a/0x1a0 kernel/sched/core.c:7720
>  [<ffffffff88142d08>] down_read+0x78/0x160 kernel/locking/rwsem.c:21
>  [<     inline     >] anon_vma_lock_read include/linux/rmap.h:127
>  [<ffffffff81968295>] validate_mm+0xe5/0x880 mm/mmap.c:347
>  [<ffffffff8196bf0b>] vma_link+0x11b/0x180 mm/mmap.c:605
>  [<ffffffff81977f46>] mmap_region+0x1076/0x1880 mm/mmap.c:1692
>  [<ffffffff81978e4f>] do_mmap+0x6ff/0xe80 mm/mmap.c:1450
>  [<     inline     >] do_mmap_pgoff include/linux/mm.h:2039
>  [<ffffffff818fd527>] vm_mmap_pgoff+0x1b7/0x210 mm/util.c:305
>  [<     inline     >] SYSC_mmap_pgoff mm/mmap.c:1500
>  [<ffffffff8196f961>] SyS_mmap_pgoff+0x231/0x5e0 mm/mmap.c:1458
>  [<     inline     >] SYSC_mmap arch/x86/kernel/sys_x86_64.c:95
>  [<ffffffff8124bf4b>] SyS_mmap+0x1b/0x30 arch/x86/kernel/sys_x86_64.c:86
>  [<ffffffff88149dc5>] entry_SYSCALL_64_fastpath+0x23/0xc6


Issue was reported yesterday and is under investigation.


http://marc.info/?l=linux-netdev&m=148014004331663&w=2


Thanks !

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: net: deadlock on genl_mutex
  2016-11-26 17:12 ` Eric Dumazet
@ 2016-11-29  5:59   ` subashab
  2016-11-29  6:06     ` Eric Dumazet
  2016-12-08 16:16     ` Dmitry Vyukov
  0 siblings, 2 replies; 13+ messages in thread
From: subashab @ 2016-11-29  5:59 UTC (permalink / raw)
  To: Eric Dumazet, Dmitry Vyukov
  Cc: David Miller, Matti Vaittinen, Tycho Andersen, Cong Wang,
	Florian Westphal, stephen hemminger, Tom Herbert, netdev, LKML,
	Richard Guy Briggs, syzkaller, netdev-owner

> 
> Issue was reported yesterday and is under investigation.
> 
> 
> http://marc.info/?l=linux-netdev&m=148014004331663&w=2
> 
> 
> Thanks !

Hi Dmitry

Can you try the patch below with your reproducer? I haven't seen similar 
crashes reported after this (or even with Eric's patch).

https://patchwork.ozlabs.org/patch/699937/

--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a 
Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: net: deadlock on genl_mutex
  2016-11-29  5:59   ` subashab
@ 2016-11-29  6:06     ` Eric Dumazet
  2016-12-08 16:16     ` Dmitry Vyukov
  1 sibling, 0 replies; 13+ messages in thread
From: Eric Dumazet @ 2016-11-29  6:06 UTC (permalink / raw)
  To: subashab
  Cc: Eric Dumazet, Dmitry Vyukov, David Miller, Matti Vaittinen,
	Tycho Andersen, Cong Wang, Florian Westphal, stephen hemminger,
	Tom Herbert, netdev, LKML, Richard Guy Briggs, syzkaller,
	netdev-owner

On Mon, 2016-11-28 at 22:59 -0700, subashab@codeaurora.org wrote:
> > 
> > Issue was reported yesterday and is under investigation.
> > 
> > 
> > http://marc.info/?l=linux-netdev&m=148014004331663&w=2
> > 
> > 
> > Thanks !
> 
> Hi Dmitry
> 
> Can you try the patch below with your reproducer? I haven't seen similar 
> crashes reported after this (or even with Eric's patch).
> 
> https://patchwork.ozlabs.org/patch/699937/

Yeah, I will post my patch on top of this one.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: net: deadlock on genl_mutex
  2016-11-29  5:59   ` subashab
  2016-11-29  6:06     ` Eric Dumazet
@ 2016-12-08 16:16     ` Dmitry Vyukov
  2016-12-08 17:16       ` Dmitry Vyukov
  1 sibling, 1 reply; 13+ messages in thread
From: Dmitry Vyukov @ 2016-12-08 16:16 UTC (permalink / raw)
  To: syzkaller
  Cc: Eric Dumazet, David Miller, Matti Vaittinen, Tycho Andersen,
	Cong Wang, Florian Westphal, stephen hemminger, Tom Herbert,
	netdev, LKML, Richard Guy Briggs, netdev-owner

On Tue, Nov 29, 2016 at 6:59 AM,  <subashab@codeaurora.org> wrote:
>>
>> Issue was reported yesterday and is under investigation.
>>
>>
>> http://marc.info/?l=linux-netdev&m=148014004331663&w=2
>>
>>
>> Thanks !
>
>
> Hi Dmitry
>
> Can you try the patch below with your reproducer? I haven't seen similar
> crashes reported after this (or even with Eric's patch).

I've synced to 318c8932ddec5c1c26a4af0f3c053784841c598e (Dec 7) and do
_not_ see this report happening anymore.
Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: net: deadlock on genl_mutex
  2016-12-08 16:16     ` Dmitry Vyukov
@ 2016-12-08 17:16       ` Dmitry Vyukov
  2016-12-08 18:02         ` Dmitry Vyukov
  2016-12-09  0:32         ` Cong Wang
  0 siblings, 2 replies; 13+ messages in thread
From: Dmitry Vyukov @ 2016-12-08 17:16 UTC (permalink / raw)
  To: syzkaller
  Cc: Eric Dumazet, David Miller, Matti Vaittinen, Tycho Andersen,
	Cong Wang, Florian Westphal, stephen hemminger, Tom Herbert,
	netdev, LKML, Richard Guy Briggs, netdev-owner

On Thu, Dec 8, 2016 at 5:16 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Tue, Nov 29, 2016 at 6:59 AM,  <subashab@codeaurora.org> wrote:
>>>
>>> Issue was reported yesterday and is under investigation.
>>>
>>>
>>> http://marc.info/?l=linux-netdev&m=148014004331663&w=2
>>>
>>>
>>> Thanks !
>>
>>
>> Hi Dmitry
>>
>> Can you try the patch below with your reproducer? I haven't seen similar
>> crashes reported after this (or even with Eric's patch).
>
> I've synced to 318c8932ddec5c1c26a4af0f3c053784841c598e (Dec 7) and do
> _not_ see this report happening anymore.
> Thanks.


But now I am seeing "possible deadlock" warnings involving genl_lock:

[ INFO: possible circular locking dependency detected ]
4.9.0-rc8+ #77 Not tainted
-------------------------------------------------------
syz-executor7/18794 is trying to acquire lock:
 (rtnl_mutex){+.+.+.}, at: [<ffffffff86b4682c>] rtnl_lock+0x1c/0x20
net/core/rtnetlink.c:70
but task is already holding lock:
 (genl_mutex){+.+.+.}, at: [<     inline     >] genl_lock
net/netlink/genetlink.c:31
 (genl_mutex){+.+.+.}, at: [<ffffffff86cc27c9>]
genl_rcv_msg+0x209/0x260 net/netlink/genetlink.c:658
which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

       [  315.403815] [<     inline     >] validate_chain
kernel/locking/lockdep.c:2265
       [  315.403815] [<ffffffff81569576>]
__lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
       [  315.403815] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3749
       [  315.403815] [<     inline     >] __mutex_lock_common
kernel/locking/mutex.c:521
       [  315.403815] [<ffffffff88195bcf>]
mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
       [  315.403815] [<     inline     >] genl_lock net/netlink/genetlink.c:31
       [  315.403815] [<ffffffff86cc0c26>] genl_lock_dumpit+0x46/0xa0
net/netlink/genetlink.c:518
       [  315.403815] [<ffffffff86cb33ac>] netlink_dump+0x57c/0xd70
net/netlink/af_netlink.c:2127
       [  315.403815] [<ffffffff86cb7b6a>]
__netlink_dump_start+0x4ea/0x760 net/netlink/af_netlink.c:2217
       [  315.403815] [<ffffffff86cc2319>]
genl_family_rcv_msg+0xdc9/0x1070 net/netlink/genetlink.c:586
       [  315.403815] [<ffffffff86cc2770>] genl_rcv_msg+0x1b0/0x260
net/netlink/genetlink.c:660
       [  315.403815] [<ffffffff86cc034c>] netlink_rcv_skb+0x2bc/0x3a0
net/netlink/af_netlink.c:2298
       [  315.403815] [<ffffffff86cc153d>] genl_rcv+0x2d/0x40
net/netlink/genetlink.c:671
       [  315.403815] [<     inline     >] netlink_unicast_kernel
net/netlink/af_netlink.c:1231
       [  315.403815] [<ffffffff86cbeb6a>] netlink_unicast+0x51a/0x740
net/netlink/af_netlink.c:1257
       [  315.403815] [<ffffffff86cbf834>] netlink_sendmsg+0xaa4/0xe50
net/netlink/af_netlink.c:1803
       [  315.403815] [<     inline     >] sock_sendmsg_nosec net/socket.c:621
       [  315.403815] [<ffffffff86a7618f>] sock_sendmsg+0xcf/0x110
net/socket.c:631
       [  315.403815] [<ffffffff86a764fb>] sock_write_iter+0x32b/0x620
net/socket.c:829
       [  315.403815] [<     inline     >] new_sync_write fs/read_write.c:499
       [  315.403815] [<ffffffff81a701ae>] __vfs_write+0x4fe/0x830
fs/read_write.c:512
       [  315.403815] [<ffffffff81a71c55>] vfs_write+0x175/0x4e0
fs/read_write.c:560
       [  315.403815] [<     inline     >] SYSC_write fs/read_write.c:607
       [  315.403815] [<ffffffff81a760e0>] SyS_write+0x100/0x240
fs/read_write.c:599
       [  315.403815] [<ffffffff881a5f85>] entry_SYSCALL_64_fastpath+0x23/0xc6

       [  315.403815] [<     inline     >] validate_chain
kernel/locking/lockdep.c:2265
       [  315.403815] [<ffffffff81569576>]
__lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
       [  315.403815] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3749
       [  315.403815] [<     inline     >] __mutex_lock_common
kernel/locking/mutex.c:521
       [  315.403815] [<ffffffff88195bcf>]
mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
       [  315.403815] [<ffffffff86cb7779>]
__netlink_dump_start+0xf9/0x760 net/netlink/af_netlink.c:2187
       [  315.403815] [<     inline     >] netlink_dump_start
include/linux/netlink.h:165
       [  315.403815] [<ffffffff86d14d48>]
ctnetlink_stat_ct_cpu+0x198/0x1e0
net/netfilter/nf_conntrack_netlink.c:2045
       [  315.403815] [<ffffffff86cd313e>]
nfnetlink_rcv_msg+0x9be/0xd60 net/netfilter/nfnetlink.c:212
       [  315.403815] [<ffffffff86cc034c>] netlink_rcv_skb+0x2bc/0x3a0
net/netlink/af_netlink.c:2298
       [  315.403815] [<ffffffff86cd1b71>] nfnetlink_rcv+0x7e1/0x10d0
net/netfilter/nfnetlink.c:474
       [  315.403815] [<     inline     >] netlink_unicast_kernel
net/netlink/af_netlink.c:1231
       [  315.403815] [<ffffffff86cbeb6a>] netlink_unicast+0x51a/0x740
net/netlink/af_netlink.c:1257
       [  315.403815] [<ffffffff86cbf834>] netlink_sendmsg+0xaa4/0xe50
net/netlink/af_netlink.c:1803
       [  315.403815] [<     inline     >] sock_sendmsg_nosec net/socket.c:621
       [  315.403815] [<ffffffff86a7618f>] sock_sendmsg+0xcf/0x110
net/socket.c:631
       [  315.403815] [<ffffffff86a764fb>] sock_write_iter+0x32b/0x620
net/socket.c:829
       [  315.403815] [<     inline     >] new_sync_write fs/read_write.c:499
       [  315.403815] [<ffffffff81a701ae>] __vfs_write+0x4fe/0x830
fs/read_write.c:512
       [  315.403815] [<ffffffff81a71c55>] vfs_write+0x175/0x4e0
fs/read_write.c:560
       [  315.403815] [<     inline     >] SYSC_write fs/read_write.c:607
       [  315.403815] [<ffffffff81a760e0>] SyS_write+0x100/0x240
fs/read_write.c:599
       [  315.403815] [<ffffffff881a5f85>] entry_SYSCALL_64_fastpath+0x23/0xc6

       [  315.403815] [<     inline     >] validate_chain
kernel/locking/lockdep.c:2265
       [  315.403815] [<ffffffff81569576>]
__lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
       [  315.403815] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3749
       [  315.403815] [<     inline     >] __mutex_lock_common
kernel/locking/mutex.c:521
       [  315.403815] [<ffffffff88195bcf>]
mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
       [  315.403815] [<ffffffff86cd083d>] nfnl_lock+0x2d/0x30
net/netfilter/nfnetlink.c:61
       [  315.403815] [<ffffffff86d7c5b1>]
nf_tables_netdev_event+0x1f1/0x720
net/netfilter/nf_tables_netdev.c:122
       [  315.403815] [<ffffffff8149095a>]
notifier_call_chain+0x14a/0x2f0 kernel/notifier.c:93
       [  315.403815] [<     inline     >] __raw_notifier_call_chain
kernel/notifier.c:394
       [  315.403815] [<ffffffff81490b82>]
raw_notifier_call_chain+0x32/0x40 kernel/notifier.c:401
       [  315.403815] [<ffffffff86ae4af6>]
call_netdevice_notifiers_info+0x56/0x90 net/core/dev.c:1645
       [  315.403815] [<     inline     >] call_netdevice_notifiers
net/core/dev.c:1661
       [  315.403815] [<ffffffff86af898d>]
rollback_registered_many+0x73d/0xba0 net/core/dev.c:6759
       [  315.403815] [<ffffffff86af8e9e>]
rollback_registered+0xae/0x100 net/core/dev.c:6800
       [  315.403815] [<ffffffff86af8f76>]
unregister_netdevice_queue+0x86/0x140 net/core/dev.c:7787
       [  315.403815] [<     inline     >] unregister_netdevice
include/linux/netdevice.h:2455
       [  315.403815] [<ffffffff84912be6>] __tun_detach+0xc66/0xea0
drivers/net/tun.c:567
       [  315.808015] [<     inline     >] tun_detach drivers/net/tun.c:578
       [  315.808015] [<ffffffff84912e69>] tun_chr_close+0x49/0x60
drivers/net/tun.c:2350
       [  315.808015] [<ffffffff81a77f7e>] __fput+0x34e/0x910
fs/file_table.c:208
       [  315.808015] [<ffffffff81a785ca>] ____fput+0x1a/0x20
fs/file_table.c:244
       [  315.808015] [<ffffffff81483c20>] task_work_run+0x1a0/0x280
kernel/task_work.c:116
       [  315.808015] [<     inline     >] exit_task_work
include/linux/task_work.h:21
       [  315.808015] [<ffffffff814129e2>] do_exit+0x1842/0x2650
kernel/exit.c:828
       [  315.808015] [<ffffffff814139ae>] do_group_exit+0x14e/0x420
kernel/exit.c:932
       [  315.808015] [<ffffffff81442b43>] get_signal+0x663/0x1880
kernel/signal.c:2307
       [  315.808015] [<ffffffff81239b45>] do_signal+0xc5/0x2190
arch/x86/kernel/signal.c:807
       [  315.808015] [<ffffffff8100666a>]
exit_to_usermode_loop+0x1ea/0x2d0 arch/x86/entry/common.c:156
       [  315.808015] [<     inline     >] prepare_exit_to_usermode
arch/x86/entry/common.c:190
       [  315.808015] [<ffffffff81009693>]
syscall_return_slowpath+0x4d3/0x570 arch/x86/entry/common.c:259
       [  315.808015] [<ffffffff881a6026>] entry_SYSCALL_64_fastpath+0xc4/0xc6

       [  315.808015] [<     inline     >] check_prev_add
kernel/locking/lockdep.c:1828
       [  315.808015] [<ffffffff8156309b>]
check_prevs_add+0xaab/0x1c20 kernel/locking/lockdep.c:1938
       [  315.808015] [<     inline     >] validate_chain
kernel/locking/lockdep.c:2265
       [  315.808015] [<ffffffff81569576>]
__lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
       [  315.808015] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3749
       [  315.808015] [<     inline     >] __mutex_lock_common
kernel/locking/mutex.c:521
       [  315.808015] [<ffffffff88195bcf>]
mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
       [  315.808015] [<ffffffff86b4682c>] rtnl_lock+0x1c/0x20
net/core/rtnetlink.c:70
       [  315.808015] [<ffffffff87b5cdf9>]
nl80211_pre_doit+0x309/0x5b0 net/wireless/nl80211.c:11750
       [  315.808015] [<ffffffff86cc1cd0>]
genl_family_rcv_msg+0x780/0x1070 net/netlink/genetlink.c:631
       [  315.808015] [<ffffffff86cc2770>] genl_rcv_msg+0x1b0/0x260
net/netlink/genetlink.c:660
       [  315.808015] [<ffffffff86cc034c>] netlink_rcv_skb+0x2bc/0x3a0
net/netlink/af_netlink.c:2298
       [  315.808015] [<ffffffff86cc153d>] genl_rcv+0x2d/0x40
net/netlink/genetlink.c:671
       [  315.808015] [<     inline     >] netlink_unicast_kernel
net/netlink/af_netlink.c:1231
       [  315.808015] [<ffffffff86cbeb6a>] netlink_unicast+0x51a/0x740
net/netlink/af_netlink.c:1257
       [  315.808015] [<ffffffff86cbf834>] netlink_sendmsg+0xaa4/0xe50
net/netlink/af_netlink.c:1803
       [  315.808015] [<     inline     >] sock_sendmsg_nosec net/socket.c:621
       [  315.808015] [<ffffffff86a7618f>] sock_sendmsg+0xcf/0x110
net/socket.c:631
       [  315.808015] [<ffffffff86a764fb>] sock_write_iter+0x32b/0x620
net/socket.c:829
       [  315.808015] [<ffffffff81a6f9a3>]
do_iter_readv_writev+0x363/0x670 fs/read_write.c:695
       [  315.808015] [<ffffffff81a723f1>] do_readv_writev+0x431/0x9b0
fs/read_write.c:872
       [  315.808015] [<ffffffff81a72f2c>] vfs_writev+0x8c/0xc0
fs/read_write.c:911
       [  315.808015] [<ffffffff81a73075>] do_writev+0x115/0x2d0
fs/read_write.c:944
       [  315.808015] [<     inline     >] SYSC_writev fs/read_write.c:1017
       [  315.808015] [<ffffffff81a7682c>] SyS_writev+0x2c/0x40
fs/read_write.c:1014
       [  315.808015] [<ffffffff881a5f85>] entry_SYSCALL_64_fastpath+0x23/0xc6

other info that might help us debug this:

Chain exists of:
 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(genl_mutex);
                               lock(nlk->cb_mutex);
                               lock(genl_mutex);
  lock(rtnl_mutex);

 *** DEADLOCK ***

2 locks held by syz-executor7/18794:
 #0:  (cb_lock){++++++}, at: [<ffffffff86cc152e>] genl_rcv+0x1e/0x40
net/netlink/genetlink.c:670
 #1:  (genl_mutex){+.+.+.}, at: [<     inline     >] genl_lock
net/netlink/genetlink.c:31
 #1:  (genl_mutex){+.+.+.}, at: [<ffffffff86cc27c9>]
genl_rcv_msg+0x209/0x260 net/netlink/genetlink.c:658

stack backtrace:
CPU: 0 PID: 18794 Comm: syz-executor7 Not tainted 4.9.0-rc8+ #77
Hardware name: Google Google/Google, BIOS Google 01/01/2011
 ffff88004add6468 ffffffff834c44f9 ffffffff00000000 1ffff100095bac20
 ffffed00095bac18 0000000041b58ab3 ffffffff895816f0 ffffffff834c420b
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
 [<     inline     >] __dump_stack lib/dump_stack.c:15
 [<ffffffff834c44f9>] dump_stack+0x2ee/0x3f5 lib/dump_stack.c:51
 [<ffffffff81560cb0>] print_circular_bug+0x310/0x3c0
kernel/locking/lockdep.c:1202
 [<     inline     >] check_prev_add kernel/locking/lockdep.c:1828
 [<ffffffff8156309b>] check_prevs_add+0xaab/0x1c20 kernel/locking/lockdep.c:1938
 [<     inline     >] validate_chain kernel/locking/lockdep.c:2265
 [<ffffffff81569576>] __lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
 [<ffffffff8156b672>] lock_acquire+0x2a2/0x790 kernel/locking/lockdep.c:3749
 [<     inline     >] __mutex_lock_common kernel/locking/mutex.c:521
 [<ffffffff88195bcf>] mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
 [<ffffffff86b4682c>] rtnl_lock+0x1c/0x20 net/core/rtnetlink.c:70
 [<ffffffff87b5cdf9>] nl80211_pre_doit+0x309/0x5b0 net/wireless/nl80211.c:11750
 [<ffffffff86cc1cd0>] genl_family_rcv_msg+0x780/0x1070
net/netlink/genetlink.c:631
 [<ffffffff86cc2770>] genl_rcv_msg+0x1b0/0x260 net/netlink/genetlink.c:660
 [<ffffffff86cc034c>] netlink_rcv_skb+0x2bc/0x3a0 net/netlink/af_netlink.c:2298
 [<ffffffff86cc153d>] genl_rcv+0x2d/0x40 net/netlink/genetlink.c:671
 [<     inline     >] netlink_unicast_kernel net/netlink/af_netlink.c:1231
 [<ffffffff86cbeb6a>] netlink_unicast+0x51a/0x740 net/netlink/af_netlink.c:1257
 [<ffffffff86cbf834>] netlink_sendmsg+0xaa4/0xe50 net/netlink/af_netlink.c:1803
 [<     inline     >] sock_sendmsg_nosec net/socket.c:621
 [<ffffffff86a7618f>] sock_sendmsg+0xcf/0x110 net/socket.c:631
 [<ffffffff86a764fb>] sock_write_iter+0x32b/0x620 net/socket.c:829
 [<ffffffff81a6f9a3>] do_iter_readv_writev+0x363/0x670 fs/read_write.c:695
 [<ffffffff81a723f1>] do_readv_writev+0x431/0x9b0 fs/read_write.c:872
 [<ffffffff81a72f2c>] vfs_writev+0x8c/0xc0 fs/read_write.c:911
 [<ffffffff81a73075>] do_writev+0x115/0x2d0 fs/read_write.c:944
 [<     inline     >] SYSC_writev fs/read_write.c:1017
 [<ffffffff81a7682c>] SyS_writev+0x2c/0x40 fs/read_write.c:1014
 [<ffffffff881a5f85>] entry_SYSCALL_64_fastpath+0x23/0xc6

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: net: deadlock on genl_mutex
  2016-12-08 17:16       ` Dmitry Vyukov
@ 2016-12-08 18:02         ` Dmitry Vyukov
  2016-12-09  0:13           ` Cong Wang
  2016-12-09  0:32         ` Cong Wang
  1 sibling, 1 reply; 13+ messages in thread
From: Dmitry Vyukov @ 2016-12-08 18:02 UTC (permalink / raw)
  To: syzkaller
  Cc: Eric Dumazet, David Miller, Matti Vaittinen, Tycho Andersen,
	Cong Wang, Florian Westphal, stephen hemminger, Tom Herbert,
	netdev, LKML, Richard Guy Briggs, netdev-owner

On Thu, Dec 8, 2016 at 6:16 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Thu, Dec 8, 2016 at 5:16 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> On Tue, Nov 29, 2016 at 6:59 AM,  <subashab@codeaurora.org> wrote:
>>>>
>>>> Issue was reported yesterday and is under investigation.
>>>>
>>>>
>>>> http://marc.info/?l=linux-netdev&m=148014004331663&w=2
>>>>
>>>>
>>>> Thanks !
>>>
>>>
>>> Hi Dmitry
>>>
>>> Can you try the patch below with your reproducer? I haven't seen similar
>>> crashes reported after this (or even with Eric's patch).
>>
>> I've synced to 318c8932ddec5c1c26a4af0f3c053784841c598e (Dec 7) and do
>> _not_ see this report happening anymore.
>> Thanks.
>
>
> But now I am seeing "possible deadlock" warnings involving genl_lock:
>
> [ INFO: possible circular locking dependency detected ]
> 4.9.0-rc8+ #77 Not tainted
> -------------------------------------------------------
> syz-executor7/18794 is trying to acquire lock:
>  (rtnl_mutex){+.+.+.}, at: [<ffffffff86b4682c>] rtnl_lock+0x1c/0x20
> net/core/rtnetlink.c:70
> but task is already holding lock:
>  (genl_mutex){+.+.+.}, at: [<     inline     >] genl_lock
> net/netlink/genetlink.c:31
>  (genl_mutex){+.+.+.}, at: [<ffffffff86cc27c9>]
> genl_rcv_msg+0x209/0x260 net/netlink/genetlink.c:658
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
>        [  315.403815] [<     inline     >] validate_chain
> kernel/locking/lockdep.c:2265
>        [  315.403815] [<ffffffff81569576>]
> __lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
>        [  315.403815] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
> kernel/locking/lockdep.c:3749
>        [  315.403815] [<     inline     >] __mutex_lock_common
> kernel/locking/mutex.c:521
>        [  315.403815] [<ffffffff88195bcf>]
> mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
>        [  315.403815] [<     inline     >] genl_lock net/netlink/genetlink.c:31
>        [  315.403815] [<ffffffff86cc0c26>] genl_lock_dumpit+0x46/0xa0
> net/netlink/genetlink.c:518
>        [  315.403815] [<ffffffff86cb33ac>] netlink_dump+0x57c/0xd70
> net/netlink/af_netlink.c:2127
>        [  315.403815] [<ffffffff86cb7b6a>]
> __netlink_dump_start+0x4ea/0x760 net/netlink/af_netlink.c:2217
>        [  315.403815] [<ffffffff86cc2319>]
> genl_family_rcv_msg+0xdc9/0x1070 net/netlink/genetlink.c:586
>        [  315.403815] [<ffffffff86cc2770>] genl_rcv_msg+0x1b0/0x260
> net/netlink/genetlink.c:660
>        [  315.403815] [<ffffffff86cc034c>] netlink_rcv_skb+0x2bc/0x3a0
> net/netlink/af_netlink.c:2298
>        [  315.403815] [<ffffffff86cc153d>] genl_rcv+0x2d/0x40
> net/netlink/genetlink.c:671
>        [  315.403815] [<     inline     >] netlink_unicast_kernel
> net/netlink/af_netlink.c:1231
>        [  315.403815] [<ffffffff86cbeb6a>] netlink_unicast+0x51a/0x740
> net/netlink/af_netlink.c:1257
>        [  315.403815] [<ffffffff86cbf834>] netlink_sendmsg+0xaa4/0xe50
> net/netlink/af_netlink.c:1803
>        [  315.403815] [<     inline     >] sock_sendmsg_nosec net/socket.c:621
>        [  315.403815] [<ffffffff86a7618f>] sock_sendmsg+0xcf/0x110
> net/socket.c:631
>        [  315.403815] [<ffffffff86a764fb>] sock_write_iter+0x32b/0x620
> net/socket.c:829
>        [  315.403815] [<     inline     >] new_sync_write fs/read_write.c:499
>        [  315.403815] [<ffffffff81a701ae>] __vfs_write+0x4fe/0x830
> fs/read_write.c:512
>        [  315.403815] [<ffffffff81a71c55>] vfs_write+0x175/0x4e0
> fs/read_write.c:560
>        [  315.403815] [<     inline     >] SYSC_write fs/read_write.c:607
>        [  315.403815] [<ffffffff81a760e0>] SyS_write+0x100/0x240
> fs/read_write.c:599
>        [  315.403815] [<ffffffff881a5f85>] entry_SYSCALL_64_fastpath+0x23/0xc6
>
>        [  315.403815] [<     inline     >] validate_chain
> kernel/locking/lockdep.c:2265
>        [  315.403815] [<ffffffff81569576>]
> __lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
>        [  315.403815] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
> kernel/locking/lockdep.c:3749
>        [  315.403815] [<     inline     >] __mutex_lock_common
> kernel/locking/mutex.c:521
>        [  315.403815] [<ffffffff88195bcf>]
> mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
>        [  315.403815] [<ffffffff86cb7779>]
> __netlink_dump_start+0xf9/0x760 net/netlink/af_netlink.c:2187
>        [  315.403815] [<     inline     >] netlink_dump_start
> include/linux/netlink.h:165
>        [  315.403815] [<ffffffff86d14d48>]
> ctnetlink_stat_ct_cpu+0x198/0x1e0
> net/netfilter/nf_conntrack_netlink.c:2045
>        [  315.403815] [<ffffffff86cd313e>]
> nfnetlink_rcv_msg+0x9be/0xd60 net/netfilter/nfnetlink.c:212
>        [  315.403815] [<ffffffff86cc034c>] netlink_rcv_skb+0x2bc/0x3a0
> net/netlink/af_netlink.c:2298
>        [  315.403815] [<ffffffff86cd1b71>] nfnetlink_rcv+0x7e1/0x10d0
> net/netfilter/nfnetlink.c:474
>        [  315.403815] [<     inline     >] netlink_unicast_kernel
> net/netlink/af_netlink.c:1231
>        [  315.403815] [<ffffffff86cbeb6a>] netlink_unicast+0x51a/0x740
> net/netlink/af_netlink.c:1257
>        [  315.403815] [<ffffffff86cbf834>] netlink_sendmsg+0xaa4/0xe50
> net/netlink/af_netlink.c:1803
>        [  315.403815] [<     inline     >] sock_sendmsg_nosec net/socket.c:621
>        [  315.403815] [<ffffffff86a7618f>] sock_sendmsg+0xcf/0x110
> net/socket.c:631
>        [  315.403815] [<ffffffff86a764fb>] sock_write_iter+0x32b/0x620
> net/socket.c:829
>        [  315.403815] [<     inline     >] new_sync_write fs/read_write.c:499
>        [  315.403815] [<ffffffff81a701ae>] __vfs_write+0x4fe/0x830
> fs/read_write.c:512
>        [  315.403815] [<ffffffff81a71c55>] vfs_write+0x175/0x4e0
> fs/read_write.c:560
>        [  315.403815] [<     inline     >] SYSC_write fs/read_write.c:607
>        [  315.403815] [<ffffffff81a760e0>] SyS_write+0x100/0x240
> fs/read_write.c:599
>        [  315.403815] [<ffffffff881a5f85>] entry_SYSCALL_64_fastpath+0x23/0xc6
>
>        [  315.403815] [<     inline     >] validate_chain
> kernel/locking/lockdep.c:2265
>        [  315.403815] [<ffffffff81569576>]
> __lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
>        [  315.403815] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
> kernel/locking/lockdep.c:3749
>        [  315.403815] [<     inline     >] __mutex_lock_common
> kernel/locking/mutex.c:521
>        [  315.403815] [<ffffffff88195bcf>]
> mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
>        [  315.403815] [<ffffffff86cd083d>] nfnl_lock+0x2d/0x30
> net/netfilter/nfnetlink.c:61
>        [  315.403815] [<ffffffff86d7c5b1>]
> nf_tables_netdev_event+0x1f1/0x720
> net/netfilter/nf_tables_netdev.c:122
>        [  315.403815] [<ffffffff8149095a>]
> notifier_call_chain+0x14a/0x2f0 kernel/notifier.c:93
>        [  315.403815] [<     inline     >] __raw_notifier_call_chain
> kernel/notifier.c:394
>        [  315.403815] [<ffffffff81490b82>]
> raw_notifier_call_chain+0x32/0x40 kernel/notifier.c:401
>        [  315.403815] [<ffffffff86ae4af6>]
> call_netdevice_notifiers_info+0x56/0x90 net/core/dev.c:1645
>        [  315.403815] [<     inline     >] call_netdevice_notifiers
> net/core/dev.c:1661
>        [  315.403815] [<ffffffff86af898d>]
> rollback_registered_many+0x73d/0xba0 net/core/dev.c:6759
>        [  315.403815] [<ffffffff86af8e9e>]
> rollback_registered+0xae/0x100 net/core/dev.c:6800
>        [  315.403815] [<ffffffff86af8f76>]
> unregister_netdevice_queue+0x86/0x140 net/core/dev.c:7787
>        [  315.403815] [<     inline     >] unregister_netdevice
> include/linux/netdevice.h:2455
>        [  315.403815] [<ffffffff84912be6>] __tun_detach+0xc66/0xea0
> drivers/net/tun.c:567
>        [  315.808015] [<     inline     >] tun_detach drivers/net/tun.c:578
>        [  315.808015] [<ffffffff84912e69>] tun_chr_close+0x49/0x60
> drivers/net/tun.c:2350
>        [  315.808015] [<ffffffff81a77f7e>] __fput+0x34e/0x910
> fs/file_table.c:208
>        [  315.808015] [<ffffffff81a785ca>] ____fput+0x1a/0x20
> fs/file_table.c:244
>        [  315.808015] [<ffffffff81483c20>] task_work_run+0x1a0/0x280
> kernel/task_work.c:116
>        [  315.808015] [<     inline     >] exit_task_work
> include/linux/task_work.h:21
>        [  315.808015] [<ffffffff814129e2>] do_exit+0x1842/0x2650
> kernel/exit.c:828
>        [  315.808015] [<ffffffff814139ae>] do_group_exit+0x14e/0x420
> kernel/exit.c:932
>        [  315.808015] [<ffffffff81442b43>] get_signal+0x663/0x1880
> kernel/signal.c:2307
>        [  315.808015] [<ffffffff81239b45>] do_signal+0xc5/0x2190
> arch/x86/kernel/signal.c:807
>        [  315.808015] [<ffffffff8100666a>]
> exit_to_usermode_loop+0x1ea/0x2d0 arch/x86/entry/common.c:156
>        [  315.808015] [<     inline     >] prepare_exit_to_usermode
> arch/x86/entry/common.c:190
>        [  315.808015] [<ffffffff81009693>]
> syscall_return_slowpath+0x4d3/0x570 arch/x86/entry/common.c:259
>        [  315.808015] [<ffffffff881a6026>] entry_SYSCALL_64_fastpath+0xc4/0xc6
>
>        [  315.808015] [<     inline     >] check_prev_add
> kernel/locking/lockdep.c:1828
>        [  315.808015] [<ffffffff8156309b>]
> check_prevs_add+0xaab/0x1c20 kernel/locking/lockdep.c:1938
>        [  315.808015] [<     inline     >] validate_chain
> kernel/locking/lockdep.c:2265
>        [  315.808015] [<ffffffff81569576>]
> __lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
>        [  315.808015] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
> kernel/locking/lockdep.c:3749
>        [  315.808015] [<     inline     >] __mutex_lock_common
> kernel/locking/mutex.c:521
>        [  315.808015] [<ffffffff88195bcf>]
> mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
>        [  315.808015] [<ffffffff86b4682c>] rtnl_lock+0x1c/0x20
> net/core/rtnetlink.c:70
>        [  315.808015] [<ffffffff87b5cdf9>]
> nl80211_pre_doit+0x309/0x5b0 net/wireless/nl80211.c:11750
>        [  315.808015] [<ffffffff86cc1cd0>]
> genl_family_rcv_msg+0x780/0x1070 net/netlink/genetlink.c:631
>        [  315.808015] [<ffffffff86cc2770>] genl_rcv_msg+0x1b0/0x260
> net/netlink/genetlink.c:660
>        [  315.808015] [<ffffffff86cc034c>] netlink_rcv_skb+0x2bc/0x3a0
> net/netlink/af_netlink.c:2298
>        [  315.808015] [<ffffffff86cc153d>] genl_rcv+0x2d/0x40
> net/netlink/genetlink.c:671
>        [  315.808015] [<     inline     >] netlink_unicast_kernel
> net/netlink/af_netlink.c:1231
>        [  315.808015] [<ffffffff86cbeb6a>] netlink_unicast+0x51a/0x740
> net/netlink/af_netlink.c:1257
>        [  315.808015] [<ffffffff86cbf834>] netlink_sendmsg+0xaa4/0xe50
> net/netlink/af_netlink.c:1803
>        [  315.808015] [<     inline     >] sock_sendmsg_nosec net/socket.c:621
>        [  315.808015] [<ffffffff86a7618f>] sock_sendmsg+0xcf/0x110
> net/socket.c:631
>        [  315.808015] [<ffffffff86a764fb>] sock_write_iter+0x32b/0x620
> net/socket.c:829
>        [  315.808015] [<ffffffff81a6f9a3>]
> do_iter_readv_writev+0x363/0x670 fs/read_write.c:695
>        [  315.808015] [<ffffffff81a723f1>] do_readv_writev+0x431/0x9b0
> fs/read_write.c:872
>        [  315.808015] [<ffffffff81a72f2c>] vfs_writev+0x8c/0xc0
> fs/read_write.c:911
>        [  315.808015] [<ffffffff81a73075>] do_writev+0x115/0x2d0
> fs/read_write.c:944
>        [  315.808015] [<     inline     >] SYSC_writev fs/read_write.c:1017
>        [  315.808015] [<ffffffff81a7682c>] SyS_writev+0x2c/0x40
> fs/read_write.c:1014
>        [  315.808015] [<ffffffff881a5f85>] entry_SYSCALL_64_fastpath+0x23/0xc6
>
> other info that might help us debug this:
>
> Chain exists of:
>  Possible unsafe locking scenario:
>
>        CPU0                    CPU1
>        ----                    ----
>   lock(genl_mutex);
>                                lock(nlk->cb_mutex);
>                                lock(genl_mutex);
>   lock(rtnl_mutex);
>
>  *** DEADLOCK ***
>
> 2 locks held by syz-executor7/18794:
>  #0:  (cb_lock){++++++}, at: [<ffffffff86cc152e>] genl_rcv+0x1e/0x40
> net/netlink/genetlink.c:670
>  #1:  (genl_mutex){+.+.+.}, at: [<     inline     >] genl_lock
> net/netlink/genetlink.c:31
>  #1:  (genl_mutex){+.+.+.}, at: [<ffffffff86cc27c9>]
> genl_rcv_msg+0x209/0x260 net/netlink/genetlink.c:658
>
> stack backtrace:
> CPU: 0 PID: 18794 Comm: syz-executor7 Not tainted 4.9.0-rc8+ #77
> Hardware name: Google Google/Google, BIOS Google 01/01/2011
>  ffff88004add6468 ffffffff834c44f9 ffffffff00000000 1ffff100095bac20
>  ffffed00095bac18 0000000041b58ab3 ffffffff895816f0 ffffffff834c420b
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> Call Trace:
>  [<     inline     >] __dump_stack lib/dump_stack.c:15
>  [<ffffffff834c44f9>] dump_stack+0x2ee/0x3f5 lib/dump_stack.c:51
>  [<ffffffff81560cb0>] print_circular_bug+0x310/0x3c0
> kernel/locking/lockdep.c:1202
>  [<     inline     >] check_prev_add kernel/locking/lockdep.c:1828
>  [<ffffffff8156309b>] check_prevs_add+0xaab/0x1c20 kernel/locking/lockdep.c:1938
>  [<     inline     >] validate_chain kernel/locking/lockdep.c:2265
>  [<ffffffff81569576>] __lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
>  [<ffffffff8156b672>] lock_acquire+0x2a2/0x790 kernel/locking/lockdep.c:3749
>  [<     inline     >] __mutex_lock_common kernel/locking/mutex.c:521
>  [<ffffffff88195bcf>] mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
>  [<ffffffff86b4682c>] rtnl_lock+0x1c/0x20 net/core/rtnetlink.c:70
>  [<ffffffff87b5cdf9>] nl80211_pre_doit+0x309/0x5b0 net/wireless/nl80211.c:11750
>  [<ffffffff86cc1cd0>] genl_family_rcv_msg+0x780/0x1070
> net/netlink/genetlink.c:631
>  [<ffffffff86cc2770>] genl_rcv_msg+0x1b0/0x260 net/netlink/genetlink.c:660
>  [<ffffffff86cc034c>] netlink_rcv_skb+0x2bc/0x3a0 net/netlink/af_netlink.c:2298
>  [<ffffffff86cc153d>] genl_rcv+0x2d/0x40 net/netlink/genetlink.c:671
>  [<     inline     >] netlink_unicast_kernel net/netlink/af_netlink.c:1231
>  [<ffffffff86cbeb6a>] netlink_unicast+0x51a/0x740 net/netlink/af_netlink.c:1257
>  [<ffffffff86cbf834>] netlink_sendmsg+0xaa4/0xe50 net/netlink/af_netlink.c:1803
>  [<     inline     >] sock_sendmsg_nosec net/socket.c:621
>  [<ffffffff86a7618f>] sock_sendmsg+0xcf/0x110 net/socket.c:631
>  [<ffffffff86a764fb>] sock_write_iter+0x32b/0x620 net/socket.c:829
>  [<ffffffff81a6f9a3>] do_iter_readv_writev+0x363/0x670 fs/read_write.c:695
>  [<ffffffff81a723f1>] do_readv_writev+0x431/0x9b0 fs/read_write.c:872
>  [<ffffffff81a72f2c>] vfs_writev+0x8c/0xc0 fs/read_write.c:911
>  [<ffffffff81a73075>] do_writev+0x115/0x2d0 fs/read_write.c:944
>  [<     inline     >] SYSC_writev fs/read_write.c:1017
>  [<ffffffff81a7682c>] SyS_writev+0x2c/0x40 fs/read_write.c:1014
>  [<ffffffff881a5f85>] entry_SYSCALL_64_fastpath+0x23/0xc6



Probably a related one:

[ INFO: possible circular locking dependency detected ]
4.9.0-rc8+ #77 Not tainted
-------------------------------------------------------
syz-executor5/5777 is trying to acquire lock:
 (genl_mutex){+.+.+.}, at: [<     inline     >] genl_lock
net/netlink/genetlink.c:31
 (genl_mutex){+.+.+.}, at: [<ffffffff86cc0c26>]
genl_lock_dumpit+0x46/0xa0 net/netlink/genetlink.c:518
but task is already holding lock:
 (nlk->cb_mutex){+.+.+.}, at: [<ffffffff86cb2f08>]
netlink_dump+0xd8/0xd70 net/netlink/af_netlink.c:2084
which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

       [  158.966653] [<     inline     >] validate_chain
kernel/locking/lockdep.c:2265
       [  158.966653] [<ffffffff81569576>]
__lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
       [  158.966653] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3749
       [  158.966653] [<     inline     >] __mutex_lock_common
kernel/locking/mutex.c:521
       [  158.966653] [<ffffffff88195bcf>]
mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
       [  158.966653] [<ffffffff86cb7779>]
__netlink_dump_start+0xf9/0x760 net/netlink/af_netlink.c:2187
       [  158.966653] [<     inline     >] netlink_dump_start
include/linux/netlink.h:165
       [  158.966653] [<ffffffff86d1395f>]
ctnetlink_get_ct_unconfirmed+0x17f/0x220
net/netfilter/nf_conntrack_netlink.c:1369
       [  158.966653] [<ffffffff86cd313e>]
nfnetlink_rcv_msg+0x9be/0xd60 net/netfilter/nfnetlink.c:212
       [  158.966653] [<ffffffff86cc034c>] netlink_rcv_skb+0x2bc/0x3a0
net/netlink/af_netlink.c:2298
       [  158.966653] [<ffffffff86cd1b71>] nfnetlink_rcv+0x7e1/0x10d0
net/netfilter/nfnetlink.c:474
       [  158.966653] [<     inline     >] netlink_unicast_kernel
net/netlink/af_netlink.c:1231
       [  158.966653] [<ffffffff86cbeb6a>] netlink_unicast+0x51a/0x740
net/netlink/af_netlink.c:1257
       [  158.966653] [<ffffffff86cbf834>] netlink_sendmsg+0xaa4/0xe50
net/netlink/af_netlink.c:1803
       [  158.966653] [<     inline     >] sock_sendmsg_nosec net/socket.c:621
       [  158.966653] [<ffffffff86a7618f>] sock_sendmsg+0xcf/0x110
net/socket.c:631
       [  158.966653] [<ffffffff86a764fb>] sock_write_iter+0x32b/0x620
net/socket.c:829
       [  158.966653] [<     inline     >] new_sync_write fs/read_write.c:499
       [  158.966653] [<ffffffff81a701ae>] __vfs_write+0x4fe/0x830
fs/read_write.c:512
       [  158.966653] [<ffffffff81a71c55>] vfs_write+0x175/0x4e0
fs/read_write.c:560
       [  158.966653] [<     inline     >] SYSC_write fs/read_write.c:607
       [  158.966653] [<ffffffff81a760e0>] SyS_write+0x100/0x240
fs/read_write.c:599
       [  158.966653] [<ffffffff881a5f85>] entry_SYSCALL_64_fastpath+0x23/0xc6

       [  158.966653] [<     inline     >] validate_chain
kernel/locking/lockdep.c:2265
       [  158.966653] [<ffffffff81569576>]
__lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
       [  158.966653] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3749
       [  158.966653] [<     inline     >] __mutex_lock_common
kernel/locking/mutex.c:521
       [  158.966653] [<ffffffff88195bcf>]
mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
       [  158.966653] [<ffffffff86cd083d>] nfnl_lock+0x2d/0x30
net/netfilter/nfnetlink.c:61
       [  158.966653] [<ffffffff86d7c5b1>]
nf_tables_netdev_event+0x1f1/0x720
net/netfilter/nf_tables_netdev.c:122
       [  158.966653] [<ffffffff8149095a>]
notifier_call_chain+0x14a/0x2f0 kernel/notifier.c:93
       [  158.966653] [<     inline     >] __raw_notifier_call_chain
kernel/notifier.c:394
       [  158.966653] [<ffffffff81490b82>]
raw_notifier_call_chain+0x32/0x40 kernel/notifier.c:401
       [  158.966653] [<ffffffff86ae4af6>]
call_netdevice_notifiers_info+0x56/0x90 net/core/dev.c:1645
       [  158.966653] [<     inline     >] call_netdevice_notifiers
net/core/dev.c:1661
       [  158.966653] [<ffffffff86af898d>]
rollback_registered_many+0x73d/0xba0 net/core/dev.c:6759
       [  158.966653] [<ffffffff86af8e9e>]
rollback_registered+0xae/0x100 net/core/dev.c:6800
       [  158.966653] [<ffffffff86af8f76>]
unregister_netdevice_queue+0x86/0x140 net/core/dev.c:7787
       [  158.966653] [<     inline     >] unregister_netdevice
include/linux/netdevice.h:2455
       [  158.966653] [<ffffffff84912be6>] __tun_detach+0xc66/0xea0
drivers/net/tun.c:567
       [  158.966653] [<     inline     >] tun_detach drivers/net/tun.c:578
       [  158.966653] [<ffffffff84912e69>] tun_chr_close+0x49/0x60
drivers/net/tun.c:2350
       [  158.966653] [<ffffffff81a77f7e>] __fput+0x34e/0x910
fs/file_table.c:208
       [  158.966653] [<ffffffff81a785ca>] ____fput+0x1a/0x20
fs/file_table.c:244
       [  158.966653] [<ffffffff81483c20>] task_work_run+0x1a0/0x280
kernel/task_work.c:116
       [  158.966653] [<     inline     >] exit_task_work
include/linux/task_work.h:21
       [  158.966653] [<ffffffff814129e2>] do_exit+0x1842/0x2650
kernel/exit.c:828
       [  158.966653] [<ffffffff814139ae>] do_group_exit+0x14e/0x420
kernel/exit.c:932
       [  159.308048] [<ffffffff81442b43>] get_signal+0x663/0x1880
kernel/signal.c:2307
       [  159.308048] [<ffffffff81239b45>] do_signal+0xc5/0x2190
arch/x86/kernel/signal.c:807
       [  159.308048] [<ffffffff8100666a>]
exit_to_usermode_loop+0x1ea/0x2d0 arch/x86/entry/common.c:156
       [  159.308048] [<     inline     >] prepare_exit_to_usermode
arch/x86/entry/common.c:190
       [  159.308048] [<ffffffff81009693>]
syscall_return_slowpath+0x4d3/0x570 arch/x86/entry/common.c:259
       [  159.308048] [<ffffffff881a6026>] entry_SYSCALL_64_fastpath+0xc4/0xc6

       [  159.308048] [<     inline     >] validate_chain
kernel/locking/lockdep.c:2265
       [  159.308048] [<ffffffff81569576>]
__lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
       [  159.308048] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3749
       [  159.308048] [<     inline     >] __mutex_lock_common
kernel/locking/mutex.c:521
       [  159.308048] [<ffffffff88195bcf>]
mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
       [  159.308048] [<ffffffff86b4682c>] rtnl_lock+0x1c/0x20
net/core/rtnetlink.c:70
       [  159.308048] [<ffffffff87b5cdf9>]
nl80211_pre_doit+0x309/0x5b0 net/wireless/nl80211.c:11750
       [  159.308048] [<ffffffff86cc1cd0>]
genl_family_rcv_msg+0x780/0x1070 net/netlink/genetlink.c:631
       [  159.308048] [<ffffffff86cc2770>] genl_rcv_msg+0x1b0/0x260
net/netlink/genetlink.c:660
       [  159.308048] [<ffffffff86cc034c>] netlink_rcv_skb+0x2bc/0x3a0
net/netlink/af_netlink.c:2298
       [  159.308048] [<ffffffff86cc153d>] genl_rcv+0x2d/0x40
net/netlink/genetlink.c:671
       [  159.308048] [<     inline     >] netlink_unicast_kernel
net/netlink/af_netlink.c:1231
       [  159.308048] [<ffffffff86cbeb6a>] netlink_unicast+0x51a/0x740
net/netlink/af_netlink.c:1257
       [  159.308048] [<ffffffff86cbf834>] netlink_sendmsg+0xaa4/0xe50
net/netlink/af_netlink.c:1803
       [  159.308048] [<     inline     >] sock_sendmsg_nosec net/socket.c:621
       [  159.308048] [<ffffffff86a7618f>] sock_sendmsg+0xcf/0x110
net/socket.c:631
       [  159.308048] [<ffffffff86a764fb>] sock_write_iter+0x32b/0x620
net/socket.c:829
       [  159.308048] [<ffffffff81a6f9a3>]
do_iter_readv_writev+0x363/0x670 fs/read_write.c:695
       [  159.308048] [<ffffffff81a723f1>] do_readv_writev+0x431/0x9b0
fs/read_write.c:872
       [  159.308048] [<ffffffff81a72f2c>] vfs_writev+0x8c/0xc0
fs/read_write.c:911
       [  159.308048] [<ffffffff81a73075>] do_writev+0x115/0x2d0
fs/read_write.c:944
       [  159.308048] [<     inline     >] SYSC_writev fs/read_write.c:1017
       [  159.308048] [<ffffffff81a7682c>] SyS_writev+0x2c/0x40
fs/read_write.c:1014
       [  159.308048] [<ffffffff881a5f85>] entry_SYSCALL_64_fastpath+0x23/0xc6

       [  159.308048] [<     inline     >] check_prev_add
kernel/locking/lockdep.c:1828
       [  159.308048] [<ffffffff8156309b>]
check_prevs_add+0xaab/0x1c20 kernel/locking/lockdep.c:1938
       [  159.308048] [<     inline     >] validate_chain
kernel/locking/lockdep.c:2265
       [  159.308048] [<ffffffff81569576>]
__lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
       [  159.308048] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3749
       [  159.308048] [<     inline     >] __mutex_lock_common
kernel/locking/mutex.c:521
       [  159.308048] [<ffffffff88195bcf>]
mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
       [  159.308048] [<     inline     >] genl_lock net/netlink/genetlink.c:31
       [  159.308048] [<ffffffff86cc0c26>] genl_lock_dumpit+0x46/0xa0
net/netlink/genetlink.c:518
       [  159.308048] [<ffffffff86cb33ac>] netlink_dump+0x57c/0xd70
net/netlink/af_netlink.c:2127
       [  159.308048] [<ffffffff86cb7b6a>]
__netlink_dump_start+0x4ea/0x760 net/netlink/af_netlink.c:2217
       [  159.308048] [<ffffffff86cc2319>]
genl_family_rcv_msg+0xdc9/0x1070 net/netlink/genetlink.c:586
       [  159.308048] [<ffffffff86cc2770>] genl_rcv_msg+0x1b0/0x260
net/netlink/genetlink.c:660
       [  159.308048] [<ffffffff86cc034c>] netlink_rcv_skb+0x2bc/0x3a0
net/netlink/af_netlink.c:2298
       [  159.308048] [<ffffffff86cc153d>] genl_rcv+0x2d/0x40
net/netlink/genetlink.c:671
       [  159.308048] [<     inline     >] netlink_unicast_kernel
net/netlink/af_netlink.c:1231
       [  159.308048] [<ffffffff86cbeb6a>] netlink_unicast+0x51a/0x740
net/netlink/af_netlink.c:1257
       [  159.308048] [<ffffffff86cbf834>] netlink_sendmsg+0xaa4/0xe50
net/netlink/af_netlink.c:1803
       [  159.308048] [<     inline     >] sock_sendmsg_nosec net/socket.c:621
       [  159.308048] [<ffffffff86a7618f>] sock_sendmsg+0xcf/0x110
net/socket.c:631
       [  159.308048] [<ffffffff86a764fb>] sock_write_iter+0x32b/0x620
net/socket.c:829
       [  159.308048] [<     inline     >] new_sync_write fs/read_write.c:499
       [  159.308048] [<ffffffff81a701ae>] __vfs_write+0x4fe/0x830
fs/read_write.c:512
       [  159.308048] [<ffffffff81a71c55>] vfs_write+0x175/0x4e0
fs/read_write.c:560
       [  159.308048] [<     inline     >] SYSC_write fs/read_write.c:607
       [  159.308048] [<ffffffff81a760e0>] SyS_write+0x100/0x240
fs/read_write.c:599
       [  159.308048] [<ffffffff881a5f85>] entry_SYSCALL_64_fastpath+0x23/0xc6

other info that might help us debug this:

Chain exists of:
 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(nlk->cb_mutex);
                               lock(&table[i].mutex);
                               lock(nlk->cb_mutex);
  lock(genl_mutex);

 *** DEADLOCK ***

2 locks held by syz-executor5/5777:
 #0:  (cb_lock){++++++}, at: [<ffffffff86cc152e>] genl_rcv+0x1e/0x40
net/netlink/genetlink.c:670
 #1:  (nlk->cb_mutex){+.+.+.}, at: [<ffffffff86cb2f08>]
netlink_dump+0xd8/0xd70 net/netlink/af_netlink.c:2084

stack backtrace:
CPU: 1 PID: 5777 Comm: syz-executor5 Not tainted 4.9.0-rc8+ #77
Hardware name: Google Google/Google, BIOS Google 01/01/2011
 ffff88005fe363e8 ffffffff834c44f9 ffffffff00000001 1ffff1000bfc6c10
 ffffed000bfc6c08 0000000041b58ab3 ffffffff895816f0 ffffffff834c420b
 0000000000000000 0000000000000000 0000000000000000 dffffc0000000000
Call Trace:
 [<     inline     >] __dump_stack lib/dump_stack.c:15
 [<ffffffff834c44f9>] dump_stack+0x2ee/0x3f5 lib/dump_stack.c:51
 [<ffffffff81560cb0>] print_circular_bug+0x310/0x3c0
kernel/locking/lockdep.c:1202
 [<     inline     >] check_prev_add kernel/locking/lockdep.c:1828
 [<ffffffff8156309b>] check_prevs_add+0xaab/0x1c20 kernel/locking/lockdep.c:1938
 [<     inline     >] validate_chain kernel/locking/lockdep.c:2265
 [<ffffffff81569576>] __lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
 [<ffffffff8156b672>] lock_acquire+0x2a2/0x790 kernel/locking/lockdep.c:3749
 [<     inline     >] __mutex_lock_common kernel/locking/mutex.c:521
 [<ffffffff88195bcf>] mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
 [<     inline     >] genl_lock net/netlink/genetlink.c:31
 [<ffffffff86cc0c26>] genl_lock_dumpit+0x46/0xa0 net/netlink/genetlink.c:518
 [<ffffffff86cb33ac>] netlink_dump+0x57c/0xd70 net/netlink/af_netlink.c:2127
 [<ffffffff86cb7b6a>] __netlink_dump_start+0x4ea/0x760
net/netlink/af_netlink.c:2217
 [<ffffffff86cc2319>] genl_family_rcv_msg+0xdc9/0x1070
net/netlink/genetlink.c:586
 [<ffffffff86cc2770>] genl_rcv_msg+0x1b0/0x260 net/netlink/genetlink.c:660
 [<ffffffff86cc034c>] netlink_rcv_skb+0x2bc/0x3a0 net/netlink/af_netlink.c:2298
 [<ffffffff86cc153d>] genl_rcv+0x2d/0x40 net/netlink/genetlink.c:671
 [<     inline     >] netlink_unicast_kernel net/netlink/af_netlink.c:1231
 [<ffffffff86cbeb6a>] netlink_unicast+0x51a/0x740 net/netlink/af_netlink.c:1257
 [<ffffffff86cbf834>] netlink_sendmsg+0xaa4/0xe50 net/netlink/af_netlink.c:1803
 [<     inline     >] sock_sendmsg_nosec net/socket.c:621
 [<ffffffff86a7618f>] sock_sendmsg+0xcf/0x110 net/socket.c:631
 [<ffffffff86a764fb>] sock_write_iter+0x32b/0x620 net/socket.c:829
 [<     inline     >] new_sync_write fs/read_write.c:499
 [<ffffffff81a701ae>] __vfs_write+0x4fe/0x830 fs/read_write.c:512
 [<ffffffff81a71c55>] vfs_write+0x175/0x4e0 fs/read_write.c:560
 [<     inline     >] SYSC_write fs/read_write.c:607
 [<ffffffff81a760e0>] SyS_write+0x100/0x240 fs/read_write.c:599
 [<ffffffff881a5f85>] entry_SYSCALL_64_fastpath+0x23/0xc6

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: net: deadlock on genl_mutex
  2016-12-08 18:02         ` Dmitry Vyukov
@ 2016-12-09  0:13           ` Cong Wang
  0 siblings, 0 replies; 13+ messages in thread
From: Cong Wang @ 2016-12-09  0:13 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: syzkaller, Eric Dumazet, David Miller, Matti Vaittinen,
	Tycho Andersen, Florian Westphal, stephen hemminger, Tom Herbert,
	netdev, LKML, Richard Guy Briggs, netdev-owner

On Thu, Dec 8, 2016 at 10:02 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> Chain exists of:
>  Possible unsafe locking scenario:
>
>        CPU0                    CPU1
>        ----                    ----
>   lock(nlk->cb_mutex);
>                                lock(&table[i].mutex);
>                                lock(nlk->cb_mutex);
>   lock(genl_mutex);

Similar to the unix bindlock, this one looks false positive to me too.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: net: deadlock on genl_mutex
  2016-12-08 17:16       ` Dmitry Vyukov
  2016-12-08 18:02         ` Dmitry Vyukov
@ 2016-12-09  0:32         ` Cong Wang
  2016-12-09  5:08           ` Cong Wang
  1 sibling, 1 reply; 13+ messages in thread
From: Cong Wang @ 2016-12-09  0:32 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: syzkaller, Eric Dumazet, David Miller, Matti Vaittinen,
	Tycho Andersen, Florian Westphal, stephen hemminger, Tom Herbert,
	netdev, LKML, Richard Guy Briggs, netdev-owner

On Thu, Dec 8, 2016 at 9:16 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> Chain exists of:
>  Possible unsafe locking scenario:
>
>        CPU0                    CPU1
>        ----                    ----
>   lock(genl_mutex);
>                                lock(nlk->cb_mutex);
>                                lock(genl_mutex);
>   lock(rtnl_mutex);
>
>  *** DEADLOCK ***

This one looks legitimate, because nlk->cb_mutex could be rtnl_mutex.
Let me think about it.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: net: deadlock on genl_mutex
  2016-12-09  0:32         ` Cong Wang
@ 2016-12-09  5:08           ` Cong Wang
  2016-12-11  9:40             ` Dmitry Vyukov
  2017-01-29 10:11             ` Dmitry Vyukov
  0 siblings, 2 replies; 13+ messages in thread
From: Cong Wang @ 2016-12-09  5:08 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: syzkaller, Eric Dumazet, David Miller, Matti Vaittinen,
	Tycho Andersen, Florian Westphal, stephen hemminger, Tom Herbert,
	netdev, LKML, Richard Guy Briggs, netdev-owner

On Thu, Dec 8, 2016 at 4:32 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Thu, Dec 8, 2016 at 9:16 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> Chain exists of:
>>  Possible unsafe locking scenario:
>>
>>        CPU0                    CPU1
>>        ----                    ----
>>   lock(genl_mutex);
>>                                lock(nlk->cb_mutex);
>>                                lock(genl_mutex);
>>   lock(rtnl_mutex);
>>
>>  *** DEADLOCK ***
>
> This one looks legitimate, because nlk->cb_mutex could be rtnl_mutex.
> Let me think about it.

Never mind. Actually both reports in this thread are legitimate.

I know what happened now, the lock chain is so long, 4 locks are involved
to form a chain!!!

Let me think about how to break the chain.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: net: deadlock on genl_mutex
  2016-12-09  5:08           ` Cong Wang
@ 2016-12-11  9:40             ` Dmitry Vyukov
  2017-01-29 10:11             ` Dmitry Vyukov
  1 sibling, 0 replies; 13+ messages in thread
From: Dmitry Vyukov @ 2016-12-11  9:40 UTC (permalink / raw)
  To: Cong Wang
  Cc: syzkaller, Eric Dumazet, David Miller, Matti Vaittinen,
	Tycho Andersen, Florian Westphal, stephen hemminger, Tom Herbert,
	netdev, LKML, Richard Guy Briggs, netdev-owner

On Fri, Dec 9, 2016 at 6:08 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Thu, Dec 8, 2016 at 4:32 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> On Thu, Dec 8, 2016 at 9:16 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>> Chain exists of:
>>>  Possible unsafe locking scenario:
>>>
>>>        CPU0                    CPU1
>>>        ----                    ----
>>>   lock(genl_mutex);
>>>                                lock(nlk->cb_mutex);
>>>                                lock(genl_mutex);
>>>   lock(rtnl_mutex);
>>>
>>>  *** DEADLOCK ***
>>
>> This one looks legitimate, because nlk->cb_mutex could be rtnl_mutex.
>> Let me think about it.
>
> Never mind. Actually both reports in this thread are legitimate.
>
> I know what happened now, the lock chain is so long, 4 locks are involved
> to form a chain!!!
>
> Let me think about how to break the chain.



Seems to be a related one, now on nfnl_lock :



[ INFO: possible circular locking dependency detected ]
4.9.0-rc8+ #82 Not tainted
-------------------------------------------------------
syz-executor3/10151 is trying to acquire lock:
 (&table[i].mutex){+.+.+.}, at: [<ffffffff86c96f1d>]
nfnl_lock+0x2d/0x30 net/netfilter/nfnetlink.c:61
but task is already holding lock:
 (rtnl_mutex){+.+.+.}, at: [<ffffffff86b0cf0c>] rtnl_lock+0x1c/0x20
net/core/rtnetlink.c:70
which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

       [  231.942041] [<     inline     >] validate_chain
kernel/locking/lockdep.c:2265
       [  231.942041] [<ffffffff81569576>]
__lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
floppy0: disk absent or changed during operation
floppy0: disk absent or changed during operation
       [  231.950342] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3749
       [  231.950342] [<     inline     >] __mutex_lock_common
kernel/locking/mutex.c:521
       [  231.950342] [<ffffffff8815c2bf>]
mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
       [  231.950342] [<ffffffff86b0cf0c>] rtnl_lock+0x1c/0x20
net/core/rtnetlink.c:70
       [  231.950342] [<ffffffff87b234e9>]
nl80211_pre_doit+0x309/0x5b0 net/wireless/nl80211.c:11750
       [  231.950342] [<ffffffff86c883b0>]
genl_family_rcv_msg+0x780/0x1070 net/netlink/genetlink.c:631
       [  231.950342] [<ffffffff86c88e50>] genl_rcv_msg+0x1b0/0x260
net/netlink/genetlink.c:660
       [  231.950342] [<ffffffff86c86a2c>] netlink_rcv_skb+0x2bc/0x3a0
net/netlink/af_netlink.c:2298
       [  231.950342] [<ffffffff86c87c1d>] genl_rcv+0x2d/0x40
net/netlink/genetlink.c:671
       [  231.950342] [<     inline     >] netlink_unicast_kernel
net/netlink/af_netlink.c:1231
       [  231.950342] [<ffffffff86c8524a>] netlink_unicast+0x51a/0x740
net/netlink/af_netlink.c:1257
       [  231.950342] [<ffffffff86c85f14>] netlink_sendmsg+0xaa4/0xe50
net/netlink/af_netlink.c:1803
       [  231.950342] [<     inline     >] sock_sendmsg_nosec net/socket.c:621
       [  231.950342] [<ffffffff86a3c86f>] sock_sendmsg+0xcf/0x110
net/socket.c:631
       [  231.950342] [<ffffffff86a3cbdb>] sock_write_iter+0x32b/0x620
net/socket.c:829
       [  231.950342] [<     inline     >] new_sync_write fs/read_write.c:499
       [  231.950342] [<ffffffff81a7021e>] __vfs_write+0x4fe/0x830
fs/read_write.c:512
       [  231.950342] [<ffffffff81a71cc5>] vfs_write+0x175/0x4e0
fs/read_write.c:560
       [  231.950342] [<     inline     >] SYSC_write fs/read_write.c:607
       [  231.950342] [<ffffffff81a76150>] SyS_write+0x100/0x240
fs/read_write.c:599
       [  231.950342] [<ffffffff8816c685>] entry_SYSCALL_64_fastpath+0x23/0xc6

       [  231.950342] [<     inline     >] validate_chain
kernel/locking/lockdep.c:2265
       [  231.950342] [<ffffffff81569576>]
__lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
       [  231.950342] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3749
       [  231.950342] [<     inline     >] __mutex_lock_common
kernel/locking/mutex.c:521
       [  231.950342] [<ffffffff8815c2bf>]
mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
       [  231.950342] [<     inline     >] genl_lock net/netlink/genetlink.c:31
       [  231.950342] [<ffffffff86c87306>] genl_lock_dumpit+0x46/0xa0
net/netlink/genetlink.c:518
       [  231.950342] [<ffffffff86c79a8c>] netlink_dump+0x57c/0xd70
net/netlink/af_netlink.c:2127
       [  231.950342] [<ffffffff86c7e24a>]
__netlink_dump_start+0x4ea/0x760 net/netlink/af_netlink.c:2217
       [  231.950342] [<ffffffff86c889f9>]
genl_family_rcv_msg+0xdc9/0x1070 net/netlink/genetlink.c:586
       [  231.950342] [<ffffffff86c88e50>] genl_rcv_msg+0x1b0/0x260
net/netlink/genetlink.c:660
       [  231.950342] [<ffffffff86c86a2c>] netlink_rcv_skb+0x2bc/0x3a0
net/netlink/af_netlink.c:2298
       [  231.950342] [<ffffffff86c87c1d>] genl_rcv+0x2d/0x40
net/netlink/genetlink.c:671
       [  231.950342] [<     inline     >] netlink_unicast_kernel
net/netlink/af_netlink.c:1231
       [  231.950342] [<ffffffff86c8524a>] netlink_unicast+0x51a/0x740
net/netlink/af_netlink.c:1257
       [  231.950342] [<ffffffff86c85f14>] netlink_sendmsg+0xaa4/0xe50
net/netlink/af_netlink.c:1803
       [  231.950342] [<     inline     >] sock_sendmsg_nosec net/socket.c:621
       [  231.950342] [<ffffffff86a3c86f>] sock_sendmsg+0xcf/0x110
net/socket.c:631
       [  231.950342] [<ffffffff86a3cbdb>] sock_write_iter+0x32b/0x620
net/socket.c:829
       [  231.950342] [<ffffffff81a6fa13>]
do_iter_readv_writev+0x363/0x670 fs/read_write.c:695
       [  231.950342] [<ffffffff81a72461>] do_readv_writev+0x431/0x9b0
fs/read_write.c:872
       [  231.950342] [<ffffffff81a72f9c>] vfs_writev+0x8c/0xc0
fs/read_write.c:911
       [  231.950342] [<ffffffff81a730e5>] do_writev+0x115/0x2d0
fs/read_write.c:944
       [  231.950342] [<     inline     >] SYSC_writev fs/read_write.c:1017
       [  231.950342] [<ffffffff81a7689c>] SyS_writev+0x2c/0x40
fs/read_write.c:1014
       [  231.950342] [<ffffffff8816c685>] entry_SYSCALL_64_fastpath+0x23/0xc6

       [  231.950342] [<     inline     >] validate_chain
kernel/locking/lockdep.c:2265
       [  231.950342] [<ffffffff81569576>]
__lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
       [  231.950342] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3749
       [  231.950342] [<     inline     >] __mutex_lock_common
kernel/locking/mutex.c:521
       [  231.950342] [<ffffffff8815c2bf>]
mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
       [  231.950342] [<ffffffff86c7de59>]
__netlink_dump_start+0xf9/0x760 net/netlink/af_netlink.c:2187
       [  231.950342] [<     inline     >] netlink_dump_start
include/linux/netlink.h:165
       [  231.950342] [<ffffffff86d9d964>] ip_set_dump+0x204/0x2b0
net/netfilter/ipset/ip_set_core.c:1447
       [  231.950342] [<ffffffff86c9981e>]
nfnetlink_rcv_msg+0x9be/0xd60 net/netfilter/nfnetlink.c:212
       [  231.950342] [<ffffffff86c86a2c>] netlink_rcv_skb+0x2bc/0x3a0
net/netlink/af_netlink.c:2298
       [  231.950342] [<ffffffff86c98251>] nfnetlink_rcv+0x7e1/0x10d0
net/netfilter/nfnetlink.c:474
       [  231.950342] [<     inline     >] netlink_unicast_kernel
net/netlink/af_netlink.c:1231
       [  231.950342] [<ffffffff86c8524a>] netlink_unicast+0x51a/0x740
net/netlink/af_netlink.c:1257
       [  231.950342] [<ffffffff86c85f14>] netlink_sendmsg+0xaa4/0xe50
net/netlink/af_netlink.c:1803
       [  231.950342] [<     inline     >] sock_sendmsg_nosec net/socket.c:621
       [  231.950342] [<ffffffff86a3c86f>] sock_sendmsg+0xcf/0x110
net/socket.c:631
       [  231.950342] [<ffffffff86a3cbdb>] sock_write_iter+0x32b/0x620
net/socket.c:829
       [  231.950342] [<     inline     >] new_sync_write fs/read_write.c:499
       [  231.950342] [<ffffffff81a7021e>] __vfs_write+0x4fe/0x830
fs/read_write.c:512
       [  231.950342] [<ffffffff81a71cc5>] vfs_write+0x175/0x4e0
fs/read_write.c:560
       [  231.950342] [<     inline     >] SYSC_write fs/read_write.c:607
       [  231.950342] [<ffffffff81a76150>] SyS_write+0x100/0x240
fs/read_write.c:599
       [  231.950342] [<ffffffff8816c685>] entry_SYSCALL_64_fastpath+0x23/0xc6

       [  231.950342] [<     inline     >] check_prev_add
kernel/locking/lockdep.c:1828
       [  231.950342] [<ffffffff8156309b>]
check_prevs_add+0xaab/0x1c20 kernel/locking/lockdep.c:1938
       [  231.950342] [<     inline     >] validate_chain
kernel/locking/lockdep.c:2265
       [  231.950342] [<ffffffff81569576>]
__lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
       [  231.950342] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3749
       [  231.950342] [<     inline     >] __mutex_lock_common
kernel/locking/mutex.c:521
       [  231.950342] [<ffffffff8815c2bf>]
mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
       [  231.950342] [<ffffffff86c96f1d>] nfnl_lock+0x2d/0x30
net/netfilter/nfnetlink.c:61
       [  231.950342] [<ffffffff86d42c91>]
nf_tables_netdev_event+0x1f1/0x720
net/netfilter/nf_tables_netdev.c:122
       [  231.950342] [<ffffffff8149095a>]
notifier_call_chain+0x14a/0x2f0 kernel/notifier.c:93
       [  231.950342] [<     inline     >] __raw_notifier_call_chain
kernel/notifier.c:394
       [  231.950342] [<ffffffff81490b82>]
raw_notifier_call_chain+0x32/0x40 kernel/notifier.c:401
       [  231.950342] [<ffffffff86aab1d6>]
call_netdevice_notifiers_info+0x56/0x90 net/core/dev.c:1645
       [  231.950342] [<     inline     >] call_netdevice_notifiers
net/core/dev.c:1661
       [  231.950342] [<ffffffff86abf06d>]
rollback_registered_many+0x73d/0xba0 net/core/dev.c:6759
       [  231.950342] [<ffffffff86abf57e>]
rollback_registered+0xae/0x100 net/core/dev.c:6800
       [  231.950342] [<ffffffff86abf656>]
unregister_netdevice_queue+0x86/0x140 net/core/dev.c:7787
       [  231.950342] [<     inline     >] unregister_netdevice
include/linux/netdevice.h:2455
       [  231.950342] [<ffffffff848d9296>] __tun_detach+0xc66/0xea0
drivers/net/tun.c:567
       [  231.950342] [<     inline     >] tun_detach drivers/net/tun.c:578
       [  231.950342] [<ffffffff848d9519>] tun_chr_close+0x49/0x60
drivers/net/tun.c:2350
       [  231.950342] [<ffffffff81a77fee>] __fput+0x34e/0x910
fs/file_table.c:208
       [  231.950342] [<ffffffff81a7863a>] ____fput+0x1a/0x20
fs/file_table.c:244
       [  231.950342] [<ffffffff81483c20>] task_work_run+0x1a0/0x280
kernel/task_work.c:116
       [  231.950342] [<     inline     >] exit_task_work
include/linux/task_work.h:21
       [  231.950342] [<ffffffff814129e2>] do_exit+0x1842/0x2650
kernel/exit.c:828
       [  231.950342] [<ffffffff814139ae>] do_group_exit+0x14e/0x420
kernel/exit.c:932
       [  231.950342] [<ffffffff81442b43>] get_signal+0x663/0x1880
kernel/signal.c:2307
       [  231.950342] [<ffffffff81239b45>] do_signal+0xc5/0x2190
arch/x86/kernel/signal.c:807
       [  231.950342] [<ffffffff8100666a>]
exit_to_usermode_loop+0x1ea/0x2d0 arch/x86/entry/common.c:156
       [  231.950342] [<     inline     >] prepare_exit_to_usermode
arch/x86/entry/common.c:190
       [  231.950342] [<ffffffff81009693>]
syscall_return_slowpath+0x4d3/0x570 arch/x86/entry/common.c:259
       [  231.950342] [<ffffffff8816c726>] entry_SYSCALL_64_fastpath+0xc4/0xc6

other info that might help us debug this:

Chain exists of:
 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(rtnl_mutex);
                               lock(genl_mutex);
                               lock(rtnl_mutex);
  lock(&table[i].mutex);

 *** DEADLOCK ***

1 lock held by syz-executor3/10151:
 #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff86b0cf0c>]
rtnl_lock+0x1c/0x20 net/core/rtnetlink.c:70

stack backtrace:
CPU: 2 PID: 10151 Comm: syz-executor3 Not tainted 4.9.0-rc8+ #82
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 ffff8800311057f8 ffffffff8348fc59 ffffffff00000002 1ffff10006220a92
 ffffed0006220a8a 0000000041b58ab3 ffffffff8957cf18 ffffffff8348f96b
 ffffffff894eb258 ffffffff81564970 ffffffff8b565c30 ffffffff8b8e5020
Call Trace:
 [<     inline     >] __dump_stack lib/dump_stack.c:15
 [<ffffffff8348fc59>] dump_stack+0x2ee/0x3f5 lib/dump_stack.c:51
 [<ffffffff81560cb0>] print_circular_bug+0x310/0x3c0
kernel/locking/lockdep.c:1202
 [<     inline     >] check_prev_add kernel/locking/lockdep.c:1828
 [<ffffffff8156309b>] check_prevs_add+0xaab/0x1c20 kernel/locking/lockdep.c:1938
 [<     inline     >] validate_chain kernel/locking/lockdep.c:2265
 [<ffffffff81569576>] __lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
 [<ffffffff8156b672>] lock_acquire+0x2a2/0x790 kernel/locking/lockdep.c:3749
 [<     inline     >] __mutex_lock_common kernel/locking/mutex.c:521
 [<ffffffff8815c2bf>] mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
 [<ffffffff86c96f1d>] nfnl_lock+0x2d/0x30 net/netfilter/nfnetlink.c:61
 [<ffffffff86d42c91>] nf_tables_netdev_event+0x1f1/0x720
net/netfilter/nf_tables_netdev.c:122
 [<ffffffff8149095a>] notifier_call_chain+0x14a/0x2f0 kernel/notifier.c:93
 [<     inline     >] __raw_notifier_call_chain kernel/notifier.c:394
 [<ffffffff81490b82>] raw_notifier_call_chain+0x32/0x40 kernel/notifier.c:401
 [<ffffffff86aab1d6>] call_netdevice_notifiers_info+0x56/0x90
net/core/dev.c:1645
 [<     inline     >] call_netdevice_notifiers net/core/dev.c:1661
 [<ffffffff86abf06d>] rollback_registered_many+0x73d/0xba0 net/core/dev.c:6759
 [<ffffffff86abf57e>] rollback_registered+0xae/0x100 net/core/dev.c:6800
 [<ffffffff86abf656>] unregister_netdevice_queue+0x86/0x140 net/core/dev.c:7787
 [<     inline     >] unregister_netdevice include/linux/netdevice.h:2455
 [<ffffffff848d9296>] __tun_detach+0xc66/0xea0 drivers/net/tun.c:567
 [<     inline     >] tun_detach drivers/net/tun.c:578
 [<ffffffff848d9519>] tun_chr_close+0x49/0x60 drivers/net/tun.c:2350
 [<ffffffff81a77fee>] __fput+0x34e/0x910 fs/file_table.c:208
 [<ffffffff81a7863a>] ____fput+0x1a/0x20 fs/file_table.c:244
 [<ffffffff81483c20>] task_work_run+0x1a0/0x280 kernel/task_work.c:116
 [<     inline     >] exit_task_work include/linux/task_work.h:21
 [<ffffffff814129e2>] do_exit+0x1842/0x2650 kernel/exit.c:828
 [<ffffffff814139ae>] do_group_exit+0x14e/0x420 kernel/exit.c:932
 [<ffffffff81442b43>] get_signal+0x663/0x1880 kernel/signal.c:2307
 [<ffffffff81239b45>] do_signal+0xc5/0x2190 arch/x86/kernel/signal.c:807
 [<ffffffff8100666a>] exit_to_usermode_loop+0x1ea/0x2d0
arch/x86/entry/common.c:156
 [<     inline     >] prepare_exit_to_usermode arch/x86/entry/common.c:190
 [<ffffffff81009693>] syscall_return_slowpath+0x4d3/0x570
arch/x86/entry/common.c:259
 [<ffffffff8816c726>] entry_SYSCALL_64_fastpath+0xc4/0xc6

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: net: deadlock on genl_mutex
  2016-12-09  5:08           ` Cong Wang
  2016-12-11  9:40             ` Dmitry Vyukov
@ 2017-01-29 10:11             ` Dmitry Vyukov
  2017-02-06  6:32               ` Cong Wang
  1 sibling, 1 reply; 13+ messages in thread
From: Dmitry Vyukov @ 2017-01-29 10:11 UTC (permalink / raw)
  To: Cong Wang
  Cc: syzkaller, Eric Dumazet, David Miller, Matti Vaittinen,
	Tycho Andersen, Florian Westphal, stephen hemminger, Tom Herbert,
	netdev, LKML, Richard Guy Briggs, netdev-owner

On Fri, Dec 9, 2016 at 6:08 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>> Chain exists of:
>>>  Possible unsafe locking scenario:
>>>
>>>        CPU0                    CPU1
>>>        ----                    ----
>>>   lock(genl_mutex);
>>>                                lock(nlk->cb_mutex);
>>>                                lock(genl_mutex);
>>>   lock(rtnl_mutex);
>>>
>>>  *** DEADLOCK ***
>>
>> This one looks legitimate, because nlk->cb_mutex could be rtnl_mutex.
>> Let me think about it.
>
> Never mind. Actually both reports in this thread are legitimate.
>
> I know what happened now, the lock chain is so long, 4 locks are involved
> to form a chain!!!
>
> Let me think about how to break the chain.


Cong, any success with breaking the chain?

Still happenning on f0ad17712b9f71c24e2b8b9725230ef57232377f. Or is it
a different one?


[ INFO: possible circular locking dependency detected ]
4.10.0-rc3+ #4 Not tainted
-------------------------------------------------------
syz-executor9/2705 is trying to acquire lock:
 (genl_mutex){+.+.+.}, at: [<ffffffff836f58fe>] genl_lock
net/netlink/genetlink.c:32 [inline]
 (genl_mutex){+.+.+.}, at: [<ffffffff836f58fe>]
genl_family_rcv_msg+0xdae/0x1040 net/netlink/genetlink.c:547

but task is already holding lock:
 (rtnl_mutex){+.+.+.}, at: [<ffffffff836416e7>] rtnl_lock+0x17/0x20
net/core/rtnetlink.c:70

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (rtnl_mutex){+.+.+.}:

[<ffffffff8157e729>] validate_chain kernel/locking/lockdep.c:2265 [inline]
[<ffffffff8157e729>] __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3338
[<ffffffff815808b1>] lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753
[<ffffffff843f9de0>] __mutex_lock_common kernel/locking/mutex.c:639 [inline]
[<ffffffff843f9de0>] mutex_lock_nested+0x290/0x1730 kernel/locking/mutex.c:753
[<ffffffff836416e7>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:70
[<ffffffff83fd5e9e>] nl80211_pre_doit+0x2fe/0x570 net/wireless/nl80211.c:11847
[<ffffffff836f52b0>] genl_family_rcv_msg+0x760/0x1040
net/netlink/genetlink.c:591
[<ffffffff836f807a>] genl_rcv_msg+0x19a/0x330 net/netlink/genetlink.c:620
[<ffffffff836f36cb>] netlink_rcv_skb+0x2ab/0x390 net/netlink/af_netlink.c:2298
[<ffffffff836f4b38>] genl_rcv+0x28/0x40 net/netlink/genetlink.c:631
[<ffffffff836f1f14>] netlink_unicast_kernel
net/netlink/af_netlink.c:1231 [inline]
[<ffffffff836f1f14>] netlink_unicast+0x514/0x730 net/netlink/af_netlink.c:1257
[<ffffffff836f2bcf>] netlink_sendmsg+0xa9f/0xe50 net/netlink/af_netlink.c:1803
[<ffffffff83572d3a>] sock_sendmsg_nosec net/socket.c:635 [inline]
[<ffffffff83572d3a>] sock_sendmsg+0xca/0x110 net/socket.c:645
[<ffffffff8357557a>] ___sys_sendmsg+0x8fa/0x9f0 net/socket.c:1985
[<ffffffff83578138>] __sys_sendmsg+0x138/0x300 net/socket.c:2019
[<ffffffff8357832d>] SYSC_sendmsg net/socket.c:2030 [inline]
[<ffffffff8357832d>] SyS_sendmsg+0x2d/0x50 net/socket.c:2026
[<ffffffff8440e7c1>] entry_SYSCALL_64_fastpath+0x1f/0xc2

-> #0 (genl_mutex){+.+.+.}:

[<ffffffff8157847f>] check_prev_add kernel/locking/lockdep.c:1828 [inline]
[<ffffffff8157847f>] check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1938
[<ffffffff8157e729>] validate_chain kernel/locking/lockdep.c:2265 [inline]
[<ffffffff8157e729>] __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3338
[<ffffffff815808b1>] lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753
[<ffffffff843f9de0>] __mutex_lock_common kernel/locking/mutex.c:639 [inline]
[<ffffffff843f9de0>] mutex_lock_nested+0x290/0x1730 kernel/locking/mutex.c:753
[<ffffffff836f58fe>] genl_lock net/netlink/genetlink.c:32 [inline]
[<ffffffff836f58fe>] genl_family_rcv_msg+0xdae/0x1040
net/netlink/genetlink.c:547
[<ffffffff836f807a>] genl_rcv_msg+0x19a/0x330 net/netlink/genetlink.c:620
[<ffffffff836f36cb>] netlink_rcv_skb+0x2ab/0x390 net/netlink/af_netlink.c:2298
[<ffffffff836f4b38>] genl_rcv+0x28/0x40 net/netlink/genetlink.c:631
[<ffffffff836f1f14>] netlink_unicast_kernel
net/netlink/af_netlink.c:1231 [inline]
[<ffffffff836f1f14>] netlink_unicast+0x514/0x730 net/netlink/af_netlink.c:1257
[<ffffffff836f2bcf>] netlink_sendmsg+0xa9f/0xe50 net/netlink/af_netlink.c:1803
[<ffffffff83572d3a>] sock_sendmsg_nosec net/socket.c:635 [inline]
[<ffffffff83572d3a>] sock_sendmsg+0xca/0x110 net/socket.c:645
[<ffffffff835730a6>] sock_write_iter+0x326/0x600 net/socket.c:848
[<ffffffff81a3c493>] new_sync_write fs/read_write.c:499 [inline]
[<ffffffff81a3c493>] __vfs_write+0x483/0x740 fs/read_write.c:512
[<ffffffff81a42227>] vfs_write+0x187/0x530 fs/read_write.c:560
[<ffffffff81a4675b>] SYSC_write fs/read_write.c:607 [inline]
[<ffffffff81a4675b>] SyS_write+0xfb/0x230 fs/read_write.c:599
[<ffffffff8440e7c1>] entry_SYSCALL_64_fastpath+0x1f/0xc2

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(rtnl_mutex);
                               lock(genl_mutex);
                               lock(rtnl_mutex);
  lock(genl_mutex);

 *** DEADLOCK ***

2 locks held by syz-executor9/2705:
 #0:  (cb_lock){++++++}, at: [<ffffffff836f4b29>] genl_rcv+0x19/0x40
net/netlink/genetlink.c:630
 #1:  (rtnl_mutex){+.+.+.}, at: [<ffffffff836416e7>]
rtnl_lock+0x17/0x20 net/core/rtnetlink.c:70

stack backtrace:
CPU: 1 PID: 2705 Comm: syz-executor9 Not tainted 4.10.0-rc3+ #4
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:15 [inline]
 dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
 print_circular_bug+0x307/0x3b0 kernel/locking/lockdep.c:1202
 check_prev_add kernel/locking/lockdep.c:1828 [inline]
 check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1938
 validate_chain kernel/locking/lockdep.c:2265 [inline]
 __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3338
 lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753
 __mutex_lock_common kernel/locking/mutex.c:639 [inline]
 mutex_lock_nested+0x290/0x1730 kernel/locking/mutex.c:753
 genl_lock net/netlink/genetlink.c:32 [inline]
 genl_family_rcv_msg+0xdae/0x1040 net/netlink/genetlink.c:547
 genl_rcv_msg+0x19a/0x330 net/netlink/genetlink.c:620
 netlink_rcv_skb+0x2ab/0x390 net/netlink/af_netlink.c:2298
 genl_rcv+0x28/0x40 net/netlink/genetlink.c:631
 netlink_unicast_kernel net/netlink/af_netlink.c:1231 [inline]
 netlink_unicast+0x514/0x730 net/netlink/af_netlink.c:1257
 netlink_sendmsg+0xa9f/0xe50 net/netlink/af_netlink.c:1803
 sock_sendmsg_nosec net/socket.c:635 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:645
 sock_write_iter+0x326/0x600 net/socket.c:848
 new_sync_write fs/read_write.c:499 [inline]
 __vfs_write+0x483/0x740 fs/read_write.c:512
 vfs_write+0x187/0x530 fs/read_write.c:560
 SYSC_write fs/read_write.c:607 [inline]
 SyS_write+0xfb/0x230 fs/read_write.c:599
 entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x44f5e9
RSP: 002b:00007fdba138cb58 EFLAGS: 00000212 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000020000fdc RCX: 000000000044f5e9
RDX: 0000000000000024 RSI: 0000000020000fdc RDI: 0000000000000006
RBP: 0000000000000006 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000212 R12: 0000000000700000
R13: 0000000000000002 R14: 0000000000000010 R15: 0000000000000000

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: net: deadlock on genl_mutex
  2017-01-29 10:11             ` Dmitry Vyukov
@ 2017-02-06  6:32               ` Cong Wang
  0 siblings, 0 replies; 13+ messages in thread
From: Cong Wang @ 2017-02-06  6:32 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: syzkaller, Eric Dumazet, David Miller, Matti Vaittinen,
	Tycho Andersen, Florian Westphal, stephen hemminger, Tom Herbert,
	netdev, LKML, Richard Guy Briggs, netdev-owner

On Sun, Jan 29, 2017 at 2:11 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Fri, Dec 9, 2016 at 6:08 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>>> Chain exists of:
>>>>  Possible unsafe locking scenario:
>>>>
>>>>        CPU0                    CPU1
>>>>        ----                    ----
>>>>   lock(genl_mutex);
>>>>                                lock(nlk->cb_mutex);
>>>>                                lock(genl_mutex);
>>>>   lock(rtnl_mutex);
>>>>
>>>>  *** DEADLOCK ***
>>>
>>> This one looks legitimate, because nlk->cb_mutex could be rtnl_mutex.
>>> Let me think about it.
>>
>> Never mind. Actually both reports in this thread are legitimate.
>>
>> I know what happened now, the lock chain is so long, 4 locks are involved
>> to form a chain!!!
>>
>> Let me think about how to break the chain.
>
>
> Cong, any success with breaking the chain?

No luck yet. Each part of the chain seems legit, not sure which
one could be reordered. :-/

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-02-06  6:32 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-26 17:04 net: deadlock on genl_mutex Dmitry Vyukov
2016-11-26 17:12 ` Eric Dumazet
2016-11-29  5:59   ` subashab
2016-11-29  6:06     ` Eric Dumazet
2016-12-08 16:16     ` Dmitry Vyukov
2016-12-08 17:16       ` Dmitry Vyukov
2016-12-08 18:02         ` Dmitry Vyukov
2016-12-09  0:13           ` Cong Wang
2016-12-09  0:32         ` Cong Wang
2016-12-09  5:08           ` Cong Wang
2016-12-11  9:40             ` Dmitry Vyukov
2017-01-29 10:11             ` Dmitry Vyukov
2017-02-06  6:32               ` Cong Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).