[syzbot] possible deadlock in worker_thread

* [syzbot] possible deadlock in worker_thread
@ 2022-02-10 19:27 syzbot
  2022-02-11 18:59 ` Bart Van Assche
  0 siblings, 1 reply; 27+ messages in thread
From: syzbot @ 2022-02-10 19:27 UTC (permalink / raw)
  To: bvanassche, jgg, linux-kernel, linux-rdma, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    d8ad2ce873ab Merge tag 'ext4_for_linus_stable' of git://gi..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=17823662700000
kernel config:  https://syzkaller.appspot.com/x/.config?x=6f043113811433a5
dashboard link: https://syzkaller.appspot.com/bug?extid=831661966588c802aae9
compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+831661966588c802aae9@syzkaller.appspotmail.com

======================================================
WARNING: possible circular locking dependency detected
5.17.0-rc2-syzkaller-00398-gd8ad2ce873ab #0 Not tainted
------------------------------------------------------
kworker/0:7/17139 is trying to acquire lock:
ffff888077a89938 ((wq_completion)loop1){+.+.}-{0:0}, at: flush_workqueue+0xe1/0x13a0 kernel/workqueue.c:2824

but task is already holding lock:
ffffc9000fa07db8 ((work_completion)(&lo->rundown_work)){+.+.}-{0:0}, at: process_one_work+0x8c4/0x1650 kernel/workqueue.c:2282

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #7 ((work_completion)(&lo->rundown_work)){+.+.}-{0:0}:
       process_one_work+0x91b/0x1650 kernel/workqueue.c:2283
       worker_thread+0x657/0x1110 kernel/workqueue.c:2454
       kthread+0x2e9/0x3a0 kernel/kthread.c:377
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

-> #6 ((wq_completion)events_long){+.+.}-{0:0}:
       flush_workqueue+0x110/0x13a0 kernel/workqueue.c:2827
       srp_remove_one+0x1cf/0x320 drivers/infiniband/ulp/srp/ib_srp.c:4052
       remove_client_context+0xbe/0x110 drivers/infiniband/core/device.c:775
       disable_device+0x13b/0x270 drivers/infiniband/core/device.c:1281
       __ib_unregister_device+0x98/0x1a0 drivers/infiniband/core/device.c:1474
       ib_unregister_device_and_put+0x57/0x80 drivers/infiniband/core/device.c:1538
       nldev_dellink+0x1fb/0x300 drivers/infiniband/core/nldev.c:1747
       rdma_nl_rcv_msg+0x36d/0x690 drivers/infiniband/core/netlink.c:195
       rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
       rdma_nl_rcv+0x2ee/0x430 drivers/infiniband/core/netlink.c:259
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:725
       ____sys_sendmsg+0x6e8/0x810 net/socket.c:2413
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2467
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2496
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #5 (&device->unregistration_lock){+.+.}-{3:3}:
       __mutex_lock_common kernel/locking/mutex.c:600 [inline]
       __mutex_lock+0x12f/0x12f0 kernel/locking/mutex.c:733
       __ib_unregister_device+0x25/0x1a0 drivers/infiniband/core/device.c:1470
       ib_unregister_device_and_put+0x57/0x80 drivers/infiniband/core/device.c:1538
       nldev_dellink+0x1fb/0x300 drivers/infiniband/core/nldev.c:1747
       rdma_nl_rcv_msg+0x36d/0x690 drivers/infiniband/core/netlink.c:195
       rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
       rdma_nl_rcv+0x2ee/0x430 drivers/infiniband/core/netlink.c:259
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:725
       ____sys_sendmsg+0x6e8/0x810 net/socket.c:2413
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2467
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2496
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #4 (&rdma_nl_types[idx].sem){.+.+}-{3:3}:
       down_read+0x98/0x440 kernel/locking/rwsem.c:1461
       rdma_nl_rcv_msg+0x161/0x690 drivers/infiniband/core/netlink.c:164
       rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
       rdma_nl_rcv+0x2ee/0x430 drivers/infiniband/core/netlink.c:259
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:725
       sock_no_sendpage+0xf6/0x140 net/core/sock.c:3091
       kernel_sendpage.part.0+0x1a0/0x340 net/socket.c:3492
       kernel_sendpage net/socket.c:3489 [inline]
       sock_sendpage+0xe5/0x140 net/socket.c:1007
       pipe_to_sendpage+0x2ad/0x380 fs/splice.c:364
       splice_from_pipe_feed fs/splice.c:418 [inline]
       __splice_from_pipe+0x43e/0x8a0 fs/splice.c:562
       splice_from_pipe fs/splice.c:597 [inline]
       generic_splice_sendpage+0xd4/0x140 fs/splice.c:746
       do_splice_from fs/splice.c:767 [inline]
       do_splice+0xb7e/0x1960 fs/splice.c:1079
       __do_splice+0x134/0x250 fs/splice.c:1144
       __do_sys_splice fs/splice.c:1350 [inline]
       __se_sys_splice fs/splice.c:1332 [inline]
       __x64_sys_splice+0x198/0x250 fs/splice.c:1332
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #3 (&pipe->mutex/1){+.+.}-{3:3}:
       __mutex_lock_common kernel/locking/mutex.c:600 [inline]
       __mutex_lock+0x12f/0x12f0 kernel/locking/mutex.c:733
       pipe_lock_nested fs/pipe.c:82 [inline]
       pipe_lock+0x5a/0x70 fs/pipe.c:90
       iter_file_splice_write+0x15a/0xc10 fs/splice.c:635
       do_splice_from fs/splice.c:767 [inline]
       do_splice+0xb7e/0x1960 fs/splice.c:1079
       __do_splice+0x134/0x250 fs/splice.c:1144
       __do_sys_splice fs/splice.c:1350 [inline]
       __se_sys_splice fs/splice.c:1332 [inline]
       __x64_sys_splice+0x198/0x250 fs/splice.c:1332
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #2 (sb_writers#3){.+.+}-{0:0}:
       percpu_down_read include/linux/percpu-rwsem.h:51 [inline]
       __sb_start_write include/linux/fs.h:1722 [inline]
       sb_start_write include/linux/fs.h:1792 [inline]
       file_start_write include/linux/fs.h:2937 [inline]
       lo_write_bvec drivers/block/loop.c:242 [inline]
       lo_write_simple drivers/block/loop.c:265 [inline]
       do_req_filebacked drivers/block/loop.c:494 [inline]
       loop_handle_cmd drivers/block/loop.c:1853 [inline]
       loop_process_work+0x1499/0x1db0 drivers/block/loop.c:1893
       process_one_work+0x9ac/0x1650 kernel/workqueue.c:2307
       worker_thread+0x657/0x1110 kernel/workqueue.c:2454
       kthread+0x2e9/0x3a0 kernel/kthread.c:377
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

-> #1 ((work_completion)(&worker->work)){+.+.}-{0:0}:
       process_one_work+0x91b/0x1650 kernel/workqueue.c:2283
       worker_thread+0x657/0x1110 kernel/workqueue.c:2454
       kthread+0x2e9/0x3a0 kernel/kthread.c:377
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

-> #0 ((wq_completion)loop1){+.+.}-{0:0}:
       check_prev_add kernel/locking/lockdep.c:3063 [inline]
       check_prevs_add kernel/locking/lockdep.c:3186 [inline]
       validate_chain kernel/locking/lockdep.c:3801 [inline]
       __lock_acquire+0x2a2c/0x5470 kernel/locking/lockdep.c:5027
       lock_acquire kernel/locking/lockdep.c:5639 [inline]
       lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5604
       flush_workqueue+0x110/0x13a0 kernel/workqueue.c:2827
       drain_workqueue+0x1a5/0x3c0 kernel/workqueue.c:2992
       destroy_workqueue+0x71/0x800 kernel/workqueue.c:4429
       __loop_clr_fd+0x19b/0xd80 drivers/block/loop.c:1118
       loop_rundown_workfn+0x6f/0x150 drivers/block/loop.c:1185
       process_one_work+0x9ac/0x1650 kernel/workqueue.c:2307
       worker_thread+0x657/0x1110 kernel/workqueue.c:2454
       kthread+0x2e9/0x3a0 kernel/kthread.c:377
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

other info that might help us debug this:

Chain exists of:
  (wq_completion)loop1 --> (wq_completion)events_long --> (work_completion)(&lo->rundown_work)

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock((work_completion)(&lo->rundown_work));
                               lock((wq_completion)events_long);
                               lock((work_completion)(&lo->rundown_work));
  lock((wq_completion)loop1);

 *** DEADLOCK ***

2 locks held by kworker/0:7/17139:
 #0: ffff888010c73538 ((wq_completion)events_long){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
 #0: ffff888010c73538 ((wq_completion)events_long){+.+.}-{0:0}, at: arch_atomic_long_set include/linux/atomic/atomic-long.h:41 [inline]
 #0: ffff888010c73538 ((wq_completion)events_long){+.+.}-{0:0}, at: atomic_long_set include/linux/atomic/atomic-instrumented.h:1280 [inline]
 #0: ffff888010c73538 ((wq_completion)events_long){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:631 [inline]
 #0: ffff888010c73538 ((wq_completion)events_long){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:658 [inline]
 #0: ffff888010c73538 ((wq_completion)events_long){+.+.}-{0:0}, at: process_one_work+0x890/0x1650 kernel/workqueue.c:2278
 #1: ffffc9000fa07db8 ((work_completion)(&lo->rundown_work)){+.+.}-{0:0}, at: process_one_work+0x8c4/0x1650 kernel/workqueue.c:2282

stack backtrace:
CPU: 0 PID: 17139 Comm: kworker/0:7 Not tainted 5.17.0-rc2-syzkaller-00398-gd8ad2ce873ab #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events_long loop_rundown_workfn
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
 check_noncircular+0x25f/0x2e0 kernel/locking/lockdep.c:2143
 check_prev_add kernel/locking/lockdep.c:3063 [inline]
 check_prevs_add kernel/locking/lockdep.c:3186 [inline]
 validate_chain kernel/locking/lockdep.c:3801 [inline]
 __lock_acquire+0x2a2c/0x5470 kernel/locking/lockdep.c:5027
 lock_acquire kernel/locking/lockdep.c:5639 [inline]
 lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5604
 flush_workqueue+0x110/0x13a0 kernel/workqueue.c:2827
 drain_workqueue+0x1a5/0x3c0 kernel/workqueue.c:2992
 destroy_workqueue+0x71/0x800 kernel/workqueue.c:4429
 __loop_clr_fd+0x19b/0xd80 drivers/block/loop.c:1118
 loop_rundown_workfn+0x6f/0x150 drivers/block/loop.c:1185
 process_one_work+0x9ac/0x1650 kernel/workqueue.c:2307
 worker_thread+0x657/0x1110 kernel/workqueue.c:2454
 kthread+0x2e9/0x3a0 kernel/kthread.c:377
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
 </TASK>
usb 4-1: new high-speed USB device number 63 using dummy_hcd
usb 4-1: New USB device found, idVendor=0af0, idProduct=d058, bcdDevice= 0.00
usb 4-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 4-1: Product: syz
usb 4-1: Manufacturer: syz
usb 4-1: SerialNumber: syz
usb 4-1: config 0 descriptor??
usb-storage 4-1:0.0: USB Mass Storage device detected
usb 1-1: new high-speed USB device number 103 using dummy_hcd
usb 1-1: New USB device found, idVendor=0af0, idProduct=d058, bcdDevice= 0.00
usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 1-1: Product: syz
usb 1-1: Manufacturer: syz
usb 1-1: SerialNumber: syz
usb 1-1: config 0 descriptor??
usb-storage 1-1:0.0: USB Mass Storage device detected
usb 4-1: USB disconnect, device number 65

---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply	[flat|nested] 27+ messages in thread