All of lore.kernel.org
 help / color / mirror / Atom feed
* [syzbot] possible deadlock in worker_thread
@ 2022-02-10 19:27 syzbot
  2022-02-11 18:59 ` Bart Van Assche
  0 siblings, 1 reply; 27+ messages in thread
From: syzbot @ 2022-02-10 19:27 UTC (permalink / raw)
  To: bvanassche, jgg, linux-kernel, linux-rdma, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    d8ad2ce873ab Merge tag 'ext4_for_linus_stable' of git://gi..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=17823662700000
kernel config:  https://syzkaller.appspot.com/x/.config?x=6f043113811433a5
dashboard link: https://syzkaller.appspot.com/bug?extid=831661966588c802aae9
compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+831661966588c802aae9@syzkaller.appspotmail.com

======================================================
WARNING: possible circular locking dependency detected
5.17.0-rc2-syzkaller-00398-gd8ad2ce873ab #0 Not tainted
------------------------------------------------------
kworker/0:7/17139 is trying to acquire lock:
ffff888077a89938 ((wq_completion)loop1){+.+.}-{0:0}, at: flush_workqueue+0xe1/0x13a0 kernel/workqueue.c:2824

but task is already holding lock:
ffffc9000fa07db8 ((work_completion)(&lo->rundown_work)){+.+.}-{0:0}, at: process_one_work+0x8c4/0x1650 kernel/workqueue.c:2282

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #7 ((work_completion)(&lo->rundown_work)){+.+.}-{0:0}:
       process_one_work+0x91b/0x1650 kernel/workqueue.c:2283
       worker_thread+0x657/0x1110 kernel/workqueue.c:2454
       kthread+0x2e9/0x3a0 kernel/kthread.c:377
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

-> #6 ((wq_completion)events_long){+.+.}-{0:0}:
       flush_workqueue+0x110/0x13a0 kernel/workqueue.c:2827
       srp_remove_one+0x1cf/0x320 drivers/infiniband/ulp/srp/ib_srp.c:4052
       remove_client_context+0xbe/0x110 drivers/infiniband/core/device.c:775
       disable_device+0x13b/0x270 drivers/infiniband/core/device.c:1281
       __ib_unregister_device+0x98/0x1a0 drivers/infiniband/core/device.c:1474
       ib_unregister_device_and_put+0x57/0x80 drivers/infiniband/core/device.c:1538
       nldev_dellink+0x1fb/0x300 drivers/infiniband/core/nldev.c:1747
       rdma_nl_rcv_msg+0x36d/0x690 drivers/infiniband/core/netlink.c:195
       rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
       rdma_nl_rcv+0x2ee/0x430 drivers/infiniband/core/netlink.c:259
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:725
       ____sys_sendmsg+0x6e8/0x810 net/socket.c:2413
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2467
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2496
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #5 (&device->unregistration_lock){+.+.}-{3:3}:
       __mutex_lock_common kernel/locking/mutex.c:600 [inline]
       __mutex_lock+0x12f/0x12f0 kernel/locking/mutex.c:733
       __ib_unregister_device+0x25/0x1a0 drivers/infiniband/core/device.c:1470
       ib_unregister_device_and_put+0x57/0x80 drivers/infiniband/core/device.c:1538
       nldev_dellink+0x1fb/0x300 drivers/infiniband/core/nldev.c:1747
       rdma_nl_rcv_msg+0x36d/0x690 drivers/infiniband/core/netlink.c:195
       rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
       rdma_nl_rcv+0x2ee/0x430 drivers/infiniband/core/netlink.c:259
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:725
       ____sys_sendmsg+0x6e8/0x810 net/socket.c:2413
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2467
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2496
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #4 (&rdma_nl_types[idx].sem){.+.+}-{3:3}:
       down_read+0x98/0x440 kernel/locking/rwsem.c:1461
       rdma_nl_rcv_msg+0x161/0x690 drivers/infiniband/core/netlink.c:164
       rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
       rdma_nl_rcv+0x2ee/0x430 drivers/infiniband/core/netlink.c:259
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:725
       sock_no_sendpage+0xf6/0x140 net/core/sock.c:3091
       kernel_sendpage.part.0+0x1a0/0x340 net/socket.c:3492
       kernel_sendpage net/socket.c:3489 [inline]
       sock_sendpage+0xe5/0x140 net/socket.c:1007
       pipe_to_sendpage+0x2ad/0x380 fs/splice.c:364
       splice_from_pipe_feed fs/splice.c:418 [inline]
       __splice_from_pipe+0x43e/0x8a0 fs/splice.c:562
       splice_from_pipe fs/splice.c:597 [inline]
       generic_splice_sendpage+0xd4/0x140 fs/splice.c:746
       do_splice_from fs/splice.c:767 [inline]
       do_splice+0xb7e/0x1960 fs/splice.c:1079
       __do_splice+0x134/0x250 fs/splice.c:1144
       __do_sys_splice fs/splice.c:1350 [inline]
       __se_sys_splice fs/splice.c:1332 [inline]
       __x64_sys_splice+0x198/0x250 fs/splice.c:1332
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #3 (&pipe->mutex/1){+.+.}-{3:3}:
       __mutex_lock_common kernel/locking/mutex.c:600 [inline]
       __mutex_lock+0x12f/0x12f0 kernel/locking/mutex.c:733
       pipe_lock_nested fs/pipe.c:82 [inline]
       pipe_lock+0x5a/0x70 fs/pipe.c:90
       iter_file_splice_write+0x15a/0xc10 fs/splice.c:635
       do_splice_from fs/splice.c:767 [inline]
       do_splice+0xb7e/0x1960 fs/splice.c:1079
       __do_splice+0x134/0x250 fs/splice.c:1144
       __do_sys_splice fs/splice.c:1350 [inline]
       __se_sys_splice fs/splice.c:1332 [inline]
       __x64_sys_splice+0x198/0x250 fs/splice.c:1332
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #2 (sb_writers#3){.+.+}-{0:0}:
       percpu_down_read include/linux/percpu-rwsem.h:51 [inline]
       __sb_start_write include/linux/fs.h:1722 [inline]
       sb_start_write include/linux/fs.h:1792 [inline]
       file_start_write include/linux/fs.h:2937 [inline]
       lo_write_bvec drivers/block/loop.c:242 [inline]
       lo_write_simple drivers/block/loop.c:265 [inline]
       do_req_filebacked drivers/block/loop.c:494 [inline]
       loop_handle_cmd drivers/block/loop.c:1853 [inline]
       loop_process_work+0x1499/0x1db0 drivers/block/loop.c:1893
       process_one_work+0x9ac/0x1650 kernel/workqueue.c:2307
       worker_thread+0x657/0x1110 kernel/workqueue.c:2454
       kthread+0x2e9/0x3a0 kernel/kthread.c:377
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

-> #1 ((work_completion)(&worker->work)){+.+.}-{0:0}:
       process_one_work+0x91b/0x1650 kernel/workqueue.c:2283
       worker_thread+0x657/0x1110 kernel/workqueue.c:2454
       kthread+0x2e9/0x3a0 kernel/kthread.c:377
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

-> #0 ((wq_completion)loop1){+.+.}-{0:0}:
       check_prev_add kernel/locking/lockdep.c:3063 [inline]
       check_prevs_add kernel/locking/lockdep.c:3186 [inline]
       validate_chain kernel/locking/lockdep.c:3801 [inline]
       __lock_acquire+0x2a2c/0x5470 kernel/locking/lockdep.c:5027
       lock_acquire kernel/locking/lockdep.c:5639 [inline]
       lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5604
       flush_workqueue+0x110/0x13a0 kernel/workqueue.c:2827
       drain_workqueue+0x1a5/0x3c0 kernel/workqueue.c:2992
       destroy_workqueue+0x71/0x800 kernel/workqueue.c:4429
       __loop_clr_fd+0x19b/0xd80 drivers/block/loop.c:1118
       loop_rundown_workfn+0x6f/0x150 drivers/block/loop.c:1185
       process_one_work+0x9ac/0x1650 kernel/workqueue.c:2307
       worker_thread+0x657/0x1110 kernel/workqueue.c:2454
       kthread+0x2e9/0x3a0 kernel/kthread.c:377
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

other info that might help us debug this:

Chain exists of:
  (wq_completion)loop1 --> (wq_completion)events_long --> (work_completion)(&lo->rundown_work)

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock((work_completion)(&lo->rundown_work));
                               lock((wq_completion)events_long);
                               lock((work_completion)(&lo->rundown_work));
  lock((wq_completion)loop1);

 *** DEADLOCK ***

2 locks held by kworker/0:7/17139:
 #0: ffff888010c73538 ((wq_completion)events_long){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
 #0: ffff888010c73538 ((wq_completion)events_long){+.+.}-{0:0}, at: arch_atomic_long_set include/linux/atomic/atomic-long.h:41 [inline]
 #0: ffff888010c73538 ((wq_completion)events_long){+.+.}-{0:0}, at: atomic_long_set include/linux/atomic/atomic-instrumented.h:1280 [inline]
 #0: ffff888010c73538 ((wq_completion)events_long){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:631 [inline]
 #0: ffff888010c73538 ((wq_completion)events_long){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:658 [inline]
 #0: ffff888010c73538 ((wq_completion)events_long){+.+.}-{0:0}, at: process_one_work+0x890/0x1650 kernel/workqueue.c:2278
 #1: ffffc9000fa07db8 ((work_completion)(&lo->rundown_work)){+.+.}-{0:0}, at: process_one_work+0x8c4/0x1650 kernel/workqueue.c:2282

stack backtrace:
CPU: 0 PID: 17139 Comm: kworker/0:7 Not tainted 5.17.0-rc2-syzkaller-00398-gd8ad2ce873ab #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events_long loop_rundown_workfn
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
 check_noncircular+0x25f/0x2e0 kernel/locking/lockdep.c:2143
 check_prev_add kernel/locking/lockdep.c:3063 [inline]
 check_prevs_add kernel/locking/lockdep.c:3186 [inline]
 validate_chain kernel/locking/lockdep.c:3801 [inline]
 __lock_acquire+0x2a2c/0x5470 kernel/locking/lockdep.c:5027
 lock_acquire kernel/locking/lockdep.c:5639 [inline]
 lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5604
 flush_workqueue+0x110/0x13a0 kernel/workqueue.c:2827
 drain_workqueue+0x1a5/0x3c0 kernel/workqueue.c:2992
 destroy_workqueue+0x71/0x800 kernel/workqueue.c:4429
 __loop_clr_fd+0x19b/0xd80 drivers/block/loop.c:1118
 loop_rundown_workfn+0x6f/0x150 drivers/block/loop.c:1185
 process_one_work+0x9ac/0x1650 kernel/workqueue.c:2307
 worker_thread+0x657/0x1110 kernel/workqueue.c:2454
 kthread+0x2e9/0x3a0 kernel/kthread.c:377
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
 </TASK>
usb 4-1: new high-speed USB device number 63 using dummy_hcd
usb 4-1: New USB device found, idVendor=0af0, idProduct=d058, bcdDevice= 0.00
usb 4-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 4-1: Product: syz
usb 4-1: Manufacturer: syz
usb 4-1: SerialNumber: syz
usb 4-1: config 0 descriptor??
usb-storage 4-1:0.0: USB Mass Storage device detected
usb 1-1: new high-speed USB device number 103 using dummy_hcd
usb 1-1: New USB device found, idVendor=0af0, idProduct=d058, bcdDevice= 0.00
usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 1-1: Product: syz
usb 1-1: Manufacturer: syz
usb 1-1: SerialNumber: syz
usb 1-1: config 0 descriptor??
usb-storage 1-1:0.0: USB Mass Storage device detected
usb 4-1: USB disconnect, device number 65


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [syzbot] possible deadlock in worker_thread
  2022-02-10 19:27 [syzbot] possible deadlock in worker_thread syzbot
@ 2022-02-11 18:59 ` Bart Van Assche
  2022-02-12  5:31   ` Tetsuo Handa
  0 siblings, 1 reply; 27+ messages in thread
From: Bart Van Assche @ 2022-02-11 18:59 UTC (permalink / raw)
  To: syzbot, jgg, linux-kernel, linux-rdma, syzkaller-bugs

On 2/10/22 11:27, syzbot wrote:
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.17.0-rc2-syzkaller-00398-gd8ad2ce873ab #0 Not tainted
> ------------------------------------------------------

Since the SRP initiator driver is involved, I will take a look. However, 
I'm not sure yet when I will have the time to post a fix.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [syzbot] possible deadlock in worker_thread
  2022-02-11 18:59 ` Bart Van Assche
@ 2022-02-12  5:31   ` Tetsuo Handa
  2022-02-12 16:37     ` Bart Van Assche
  0 siblings, 1 reply; 27+ messages in thread
From: Tetsuo Handa @ 2022-02-12  5:31 UTC (permalink / raw)
  To: Bart Van Assche, syzbot, jgg, linux-kernel, linux-rdma,
	syzkaller-bugs, Tejun Heo, Lai Jiangshan

On 2022/02/12 3:59, Bart Van Assche wrote:
> On 2/10/22 11:27, syzbot wrote:
>> ======================================================
>> WARNING: possible circular locking dependency detected
>> 5.17.0-rc2-syzkaller-00398-gd8ad2ce873ab #0 Not tainted
>> ------------------------------------------------------
> 
> Since the SRP initiator driver is involved, I will take a look.
> However, I'm not sure yet when I will have the time to post a fix.
> 
> Thanks,
> 
> Bart.
> 

This problem was already handled by commit bf23747ee0532090 ("loop:
revert "make autoclear operation asynchronous"").

But this report might be suggesting us that we should consider
deprecating (and eventually getting rid of) system-wide workqueues
(declared in include/linux/workqueue.h), for since flush_workqueue()
synchronously waits for completion, sharing system-wide workqueues
among multiple modules can generate unexpected locking dependency
chain (like this report).

If some module needs flush_workqueue() or flush_*_work(), shouldn't
such module create and use their own workqueues?

Tejun, what do you think?


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [syzbot] possible deadlock in worker_thread
  2022-02-12  5:31   ` Tetsuo Handa
@ 2022-02-12 16:37     ` Bart Van Assche
  2022-02-12 17:14       ` Tetsuo Handa
  0 siblings, 1 reply; 27+ messages in thread
From: Bart Van Assche @ 2022-02-12 16:37 UTC (permalink / raw)
  To: Tetsuo Handa, syzbot, jgg, linux-kernel, linux-rdma,
	syzkaller-bugs, Tejun Heo, Lai Jiangshan

On 2/11/22 21:31, Tetsuo Handa wrote:
> But this report might be suggesting us that we should consider
> deprecating (and eventually getting rid of) system-wide workqueues
> (declared in include/linux/workqueue.h), for since flush_workqueue()
> synchronously waits for completion, sharing system-wide workqueues
> among multiple modules can generate unexpected locking dependency
> chain (like this report).

I do not agree with deprecating system-wide workqueues. I think that all 
flush_workqueue(system_long_wq) calls should be reviewed since these are 
deadlock-prone.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [syzbot] possible deadlock in worker_thread
  2022-02-12 16:37     ` Bart Van Assche
@ 2022-02-12 17:14       ` Tetsuo Handa
  2022-02-13 15:33         ` Leon Romanovsky
  2022-02-13 23:06         ` Bart Van Assche
  0 siblings, 2 replies; 27+ messages in thread
From: Tetsuo Handa @ 2022-02-12 17:14 UTC (permalink / raw)
  To: Bart Van Assche, syzbot, jgg, linux-kernel, linux-rdma,
	syzkaller-bugs, Tejun Heo, Lai Jiangshan

On 2022/02/13 1:37, Bart Van Assche wrote:
> On 2/11/22 21:31, Tetsuo Handa wrote:
>> But this report might be suggesting us that we should consider
>> deprecating (and eventually getting rid of) system-wide workqueues
>> (declared in include/linux/workqueue.h), for since flush_workqueue()
>> synchronously waits for completion, sharing system-wide workqueues
>> among multiple modules can generate unexpected locking dependency
>> chain (like this report).
> 
> I do not agree with deprecating system-wide workqueues. I think that
> all flush_workqueue(system_long_wq) calls should be reviewed since
> these are deadlock-prone.
> 
> Thanks,
> 
> Bart.

The loop module is not using flush_workqueue(system_long_wq) call; it only
scheduled a work via system_long_wq which will call flush_workqueue() (of
a local workqueue) from drain_workqueue() from destroy_workqueue() from
work function.

How can reviewing all flush_workqueue(system_long_wq) calls help?


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [syzbot] possible deadlock in worker_thread
  2022-02-12 17:14       ` Tetsuo Handa
@ 2022-02-13 15:33         ` Leon Romanovsky
  2022-02-13 23:06         ` Bart Van Assche
  1 sibling, 0 replies; 27+ messages in thread
From: Leon Romanovsky @ 2022-02-13 15:33 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Bart Van Assche, syzbot, jgg, linux-kernel, linux-rdma,
	syzkaller-bugs, Tejun Heo, Lai Jiangshan

On Sun, Feb 13, 2022 at 02:14:09AM +0900, Tetsuo Handa wrote:
> On 2022/02/13 1:37, Bart Van Assche wrote:
> > On 2/11/22 21:31, Tetsuo Handa wrote:
> >> But this report might be suggesting us that we should consider
> >> deprecating (and eventually getting rid of) system-wide workqueues
> >> (declared in include/linux/workqueue.h), for since flush_workqueue()
> >> synchronously waits for completion, sharing system-wide workqueues
> >> among multiple modules can generate unexpected locking dependency
> >> chain (like this report).
> > 
> > I do not agree with deprecating system-wide workqueues. I think that
> > all flush_workqueue(system_long_wq) calls should be reviewed since
> > these are deadlock-prone.
> > 
> > Thanks,
> > 
> > Bart.
> 

I second to Bart's request to do not deprecate system-wide workqueues.

Thanks

> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [syzbot] possible deadlock in worker_thread
  2022-02-12 17:14       ` Tetsuo Handa
  2022-02-13 15:33         ` Leon Romanovsky
@ 2022-02-13 23:06         ` Bart Van Assche
  2022-02-14  1:08           ` Tetsuo Handa
  1 sibling, 1 reply; 27+ messages in thread
From: Bart Van Assche @ 2022-02-13 23:06 UTC (permalink / raw)
  To: Tetsuo Handa, syzbot, jgg, linux-kernel, linux-rdma,
	syzkaller-bugs, Tejun Heo, Lai Jiangshan

On 2/12/22 09:14, Tetsuo Handa wrote:
> How can reviewing all flush_workqueue(system_long_wq) calls help?

It is allowed to queue blocking actions on system_long_wq. 
flush_workqueue(system_long_wq) can make a lower layer (e.g. ib_srp) 
wait on a blocking action from a higher layer (e.g. the loop driver) and 
thereby cause a deadlock. Hence my proposal to review all 
flush_workqueue(system_long_wq) calls.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [syzbot] possible deadlock in worker_thread
  2022-02-13 23:06         ` Bart Van Assche
@ 2022-02-14  1:08           ` Tetsuo Handa
  2022-02-14  3:44             ` Tejun Heo
  0 siblings, 1 reply; 27+ messages in thread
From: Tetsuo Handa @ 2022-02-14  1:08 UTC (permalink / raw)
  To: Bart Van Assche, syzbot, jgg, linux-kernel, linux-rdma,
	syzkaller-bugs, Tejun Heo, Lai Jiangshan

On 2022/02/14 8:06, Bart Van Assche wrote:
> On 2/12/22 09:14, Tetsuo Handa wrote:
>> How can reviewing all flush_workqueue(system_long_wq) calls help?
> 
> It is allowed to queue blocking actions on system_long_wq.

Correct.

> flush_workqueue(system_long_wq) can make a lower layer (e.g. ib_srp)
> wait on a blocking action from a higher layer (e.g. the loop driver)
> and thereby cause a deadlock.

Correct.

> Hence my proposal to review all flush_workqueue(system_long_wq) calls.

Maybe I'm misunderstanding what the "review" means.

My proposal is to "rewrite" any module which needs to call flush_workqueue()
on system-wide workqueues or call flush_work()/flush_*_work() which will
depend on system-wide workqueues.

That is, for example, "rewrite" ib_srp module not to call flush_workqueue(system_long_wq).

+	srp_tl_err_wq = alloc_workqueue("srp_tl_err_wq", 0, 0);

-	queue_work(system_long_wq, &target->tl_err_work);
+	queue_work(srp_tl_err_wq, &target->tl_err_work);

-	flush_workqueue(system_long_wq);
+	flush_workqueue(srp_tl_err_wq);

+	destroy_workqueue(srp_tl_err_wq);

Then, we can call WARN_ON() if e.g. flush_workqueue() is called on system-wide workqueues.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [syzbot] possible deadlock in worker_thread
  2022-02-14  1:08           ` Tetsuo Handa
@ 2022-02-14  3:44             ` Tejun Heo
  2022-02-14 13:36               ` Tetsuo Handa
  2022-02-17 12:27               ` [syzbot] possible deadlock in worker_thread Fabio M. De Francesco
  0 siblings, 2 replies; 27+ messages in thread
From: Tejun Heo @ 2022-02-14  3:44 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Bart Van Assche, syzbot, jgg, linux-kernel, linux-rdma,
	syzkaller-bugs, Lai Jiangshan

Hello,

On Mon, Feb 14, 2022 at 10:08:00AM +0900, Tetsuo Handa wrote:
> +	destroy_workqueue(srp_tl_err_wq);
> 
> Then, we can call WARN_ON() if e.g. flush_workqueue() is called on system-wide workqueues.

Yeah, this is the right thing to do. It makes no sense at all to call
flush_workqueue() on the shared workqueues as the caller has no idea what
it's gonna end up waiting for. It was on my todo list a long while ago but
slipped through the crack. If anyone wanna take a stab at it (including
scrubbing the existing users, of course), please be my guest.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [syzbot] possible deadlock in worker_thread
  2022-02-14  3:44             ` Tejun Heo
@ 2022-02-14 13:36               ` Tetsuo Handa
  2022-02-14 17:34                 ` Tejun Heo
  2022-02-17 12:27               ` [syzbot] possible deadlock in worker_thread Fabio M. De Francesco
  1 sibling, 1 reply; 27+ messages in thread
From: Tetsuo Handa @ 2022-02-14 13:36 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Bart Van Assche, syzbot, jgg, linux-kernel, linux-rdma,
	syzkaller-bugs, Lai Jiangshan

On 2022/02/14 12:44, Tejun Heo wrote:
> Hello,
> 
> On Mon, Feb 14, 2022 at 10:08:00AM +0900, Tetsuo Handa wrote:
>> +	destroy_workqueue(srp_tl_err_wq);
>>
>> Then, we can call WARN_ON() if e.g. flush_workqueue() is called on system-wide workqueues.
> 
> Yeah, this is the right thing to do. It makes no sense at all to call
> flush_workqueue() on the shared workqueues as the caller has no idea what
> it's gonna end up waiting for. It was on my todo list a long while ago but
> slipped through the crack. If anyone wanna take a stab at it (including
> scrubbing the existing users, of course), please be my guest.
> 
> Thanks.
> 

OK. Then, I propose below patch. If you are OK with this approach, I can
keep this via my tree as a linux-next only experimental patch for one or
two weeks, in order to see if someone complains.

From 95a3aa8d46c8479c95672305645247ba70312113 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Mon, 14 Feb 2022 22:28:21 +0900
Subject: [PATCH] workqueue: Warn on flushing system-wide workqueues

syzbot found a circular locking dependency which is caused by flushing
system_long_wq WQ [1]. Tejun Heo commented that it makes no sense at all
to call flush_workqueue() on the shared workqueues as the caller has no
idea what it's gonna end up waiting for.

Although there is flush_scheduled_work() which flushes system_wq WQ with
"Think twice before calling this function! It's very easy to get into
trouble if you don't take great care." warning message, it will be too
difficult to guarantee that all users safely flush system-wide WQs.

Therefore, let's change the direction to that developers had better use
their own WQs if flushing is inevitable. To give developers time to update
their modules, for now just emit a warning message when flush_workqueue()
is called on system-wide WQs. We will eventually convert this warning
message into WARN_ON() and kill flush_scheduled_work().

Link: https://syzkaller.appspot.com/bug?extid=831661966588c802aae9 [1]
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 kernel/workqueue.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 33f1106b4f99..5ef40b9a1842 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2805,6 +2805,37 @@ static bool flush_workqueue_prep_pwqs(struct workqueue_struct *wq,
 	return wait;
 }
 
+static void warn_if_flushing_global_workqueue(struct workqueue_struct *wq)
+{
+#ifdef CONFIG_PROVE_LOCKING
+	static DEFINE_RATELIMIT_STATE(flush_warn_rs, 600 * HZ, 1);
+	const char *name;
+
+	if (wq == system_wq)
+		name = "system_wq";
+	else if (wq == system_highpri_wq)
+		name = "system_highpri_wq";
+	else if (wq == system_long_wq)
+		name = "system_long_wq";
+	else if (wq == system_unbound_wq)
+		name = "system_unbound_wq";
+	else if (wq == system_freezable_wq)
+		name = "system_freezable_wq";
+	else if (wq == system_power_efficient_wq)
+		name = "system_power_efficient_wq";
+	else if (wq == system_freezable_power_efficient_wq)
+		name = "system_freezable_power_efficient_wq";
+	else
+		return;
+	ratelimit_set_flags(&flush_warn_rs, RATELIMIT_MSG_ON_RELEASE);
+	if (!__ratelimit(&flush_warn_rs))
+		return;
+	pr_warn("Since system-wide WQ is shared, flushing system-wide WQ can introduce unexpected locking dependency. Please replace %s usage in your code with your local WQ.\n",
+		name);
+	dump_stack();
+#endif
+}
+
 /**
  * flush_workqueue - ensure that any scheduled work has run to completion.
  * @wq: workqueue to flush
@@ -2824,6 +2855,8 @@ void flush_workqueue(struct workqueue_struct *wq)
 	if (WARN_ON(!wq_online))
 		return;
 
+	warn_if_flushing_global_workqueue(wq);
+
 	lock_map_acquire(&wq->lockdep_map);
 	lock_map_release(&wq->lockdep_map);
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [syzbot] possible deadlock in worker_thread
  2022-02-14 13:36               ` Tetsuo Handa
@ 2022-02-14 17:34                 ` Tejun Heo
  2022-02-15 10:26                   ` Tetsuo Handa
  2022-02-17 11:22                   ` [PATCH v2] workqueue: Warn flush attempt using system-wide workqueues Tetsuo Handa
  0 siblings, 2 replies; 27+ messages in thread
From: Tejun Heo @ 2022-02-14 17:34 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Bart Van Assche, syzbot, jgg, linux-kernel, linux-rdma,
	syzkaller-bugs, Lai Jiangshan

Hello,

On Mon, Feb 14, 2022 at 10:36:57PM +0900, Tetsuo Handa wrote:
> OK. Then, I propose below patch. If you are OK with this approach, I can
> keep this via my tree as a linux-next only experimental patch for one or
> two weeks, in order to see if someone complains.

I don't mind you testing that way but this and would much prefer this and
related changes in the wq tree.

> +static void warn_if_flushing_global_workqueue(struct workqueue_struct *wq)
> +{
> +#ifdef CONFIG_PROVE_LOCKING
> +	static DEFINE_RATELIMIT_STATE(flush_warn_rs, 600 * HZ, 1);
> +	const char *name;
> +
> +	if (wq == system_wq)
> +		name = "system_wq";
> +	else if (wq == system_highpri_wq)
> +		name = "system_highpri_wq";
> +	else if (wq == system_long_wq)
> +		name = "system_long_wq";
> +	else if (wq == system_unbound_wq)
> +		name = "system_unbound_wq";
> +	else if (wq == system_freezable_wq)
> +		name = "system_freezable_wq";
> +	else if (wq == system_power_efficient_wq)
> +		name = "system_power_efficient_wq";
> +	else if (wq == system_freezable_power_efficient_wq)
> +		name = "system_freezable_power_efficient_wq";
> +	else
> +		return;
> +	ratelimit_set_flags(&flush_warn_rs, RATELIMIT_MSG_ON_RELEASE);
> +	if (!__ratelimit(&flush_warn_rs))
> +		return;
> +	pr_warn("Since system-wide WQ is shared, flushing system-wide WQ can introduce unexpected locking dependency. Please replace %s usage in your code with your local WQ.\n",
> +		name);
> +	dump_stack();
> +#endif

Instead of doing the above, please add a wq flag to mark system wqs and
trigger the warning that way and I'd leave it regardless of PROVE_LOCKING.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [syzbot] possible deadlock in worker_thread
  2022-02-14 17:34                 ` Tejun Heo
@ 2022-02-15 10:26                   ` Tetsuo Handa
  2022-02-15 10:43                     ` Haakon Bugge
  2022-02-17 11:22                   ` [PATCH v2] workqueue: Warn flush attempt using system-wide workqueues Tetsuo Handa
  1 sibling, 1 reply; 27+ messages in thread
From: Tetsuo Handa @ 2022-02-15 10:26 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Bart Van Assche, syzbot, jgg, linux-kernel, linux-rdma,
	syzkaller-bugs, Lai Jiangshan

On 2022/02/15 2:34, Tejun Heo wrote:
> 
> Instead of doing the above, please add a wq flag to mark system wqs and
> trigger the warning that way and I'd leave it regardless of PROVE_LOCKING.
> 

Do you mean something like below?
I think the above is easier to understand (for developers) because

  The above prints variable's name (one of 'system_wq', 'system_highpri_wq',
  'system_long_wq', 'system_unbound_wq', 'system_freezable_wq', 'system_power_efficient_wq'
  or 'system_freezable_power_efficient_wq') with backtrace (which will be translated into
  filename:line format) while the below prints queue's name (one of 'events', 'events_highpri',
  'events_long', 'events_unbound', 'events_freezable', 'events_power_efficient' or
  'events_freezable_power_efficient'). If CONFIG_KALLSYMS_ALL=y, we can print
  variable's name using "%ps", but CONFIG_KALLSYMS_ALL is not always enabled.

  The flag must not be used by user-defined WQs, for destroy_workqueue() involves
  flush_workqueue(). If this flag is by error used on user-defined WQs, pointless
  warning will be printed. I didn't pass this flag when creating system-wide WQs
  because some developer might copy&paste the
    system_wq = alloc_workqueue("events", 0, 0);
  lines when converting.

---
 include/linux/workqueue.h |  1 +
 kernel/workqueue.c        | 24 ++++++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 7fee9b6cfede..9e33dcd417d3 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -339,6 +339,7 @@ enum {
 	__WQ_ORDERED		= 1 << 17, /* internal: workqueue is ordered */
 	__WQ_LEGACY		= 1 << 18, /* internal: create*_workqueue() */
 	__WQ_ORDERED_EXPLICIT	= 1 << 19, /* internal: alloc_ordered_workqueue() */
+	__WQ_SYSTEM_WIDE        = 1 << 20, /* interbal: don't flush this workqueue */
 
 	WQ_MAX_ACTIVE		= 512,	  /* I like 512, better ideas? */
 	WQ_MAX_UNBOUND_PER_CPU	= 4,	  /* 4 * #cpus for unbound wq */
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 33f1106b4f99..dbb9c6bb54a7 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2805,6 +2805,21 @@ static bool flush_workqueue_prep_pwqs(struct workqueue_struct *wq,
 	return wait;
 }
 
+static void warn_if_flushing_global_workqueue(struct workqueue_struct *wq)
+{
+	static DEFINE_RATELIMIT_STATE(flush_warn_rs, 600 * HZ, 1);
+
+	if (likely(!(wq->flags & __WQ_SYSTEM_WIDE)))
+		return;
+
+	ratelimit_set_flags(&flush_warn_rs, RATELIMIT_MSG_ON_RELEASE);
+	if (!__ratelimit(&flush_warn_rs))
+		return;
+	pr_warn("Since system-wide WQ is shared, flushing system-wide WQ can introduce unexpected locking dependency. Please replace %s usage in your code with your local WQ.\n",
+		wq->name);
+	dump_stack();
+}
+
 /**
  * flush_workqueue - ensure that any scheduled work has run to completion.
  * @wq: workqueue to flush
@@ -2824,6 +2839,8 @@ void flush_workqueue(struct workqueue_struct *wq)
 	if (WARN_ON(!wq_online))
 		return;
 
+	warn_if_flushing_global_workqueue(wq);
+
 	lock_map_acquire(&wq->lockdep_map);
 	lock_map_release(&wq->lockdep_map);
 
@@ -6070,6 +6087,13 @@ void __init workqueue_init_early(void)
 	       !system_unbound_wq || !system_freezable_wq ||
 	       !system_power_efficient_wq ||
 	       !system_freezable_power_efficient_wq);
+	system_wq->flags |= __WQ_SYSTEM_WIDE;
+	system_highpri_wq->flags |= __WQ_SYSTEM_WIDE;
+	system_long_wq->flags |= __WQ_SYSTEM_WIDE;
+	system_unbound_wq->flags |= __WQ_SYSTEM_WIDE;
+	system_freezable_wq->flags |= __WQ_SYSTEM_WIDE;
+	system_power_efficient_wq->flags |= __WQ_SYSTEM_WIDE;
+	system_freezable_power_efficient_wq->flags |= __WQ_SYSTEM_WIDE;
 }
 
 /**
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [syzbot] possible deadlock in worker_thread
  2022-02-15 10:26                   ` Tetsuo Handa
@ 2022-02-15 10:43                     ` Haakon Bugge
  2022-02-15 12:48                       ` Tetsuo Handa
  0 siblings, 1 reply; 27+ messages in thread
From: Haakon Bugge @ 2022-02-15 10:43 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Tejun Heo, Bart Van Assche, syzbot, Jason Gunthorpe, LKML,
	OFED mailing list, syzkaller-bugs, Lai Jiangshan



> On 15 Feb 2022, at 11:26, Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote:
> 
> On 2022/02/15 2:34, Tejun Heo wrote:
>> 
>> Instead of doing the above, please add a wq flag to mark system wqs and
>> trigger the warning that way and I'd leave it regardless of PROVE_LOCKING.
>> 
> 
> Do you mean something like below?
> I think the above is easier to understand (for developers) because
> 
>  The above prints variable's name (one of 'system_wq', 'system_highpri_wq',
>  'system_long_wq', 'system_unbound_wq', 'system_freezable_wq', 'system_power_efficient_wq'
>  or 'system_freezable_power_efficient_wq') with backtrace (which will be translated into
>  filename:line format) while the below prints queue's name (one of 'events', 'events_highpri',
>  'events_long', 'events_unbound', 'events_freezable', 'events_power_efficient' or
>  'events_freezable_power_efficient'). If CONFIG_KALLSYMS_ALL=y, we can print
>  variable's name using "%ps", but CONFIG_KALLSYMS_ALL is not always enabled.
> 
>  The flag must not be used by user-defined WQs, for destroy_workqueue() involves
>  flush_workqueue(). If this flag is by error used on user-defined WQs, pointless
>  warning will be printed. I didn't pass this flag when creating system-wide WQs
>  because some developer might copy&paste the
>    system_wq = alloc_workqueue("events", 0, 0);
>  lines when converting.
> 
> ---
> include/linux/workqueue.h |  1 +
> kernel/workqueue.c        | 24 ++++++++++++++++++++++++
> 2 files changed, 25 insertions(+)
> 
> diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
> index 7fee9b6cfede..9e33dcd417d3 100644
> --- a/include/linux/workqueue.h
> +++ b/include/linux/workqueue.h
> @@ -339,6 +339,7 @@ enum {
> 	__WQ_ORDERED		= 1 << 17, /* internal: workqueue is ordered */
> 	__WQ_LEGACY		= 1 << 18, /* internal: create*_workqueue() */
> 	__WQ_ORDERED_EXPLICIT	= 1 << 19, /* internal: alloc_ordered_workqueue() */
> +	__WQ_SYSTEM_WIDE        = 1 << 20, /* interbal: don't flush this workqueue */

s/interbal/internal/

> 
> 	WQ_MAX_ACTIVE		= 512,	  /* I like 512, better ideas? */
> 	WQ_MAX_UNBOUND_PER_CPU	= 4,	  /* 4 * #cpus for unbound wq */
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 33f1106b4f99..dbb9c6bb54a7 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -2805,6 +2805,21 @@ static bool flush_workqueue_prep_pwqs(struct workqueue_struct *wq,
> 	return wait;
> }
> 
> +static void warn_if_flushing_global_workqueue(struct workqueue_struct *wq)
> +{
> +	static DEFINE_RATELIMIT_STATE(flush_warn_rs, 600 * HZ, 1);
> +
> +	if (likely(!(wq->flags & __WQ_SYSTEM_WIDE)))
> +		return;
> +
> +	ratelimit_set_flags(&flush_warn_rs, RATELIMIT_MSG_ON_RELEASE);
> +	if (!__ratelimit(&flush_warn_rs))
> +		return;
> +	pr_warn("Since system-wide WQ is shared, flushing system-wide WQ can introduce unexpected locking dependency. Please replace %s usage in your code with your local WQ.\n",
> +		wq->name);
> +	dump_stack();
> +}
> +
> /**
>  * flush_workqueue - ensure that any scheduled work has run to completion.
>  * @wq: workqueue to flush
> @@ -2824,6 +2839,8 @@ void flush_workqueue(struct workqueue_struct *wq)
> 	if (WARN_ON(!wq_online))
> 		return;
> 
> +	warn_if_flushing_global_workqueue(wq);
> +
> 	lock_map_acquire(&wq->lockdep_map);
> 	lock_map_release(&wq->lockdep_map);
> 
> @@ -6070,6 +6087,13 @@ void __init workqueue_init_early(void)
> 	       !system_unbound_wq || !system_freezable_wq ||
> 	       !system_power_efficient_wq ||
> 	       !system_freezable_power_efficient_wq);
> +	system_wq->flags |= __WQ_SYSTEM_WIDE;
> +	system_highpri_wq->flags |= __WQ_SYSTEM_WIDE;
> +	system_long_wq->flags |= __WQ_SYSTEM_WIDE;
> +	system_unbound_wq->flags |= __WQ_SYSTEM_WIDE;
> +	system_freezable_wq->flags |= __WQ_SYSTEM_WIDE;
> +	system_power_efficient_wq->flags |= __WQ_SYSTEM_WIDE;
> +	system_freezable_power_efficient_wq->flags |= __WQ_SYSTEM_WIDE;

Better to OR this in, in the alloc_workqueue() call? Perceive the notion of an opaque object?


Thxs, Håkon

> }
> 
> /**
> -- 
> 2.32.0
> 
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [syzbot] possible deadlock in worker_thread
  2022-02-15 10:43                     ` Haakon Bugge
@ 2022-02-15 12:48                       ` Tetsuo Handa
  2022-02-15 17:05                         ` Bart Van Assche
  0 siblings, 1 reply; 27+ messages in thread
From: Tetsuo Handa @ 2022-02-15 12:48 UTC (permalink / raw)
  To: Haakon Bugge
  Cc: Tejun Heo, Bart Van Assche, syzbot, Jason Gunthorpe, LKML,
	OFED mailing list, syzkaller-bugs, Lai Jiangshan

On 2022/02/15 19:43, Haakon Bugge wrote:
>> @@ -6070,6 +6087,13 @@ void __init workqueue_init_early(void)
>> 	       !system_unbound_wq || !system_freezable_wq ||
>> 	       !system_power_efficient_wq ||
>> 	       !system_freezable_power_efficient_wq);
>> +	system_wq->flags |= __WQ_SYSTEM_WIDE;
>> +	system_highpri_wq->flags |= __WQ_SYSTEM_WIDE;
>> +	system_long_wq->flags |= __WQ_SYSTEM_WIDE;
>> +	system_unbound_wq->flags |= __WQ_SYSTEM_WIDE;
>> +	system_freezable_wq->flags |= __WQ_SYSTEM_WIDE;
>> +	system_power_efficient_wq->flags |= __WQ_SYSTEM_WIDE;
>> +	system_freezable_power_efficient_wq->flags |= __WQ_SYSTEM_WIDE;
> 
> Better to OR this in, in the alloc_workqueue() call? Perceive the notion of an opaque object?
> 

I do not want to do like

-	system_wq = alloc_workqueue("events", 0, 0);
+	system_wq = alloc_workqueue("events", __WQ_SYSTEM_WIDE, 0);

because the intent of this change is to ask developers to create their own WQs.
If I pass __WQ_SYSTEM_WIDE to alloc_workqueue(), developers might by error create like

	srp_tl_err_wq = alloc_workqueue("srp_tl_err_wq", __WQ_SYSTEM_WIDE, 0);

because of

	system_wq = alloc_workqueue("events", __WQ_SYSTEM_WIDE, 0);

line. The __WQ_SYSTEM_WIDE is absolutely meant to be applied to only 'system_wq',
'system_highpri_wq', 'system_long_wq', 'system_unbound_wq', 'system_freezable_wq',
'system_power_efficient_wq' and 'system_freezable_power_efficient_wq' WQs, in order
to avoid calling flush_workqueue() on these system-wide WQs.

I wish I could define __WQ_SYSTEM_WIDE inside kernel/workqueue_internal.h, but
it seems that kernel/workqueue_internal.h does not define internal flags.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [syzbot] possible deadlock in worker_thread
  2022-02-15 12:48                       ` Tetsuo Handa
@ 2022-02-15 17:05                         ` Bart Van Assche
  2022-02-15 22:05                           ` Tetsuo Handa
  2022-02-22 18:26                           ` Tejun Heo
  0 siblings, 2 replies; 27+ messages in thread
From: Bart Van Assche @ 2022-02-15 17:05 UTC (permalink / raw)
  To: Tetsuo Handa, Haakon Bugge
  Cc: Tejun Heo, syzbot, Jason Gunthorpe, LKML, OFED mailing list,
	syzkaller-bugs, Lai Jiangshan

On 2/15/22 04:48, Tetsuo Handa wrote:
> I do not want to do like
> 
> -	system_wq = alloc_workqueue("events", 0, 0);
> +	system_wq = alloc_workqueue("events", __WQ_SYSTEM_WIDE, 0);
> 
> because the intent of this change is to ask developers to create their own WQs.

I want more developers to use the system-wide workqueues since that 
reduces memory usage. That matters for embedded devices running Linux.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [syzbot] possible deadlock in worker_thread
  2022-02-15 17:05                         ` Bart Van Assche
@ 2022-02-15 22:05                           ` Tetsuo Handa
  2022-02-22 18:26                           ` Tejun Heo
  1 sibling, 0 replies; 27+ messages in thread
From: Tetsuo Handa @ 2022-02-15 22:05 UTC (permalink / raw)
  To: Bart Van Assche, Haakon Bugge
  Cc: Tejun Heo, syzbot, Jason Gunthorpe, LKML, OFED mailing list,
	syzkaller-bugs, Lai Jiangshan

On 2022/02/16 2:05, Bart Van Assche wrote:
> On 2/15/22 04:48, Tetsuo Handa wrote:
>> I do not want to do like
>>
>> -    system_wq = alloc_workqueue("events", 0, 0);
>> +    system_wq = alloc_workqueue("events", __WQ_SYSTEM_WIDE, 0);
>>
>> because the intent of this change is to ask developers to create their own WQs.
> 
> I want more developers to use the system-wide workqueues since that reduces memory usage. That matters for embedded devices running Linux.

Reserving a kernel thread for WQ_MEM_RECLAIM WQ might consume some memory,
but I don't think that creating a !WQ_MEM_RECLAIM WQ consumes much memory.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2] workqueue: Warn flush attempt using system-wide workqueues
  2022-02-14 17:34                 ` Tejun Heo
  2022-02-15 10:26                   ` Tetsuo Handa
@ 2022-02-17 11:22                   ` Tetsuo Handa
  2022-02-22 18:36                     ` Tejun Heo
       [not found]                     ` <CGME20220223212048eucas1p1fab5e35ff398eff57808a8f1125dd15f@eucas1p1.samsung.com>
  1 sibling, 2 replies; 27+ messages in thread
From: Tetsuo Handa @ 2022-02-17 11:22 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Bart Van Assche, jgg, linux-kernel, Lai Jiangshan, Haakon Bugge

syzbot found a circular locking dependency which is caused by flushing
system_long_wq WQ [1]. Tejun Heo commented that it makes no sense at all
to call flush_workqueue() on the shared workqueues as the caller has no
idea what it's gonna end up waiting for.

Although there is flush_scheduled_work() which flushes system_wq WQ with
"Think twice before calling this function! It's very easy to get into
trouble if you don't take great care." warning message, it will be too
difficult to guarantee that all users safely flush system-wide WQs.

Therefore, let's change the direction to that developers had better use
their own WQs if flushing is inevitable. To give developers time to update
their modules, for now just emit a warning message when flush_workqueue()
or flush_work() is called on system-wide WQs. We will eventually convert
this warning message into WARN_ON() and kill flush_scheduled_work().

Link: https://syzkaller.appspot.com/bug?extid=831661966588c802aae9 [1]
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
Changes in v2:
  Removed #ifdef CONFIG_PROVE_LOCKING=y check.
  Also check flush_work() attempt.
  Shorten warning message.
  Introduced a public WQ_ flag, which is initially meant for use by
  only system-wide WQs, but allows private WQs used by built-in modules
  to use this flag for detecting unexpected flush attempts if they want.

 include/linux/workqueue.h | 26 +++++++++++++------------
 kernel/workqueue.c        | 41 ++++++++++++++++++++++++++++-----------
 2 files changed, 44 insertions(+), 23 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 7fee9b6cfede..4b698917b9d5 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -335,6 +335,18 @@ enum {
 	 */
 	WQ_POWER_EFFICIENT	= 1 << 7,
 
+	/*
+	 * Since flush operation synchronously waits for completion, flushing
+	 * system-wide workqueues (e.g. system_wq) or a work on a system-wide
+	 * workqueue might introduce possibility of deadlock due to unexpected
+	 * locking dependency.
+	 *
+	 * This flag emits warning if flush operation is attempted. Don't set
+	 * this flag on user-defined workqueues, for destroy_workqueue() will
+	 * involve flush operation.
+	 */
+	WQ_WARN_FLUSH_ATTEMPT   = 1 << 8,
+
 	__WQ_DRAINING		= 1 << 16, /* internal: workqueue is draining */
 	__WQ_ORDERED		= 1 << 17, /* internal: workqueue is ordered */
 	__WQ_LEGACY		= 1 << 18, /* internal: create*_workqueue() */
@@ -569,18 +581,8 @@ static inline bool schedule_work(struct work_struct *work)
  * Forces execution of the kernel-global workqueue and blocks until its
  * completion.
  *
- * Think twice before calling this function!  It's very easy to get into
- * trouble if you don't take great care.  Either of the following situations
- * will lead to deadlock:
- *
- *	One of the work items currently on the workqueue needs to acquire
- *	a lock held by your code or its caller.
- *
- *	Your code is running in the context of a work routine.
- *
- * They will be detected by lockdep when they occur, but the first might not
- * occur very often.  It depends on what work items are on the workqueue and
- * what locks they need, which you have no control over.
+ * Please stop calling this function. If you need to flush, please use your
+ * own workqueue.
  *
  * In most situations flushing the entire workqueue is overkill; you merely
  * need to know that a particular work item isn't queued and isn't running.
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 33f1106b4f99..8e6e64372441 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2618,6 +2618,20 @@ static int rescuer_thread(void *__rescuer)
 	goto repeat;
 }
 
+static void warn_flush_attempt(struct workqueue_struct *wq)
+{
+	static DEFINE_RATELIMIT_STATE(flush_warn_rs, 600 * HZ, 1);
+
+
+	/* Use ratelimit for now in order not to flood warning messages. */
+	ratelimit_set_flags(&flush_warn_rs, RATELIMIT_MSG_ON_RELEASE);
+	if (!__ratelimit(&flush_warn_rs))
+		return;
+	/* Don't use WARN_ON() for now in order not to break kernel testing. */
+	pr_warn("Please do not flush %s WQ.\n", wq->name);
+	dump_stack();
+}
+
 /**
  * check_flush_dependency - check for flush dependency sanity
  * @target_wq: workqueue being flushed
@@ -2635,6 +2649,9 @@ static void check_flush_dependency(struct workqueue_struct *target_wq,
 	work_func_t target_func = target_work ? target_work->func : NULL;
 	struct worker *worker;
 
+	if (unlikely(target_wq->flags & WQ_WARN_FLUSH_ATTEMPT))
+		warn_flush_attempt(target_wq);
+
 	if (target_wq->flags & WQ_MEM_RECLAIM)
 		return;
 
@@ -6054,18 +6071,20 @@ void __init workqueue_init_early(void)
 		ordered_wq_attrs[i] = attrs;
 	}
 
-	system_wq = alloc_workqueue("events", 0, 0);
-	system_highpri_wq = alloc_workqueue("events_highpri", WQ_HIGHPRI, 0);
-	system_long_wq = alloc_workqueue("events_long", 0, 0);
-	system_unbound_wq = alloc_workqueue("events_unbound", WQ_UNBOUND,
+	system_wq = alloc_workqueue("events", WQ_WARN_FLUSH_ATTEMPT, 0);
+	system_highpri_wq = alloc_workqueue("events_highpri",
+					    WQ_WARN_FLUSH_ATTEMPT | WQ_HIGHPRI, 0);
+	system_long_wq = alloc_workqueue("events_long", WQ_WARN_FLUSH_ATTEMPT, 0);
+	system_unbound_wq = alloc_workqueue("events_unbound", WQ_WARN_FLUSH_ATTEMPT | WQ_UNBOUND,
 					    WQ_UNBOUND_MAX_ACTIVE);
-	system_freezable_wq = alloc_workqueue("events_freezable",
-					      WQ_FREEZABLE, 0);
-	system_power_efficient_wq = alloc_workqueue("events_power_efficient",
-					      WQ_POWER_EFFICIENT, 0);
-	system_freezable_power_efficient_wq = alloc_workqueue("events_freezable_power_efficient",
-					      WQ_FREEZABLE | WQ_POWER_EFFICIENT,
-					      0);
+	system_freezable_wq =
+		alloc_workqueue("events_freezable", WQ_WARN_FLUSH_ATTEMPT | WQ_FREEZABLE, 0);
+	system_power_efficient_wq =
+		alloc_workqueue("events_power_efficient",
+				WQ_WARN_FLUSH_ATTEMPT | WQ_POWER_EFFICIENT, 0);
+	system_freezable_power_efficient_wq =
+		alloc_workqueue("events_freezable_power_efficient",
+				WQ_WARN_FLUSH_ATTEMPT | WQ_FREEZABLE | WQ_POWER_EFFICIENT, 0);
 	BUG_ON(!system_wq || !system_highpri_wq || !system_long_wq ||
 	       !system_unbound_wq || !system_freezable_wq ||
 	       !system_power_efficient_wq ||
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [syzbot] possible deadlock in worker_thread
  2022-02-14  3:44             ` Tejun Heo
  2022-02-14 13:36               ` Tetsuo Handa
@ 2022-02-17 12:27               ` Fabio M. De Francesco
  2022-02-22 18:30                 ` Tejun Heo
  1 sibling, 1 reply; 27+ messages in thread
From: Fabio M. De Francesco @ 2022-02-17 12:27 UTC (permalink / raw)
  To: Tetsuo Handa, syzkaller-bugs
  Cc: Bart Van Assche, syzbot, jgg, linux-kernel, linux-rdma,
	syzkaller-bugs, Lai Jiangshan, Tejun Heo

On lunedì 14 febbraio 2022 04:44:25 CET Tejun Heo wrote:
> Hello,
> 
> On Mon, Feb 14, 2022 at 10:08:00AM +0900, Tetsuo Handa wrote:
> > +	destroy_workqueue(srp_tl_err_wq);
> > 
> > Then, we can call WARN_ON() if e.g. flush_workqueue() is called on system-wide workqueues.
> 
> Yeah, this is the right thing to do. It makes no sense at all to call
> flush_workqueue() on the shared workqueues as the caller has no idea what
> it's gonna end up waiting for. It was on my todo list a long while ago but
> slipped through the crack. If anyone wanna take a stab at it (including
> scrubbing the existing users, of course), please be my guest.
> 

Just to think and understand... what if the system-wide WQ were allocated as unbound 
ordered (i.e., as in alloc_ordered_workqueue()) with "max_active" of one?

1) Would it solve the locks dependency problem?
2) Would it introduce performance penalties (bottlenecks)?

Greetings,

Fabio

>
> Thanks.
> 
> -- 
> tejun
> 




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [syzbot] possible deadlock in worker_thread
  2022-02-15 17:05                         ` Bart Van Assche
  2022-02-15 22:05                           ` Tetsuo Handa
@ 2022-02-22 18:26                           ` Tejun Heo
  1 sibling, 0 replies; 27+ messages in thread
From: Tejun Heo @ 2022-02-22 18:26 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Tetsuo Handa, Haakon Bugge, syzbot, Jason Gunthorpe, LKML,
	OFED mailing list, syzkaller-bugs, Lai Jiangshan

On Tue, Feb 15, 2022 at 09:05:38AM -0800, Bart Van Assche wrote:
> On 2/15/22 04:48, Tetsuo Handa wrote:
> > I do not want to do like
> > 
> > -	system_wq = alloc_workqueue("events", 0, 0);
> > +	system_wq = alloc_workqueue("events", __WQ_SYSTEM_WIDE, 0);
> > 
> > because the intent of this change is to ask developers to create their own WQs.
> 
> I want more developers to use the system-wide workqueues since that reduces
> memory usage. That matters for embedded devices running Linux.

Each wq is just a frontend interface to backend shard pool and doesn't
consume a lot of memory. The only consumption which would matter is when
WQ_MEM_RECLAIM is specified which creates its dedicated rescuer thread to
guarantee forward progress under memory contention, but we aren't talking
about those here.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [syzbot] possible deadlock in worker_thread
  2022-02-17 12:27               ` [syzbot] possible deadlock in worker_thread Fabio M. De Francesco
@ 2022-02-22 18:30                 ` Tejun Heo
  0 siblings, 0 replies; 27+ messages in thread
From: Tejun Heo @ 2022-02-22 18:30 UTC (permalink / raw)
  To: Fabio M. De Francesco
  Cc: Tetsuo Handa, syzkaller-bugs, Bart Van Assche, syzbot, jgg,
	linux-kernel, linux-rdma, Lai Jiangshan

Hello,

On Thu, Feb 17, 2022 at 01:27:08PM +0100, Fabio M. De Francesco wrote:
> Just to think and understand... what if the system-wide WQ were allocated as unbound 
> ordered (i.e., as in alloc_ordered_workqueue()) with "max_active" of one?
> 
> 1) Would it solve the locks dependency problem?

It'll actually make deadlocks a lot more prevalent. Some work items take
more than one work to complete (e.g. flushing another work directly or
waiting for something which must be completed by something else which may
involve a system work item) and system wq's max active must be high enough
that all those chains taking place at the same time should require fewer
number of work items than max_active.

> 2) Would it introduce performance penalties (bottlenecks)?

I'd be surprised it wouldn't cause at least notcieable latency increases for
some workloads.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2] workqueue: Warn flush attempt using system-wide workqueues
  2022-02-17 11:22                   ` [PATCH v2] workqueue: Warn flush attempt using system-wide workqueues Tetsuo Handa
@ 2022-02-22 18:36                     ` Tejun Heo
       [not found]                     ` <CGME20220223212048eucas1p1fab5e35ff398eff57808a8f1125dd15f@eucas1p1.samsung.com>
  1 sibling, 0 replies; 27+ messages in thread
From: Tejun Heo @ 2022-02-22 18:36 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Bart Van Assche, jgg, linux-kernel, Lai Jiangshan, Haakon Bugge

On Thu, Feb 17, 2022 at 08:22:30PM +0900, Tetsuo Handa wrote:
> @@ -335,6 +335,18 @@ enum {
>  	 */
>  	WQ_POWER_EFFICIENT	= 1 << 7,
>  
> +	/*
> +	 * Since flush operation synchronously waits for completion, flushing
> +	 * system-wide workqueues (e.g. system_wq) or a work on a system-wide
> +	 * workqueue might introduce possibility of deadlock due to unexpected
> +	 * locking dependency.
> +	 *
> +	 * This flag emits warning if flush operation is attempted. Don't set
> +	 * this flag on user-defined workqueues, for destroy_workqueue() will
> +	 * involve flush operation.
> +	 */
> +	WQ_WARN_FLUSH_ATTEMPT   = 1 << 8,

Maybe just __WQ_NO_FLUSH?

>  	__WQ_DRAINING		= 1 << 16, /* internal: workqueue is draining */
>  	__WQ_ORDERED		= 1 << 17, /* internal: workqueue is ordered */
>  	__WQ_LEGACY		= 1 << 18, /* internal: create*_workqueue() */
...
> +static void warn_flush_attempt(struct workqueue_struct *wq)
> +{
> +	static DEFINE_RATELIMIT_STATE(flush_warn_rs, 600 * HZ, 1);
> +
> +
> +	/* Use ratelimit for now in order not to flood warning messages. */
> +	ratelimit_set_flags(&flush_warn_rs, RATELIMIT_MSG_ON_RELEASE);
> +	if (!__ratelimit(&flush_warn_rs))
> +		return;

If you're worried about spamming console while conversion is in progress, we
can just print the immediate (and maybe one more) caller with %pf and
__builtin_return_address() so that it only prints out one line.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2] workqueue: Warn flush attempt using system-wide workqueues
       [not found]                     ` <CGME20220223212048eucas1p1fab5e35ff398eff57808a8f1125dd15f@eucas1p1.samsung.com>
@ 2022-02-23 21:20                         ` Marek Szyprowski
  0 siblings, 0 replies; 27+ messages in thread
From: Marek Szyprowski @ 2022-02-23 21:20 UTC (permalink / raw)
  To: Tetsuo Handa, Tejun Heo, linux-rpi-kernel
  Cc: Nicolas Saenz Julienne, Bart Van Assche, Lai Jiangshan,
	linux-kernel, DRI mailing list, jgg, Haakon Bugge

Hi All,

On 17.02.2022 12:22, Tetsuo Handa wrote:
> syzbot found a circular locking dependency which is caused by flushing
> system_long_wq WQ [1]. Tejun Heo commented that it makes no sense at all
> to call flush_workqueue() on the shared workqueues as the caller has no
> idea what it's gonna end up waiting for.
>
> Although there is flush_scheduled_work() which flushes system_wq WQ with
> "Think twice before calling this function! It's very easy to get into
> trouble if you don't take great care." warning message, it will be too
> difficult to guarantee that all users safely flush system-wide WQs.
>
> Therefore, let's change the direction to that developers had better use
> their own WQs if flushing is inevitable. To give developers time to update
> their modules, for now just emit a warning message when flush_workqueue()
> or flush_work() is called on system-wide WQs. We will eventually convert
> this warning message into WARN_ON() and kill flush_scheduled_work().
>
> Link: https://syzkaller.appspot.com/bug?extid=831661966588c802aae9 [1]
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

This patch landed in linux next-20220222 as commit 4a6a0ce060e4 
("workqueue: Warn flush attempt using system-wide workqueues"). As it 
might be expected it exposed some calls to flush work. However it also 
causes boot failure of the Raspberry Pi 3 and 4 boards (kernel compiled 
from arm64/defconfig). In the log I see one call from the 
deferred_probe_initcall(), but it isn't critical for the boot process. 
The deadlock occurs when DRM registers emulated framebuffer on RPi4. 
RPi3 boots a bit further, to the shell prompt, but then the console is 
freezed. Reverting this patch on top of linux-next 'fixes' the boot.

> ---
> Changes in v2:
>    Removed #ifdef CONFIG_PROVE_LOCKING=y check.
>    Also check flush_work() attempt.
>    Shorten warning message.
>    Introduced a public WQ_ flag, which is initially meant for use by
>    only system-wide WQs, but allows private WQs used by built-in modules
>    to use this flag for detecting unexpected flush attempts if they want.
>
>   include/linux/workqueue.h | 26 +++++++++++++------------
>   kernel/workqueue.c        | 41 ++++++++++++++++++++++++++++-----------
>   2 files changed, 44 insertions(+), 23 deletions(-)
>
> diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
> index 7fee9b6cfede..4b698917b9d5 100644
> --- a/include/linux/workqueue.h
> +++ b/include/linux/workqueue.h
> @@ -335,6 +335,18 @@ enum {
>   	 */
>   	WQ_POWER_EFFICIENT	= 1 << 7,
>   
> +	/*
> +	 * Since flush operation synchronously waits for completion, flushing
> +	 * system-wide workqueues (e.g. system_wq) or a work on a system-wide
> +	 * workqueue might introduce possibility of deadlock due to unexpected
> +	 * locking dependency.
> +	 *
> +	 * This flag emits warning if flush operation is attempted. Don't set
> +	 * this flag on user-defined workqueues, for destroy_workqueue() will
> +	 * involve flush operation.
> +	 */
> +	WQ_WARN_FLUSH_ATTEMPT   = 1 << 8,
> +
>   	__WQ_DRAINING		= 1 << 16, /* internal: workqueue is draining */
>   	__WQ_ORDERED		= 1 << 17, /* internal: workqueue is ordered */
>   	__WQ_LEGACY		= 1 << 18, /* internal: create*_workqueue() */
> @@ -569,18 +581,8 @@ static inline bool schedule_work(struct work_struct *work)
>    * Forces execution of the kernel-global workqueue and blocks until its
>    * completion.
>    *
> - * Think twice before calling this function!  It's very easy to get into
> - * trouble if you don't take great care.  Either of the following situations
> - * will lead to deadlock:
> - *
> - *	One of the work items currently on the workqueue needs to acquire
> - *	a lock held by your code or its caller.
> - *
> - *	Your code is running in the context of a work routine.
> - *
> - * They will be detected by lockdep when they occur, but the first might not
> - * occur very often.  It depends on what work items are on the workqueue and
> - * what locks they need, which you have no control over.
> + * Please stop calling this function. If you need to flush, please use your
> + * own workqueue.
>    *
>    * In most situations flushing the entire workqueue is overkill; you merely
>    * need to know that a particular work item isn't queued and isn't running.
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 33f1106b4f99..8e6e64372441 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -2618,6 +2618,20 @@ static int rescuer_thread(void *__rescuer)
>   	goto repeat;
>   }
>   
> +static void warn_flush_attempt(struct workqueue_struct *wq)
> +{
> +	static DEFINE_RATELIMIT_STATE(flush_warn_rs, 600 * HZ, 1);
> +
> +
> +	/* Use ratelimit for now in order not to flood warning messages. */
> +	ratelimit_set_flags(&flush_warn_rs, RATELIMIT_MSG_ON_RELEASE);
> +	if (!__ratelimit(&flush_warn_rs))
> +		return;
> +	/* Don't use WARN_ON() for now in order not to break kernel testing. */
> +	pr_warn("Please do not flush %s WQ.\n", wq->name);
> +	dump_stack();
> +}
> +
>   /**
>    * check_flush_dependency - check for flush dependency sanity
>    * @target_wq: workqueue being flushed
> @@ -2635,6 +2649,9 @@ static void check_flush_dependency(struct workqueue_struct *target_wq,
>   	work_func_t target_func = target_work ? target_work->func : NULL;
>   	struct worker *worker;
>   
> +	if (unlikely(target_wq->flags & WQ_WARN_FLUSH_ATTEMPT))
> +		warn_flush_attempt(target_wq);
> +
>   	if (target_wq->flags & WQ_MEM_RECLAIM)
>   		return;
>   
> @@ -6054,18 +6071,20 @@ void __init workqueue_init_early(void)
>   		ordered_wq_attrs[i] = attrs;
>   	}
>   
> -	system_wq = alloc_workqueue("events", 0, 0);
> -	system_highpri_wq = alloc_workqueue("events_highpri", WQ_HIGHPRI, 0);
> -	system_long_wq = alloc_workqueue("events_long", 0, 0);
> -	system_unbound_wq = alloc_workqueue("events_unbound", WQ_UNBOUND,
> +	system_wq = alloc_workqueue("events", WQ_WARN_FLUSH_ATTEMPT, 0);
> +	system_highpri_wq = alloc_workqueue("events_highpri",
> +					    WQ_WARN_FLUSH_ATTEMPT | WQ_HIGHPRI, 0);
> +	system_long_wq = alloc_workqueue("events_long", WQ_WARN_FLUSH_ATTEMPT, 0);
> +	system_unbound_wq = alloc_workqueue("events_unbound", WQ_WARN_FLUSH_ATTEMPT | WQ_UNBOUND,
>   					    WQ_UNBOUND_MAX_ACTIVE);
> -	system_freezable_wq = alloc_workqueue("events_freezable",
> -					      WQ_FREEZABLE, 0);
> -	system_power_efficient_wq = alloc_workqueue("events_power_efficient",
> -					      WQ_POWER_EFFICIENT, 0);
> -	system_freezable_power_efficient_wq = alloc_workqueue("events_freezable_power_efficient",
> -					      WQ_FREEZABLE | WQ_POWER_EFFICIENT,
> -					      0);
> +	system_freezable_wq =
> +		alloc_workqueue("events_freezable", WQ_WARN_FLUSH_ATTEMPT | WQ_FREEZABLE, 0);
> +	system_power_efficient_wq =
> +		alloc_workqueue("events_power_efficient",
> +				WQ_WARN_FLUSH_ATTEMPT | WQ_POWER_EFFICIENT, 0);
> +	system_freezable_power_efficient_wq =
> +		alloc_workqueue("events_freezable_power_efficient",
> +				WQ_WARN_FLUSH_ATTEMPT | WQ_FREEZABLE | WQ_POWER_EFFICIENT, 0);
>   	BUG_ON(!system_wq || !system_highpri_wq || !system_long_wq ||
>   	       !system_unbound_wq || !system_freezable_wq ||
>   	       !system_power_efficient_wq ||

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2] workqueue: Warn flush attempt using system-wide workqueues
@ 2022-02-23 21:20                         ` Marek Szyprowski
  0 siblings, 0 replies; 27+ messages in thread
From: Marek Szyprowski @ 2022-02-23 21:20 UTC (permalink / raw)
  To: Tetsuo Handa, Tejun Heo, linux-rpi-kernel
  Cc: Bart Van Assche, jgg, linux-kernel, Lai Jiangshan, Haakon Bugge,
	DRI mailing list, Nicolas Saenz Julienne

Hi All,

On 17.02.2022 12:22, Tetsuo Handa wrote:
> syzbot found a circular locking dependency which is caused by flushing
> system_long_wq WQ [1]. Tejun Heo commented that it makes no sense at all
> to call flush_workqueue() on the shared workqueues as the caller has no
> idea what it's gonna end up waiting for.
>
> Although there is flush_scheduled_work() which flushes system_wq WQ with
> "Think twice before calling this function! It's very easy to get into
> trouble if you don't take great care." warning message, it will be too
> difficult to guarantee that all users safely flush system-wide WQs.
>
> Therefore, let's change the direction to that developers had better use
> their own WQs if flushing is inevitable. To give developers time to update
> their modules, for now just emit a warning message when flush_workqueue()
> or flush_work() is called on system-wide WQs. We will eventually convert
> this warning message into WARN_ON() and kill flush_scheduled_work().
>
> Link: https://syzkaller.appspot.com/bug?extid=831661966588c802aae9 [1]
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

This patch landed in linux next-20220222 as commit 4a6a0ce060e4 
("workqueue: Warn flush attempt using system-wide workqueues"). As it 
might be expected it exposed some calls to flush work. However it also 
causes boot failure of the Raspberry Pi 3 and 4 boards (kernel compiled 
from arm64/defconfig). In the log I see one call from the 
deferred_probe_initcall(), but it isn't critical for the boot process. 
The deadlock occurs when DRM registers emulated framebuffer on RPi4. 
RPi3 boots a bit further, to the shell prompt, but then the console is 
freezed. Reverting this patch on top of linux-next 'fixes' the boot.

> ---
> Changes in v2:
>    Removed #ifdef CONFIG_PROVE_LOCKING=y check.
>    Also check flush_work() attempt.
>    Shorten warning message.
>    Introduced a public WQ_ flag, which is initially meant for use by
>    only system-wide WQs, but allows private WQs used by built-in modules
>    to use this flag for detecting unexpected flush attempts if they want.
>
>   include/linux/workqueue.h | 26 +++++++++++++------------
>   kernel/workqueue.c        | 41 ++++++++++++++++++++++++++++-----------
>   2 files changed, 44 insertions(+), 23 deletions(-)
>
> diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
> index 7fee9b6cfede..4b698917b9d5 100644
> --- a/include/linux/workqueue.h
> +++ b/include/linux/workqueue.h
> @@ -335,6 +335,18 @@ enum {
>   	 */
>   	WQ_POWER_EFFICIENT	= 1 << 7,
>   
> +	/*
> +	 * Since flush operation synchronously waits for completion, flushing
> +	 * system-wide workqueues (e.g. system_wq) or a work on a system-wide
> +	 * workqueue might introduce possibility of deadlock due to unexpected
> +	 * locking dependency.
> +	 *
> +	 * This flag emits warning if flush operation is attempted. Don't set
> +	 * this flag on user-defined workqueues, for destroy_workqueue() will
> +	 * involve flush operation.
> +	 */
> +	WQ_WARN_FLUSH_ATTEMPT   = 1 << 8,
> +
>   	__WQ_DRAINING		= 1 << 16, /* internal: workqueue is draining */
>   	__WQ_ORDERED		= 1 << 17, /* internal: workqueue is ordered */
>   	__WQ_LEGACY		= 1 << 18, /* internal: create*_workqueue() */
> @@ -569,18 +581,8 @@ static inline bool schedule_work(struct work_struct *work)
>    * Forces execution of the kernel-global workqueue and blocks until its
>    * completion.
>    *
> - * Think twice before calling this function!  It's very easy to get into
> - * trouble if you don't take great care.  Either of the following situations
> - * will lead to deadlock:
> - *
> - *	One of the work items currently on the workqueue needs to acquire
> - *	a lock held by your code or its caller.
> - *
> - *	Your code is running in the context of a work routine.
> - *
> - * They will be detected by lockdep when they occur, but the first might not
> - * occur very often.  It depends on what work items are on the workqueue and
> - * what locks they need, which you have no control over.
> + * Please stop calling this function. If you need to flush, please use your
> + * own workqueue.
>    *
>    * In most situations flushing the entire workqueue is overkill; you merely
>    * need to know that a particular work item isn't queued and isn't running.
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 33f1106b4f99..8e6e64372441 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -2618,6 +2618,20 @@ static int rescuer_thread(void *__rescuer)
>   	goto repeat;
>   }
>   
> +static void warn_flush_attempt(struct workqueue_struct *wq)
> +{
> +	static DEFINE_RATELIMIT_STATE(flush_warn_rs, 600 * HZ, 1);
> +
> +
> +	/* Use ratelimit for now in order not to flood warning messages. */
> +	ratelimit_set_flags(&flush_warn_rs, RATELIMIT_MSG_ON_RELEASE);
> +	if (!__ratelimit(&flush_warn_rs))
> +		return;
> +	/* Don't use WARN_ON() for now in order not to break kernel testing. */
> +	pr_warn("Please do not flush %s WQ.\n", wq->name);
> +	dump_stack();
> +}
> +
>   /**
>    * check_flush_dependency - check for flush dependency sanity
>    * @target_wq: workqueue being flushed
> @@ -2635,6 +2649,9 @@ static void check_flush_dependency(struct workqueue_struct *target_wq,
>   	work_func_t target_func = target_work ? target_work->func : NULL;
>   	struct worker *worker;
>   
> +	if (unlikely(target_wq->flags & WQ_WARN_FLUSH_ATTEMPT))
> +		warn_flush_attempt(target_wq);
> +
>   	if (target_wq->flags & WQ_MEM_RECLAIM)
>   		return;
>   
> @@ -6054,18 +6071,20 @@ void __init workqueue_init_early(void)
>   		ordered_wq_attrs[i] = attrs;
>   	}
>   
> -	system_wq = alloc_workqueue("events", 0, 0);
> -	system_highpri_wq = alloc_workqueue("events_highpri", WQ_HIGHPRI, 0);
> -	system_long_wq = alloc_workqueue("events_long", 0, 0);
> -	system_unbound_wq = alloc_workqueue("events_unbound", WQ_UNBOUND,
> +	system_wq = alloc_workqueue("events", WQ_WARN_FLUSH_ATTEMPT, 0);
> +	system_highpri_wq = alloc_workqueue("events_highpri",
> +					    WQ_WARN_FLUSH_ATTEMPT | WQ_HIGHPRI, 0);
> +	system_long_wq = alloc_workqueue("events_long", WQ_WARN_FLUSH_ATTEMPT, 0);
> +	system_unbound_wq = alloc_workqueue("events_unbound", WQ_WARN_FLUSH_ATTEMPT | WQ_UNBOUND,
>   					    WQ_UNBOUND_MAX_ACTIVE);
> -	system_freezable_wq = alloc_workqueue("events_freezable",
> -					      WQ_FREEZABLE, 0);
> -	system_power_efficient_wq = alloc_workqueue("events_power_efficient",
> -					      WQ_POWER_EFFICIENT, 0);
> -	system_freezable_power_efficient_wq = alloc_workqueue("events_freezable_power_efficient",
> -					      WQ_FREEZABLE | WQ_POWER_EFFICIENT,
> -					      0);
> +	system_freezable_wq =
> +		alloc_workqueue("events_freezable", WQ_WARN_FLUSH_ATTEMPT | WQ_FREEZABLE, 0);
> +	system_power_efficient_wq =
> +		alloc_workqueue("events_power_efficient",
> +				WQ_WARN_FLUSH_ATTEMPT | WQ_POWER_EFFICIENT, 0);
> +	system_freezable_power_efficient_wq =
> +		alloc_workqueue("events_freezable_power_efficient",
> +				WQ_WARN_FLUSH_ATTEMPT | WQ_FREEZABLE | WQ_POWER_EFFICIENT, 0);
>   	BUG_ON(!system_wq || !system_highpri_wq || !system_long_wq ||
>   	       !system_unbound_wq || !system_freezable_wq ||
>   	       !system_power_efficient_wq ||

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2] workqueue: Warn flush attempt using system-wide workqueues
  2022-02-23 21:20                         ` Marek Szyprowski
@ 2022-02-23 21:35                           ` Tejun Heo
  -1 siblings, 0 replies; 27+ messages in thread
From: Tejun Heo @ 2022-02-23 21:35 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Tetsuo Handa, linux-rpi-kernel, Bart Van Assche, jgg,
	linux-kernel, Lai Jiangshan, Haakon Bugge, DRI mailing list,
	Nicolas Saenz Julienne

On Wed, Feb 23, 2022 at 10:20:47PM +0100, Marek Szyprowski wrote:
> Hi All,
> 
> On 17.02.2022 12:22, Tetsuo Handa wrote:
> > syzbot found a circular locking dependency which is caused by flushing
> > system_long_wq WQ [1]. Tejun Heo commented that it makes no sense at all
> > to call flush_workqueue() on the shared workqueues as the caller has no
> > idea what it's gonna end up waiting for.
> >
> > Although there is flush_scheduled_work() which flushes system_wq WQ with
> > "Think twice before calling this function! It's very easy to get into
> > trouble if you don't take great care." warning message, it will be too
> > difficult to guarantee that all users safely flush system-wide WQs.
> >
> > Therefore, let's change the direction to that developers had better use
> > their own WQs if flushing is inevitable. To give developers time to update
> > their modules, for now just emit a warning message when flush_workqueue()
> > or flush_work() is called on system-wide WQs. We will eventually convert
> > this warning message into WARN_ON() and kill flush_scheduled_work().
> >
> > Link: https://syzkaller.appspot.com/bug?extid=831661966588c802aae9 [1]
> > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> 
> This patch landed in linux next-20220222 as commit 4a6a0ce060e4 
> ("workqueue: Warn flush attempt using system-wide workqueues"). As it 
> might be expected it exposed some calls to flush work. However it also 
> causes boot failure of the Raspberry Pi 3 and 4 boards (kernel compiled 
> from arm64/defconfig). In the log I see one call from the 
> deferred_probe_initcall(), but it isn't critical for the boot process. 
> The deadlock occurs when DRM registers emulated framebuffer on RPi4. 
> RPi3 boots a bit further, to the shell prompt, but then the console is 
> freezed. Reverting this patch on top of linux-next 'fixes' the boot.

Tetsuo, can you please revert the patch? The patch is incorrect in that it's
triggering also on work item flushes, not just workqueue flushes.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2] workqueue: Warn flush attempt using system-wide workqueues
@ 2022-02-23 21:35                           ` Tejun Heo
  0 siblings, 0 replies; 27+ messages in thread
From: Tejun Heo @ 2022-02-23 21:35 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Nicolas Saenz Julienne, Bart Van Assche, Tetsuo Handa,
	Lai Jiangshan, linux-kernel, DRI mailing list, jgg, Haakon Bugge,
	linux-rpi-kernel

On Wed, Feb 23, 2022 at 10:20:47PM +0100, Marek Szyprowski wrote:
> Hi All,
> 
> On 17.02.2022 12:22, Tetsuo Handa wrote:
> > syzbot found a circular locking dependency which is caused by flushing
> > system_long_wq WQ [1]. Tejun Heo commented that it makes no sense at all
> > to call flush_workqueue() on the shared workqueues as the caller has no
> > idea what it's gonna end up waiting for.
> >
> > Although there is flush_scheduled_work() which flushes system_wq WQ with
> > "Think twice before calling this function! It's very easy to get into
> > trouble if you don't take great care." warning message, it will be too
> > difficult to guarantee that all users safely flush system-wide WQs.
> >
> > Therefore, let's change the direction to that developers had better use
> > their own WQs if flushing is inevitable. To give developers time to update
> > their modules, for now just emit a warning message when flush_workqueue()
> > or flush_work() is called on system-wide WQs. We will eventually convert
> > this warning message into WARN_ON() and kill flush_scheduled_work().
> >
> > Link: https://syzkaller.appspot.com/bug?extid=831661966588c802aae9 [1]
> > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> 
> This patch landed in linux next-20220222 as commit 4a6a0ce060e4 
> ("workqueue: Warn flush attempt using system-wide workqueues"). As it 
> might be expected it exposed some calls to flush work. However it also 
> causes boot failure of the Raspberry Pi 3 and 4 boards (kernel compiled 
> from arm64/defconfig). In the log I see one call from the 
> deferred_probe_initcall(), but it isn't critical for the boot process. 
> The deadlock occurs when DRM registers emulated framebuffer on RPi4. 
> RPi3 boots a bit further, to the shell prompt, but then the console is 
> freezed. Reverting this patch on top of linux-next 'fixes' the boot.

Tetsuo, can you please revert the patch? The patch is incorrect in that it's
triggering also on work item flushes, not just workqueue flushes.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2] workqueue: Warn flush attempt using system-wide workqueues
  2022-02-23 21:35                           ` Tejun Heo
@ 2022-02-23 22:06                             ` Tetsuo Handa
  -1 siblings, 0 replies; 27+ messages in thread
From: Tetsuo Handa @ 2022-02-23 22:06 UTC (permalink / raw)
  To: Tejun Heo, Marek Szyprowski
  Cc: linux-rpi-kernel, Bart Van Assche, jgg, linux-kernel,
	Lai Jiangshan, Haakon Bugge, DRI mailing list,
	Nicolas Saenz Julienne

On 2022/02/24 6:35, Tejun Heo wrote:
> Tetsuo, can you please revert the patch? The patch is incorrect in that it's
> triggering also on work item flushes, not just workqueue flushes.

OK. I removed these patches from my tree.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2] workqueue: Warn flush attempt using system-wide workqueues
@ 2022-02-23 22:06                             ` Tetsuo Handa
  0 siblings, 0 replies; 27+ messages in thread
From: Tetsuo Handa @ 2022-02-23 22:06 UTC (permalink / raw)
  To: Tejun Heo, Marek Szyprowski
  Cc: Nicolas Saenz Julienne, Bart Van Assche, Lai Jiangshan,
	linux-kernel, DRI mailing list, jgg, Haakon Bugge,
	linux-rpi-kernel

On 2022/02/24 6:35, Tejun Heo wrote:
> Tetsuo, can you please revert the patch? The patch is incorrect in that it's
> triggering also on work item flushes, not just workqueue flushes.

OK. I removed these patches from my tree.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2022-02-23 22:07 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-10 19:27 [syzbot] possible deadlock in worker_thread syzbot
2022-02-11 18:59 ` Bart Van Assche
2022-02-12  5:31   ` Tetsuo Handa
2022-02-12 16:37     ` Bart Van Assche
2022-02-12 17:14       ` Tetsuo Handa
2022-02-13 15:33         ` Leon Romanovsky
2022-02-13 23:06         ` Bart Van Assche
2022-02-14  1:08           ` Tetsuo Handa
2022-02-14  3:44             ` Tejun Heo
2022-02-14 13:36               ` Tetsuo Handa
2022-02-14 17:34                 ` Tejun Heo
2022-02-15 10:26                   ` Tetsuo Handa
2022-02-15 10:43                     ` Haakon Bugge
2022-02-15 12:48                       ` Tetsuo Handa
2022-02-15 17:05                         ` Bart Van Assche
2022-02-15 22:05                           ` Tetsuo Handa
2022-02-22 18:26                           ` Tejun Heo
2022-02-17 11:22                   ` [PATCH v2] workqueue: Warn flush attempt using system-wide workqueues Tetsuo Handa
2022-02-22 18:36                     ` Tejun Heo
     [not found]                     ` <CGME20220223212048eucas1p1fab5e35ff398eff57808a8f1125dd15f@eucas1p1.samsung.com>
2022-02-23 21:20                       ` Marek Szyprowski
2022-02-23 21:20                         ` Marek Szyprowski
2022-02-23 21:35                         ` Tejun Heo
2022-02-23 21:35                           ` Tejun Heo
2022-02-23 22:06                           ` Tetsuo Handa
2022-02-23 22:06                             ` Tetsuo Handa
2022-02-17 12:27               ` [syzbot] possible deadlock in worker_thread Fabio M. De Francesco
2022-02-22 18:30                 ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.