netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [syzbot] possible deadlock in rds_tcp_reset_callbacks
@ 2022-08-21  5:34 syzbot
  2022-09-28 15:25 ` [PATCH] net: rds: don't hold sock lock when cancelling work from rds_tcp_reset_callbacks() Tetsuo Handa
  0 siblings, 1 reply; 3+ messages in thread
From: syzbot @ 2022-08-21  5:34 UTC (permalink / raw)
  To: davem, edumazet, kuba, linux-kernel, linux-rdma, netdev, pabeni,
	rds-devel, santosh.shilimkar, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    8755ae45a9e8 Add linux-next specific files for 20220819
git tree:       linux-next
console+strace: https://syzkaller.appspot.com/x/log.txt?x=107cf485080000
kernel config:  https://syzkaller.appspot.com/x/.config?x=ead6107a3bbe3c62
dashboard link: https://syzkaller.appspot.com/bug?extid=78c55c7bc6f66e53dce2
compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12e678cb080000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=13f68e5b080000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+78c55c7bc6f66e53dce2@syzkaller.appspotmail.com

======================================================
WARNING: possible circular locking dependency detected
6.0.0-rc1-next-20220819-syzkaller #0 Not tainted
------------------------------------------------------
kworker/u4:3/46 is trying to acquire lock:
ffff888027dc40e8 ((work_completion)(&(&cp->cp_send_w)->work)){+.+.}-{0:0}, at: __flush_work+0xdd/0xae0 kernel/workqueue.c:3066

but task is already holding lock:
ffff88807e18da70 (k-sk_lock-AF_INET6){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1712 [inline]
ffff88807e18da70 (k-sk_lock-AF_INET6){+.+.}-{0:0}, at: rds_tcp_reset_callbacks+0x1bf/0x4d0 net/rds/tcp.c:169

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (k-sk_lock-AF_INET6){+.+.}-{0:0}:
       lock_sock_nested+0x36/0xf0 net/core/sock.c:3391
       lock_sock include/net/sock.h:1712 [inline]
       tcp_sock_set_cork+0x16/0x90 net/ipv4/tcp.c:3328
       rds_send_xmit+0x386/0x2540 net/rds/send.c:194
       rds_send_worker+0x92/0x2e0 net/rds/threads.c:200
       process_one_work+0x991/0x1610 kernel/workqueue.c:2289
       worker_thread+0x665/0x1080 kernel/workqueue.c:2436
       kthread+0x2e4/0x3a0 kernel/kthread.c:376
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306

-> #0 ((work_completion)(&(&cp->cp_send_w)->work)){+.+.}-{0:0}:
       check_prev_add kernel/locking/lockdep.c:3095 [inline]
       check_prevs_add kernel/locking/lockdep.c:3214 [inline]
       validate_chain kernel/locking/lockdep.c:3829 [inline]
       __lock_acquire+0x2a43/0x56d0 kernel/locking/lockdep.c:5053
       lock_acquire kernel/locking/lockdep.c:5666 [inline]
       lock_acquire+0x1ab/0x570 kernel/locking/lockdep.c:5631
       __flush_work+0x105/0xae0 kernel/workqueue.c:3069
       __cancel_work_timer+0x3f9/0x570 kernel/workqueue.c:3160
       rds_tcp_reset_callbacks+0x1cb/0x4d0 net/rds/tcp.c:171
       rds_tcp_accept_one+0x9d5/0xd10 net/rds/tcp_listen.c:203
       rds_tcp_accept_worker+0x55/0x80 net/rds/tcp.c:529
       process_one_work+0x991/0x1610 kernel/workqueue.c:2289
       worker_thread+0x665/0x1080 kernel/workqueue.c:2436
       kthread+0x2e4/0x3a0 kernel/kthread.c:376
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(k-sk_lock-AF_INET6);
                               lock((work_completion)(&(&cp->cp_send_w)->work));
                               lock(k-sk_lock-AF_INET6);
  lock((work_completion)(&(&cp->cp_send_w)->work));

 *** DEADLOCK ***

4 locks held by kworker/u4:3/46:
 #0: ffff8880275f7938 ((wq_completion)krdsd){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
 #0: ffff8880275f7938 ((wq_completion)krdsd){+.+.}-{0:0}, at: arch_atomic_long_set include/linux/atomic/atomic-long.h:41 [inline]
 #0: ffff8880275f7938 ((wq_completion)krdsd){+.+.}-{0:0}, at: atomic_long_set include/linux/atomic/atomic-instrumented.h:1280 [inline]
 #0: ffff8880275f7938 ((wq_completion)krdsd){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:636 [inline]
 #0: ffff8880275f7938 ((wq_completion)krdsd){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:663 [inline]
 #0: ffff8880275f7938 ((wq_completion)krdsd){+.+.}-{0:0}, at: process_one_work+0x87a/0x1610 kernel/workqueue.c:2260
 #1: ffffc90000b77da8 ((work_completion)(&rtn->rds_tcp_accept_w)){+.+.}-{0:0}, at: process_one_work+0x8ae/0x1610 kernel/workqueue.c:2264
 #2: ffff8880733c4088 (&tc->t_conn_path_lock){+.+.}-{3:3}, at: rds_tcp_accept_one+0x892/0xd10 net/rds/tcp_listen.c:195
 #3: ffff88807e18da70 (k-sk_lock-AF_INET6){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1712 [inline]
 #3: ffff88807e18da70 (k-sk_lock-AF_INET6){+.+.}-{0:0}, at: rds_tcp_reset_callbacks+0x1bf/0x4d0 net/rds/tcp.c:169

stack backtrace:
CPU: 1 PID: 46 Comm: kworker/u4:3 Not tainted 6.0.0-rc1-next-20220819-syzkaller #0
kworker/u4:3[46] cmdline: ��a�����\x01
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/22/2022
Workqueue: krdsd rds_tcp_accept_worker
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:122 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:140
 check_noncircular+0x25f/0x2e0 kernel/locking/lockdep.c:2175
 check_prev_add kernel/locking/lockdep.c:3095 [inline]
 check_prevs_add kernel/locking/lockdep.c:3214 [inline]
 validate_chain kernel/locking/lockdep.c:3829 [inline]
 __lock_acquire+0x2a43/0x56d0 kernel/locking/lockdep.c:5053
 lock_acquire kernel/locking/lockdep.c:5666 [inline]
 lock_acquire+0x1ab/0x570 kernel/locking/lockdep.c:5631
 __flush_work+0x105/0xae0 kernel/workqueue.c:3069
 __cancel_work_timer+0x3f9/0x570 kernel/workqueue.c:3160
 rds_tcp_reset_callbacks+0x1cb/0x4d0 net/rds/tcp.c:171
 rds_tcp_accept_one+0x9d5/0xd10 net/rds/tcp_listen.c:203
 rds_tcp_accept_worker+0x55/0x80 net/rds/tcp.c:529
 process_one_work+0x991/0x1610 kernel/workqueue.c:2289
 worker_thread+0x665/0x1080 kernel/workqueue.c:2436
 kthread+0x2e4/0x3a0 kernel/kthread.c:376
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
 </TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH] net: rds: don't hold sock lock when cancelling work from rds_tcp_reset_callbacks()
  2022-08-21  5:34 [syzbot] possible deadlock in rds_tcp_reset_callbacks syzbot
@ 2022-09-28 15:25 ` Tetsuo Handa
  2022-10-03  7:00   ` patchwork-bot+netdevbpf
  0 siblings, 1 reply; 3+ messages in thread
From: Tetsuo Handa @ 2022-09-28 15:25 UTC (permalink / raw)
  To: Santosh Shilimkar, David S. Miller, Sowmini Varadhan, Hillf Danton
  Cc: syzkaller-bugs, syzbot, Network Development, OFED mailing list

syzbot is reporting lockdep warning at rds_tcp_reset_callbacks() [1], for
commit ac3615e7f3cffe2a ("RDS: TCP: Reduce code duplication in
rds_tcp_reset_callbacks()") added cancel_delayed_work_sync() into a section
protected by lock_sock() without realizing that rds_send_xmit() might call
lock_sock().

We don't need to protect cancel_delayed_work_sync() using lock_sock(), for
even if rds_{send,recv}_worker() re-queued this work while __flush_work()
 from cancel_delayed_work_sync() was waiting for this work to complete,
retried rds_{send,recv}_worker() is no-op due to the absence of RDS_CONN_UP
bit.

Link: https://syzkaller.appspot.com/bug?extid=78c55c7bc6f66e53dce2 [1]
Reported-by: syzbot <syzbot+78c55c7bc6f66e53dce2@syzkaller.appspotmail.com>
Co-developed-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: syzbot <syzbot+78c55c7bc6f66e53dce2@syzkaller.appspotmail.com>
Fixes: ac3615e7f3cffe2a ("RDS: TCP: Reduce code duplication in rds_tcp_reset_callbacks()")
---
Hillf, why don't you propose as a formal patch after syzbot tested your patch?
Explaining as a formal patch helps us with understanding/reviewing what you thought and
how you came to your patch. I feel sorry for stealing result of your trial and error...

 net/rds/tcp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index 73ee2771093d..d0ff413f697c 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -166,10 +166,10 @@ void rds_tcp_reset_callbacks(struct socket *sock,
 	 */
 	atomic_set(&cp->cp_state, RDS_CONN_RESETTING);
 	wait_event(cp->cp_waitq, !test_bit(RDS_IN_XMIT, &cp->cp_flags));
-	lock_sock(osock->sk);
 	/* reset receive side state for rds_tcp_data_recv() for osock  */
 	cancel_delayed_work_sync(&cp->cp_send_w);
 	cancel_delayed_work_sync(&cp->cp_recv_w);
+	lock_sock(osock->sk);
 	if (tc->t_tinc) {
 		rds_inc_put(&tc->t_tinc->ti_inc);
 		tc->t_tinc = NULL;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] net: rds: don't hold sock lock when cancelling work from rds_tcp_reset_callbacks()
  2022-09-28 15:25 ` [PATCH] net: rds: don't hold sock lock when cancelling work from rds_tcp_reset_callbacks() Tetsuo Handa
@ 2022-10-03  7:00   ` patchwork-bot+netdevbpf
  0 siblings, 0 replies; 3+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-10-03  7:00 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: santosh.shilimkar, davem, sowmini.varadhan, hdanton,
	syzkaller-bugs, syzbot+78c55c7bc6f66e53dce2, netdev, linux-rdma

Hello:

This patch was applied to netdev/net.git (master)
by David S. Miller <davem@davemloft.net>:

On Thu, 29 Sep 2022 00:25:37 +0900 you wrote:
> syzbot is reporting lockdep warning at rds_tcp_reset_callbacks() [1], for
> commit ac3615e7f3cffe2a ("RDS: TCP: Reduce code duplication in
> rds_tcp_reset_callbacks()") added cancel_delayed_work_sync() into a section
> protected by lock_sock() without realizing that rds_send_xmit() might call
> lock_sock().
> 
> We don't need to protect cancel_delayed_work_sync() using lock_sock(), for
> even if rds_{send,recv}_worker() re-queued this work while __flush_work()
>  from cancel_delayed_work_sync() was waiting for this work to complete,
> retried rds_{send,recv}_worker() is no-op due to the absence of RDS_CONN_UP
> bit.
> 
> [...]

Here is the summary with links:
  - net: rds: don't hold sock lock when cancelling work from rds_tcp_reset_callbacks()
    https://git.kernel.org/netdev/net/c/a91b750fd662

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-10-03  7:00 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-21  5:34 [syzbot] possible deadlock in rds_tcp_reset_callbacks syzbot
2022-09-28 15:25 ` [PATCH] net: rds: don't hold sock lock when cancelling work from rds_tcp_reset_callbacks() Tetsuo Handa
2022-10-03  7:00   ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).