All of lore.kernel.org
 help / color / mirror / Atom feed
From: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
To: Hillf Danton <hdanton@sina.com>,
	syzbot <syzbot+7d51f807c81b190a127d@syzkaller.appspotmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
	linux-bluetooth@vger.kernel.org, linux-kernel@vger.kernel.org,
	luiz.dentz@gmail.com, netdev@vger.kernel.org,
	syzkaller-bugs@googlegroups.com
Subject: Re: [syzbot] INFO: task can't die in __lock_sock
Date: Thu, 2 Sep 2021 15:54:37 -0400	[thread overview]
Message-ID: <a43c3f8f-66b5-164f-c299-53d15c015845@gmail.com> (raw)
In-Reply-To: <20210902031752.2502-1-hdanton@sina.com>

On 1/9/21 11:17 pm, Hillf Danton wrote:
> On Wed, 01 Sep 2021 10:34:21 -0700
>> syzbot has found a reproducer for the following issue on:
>>
>> HEAD commit:    c1b13fe76e95 Add linux-next specific files for 20210901
>> git tree:       linux-next
>> console output: https://syzkaller.appspot.com/x/log.txt?x=12c6034d300000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=e2afff7bc32736e5
>> dashboard link: https://syzkaller.appspot.com/bug?extid=7d51f807c81b190a127d
>> compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.1
>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=14d42469300000
>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1107d815300000
>>
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: syzbot+7d51f807c81b190a127d@syzkaller.appspotmail.com
>>
>> INFO: task syz-executor157:6562 blocked for more than 143 seconds.
>>        Not tainted 5.14.0-next-20210901-syzkaller #0
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:syz-executor157 state:D stack:26880 pid: 6562 ppid:  6530 flags:0x00004006
>> Call Trace:
>>   context_switch kernel/sched/core.c:4955 [inline]
>>   __schedule+0x940/0x26f0 kernel/sched/core.c:6302
>>   schedule+0xd3/0x270 kernel/sched/core.c:6381
>>   __lock_sock+0x13d/0x260 net/core/sock.c:2644
>>   lock_sock_nested+0xf6/0x120 net/core/sock.c:3185
>>   lock_sock include/net/sock.h:1612 [inline]
> 
> This is due to b7ce436a5d79 ("Bluetooth: switch to lock_sock in RFCOMM").
> 
>>   rfcomm_sk_state_change+0xb4/0x390 net/bluetooth/rfcomm/sock.c:73
>>   __rfcomm_dlc_close+0x1b6/0x8a0 net/bluetooth/rfcomm/core.c:489
>>   rfcomm_dlc_close+0x1ea/0x240 net/bluetooth/rfcomm/core.c:520
>>   __rfcomm_sock_close+0xac/0x260 net/bluetooth/rfcomm/sock.c:220
>>   rfcomm_sock_shutdown+0xe9/0x210 net/bluetooth/rfcomm/sock.c:931
>>   rfcomm_sock_release+0x5f/0x140 net/bluetooth/rfcomm/sock.c:951
>>   __sock_release+0xcd/0x280 net/socket.c:649
>>   sock_close+0x18/0x20 net/socket.c:1314
>>   __fput+0x288/0x9f0 fs/file_table.c:280
>>   task_work_run+0xdd/0x1a0 kernel/task_work.c:164
>>   exit_task_work include/linux/task_work.h:32 [inline]
>>   do_exit+0xbae/0x2a30 kernel/exit.c:825
>>   do_group_exit+0x125/0x310 kernel/exit.c:922
>>   get_signal+0x47f/0x2160 kernel/signal.c:2868
>>   arch_do_signal_or_restart+0x2a9/0x1c40 arch/x86/kernel/signal.c:865
>>   handle_signal_work kernel/entry/common.c:148 [inline]
>>   exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
>>   exit_to_user_mode_prepare+0x17d/0x290 kernel/entry/common.c:209
>>   __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
>>   syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:302
>>   do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
>>   entry_SYSCALL_64_after_hwframe+0x44/0xae
>> RIP: 0033:0x445fe9
>> RSP: 002b:00007fff85049fe8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
>> RAX: fffffffffffffffc RBX: 0000000000000003 RCX: 0000000000445fe9
>> RDX: 0000000000000080 RSI: 0000000020000000 RDI: 0000000000000004
>> RBP: 0000000000000003 R08: 000000ff00000001 R09: 000000ff00000001
>> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000014112b8
>> R13: 0000000000000072 R14: 00007fff8504a040 R15: 0000000000000003
>>
>> Showing all locks held in the system:
>> 1 lock held by khungtaskd/26:
>>   #0: ffffffff8b97fbe0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6446
>> 1 lock held by krfcommd/2876:
>>   #0: ffffffff8d31ede8 (rfcomm_mutex){+.+.}-{3:3}, at: rfcomm_process_sessions net/bluetooth/rfcomm/core.c:1979 [inline]
>>   #0: ffffffff8d31ede8 (rfcomm_mutex){+.+.}-{3:3}, at: rfcomm_run+0x2ed/0x4a20 net/bluetooth/rfcomm/core.c:2086
>> 1 lock held by in:imklog/6232:
>> 4 locks held by syz-executor157/6562:
>>   #0: ffff888145e26210 (&sb->s_type->i_mutex_key#13){+.+.}-{3:3}, at: inode_lock include/linux/fs.h:786 [inline]
>>   #0: ffff888145e26210 (&sb->s_type->i_mutex_key#13){+.+.}-{3:3}, at: __sock_release+0x86/0x280 net/socket.c:648
>>   #1: ffff88801d622120 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1612 [inline]
>>   #1: ffff88801d622120 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0}, at: rfcomm_sock_shutdown+0x54/0x210 net/bluetooth/rfcomm/sock.c:928
> 
> But sk is already locked before b7ce436a5d79.
> 
> What is wierd here is lock_sock() fails to complain about recursive locking
> like this one if syzbot turned lockdep on. Any light on this, Eric?
> 
> Thanks
> Hillf
> 

Sorry, this one was my bad. The patch swapped out spin_lock_bh for 
lock_sock, to provide synchronization with other functions that use 
lock_sock.

Problem is that in one of the call traces, we hit the following deadlock:

   rfcomm_sock_close():
     lock_sock();
     __rfcomm_sock_close():
       rfcomm_dlc_close():
         __rfcomm_dlc_close():
           rfcomm_sk_state_change():
             lock_sock();

But we don't always hold onto the socket lock before calling 
rfcomm_sk_state_change.

I'm still working and testing a fix, but I think one possibility is to 
schedule rfcomm_sk_state_change on a workqueue. This seems to fit with 
the rest of the code, since in rfcomm_sock_shutdown we call 
rfcomm_sock_close then wait for the sk state to change to BT_CLOSED.

>>   #2: ffffffff8d31ede8 (rfcomm_mutex){+.+.}-{3:3}, at: rfcomm_dlc_close+0x34/0x240 net/bluetooth/rfcomm/core.c:507
>>   #3: ffff88807edd9928 (&d->lock){+.+.}-{3:3}, at: __rfcomm_dlc_close+0x162/0x8a0 net/bluetooth/rfcomm/core.c:487
>>
>> =============================================
>>
>> NMI backtrace for cpu 1
>> CPU: 1 PID: 26 Comm: khungtaskd Not tainted 5.14.0-next-20210901-syzkaller #0
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>> Call Trace:
>>   __dump_stack lib/dump_stack.c:88 [inline]
>>   dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
>>   nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:105
>>   nmi_trigger_cpumask_backtrace+0x1ae/0x220 lib/nmi_backtrace.c:62
>>   trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
>>   check_hung_uninterruptible_tasks kernel/hung_task.c:254 [inline]
>>   watchdog+0xcb7/0xed0 kernel/hung_task.c:339
>>   kthread+0x3e5/0x4d0 kernel/kthread.c:319
>>   ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
>> Sending NMI from CPU 1 to CPUs 0:
>> NMI backtrace for cpu 0
>> CPU: 0 PID: 2958 Comm: systemd-journal Not tainted 5.14.0-next-20210901-syzkaller #0
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>> RIP: 0010:check_kcov_mode kernel/kcov.c:163 [inline]
>> RIP: 0010:__sanitizer_cov_trace_pc+0x7/0x60 kernel/kcov.c:197
>> Code: fd ff ff b9 ff ff ff ff ba 08 00 00 00 4d 8b 03 48 0f bd ca 49 8b 45 00 48 63 c9 e9 64 ff ff ff 0f 1f 00 65 8b 05 39 e6 8b 7e <89> c1 48 8b 34 24 81 e1 00 01 00 00 65 48 8b 14 25 40 f0 01 00 a9
>> RSP: 0018:ffffc900014dfde0 EFLAGS: 00000282
>> RAX: 0000000080000000 RBX: ffffc900014dff58 RCX: 1ffff9200029bfc7
>> RDX: dffffc0000000000 RSI: 1ffff9200029bfcd RDI: ffffc900014dfe38
>> RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff8176c71a
>> R10: ffffffff81765c97 R11: 0000000000000002 R12: 0000000000000053
>> R13: 0000000000000002 R14: 0000000000000000 R15: ffffc900014dfe30
>> FS:  00007f43756768c0(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00007f4372a49000 CR3: 000000001a5d4000 CR4: 00000000001506f0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Call Trace:
>>   get_current arch/x86/include/asm/current.h:15 [inline]
>>   seccomp_run_filters kernel/seccomp.c:402 [inline]
>>   __seccomp_filter+0x88/0x1040 kernel/seccomp.c:1180
>>   __secure_computing+0xfc/0x360 kernel/seccomp.c:1311
>>   syscall_trace_enter.constprop.0+0x94/0x270 kernel/entry/common.c:68
>>   do_syscall_64+0x16/0xb0 arch/x86/entry/common.c:76
>>   entry_SYSCALL_64_after_hwframe+0x44/0xae
>> RIP: 0033:0x7f4374931687
>> Code: 00 b8 ff ff ff ff c3 0f 1f 40 00 48 8b 05 09 d8 2b 00 64 c7 00 5f 00 00 00 b8 ff ff ff ff c3 0f 1f 40 00 b8 53 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e1 d7 2b 00 f7 d8 64 89 01 48
>> RSP: 002b:00007ffc79978938 EFLAGS: 00000293 ORIG_RAX: 0000000000000053
>> RAX: ffffffffffffffda RBX: 00007ffc7997b850 RCX: 00007f4374931687
>> RDX: 00007f43753a2a00 RSI: 00000000000001ed RDI: 00005646c59898a0
>> RBP: 00007ffc79978970 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000069 R11: 0000000000000293 R12: 0000000000000000
>> R13: 0000000000000000 R14: 00007ffc7997b850 R15: 00007ffc79978e60
>> ----------------
>> Code disassembly (best guess), 3 bytes skipped:
>>     0:	b9 ff ff ff ff       	mov    $0xffffffff,%ecx
>>     5:	ba 08 00 00 00       	mov    $0x8,%edx
>>     a:	4d 8b 03             	mov    (%r11),%r8
>>     d:	48 0f bd ca          	bsr    %rdx,%rcx
>>    11:	49 8b 45 00          	mov    0x0(%r13),%rax
>>    15:	48 63 c9             	movslq %ecx,%rcx
>>    18:	e9 64 ff ff ff       	jmpq   0xffffff81
>>    1d:	0f 1f 00             	nopl   (%rax)
>>    20:	65 8b 05 39 e6 8b 7e 	mov    %gs:0x7e8be639(%rip),%eax        # 0x7e8be660
>> * 27:	89 c1                	mov    %eax,%ecx <-- trapping instruction
>>    29:	48 8b 34 24          	mov    (%rsp),%rsi
>>    2d:	81 e1 00 01 00 00    	and    $0x100,%ecx
>>    33:	65 48 8b 14 25 40 f0 	mov    %gs:0x1f040,%rdx
>>    3a:	01 00
>>    3c:	a9                   	.byte 0xa9
>>
>>


  parent reply	other threads:[~2021-09-02 19:54 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-15 16:47 [syzbot] INFO: task can't die in __lock_sock syzbot
2021-09-01 17:34 ` syzbot
2021-09-02  1:34 ` syzbot
     [not found] ` <20210902031752.2502-1-hdanton@sina.com>
2021-09-02 19:54   ` Desmond Cheong Zhi Xi [this message]
     [not found] <20210920142348.4642-1-hdanton@sina.com>
2021-09-20 15:50 ` syzbot
2021-09-22 14:16   ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a43c3f8f-66b5-164f-c299-53d15c015845@gmail.com \
    --to=desmondcheongzx@gmail.com \
    --cc=eric.dumazet@gmail.com \
    --cc=hdanton@sina.com \
    --cc=linux-bluetooth@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luiz.dentz@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=syzbot+7d51f807c81b190a127d@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.