linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* possible deadlock in rfcomm_sk_state_change
@ 2022-08-27 16:19 Jiacheng Xu
  2022-08-28 11:03 ` Jiacheng Xu
  0 siblings, 1 reply; 5+ messages in thread
From: Jiacheng Xu @ 2022-08-27 16:19 UTC (permalink / raw)
  To: linux-kernel, marcel, johan.hedberg; +Cc: linux-bluetooth

Hello,

When using modified Syzkaller to fuzz the Linux kernel-5.19, the
following crash was triggered.
We would appreciate a CVE ID if this is a security issue.

HEAD commit: 3d7cb6b04c3f Linux-5.19
git tree: upstream

console output:
https://drive.google.com/file/d/1NmOGWcfPnY2kSrS0nOwvG1AZ923jFQ3p/view?usp=sharing
kernel config: https://drive.google.com/file/d/1wgIUDwP5ho29AM-K7HhysSTfWFpfXYkG/view?usp=sharing
syz repro: https://drive.google.com/file/d/16hUTEGw4IcPQA9CZvoF7I5la42TlU-Cx/view?usp=sharing
C reproducer: https://drive.google.com/file/d/1YvgzTvV4qaSZPiD4D1IWGL4GuapzHD2w/view?usp=sharing

Environment:
Ubuntu 20.04 on Linux 5.4.0
QEMU 4.2.1:
qemu-system-x86_64 \
  -m 2G \
  -smp 2 \
  -kernel /home/workdir/bzImage \
  -append "console=ttyS0 root=/dev/sda earlyprintk=serial net.ifnames=0" \
  -drive file=/home/workdir/stretch.img,format=raw \
  -net user,host=10.0.2.10,hostfwd=tcp:127.0.0.1:10021-:22 \
  -net nic,model=e1000 \
  -enable-kvm \
  -nographic \
  -pidfile vm.pid \
  2>&1 | tee vm.log

If you fix this issue, please add the following tag to the commit:
Reported-by Jiacheng Xu<578001344xu@gmail.com>

============================================
WARNING: possible recursive locking detected
5.19.0 #1 Not tainted
--------------------------------------------
syz-executor/9064 is trying to acquire lock:
ffff888026b13130 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0},
at: lock_sock include/net/sock.h:1677 [inline]
ffff888026b13130 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0},
at: rfcomm_sk_state_change+0x6e/0x3a0 net/bluetooth/rfcomm/sock.c:73

but task is already holding lock:
ffff888026b13130 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0},
at: lock_sock include/net/sock.h:1677 [inline]
ffff888026b13130 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0},
at: rfcomm_sock_shutdown+0x57/0x220 net/bluetooth/rfcomm/sock.c:902

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM);
  lock(sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

4 locks held by syz-executor/9064:
 #0: ffff888110dac410 (&sb->s_type->i_mutex_key#12){+.+.}-{3:3}, at:
inode_lock include/linux/fs.h:741 [inline]
 #0: ffff888110dac410 (&sb->s_type->i_mutex_key#12){+.+.}-{3:3}, at:
__sock_release+0x86/0x280 net/socket.c:649
 #1: ffff888026b13130
(sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0}, at: lock_sock
include/net/sock.h:1677 [inline]
 #1: ffff888026b13130
(sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0}, at:
rfcomm_sock_shutdown+0x57/0x220 net/bluetooth/rfcomm/sock.c:902
 #2: ffffffff8d7d8428 (rfcomm_mutex){+.+.}-{3:3}, at:
rfcomm_dlc_close+0x34/0x240 net/bluetooth/rfcomm/core.c:507
 #3: ffff8880155d2d28 (&d->lock){+.+.}-{3:3}, at:
__rfcomm_dlc_close+0x157/0x710 net/bluetooth/rfcomm/core.c:487

stack backtrace:
CPU: 0 PID: 9064 Comm: syz-executor Not tainted 5.19.0 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
 print_deadlock_bug kernel/locking/lockdep.c:2988 [inline]
 check_deadlock kernel/locking/lockdep.c:3031 [inline]
 validate_chain kernel/locking/lockdep.c:3816 [inline]
 __lock_acquire.cold+0x152/0x3ca kernel/locking/lockdep.c:5053
 lock_acquire kernel/locking/lockdep.c:5665 [inline]
 lock_acquire+0x1ab/0x580 kernel/locking/lockdep.c:5630
 lock_sock_nested+0x36/0xf0 net/core/sock.c:3389
 lock_sock include/net/sock.h:1677 [inline]
 rfcomm_sk_state_change+0x6e/0x3a0 net/bluetooth/rfcomm/sock.c:73
 __rfcomm_dlc_close+0x1ab/0x710 net/bluetooth/rfcomm/core.c:489
 rfcomm_dlc_close+0x1ea/0x240 net/bluetooth/rfcomm/core.c:520
 __rfcomm_sock_close+0xda/0x260 net/bluetooth/rfcomm/sock.c:220
 rfcomm_sock_shutdown+0xf4/0x220 net/bluetooth/rfcomm/sock.c:905
 rfcomm_sock_release+0x5f/0x140 net/bluetooth/rfcomm/sock.c:925
 __sock_release+0xcd/0x280 net/socket.c:650
 sock_close+0x18/0x20 net/socket.c:1365
 __fput+0x277/0x9d0 fs/file_table.c:317
 task_work_run+0xe0/0x1a0 kernel/task_work.c:177
 exit_task_work include/linux/task_work.h:38 [inline]
 do_exit+0xaf5/0x2da0 kernel/exit.c:795
 do_group_exit+0xd2/0x2f0 kernel/exit.c:925
 get_signal+0x2842/0x2870 kernel/signal.c:2857
 arch_do_signal_or_restart+0x82/0x2270 arch/x86/kernel/signal.c:869
 exit_to_user_mode_loop kernel/entry/common.c:166 [inline]
 exit_to_user_mode_prepare+0x174/0x260 kernel/entry/common.c:201
 __syscall_exit_to_user_mode_work kernel/entry/common.c:283 [inline]
 syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:294
 do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f26c3295dfd
Code: Unable to access opcode bytes at RIP 0x7f26c3295dd3.
RSP: 002b:00007f26c43fcc58 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: fffffffffffffffc RBX: 00007f26c33bc0a0 RCX: 00007f26c3295dfd
RDX: 0000000000000080 RSI: 0000000020000000 RDI: 0000000000000004
RBP: 00007f26c32ff4c1 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f26c33bc0a0
R13: 00007ffc2c88f2df R14: 00007ffc2c88f480 R15: 00007f26c43fcdc0
 </TASK>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: possible deadlock in rfcomm_sk_state_change
  2022-08-27 16:19 possible deadlock in rfcomm_sk_state_change Jiacheng Xu
@ 2022-08-28 11:03 ` Jiacheng Xu
  2022-08-30  6:48   ` Desmond Cheong Zhi Xi
  0 siblings, 1 reply; 5+ messages in thread
From: Jiacheng Xu @ 2022-08-28 11:03 UTC (permalink / raw)
  To: linux-kernel, marcel, johan.hedberg, desmondcheongzx
  Cc: netdev, linux-bluetooth

Hi,

I believe the deadlock is more than possible but actually real.
I got a poc that could stably trigger the deadlock.

poc: https://drive.google.com/file/d/1PjqvMtHsrrGM1MIRGKl_zJGR-teAMMQy/view?usp=sharing

Description/Root cause:
In rfcomm_sock_shutdown(), lock_sock() is called when releasing and
shutting down socket.
However, lock_sock() has to be called once more when the sk_state is
changed because the
lock is not always held when rfcomm_sk_state_change() is called. One
such call stack is:

  rfcomm_sock_shutdown():
    lock_sock();
    __rfcomm_sock_close():
      rfcomm_dlc_close():
        __rfcomm_dlc_close():
          rfcomm_dlc_lock();
          rfcomm_sk_state_change():
            lock_sock();

Besides the recursive deadlock, there is also an
issue of a lock hierarchy inversion between rfcomm_dlc_lock() and
lock_sock() if the socket is locked in rfcomm_sk_state_change().

Reference: https://lore.kernel.org/all/20211004180734.434511-1-desmondcheongzx@gmail.com/

On Sun, Aug 28, 2022 at 12:19 AM Jiacheng Xu <578001344xu@gmail.com> wrote:
>
> Hello,
>
> When using modified Syzkaller to fuzz the Linux kernel-5.19, the
> following crash was triggered.
> We would appreciate a CVE ID if this is a security issue.
>
> HEAD commit: 3d7cb6b04c3f Linux-5.19
> git tree: upstream
>
> console output:
> https://drive.google.com/file/d/1NmOGWcfPnY2kSrS0nOwvG1AZ923jFQ3p/view?usp=sharing
> kernel config: https://drive.google.com/file/d/1wgIUDwP5ho29AM-K7HhysSTfWFpfXYkG/view?usp=sharing
> syz repro: https://drive.google.com/file/d/16hUTEGw4IcPQA9CZvoF7I5la42TlU-Cx/view?usp=sharing
> C reproducer: https://drive.google.com/file/d/1YvgzTvV4qaSZPiD4D1IWGL4GuapzHD2w/view?usp=sharing
>
> Environment:
> Ubuntu 20.04 on Linux 5.4.0
> QEMU 4.2.1:
> qemu-system-x86_64 \
>   -m 2G \
>   -smp 2 \
>   -kernel /home/workdir/bzImage \
>   -append "console=ttyS0 root=/dev/sda earlyprintk=serial net.ifnames=0" \
>   -drive file=/home/workdir/stretch.img,format=raw \
>   -net user,host=10.0.2.10,hostfwd=tcp:127.0.0.1:10021-:22 \
>   -net nic,model=e1000 \
>   -enable-kvm \
>   -nographic \
>   -pidfile vm.pid \
>   2>&1 | tee vm.log
>
> If you fix this issue, please add the following tag to the commit:
> Reported-by Jiacheng Xu<578001344xu@gmail.com>
>
> ============================================
> WARNING: possible recursive locking detected
> 5.19.0 #1 Not tainted
> --------------------------------------------
> syz-executor/9064 is trying to acquire lock:
> ffff888026b13130 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0},
> at: lock_sock include/net/sock.h:1677 [inline]
> ffff888026b13130 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0},
> at: rfcomm_sk_state_change+0x6e/0x3a0 net/bluetooth/rfcomm/sock.c:73
>
> but task is already holding lock:
> ffff888026b13130 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0},
> at: lock_sock include/net/sock.h:1677 [inline]
> ffff888026b13130 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0},
> at: rfcomm_sock_shutdown+0x57/0x220 net/bluetooth/rfcomm/sock.c:902
>
> other info that might help us debug this:
>  Possible unsafe locking scenario:
>
>        CPU0
>        ----
>   lock(sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM);
>   lock(sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM);
>
>  *** DEADLOCK ***
>
>  May be due to missing lock nesting notation
>
> 4 locks held by syz-executor/9064:
>  #0: ffff888110dac410 (&sb->s_type->i_mutex_key#12){+.+.}-{3:3}, at:
> inode_lock include/linux/fs.h:741 [inline]
>  #0: ffff888110dac410 (&sb->s_type->i_mutex_key#12){+.+.}-{3:3}, at:
> __sock_release+0x86/0x280 net/socket.c:649
>  #1: ffff888026b13130
> (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0}, at: lock_sock
> include/net/sock.h:1677 [inline]
>  #1: ffff888026b13130
> (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0}, at:
> rfcomm_sock_shutdown+0x57/0x220 net/bluetooth/rfcomm/sock.c:902
>  #2: ffffffff8d7d8428 (rfcomm_mutex){+.+.}-{3:3}, at:
> rfcomm_dlc_close+0x34/0x240 net/bluetooth/rfcomm/core.c:507
>  #3: ffff8880155d2d28 (&d->lock){+.+.}-{3:3}, at:
> __rfcomm_dlc_close+0x157/0x710 net/bluetooth/rfcomm/core.c:487
>
> stack backtrace:
> CPU: 0 PID: 9064 Comm: syz-executor Not tainted 5.19.0 #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> 1.13.0-1ubuntu1.1 04/01/2014
> Call Trace:
>  <TASK>
>  __dump_stack lib/dump_stack.c:88 [inline]
>  dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
>  print_deadlock_bug kernel/locking/lockdep.c:2988 [inline]
>  check_deadlock kernel/locking/lockdep.c:3031 [inline]
>  validate_chain kernel/locking/lockdep.c:3816 [inline]
>  __lock_acquire.cold+0x152/0x3ca kernel/locking/lockdep.c:5053
>  lock_acquire kernel/locking/lockdep.c:5665 [inline]
>  lock_acquire+0x1ab/0x580 kernel/locking/lockdep.c:5630
>  lock_sock_nested+0x36/0xf0 net/core/sock.c:3389
>  lock_sock include/net/sock.h:1677 [inline]
>  rfcomm_sk_state_change+0x6e/0x3a0 net/bluetooth/rfcomm/sock.c:73
>  __rfcomm_dlc_close+0x1ab/0x710 net/bluetooth/rfcomm/core.c:489
>  rfcomm_dlc_close+0x1ea/0x240 net/bluetooth/rfcomm/core.c:520
>  __rfcomm_sock_close+0xda/0x260 net/bluetooth/rfcomm/sock.c:220
>  rfcomm_sock_shutdown+0xf4/0x220 net/bluetooth/rfcomm/sock.c:905
>  rfcomm_sock_release+0x5f/0x140 net/bluetooth/rfcomm/sock.c:925
>  __sock_release+0xcd/0x280 net/socket.c:650
>  sock_close+0x18/0x20 net/socket.c:1365
>  __fput+0x277/0x9d0 fs/file_table.c:317
>  task_work_run+0xe0/0x1a0 kernel/task_work.c:177
>  exit_task_work include/linux/task_work.h:38 [inline]
>  do_exit+0xaf5/0x2da0 kernel/exit.c:795
>  do_group_exit+0xd2/0x2f0 kernel/exit.c:925
>  get_signal+0x2842/0x2870 kernel/signal.c:2857
>  arch_do_signal_or_restart+0x82/0x2270 arch/x86/kernel/signal.c:869
>  exit_to_user_mode_loop kernel/entry/common.c:166 [inline]
>  exit_to_user_mode_prepare+0x174/0x260 kernel/entry/common.c:201
>  __syscall_exit_to_user_mode_work kernel/entry/common.c:283 [inline]
>  syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:294
>  do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
>  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> RIP: 0033:0x7f26c3295dfd
> Code: Unable to access opcode bytes at RIP 0x7f26c3295dd3.
> RSP: 002b:00007f26c43fcc58 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
> RAX: fffffffffffffffc RBX: 00007f26c33bc0a0 RCX: 00007f26c3295dfd
> RDX: 0000000000000080 RSI: 0000000020000000 RDI: 0000000000000004
> RBP: 00007f26c32ff4c1 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 00007f26c33bc0a0
> R13: 00007ffc2c88f2df R14: 00007ffc2c88f480 R15: 00007f26c43fcdc0
>  </TASK>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: possible deadlock in rfcomm_sk_state_change
  2022-08-28 11:03 ` Jiacheng Xu
@ 2022-08-30  6:48   ` Desmond Cheong Zhi Xi
  2022-08-30 17:41     ` Luiz Augusto von Dentz
  0 siblings, 1 reply; 5+ messages in thread
From: Desmond Cheong Zhi Xi @ 2022-08-30  6:48 UTC (permalink / raw)
  To: Jiacheng Xu
  Cc: netdev, linux-bluetooth, Jakub Kicinski,
	open list:NETWORKING [GENERAL],
	Eric Dumazet, linux-kernel, marcel, johan.hedberg, David Miller,
	luiz.dentz, Paolo Abeni

+cc Bluetooth and Networking maintainers

Hi Jiacheng,

On 28/8/22 04:03, Jiacheng Xu wrote:
> Hi,
> 
> I believe the deadlock is more than possible but actually real.
> I got a poc that could stably trigger the deadlock.
> 
> poc: https://drive.google.com/file/d/1PjqvMtHsrrGM1MIRGKl_zJGR-teAMMQy/view?usp=sharing
> 
> Description/Root cause:
> In rfcomm_sock_shutdown(), lock_sock() is called when releasing and
> shutting down socket.
> However, lock_sock() has to be called once more when the sk_state is
> changed because the
> lock is not always held when rfcomm_sk_state_change() is called. One
> such call stack is:
> 
>    rfcomm_sock_shutdown():
>      lock_sock();
>      __rfcomm_sock_close():
>        rfcomm_dlc_close():
>          __rfcomm_dlc_close():
>            rfcomm_dlc_lock();
>            rfcomm_sk_state_change():
>              lock_sock();
> 
> Besides the recursive deadlock, there is also an
> issue of a lock hierarchy inversion between rfcomm_dlc_lock() and
> lock_sock() if the socket is locked in rfcomm_sk_state_change().


Thanks for the poc and for following the trail all the way to the root 
cause - this was a known issue and I didn't realize the patch wasn't 
applied.

>  > Reference: 
https://lore.kernel.org/all/20211004180734.434511-1-desmondcheongzx@gmail.com/
> 

Fwiw, I tested the patch again with syzbot. It still applies cleanly to 
the head of bluetooth-next and seems to address the root cause.

Any thoughts from the maintainers on this issue and the proposed fix?

Best,
Desmond

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: possible deadlock in rfcomm_sk_state_change
  2022-08-30  6:48   ` Desmond Cheong Zhi Xi
@ 2022-08-30 17:41     ` Luiz Augusto von Dentz
  2022-08-30 19:23       ` Luiz Augusto von Dentz
  0 siblings, 1 reply; 5+ messages in thread
From: Luiz Augusto von Dentz @ 2022-08-30 17:41 UTC (permalink / raw)
  To: Desmond Cheong Zhi Xi
  Cc: Jiacheng Xu, linux-bluetooth, Jakub Kicinski,
	open list:NETWORKING [GENERAL],
	Eric Dumazet, Linux Kernel Mailing List, Marcel Holtmann,
	Johan Hedberg, David Miller, Paolo Abeni

Hi Desmond,

On Mon, Aug 29, 2022 at 11:48 PM Desmond Cheong Zhi Xi
<desmondcheongzx@gmail.com> wrote:
>
> +cc Bluetooth and Networking maintainers
>
> Hi Jiacheng,
>
> On 28/8/22 04:03, Jiacheng Xu wrote:
> > Hi,
> >
> > I believe the deadlock is more than possible but actually real.
> > I got a poc that could stably trigger the deadlock.
> >
> > poc: https://drive.google.com/file/d/1PjqvMtHsrrGM1MIRGKl_zJGR-teAMMQy/view?usp=sharing
> >
> > Description/Root cause:
> > In rfcomm_sock_shutdown(), lock_sock() is called when releasing and
> > shutting down socket.
> > However, lock_sock() has to be called once more when the sk_state is
> > changed because the
> > lock is not always held when rfcomm_sk_state_change() is called. One
> > such call stack is:
> >
> >    rfcomm_sock_shutdown():
> >      lock_sock();
> >      __rfcomm_sock_close():
> >        rfcomm_dlc_close():
> >          __rfcomm_dlc_close():
> >            rfcomm_dlc_lock();
> >            rfcomm_sk_state_change():
> >              lock_sock();
> >
> > Besides the recursive deadlock, there is also an
> > issue of a lock hierarchy inversion between rfcomm_dlc_lock() and
> > lock_sock() if the socket is locked in rfcomm_sk_state_change().
>
>
> Thanks for the poc and for following the trail all the way to the root
> cause - this was a known issue and I didn't realize the patch wasn't
> applied.
>
> >  > Reference:
> https://lore.kernel.org/all/20211004180734.434511-1-desmondcheongzx@gmail.com/
> >
>
> Fwiw, I tested the patch again with syzbot. It still applies cleanly to
> the head of bluetooth-next and seems to address the root cause.
>
> Any thoughts from the maintainers on this issue and the proposed fix?

We probably need to introduce a test to rfcomm-tester to reproduce
this sort of problem, I also would like to avoid introducing a work
just to trigger a state change since we don't have such problem on the
likes of L2CAP socket so perhaps we need to rework the code a little
bit to avoid the locking problems.

> Best,
> Desmond



-- 
Luiz Augusto von Dentz

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: possible deadlock in rfcomm_sk_state_change
  2022-08-30 17:41     ` Luiz Augusto von Dentz
@ 2022-08-30 19:23       ` Luiz Augusto von Dentz
  0 siblings, 0 replies; 5+ messages in thread
From: Luiz Augusto von Dentz @ 2022-08-30 19:23 UTC (permalink / raw)
  To: Desmond Cheong Zhi Xi
  Cc: Jiacheng Xu, linux-bluetooth, Jakub Kicinski,
	open list:NETWORKING [GENERAL],
	Eric Dumazet, Linux Kernel Mailing List, Marcel Holtmann,
	Johan Hedberg, David Miller, Paolo Abeni

Hi Desmond,

On Tue, Aug 30, 2022 at 10:41 AM Luiz Augusto von Dentz
<luiz.dentz@gmail.com> wrote:
>
> Hi Desmond,
>
> On Mon, Aug 29, 2022 at 11:48 PM Desmond Cheong Zhi Xi
> <desmondcheongzx@gmail.com> wrote:
> >
> > +cc Bluetooth and Networking maintainers
> >
> > Hi Jiacheng,
> >
> > On 28/8/22 04:03, Jiacheng Xu wrote:
> > > Hi,
> > >
> > > I believe the deadlock is more than possible but actually real.
> > > I got a poc that could stably trigger the deadlock.
> > >
> > > poc: https://drive.google.com/file/d/1PjqvMtHsrrGM1MIRGKl_zJGR-teAMMQy/view?usp=sharing
> > >
> > > Description/Root cause:
> > > In rfcomm_sock_shutdown(), lock_sock() is called when releasing and
> > > shutting down socket.
> > > However, lock_sock() has to be called once more when the sk_state is
> > > changed because the
> > > lock is not always held when rfcomm_sk_state_change() is called. One
> > > such call stack is:
> > >
> > >    rfcomm_sock_shutdown():
> > >      lock_sock();
> > >      __rfcomm_sock_close():
> > >        rfcomm_dlc_close():
> > >          __rfcomm_dlc_close():
> > >            rfcomm_dlc_lock();
> > >            rfcomm_sk_state_change():
> > >              lock_sock();
> > >
> > > Besides the recursive deadlock, there is also an
> > > issue of a lock hierarchy inversion between rfcomm_dlc_lock() and
> > > lock_sock() if the socket is locked in rfcomm_sk_state_change().
> >
> >
> > Thanks for the poc and for following the trail all the way to the root
> > cause - this was a known issue and I didn't realize the patch wasn't
> > applied.
> >
> > >  > Reference:
> > https://lore.kernel.org/all/20211004180734.434511-1-desmondcheongzx@gmail.com/
> > >
> >
> > Fwiw, I tested the patch again with syzbot. It still applies cleanly to
> > the head of bluetooth-next and seems to address the root cause.
> >
> > Any thoughts from the maintainers on this issue and the proposed fix?
>
> We probably need to introduce a test to rfcomm-tester to reproduce
> this sort of problem, I also would like to avoid introducing a work
> just to trigger a state change since we don't have such problem on the
> likes of L2CAP socket so perhaps we need to rework the code a little
> bit to avoid the locking problems.

It looks like for L2CAP we use lock_sock_nested on teardown, we don't
have the exact same behavior in RFCOMM but I think that might be worth
a try if we can use that instead of introducing yet another work item.

> > Best,
> > Desmond
>
>
>
> --
> Luiz Augusto von Dentz



-- 
Luiz Augusto von Dentz

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-08-30 19:25 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-27 16:19 possible deadlock in rfcomm_sk_state_change Jiacheng Xu
2022-08-28 11:03 ` Jiacheng Xu
2022-08-30  6:48   ` Desmond Cheong Zhi Xi
2022-08-30 17:41     ` Luiz Augusto von Dentz
2022-08-30 19:23       ` Luiz Augusto von Dentz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).