netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] Bluetooth: Fix possible deadlock in rfcomm_sk_state_change
@ 2023-01-04 15:07 Ying Hsu
  2023-01-04 22:21 ` Luiz Augusto von Dentz
  0 siblings, 1 reply; 8+ messages in thread
From: Ying Hsu @ 2023-01-04 15:07 UTC (permalink / raw)
  To: linux-bluetooth, marcel, leon
  Cc: chromeos-bluetooth-upstreaming, Ying Hsu, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Johan Hedberg,
	Luiz Augusto von Dentz, Paolo Abeni, linux-kernel, netdev

There's a possible deadlock when two processes are connecting
and closing a RFCOMM socket concurrently. Here's the call trace:

-> #2 (&d->lock){+.+.}-{3:3}:
       __mutex_lock_common kernel/locking/mutex.c:603 [inline]
       __mutex_lock0x12f/0x1360 kernel/locking/mutex.c:747
       __rfcomm_dlc_close+0x15d/0x890 net/bluetooth/rfcomm/core.c:487
       rfcomm_dlc_close+1e9/0x240 net/bluetooth/rfcomm/core.c:520
       __rfcomm_sock_close+0x13c/0x250 net/bluetooth/rfcomm/sock.c:220
       rfcomm_sock_shutdown+0xd8/0x230 net/bluetooth/rfcomm/sock.c:907
       rfcomm_sock_release+0x68/0x140 net/bluetooth/rfcomm/sock.c:928
       __sock_release+0xcd/0x280 net/socket.c:650
       sock_close+0x1c/0x20 net/socket.c:1365
       __fput+0x27c/0xa90 fs/file_table.c:320
       task_work_run+0x16f/0x270 kernel/task_work.c:179
       exit_task_work include/linux/task_work.h:38 [inline]
       do_exit+0xaa8/0x2950 kernel/exit.c:867
       do_group_exit+0xd4/0x2a0 kernel/exit.c:1012
       get_signal+0x21c3/0x2450 kernel/signal.c:2859
       arch_do_signal_or_restart+0x79/0x5c0 arch/x86/kernel/signal.c:306
       exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
       exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:203
       __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
       syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296
       do_syscall_64+0x46/0xb0 arch/x86/entry/common.c:86
       entry_SYSCALL_64_after_hwframe+0x63/0xcd

-> #1 (rfcomm_mutex){+.+.}-{3:3}:
       __mutex_lock_common kernel/locking/mutex.c:603 [inline]
       __mutex_lock+0x12f/0x1360 kernel/locking/mutex.c:747
       rfcomm_dlc_open+0x93/0xa80 net/bluetooth/rfcomm/core.c:425
       rfcomm_sock_connect+0x329/0x450 net/bluetooth/rfcomm/sock.c:413
       __sys_connect_file+0x153/0x1a0 net/socket.c:1976
       __sys_connect+0x165/0x1a0 net/socket.c:1993
       __do_sys_connect net/socket.c:2003 [inline]
       __se_sys_connect net/socket.c:2000 [inline]
       __x64_sys_connect+0x73/0xb0 net/socket.c:2000
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd

-> #0 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0}:
       check_prev_add kernel/locking/lockdep.c:3097 [inline]
       check_prevs_add kernel/locking/lockdep.c:3216 [inline]
       validate_chain kernel/locking/lockdep.c:3831 [inline]
       __lock_acquire+0x2a43/0x56d0 kernel/locking/lockdep.c:5055
       lock_acquire kernel/locking/lockdep.c:5668 [inline]
       lock_acquire+0x1e3/0x630 kernel/locking/lockdep.c:5633
       lock_sock_nested+0x3a/0xf0 net/core/sock.c:3470
       lock_sock include/net/sock.h:1725 [inline]
       rfcomm_sk_state_change+0x6d/0x3a0 net/bluetooth/rfcomm/sock.c:73
       __rfcomm_dlc_close+0x1b1/0x890 net/bluetooth/rfcomm/core.c:489
       rfcomm_dlc_close+0x1e9/0x240 net/bluetooth/rfcomm/core.c:520
       __rfcomm_sock_close+0x13c/0x250 net/bluetooth/rfcomm/sock.c:220
       rfcomm_sock_shutdown+0xd8/0x230 net/bluetooth/rfcomm/sock.c:907
       rfcomm_sock_release+0x68/0x140 net/bluetooth/rfcomm/sock.c:928
       __sock_release+0xcd/0x280 net/socket.c:650
       sock_close+0x1c/0x20 net/socket.c:1365
       __fput+0x27c/0xa90 fs/file_table.c:320
       task_work_run+0x16f/0x270 kernel/task_work.c:179
       exit_task_work include/linux/task_work.h:38 [inline]
       do_exit+0xaa8/0x2950 kernel/exit.c:867
       do_group_exit+0xd4/0x2a0 kernel/exit.c:1012
       get_signal+0x21c3/0x2450 kernel/signal.c:2859
       arch_do_signal_or_restart+0x79/0x5c0 arch/x86/kernel/signal.c:306
       exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
       exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:203
       __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
       syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296
       do_syscall_64+0x46/0xb0 arch/x86/entry/common.c:86
       entry_SYSCALL_64_after_hwframe+0x63/0xcd

Signed-off-by: Ying Hsu <yinghsu@chromium.org>
---
This commit has been tested with a C reproducer on qemu-x86_64
and a ChromeOS device.

Changes in v2:
- Fix potential use-after-free in rfc_comm_sock_connect.

 net/bluetooth/rfcomm/sock.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c
index 21e24da4847f..4397e14ff560 100644
--- a/net/bluetooth/rfcomm/sock.c
+++ b/net/bluetooth/rfcomm/sock.c
@@ -391,6 +391,7 @@ static int rfcomm_sock_connect(struct socket *sock, struct sockaddr *addr, int a
 	    addr->sa_family != AF_BLUETOOTH)
 		return -EINVAL;
 
+	sock_hold(sk);
 	lock_sock(sk);
 
 	if (sk->sk_state != BT_OPEN && sk->sk_state != BT_BOUND) {
@@ -410,14 +411,18 @@ static int rfcomm_sock_connect(struct socket *sock, struct sockaddr *addr, int a
 	d->sec_level = rfcomm_pi(sk)->sec_level;
 	d->role_switch = rfcomm_pi(sk)->role_switch;
 
+	/* Drop sock lock to avoid potential deadlock with the RFCOMM lock */
+	release_sock(sk);
 	err = rfcomm_dlc_open(d, &rfcomm_pi(sk)->src, &sa->rc_bdaddr,
 			      sa->rc_channel);
-	if (!err)
+	lock_sock(sk);
+	if (!err && !sock_flag(sk, SOCK_ZAPPED))
 		err = bt_sock_wait_state(sk, BT_CONNECTED,
 				sock_sndtimeo(sk, flags & O_NONBLOCK));
 
 done:
 	release_sock(sk);
+	sock_put(sk);
 	return err;
 }
 
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] Bluetooth: Fix possible deadlock in rfcomm_sk_state_change
  2023-01-04 15:07 [PATCH v2] Bluetooth: Fix possible deadlock in rfcomm_sk_state_change Ying Hsu
@ 2023-01-04 22:21 ` Luiz Augusto von Dentz
  2023-01-06  1:18   ` Saeed Mahameed
  0 siblings, 1 reply; 8+ messages in thread
From: Luiz Augusto von Dentz @ 2023-01-04 22:21 UTC (permalink / raw)
  To: Ying Hsu
  Cc: linux-bluetooth, marcel, leon, chromeos-bluetooth-upstreaming,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Johan Hedberg,
	Paolo Abeni, linux-kernel, netdev

Hi Ying,

On Wed, Jan 4, 2023 at 7:07 AM Ying Hsu <yinghsu@chromium.org> wrote:
>
> There's a possible deadlock when two processes are connecting
> and closing a RFCOMM socket concurrently. Here's the call trace:

Are you sure it is 2 different processes? Usually that would mean 2
different sockets (sk) then so they wouldn't share the same lock, so
this sounds more like 2 different threads, perhaps it is worth
creating a testing case in our rfcomm-tester so we are able to detect
this sort of thing in the future.

> -> #2 (&d->lock){+.+.}-{3:3}:
>        __mutex_lock_common kernel/locking/mutex.c:603 [inline]
>        __mutex_lock0x12f/0x1360 kernel/locking/mutex.c:747
>        __rfcomm_dlc_close+0x15d/0x890 net/bluetooth/rfcomm/core.c:487
>        rfcomm_dlc_close+1e9/0x240 net/bluetooth/rfcomm/core.c:520
>        __rfcomm_sock_close+0x13c/0x250 net/bluetooth/rfcomm/sock.c:220
>        rfcomm_sock_shutdown+0xd8/0x230 net/bluetooth/rfcomm/sock.c:907
>        rfcomm_sock_release+0x68/0x140 net/bluetooth/rfcomm/sock.c:928
>        __sock_release+0xcd/0x280 net/socket.c:650
>        sock_close+0x1c/0x20 net/socket.c:1365
>        __fput+0x27c/0xa90 fs/file_table.c:320
>        task_work_run+0x16f/0x270 kernel/task_work.c:179
>        exit_task_work include/linux/task_work.h:38 [inline]
>        do_exit+0xaa8/0x2950 kernel/exit.c:867
>        do_group_exit+0xd4/0x2a0 kernel/exit.c:1012
>        get_signal+0x21c3/0x2450 kernel/signal.c:2859
>        arch_do_signal_or_restart+0x79/0x5c0 arch/x86/kernel/signal.c:306
>        exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
>        exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:203
>        __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
>        syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296
>        do_syscall_64+0x46/0xb0 arch/x86/entry/common.c:86
>        entry_SYSCALL_64_after_hwframe+0x63/0xcd
>
> -> #1 (rfcomm_mutex){+.+.}-{3:3}:
>        __mutex_lock_common kernel/locking/mutex.c:603 [inline]
>        __mutex_lock+0x12f/0x1360 kernel/locking/mutex.c:747
>        rfcomm_dlc_open+0x93/0xa80 net/bluetooth/rfcomm/core.c:425
>        rfcomm_sock_connect+0x329/0x450 net/bluetooth/rfcomm/sock.c:413
>        __sys_connect_file+0x153/0x1a0 net/socket.c:1976
>        __sys_connect+0x165/0x1a0 net/socket.c:1993
>        __do_sys_connect net/socket.c:2003 [inline]
>        __se_sys_connect net/socket.c:2000 [inline]
>        __x64_sys_connect+0x73/0xb0 net/socket.c:2000
>        do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>        do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
>        entry_SYSCALL_64_after_hwframe+0x63/0xcd
>
> -> #0 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0}:
>        check_prev_add kernel/locking/lockdep.c:3097 [inline]
>        check_prevs_add kernel/locking/lockdep.c:3216 [inline]
>        validate_chain kernel/locking/lockdep.c:3831 [inline]
>        __lock_acquire+0x2a43/0x56d0 kernel/locking/lockdep.c:5055
>        lock_acquire kernel/locking/lockdep.c:5668 [inline]
>        lock_acquire+0x1e3/0x630 kernel/locking/lockdep.c:5633
>        lock_sock_nested+0x3a/0xf0 net/core/sock.c:3470
>        lock_sock include/net/sock.h:1725 [inline]
>        rfcomm_sk_state_change+0x6d/0x3a0 net/bluetooth/rfcomm/sock.c:73
>        __rfcomm_dlc_close+0x1b1/0x890 net/bluetooth/rfcomm/core.c:489
>        rfcomm_dlc_close+0x1e9/0x240 net/bluetooth/rfcomm/core.c:520
>        __rfcomm_sock_close+0x13c/0x250 net/bluetooth/rfcomm/sock.c:220
>        rfcomm_sock_shutdown+0xd8/0x230 net/bluetooth/rfcomm/sock.c:907
>        rfcomm_sock_release+0x68/0x140 net/bluetooth/rfcomm/sock.c:928
>        __sock_release+0xcd/0x280 net/socket.c:650
>        sock_close+0x1c/0x20 net/socket.c:1365
>        __fput+0x27c/0xa90 fs/file_table.c:320
>        task_work_run+0x16f/0x270 kernel/task_work.c:179
>        exit_task_work include/linux/task_work.h:38 [inline]
>        do_exit+0xaa8/0x2950 kernel/exit.c:867
>        do_group_exit+0xd4/0x2a0 kernel/exit.c:1012
>        get_signal+0x21c3/0x2450 kernel/signal.c:2859
>        arch_do_signal_or_restart+0x79/0x5c0 arch/x86/kernel/signal.c:306
>        exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
>        exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:203
>        __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
>        syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296
>        do_syscall_64+0x46/0xb0 arch/x86/entry/common.c:86
>        entry_SYSCALL_64_after_hwframe+0x63/0xcd
>
> Signed-off-by: Ying Hsu <yinghsu@chromium.org>
> ---
> This commit has been tested with a C reproducer on qemu-x86_64
> and a ChromeOS device.
>
> Changes in v2:
> - Fix potential use-after-free in rfc_comm_sock_connect.
>
>  net/bluetooth/rfcomm/sock.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c
> index 21e24da4847f..4397e14ff560 100644
> --- a/net/bluetooth/rfcomm/sock.c
> +++ b/net/bluetooth/rfcomm/sock.c
> @@ -391,6 +391,7 @@ static int rfcomm_sock_connect(struct socket *sock, struct sockaddr *addr, int a
>             addr->sa_family != AF_BLUETOOTH)
>                 return -EINVAL;
>
> +       sock_hold(sk);
>         lock_sock(sk);
>
>         if (sk->sk_state != BT_OPEN && sk->sk_state != BT_BOUND) {
> @@ -410,14 +411,18 @@ static int rfcomm_sock_connect(struct socket *sock, struct sockaddr *addr, int a
>         d->sec_level = rfcomm_pi(sk)->sec_level;
>         d->role_switch = rfcomm_pi(sk)->role_switch;
>
> +       /* Drop sock lock to avoid potential deadlock with the RFCOMM lock */
> +       release_sock(sk);
>         err = rfcomm_dlc_open(d, &rfcomm_pi(sk)->src, &sa->rc_bdaddr,
>                               sa->rc_channel);
> -       if (!err)
> +       lock_sock(sk);
> +       if (!err && !sock_flag(sk, SOCK_ZAPPED))
>                 err = bt_sock_wait_state(sk, BT_CONNECTED,
>                                 sock_sndtimeo(sk, flags & O_NONBLOCK));
>
>  done:
>         release_sock(sk);
> +       sock_put(sk);
>         return err;
>  }

This sounds like a great solution to hold the reference and then
checking if the socket has been zapped when attempting to lock_sock,
so Ive been thinking on generalize this into something like
bt_sock_connect(sock, addr, alen, callback) so we make sure the
callback is done while holding a reference but with the socket
unlocked since typically the underline procedure only needs to access
the pi(sk) information without changing it e.g. rfcomm_dlc_open,
anyway Im fine if you don't want to pursue doing it right now but I'm
afraid these type of locking problem is no restricted to RFCOMM only.

> --
> 2.39.0.314.g84b9a713c41-goog
>


-- 
Luiz Augusto von Dentz

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] Bluetooth: Fix possible deadlock in rfcomm_sk_state_change
  2023-01-04 22:21 ` Luiz Augusto von Dentz
@ 2023-01-06  1:18   ` Saeed Mahameed
  2023-01-06 19:44     ` Luiz Augusto von Dentz
  0 siblings, 1 reply; 8+ messages in thread
From: Saeed Mahameed @ 2023-01-06  1:18 UTC (permalink / raw)
  To: Luiz Augusto von Dentz
  Cc: Ying Hsu, linux-bluetooth, marcel, leon,
	chromeos-bluetooth-upstreaming, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Johan Hedberg, Paolo Abeni, linux-kernel, netdev

On 04 Jan 14:21, Luiz Augusto von Dentz wrote:
>Hi Ying,
>
>On Wed, Jan 4, 2023 at 7:07 AM Ying Hsu <yinghsu@chromium.org> wrote:
>>
>> There's a possible deadlock when two processes are connecting
>> and closing a RFCOMM socket concurrently. Here's the call trace:
>
>Are you sure it is 2 different processes? Usually that would mean 2
>different sockets (sk) then so they wouldn't share the same lock, so
>this sounds more like 2 different threads, perhaps it is worth
>creating a testing case in our rfcomm-tester so we are able to detect
>this sort of thing in the future.
>
>> -> #2 (&d->lock){+.+.}-{3:3}:
>>        __mutex_lock_common kernel/locking/mutex.c:603 [inline]
>>        __mutex_lock0x12f/0x1360 kernel/locking/mutex.c:747
>>        __rfcomm_dlc_close+0x15d/0x890 net/bluetooth/rfcomm/core.c:487
>>        rfcomm_dlc_close+1e9/0x240 net/bluetooth/rfcomm/core.c:520
>>        __rfcomm_sock_close+0x13c/0x250 net/bluetooth/rfcomm/sock.c:220
>>        rfcomm_sock_shutdown+0xd8/0x230 net/bluetooth/rfcomm/sock.c:907
>>        rfcomm_sock_release+0x68/0x140 net/bluetooth/rfcomm/sock.c:928
>>        __sock_release+0xcd/0x280 net/socket.c:650
>>        sock_close+0x1c/0x20 net/socket.c:1365
>>        __fput+0x27c/0xa90 fs/file_table.c:320
>>        task_work_run+0x16f/0x270 kernel/task_work.c:179
>>        exit_task_work include/linux/task_work.h:38 [inline]
>>        do_exit+0xaa8/0x2950 kernel/exit.c:867
>>        do_group_exit+0xd4/0x2a0 kernel/exit.c:1012
>>        get_signal+0x21c3/0x2450 kernel/signal.c:2859
>>        arch_do_signal_or_restart+0x79/0x5c0 arch/x86/kernel/signal.c:306
>>        exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
>>        exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:203
>>        __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
>>        syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296
>>        do_syscall_64+0x46/0xb0 arch/x86/entry/common.c:86
>>        entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>
>> -> #1 (rfcomm_mutex){+.+.}-{3:3}:
>>        __mutex_lock_common kernel/locking/mutex.c:603 [inline]
>>        __mutex_lock+0x12f/0x1360 kernel/locking/mutex.c:747
>>        rfcomm_dlc_open+0x93/0xa80 net/bluetooth/rfcomm/core.c:425
>>        rfcomm_sock_connect+0x329/0x450 net/bluetooth/rfcomm/sock.c:413
>>        __sys_connect_file+0x153/0x1a0 net/socket.c:1976
>>        __sys_connect+0x165/0x1a0 net/socket.c:1993
>>        __do_sys_connect net/socket.c:2003 [inline]
>>        __se_sys_connect net/socket.c:2000 [inline]
>>        __x64_sys_connect+0x73/0xb0 net/socket.c:2000
>>        do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>        do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
>>        entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>
>> -> #0 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0}:
>>        check_prev_add kernel/locking/lockdep.c:3097 [inline]
>>        check_prevs_add kernel/locking/lockdep.c:3216 [inline]
>>        validate_chain kernel/locking/lockdep.c:3831 [inline]
>>        __lock_acquire+0x2a43/0x56d0 kernel/locking/lockdep.c:5055
>>        lock_acquire kernel/locking/lockdep.c:5668 [inline]
>>        lock_acquire+0x1e3/0x630 kernel/locking/lockdep.c:5633
>>        lock_sock_nested+0x3a/0xf0 net/core/sock.c:3470
>>        lock_sock include/net/sock.h:1725 [inline]
>>        rfcomm_sk_state_change+0x6d/0x3a0 net/bluetooth/rfcomm/sock.c:73
>>        __rfcomm_dlc_close+0x1b1/0x890 net/bluetooth/rfcomm/core.c:489
>>        rfcomm_dlc_close+0x1e9/0x240 net/bluetooth/rfcomm/core.c:520
>>        __rfcomm_sock_close+0x13c/0x250 net/bluetooth/rfcomm/sock.c:220
>>        rfcomm_sock_shutdown+0xd8/0x230 net/bluetooth/rfcomm/sock.c:907
>>        rfcomm_sock_release+0x68/0x140 net/bluetooth/rfcomm/sock.c:928
>>        __sock_release+0xcd/0x280 net/socket.c:650
>>        sock_close+0x1c/0x20 net/socket.c:1365
>>        __fput+0x27c/0xa90 fs/file_table.c:320
>>        task_work_run+0x16f/0x270 kernel/task_work.c:179
>>        exit_task_work include/linux/task_work.h:38 [inline]
>>        do_exit+0xaa8/0x2950 kernel/exit.c:867
>>        do_group_exit+0xd4/0x2a0 kernel/exit.c:1012
>>        get_signal+0x21c3/0x2450 kernel/signal.c:2859
>>        arch_do_signal_or_restart+0x79/0x5c0 arch/x86/kernel/signal.c:306
>>        exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
>>        exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:203
>>        __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
>>        syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296
>>        do_syscall_64+0x46/0xb0 arch/x86/entry/common.c:86
>>        entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>
>> Signed-off-by: Ying Hsu <yinghsu@chromium.org>
>> ---
>> This commit has been tested with a C reproducer on qemu-x86_64
>> and a ChromeOS device.
>>
>> Changes in v2:
>> - Fix potential use-after-free in rfc_comm_sock_connect.
>>
>>  net/bluetooth/rfcomm/sock.c | 7 ++++++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c
>> index 21e24da4847f..4397e14ff560 100644
>> --- a/net/bluetooth/rfcomm/sock.c
>> +++ b/net/bluetooth/rfcomm/sock.c
>> @@ -391,6 +391,7 @@ static int rfcomm_sock_connect(struct socket *sock, struct sockaddr *addr, int a
>>             addr->sa_family != AF_BLUETOOTH)
>>                 return -EINVAL;
>>
>> +       sock_hold(sk);
>>         lock_sock(sk);
>>
>>         if (sk->sk_state != BT_OPEN && sk->sk_state != BT_BOUND) {
>> @@ -410,14 +411,18 @@ static int rfcomm_sock_connect(struct socket *sock, struct sockaddr *addr, int a
>>         d->sec_level = rfcomm_pi(sk)->sec_level;
>>         d->role_switch = rfcomm_pi(sk)->role_switch;
>>
>> +       /* Drop sock lock to avoid potential deadlock with the RFCOMM lock */
>> +       release_sock(sk);
>>         err = rfcomm_dlc_open(d, &rfcomm_pi(sk)->src, &sa->rc_bdaddr,
>>                               sa->rc_channel);
>> -       if (!err)
>> +       lock_sock(sk);
>> +       if (!err && !sock_flag(sk, SOCK_ZAPPED))
>>                 err = bt_sock_wait_state(sk, BT_CONNECTED,
>>                                 sock_sndtimeo(sk, flags & O_NONBLOCK));
>>
>>  done:
>>         release_sock(sk);
>> +       sock_put(sk);
>>         return err;
>>  }
>
>This sounds like a great solution to hold the reference and then

Why do you need sock_hold/put in the same proto_ops.callback sock opts ? 
it should be guaranteed by the caller the sk will remain valid 
or if you are paranoid then sock_hold() on your proto_ops.bind() and put()
on your proto_ops.release()

>checking if the socket has been zapped when attempting to lock_sock,
>so Ive been thinking on generalize this into something like
>bt_sock_connect(sock, addr, alen, callback) so we make sure the
>callback is done while holding a reference but with the socket
>unlocked since typically the underline procedure only needs to access
>the pi(sk) information without changing it e.g. rfcomm_dlc_open,
>anyway Im fine if you don't want to pursue doing it right now but I'm
>afraid these type of locking problem is no restricted to RFCOMM only.
>
>> --
>> 2.39.0.314.g84b9a713c41-goog
>>
>
>
>-- 
>Luiz Augusto von Dentz

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] Bluetooth: Fix possible deadlock in rfcomm_sk_state_change
  2023-01-06  1:18   ` Saeed Mahameed
@ 2023-01-06 19:44     ` Luiz Augusto von Dentz
  2023-01-09  7:06       ` Ying Hsu
                         ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Luiz Augusto von Dentz @ 2023-01-06 19:44 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Ying Hsu, linux-bluetooth, marcel, leon,
	chromeos-bluetooth-upstreaming, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Johan Hedberg, Paolo Abeni, linux-kernel, netdev

Hi Saeed,

On Thu, Jan 5, 2023 at 5:18 PM Saeed Mahameed <saeed@kernel.org> wrote:
>
> On 04 Jan 14:21, Luiz Augusto von Dentz wrote:
> >Hi Ying,
> >
> >On Wed, Jan 4, 2023 at 7:07 AM Ying Hsu <yinghsu@chromium.org> wrote:
> >>
> >> There's a possible deadlock when two processes are connecting
> >> and closing a RFCOMM socket concurrently. Here's the call trace:
> >
> >Are you sure it is 2 different processes? Usually that would mean 2
> >different sockets (sk) then so they wouldn't share the same lock, so
> >this sounds more like 2 different threads, perhaps it is worth
> >creating a testing case in our rfcomm-tester so we are able to detect
> >this sort of thing in the future.
> >
> >> -> #2 (&d->lock){+.+.}-{3:3}:
> >>        __mutex_lock_common kernel/locking/mutex.c:603 [inline]
> >>        __mutex_lock0x12f/0x1360 kernel/locking/mutex.c:747
> >>        __rfcomm_dlc_close+0x15d/0x890 net/bluetooth/rfcomm/core.c:487
> >>        rfcomm_dlc_close+1e9/0x240 net/bluetooth/rfcomm/core.c:520
> >>        __rfcomm_sock_close+0x13c/0x250 net/bluetooth/rfcomm/sock.c:220
> >>        rfcomm_sock_shutdown+0xd8/0x230 net/bluetooth/rfcomm/sock.c:907
> >>        rfcomm_sock_release+0x68/0x140 net/bluetooth/rfcomm/sock.c:928
> >>        __sock_release+0xcd/0x280 net/socket.c:650
> >>        sock_close+0x1c/0x20 net/socket.c:1365
> >>        __fput+0x27c/0xa90 fs/file_table.c:320
> >>        task_work_run+0x16f/0x270 kernel/task_work.c:179
> >>        exit_task_work include/linux/task_work.h:38 [inline]
> >>        do_exit+0xaa8/0x2950 kernel/exit.c:867
> >>        do_group_exit+0xd4/0x2a0 kernel/exit.c:1012
> >>        get_signal+0x21c3/0x2450 kernel/signal.c:2859
> >>        arch_do_signal_or_restart+0x79/0x5c0 arch/x86/kernel/signal.c:306
> >>        exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
> >>        exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:203
> >>        __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
> >>        syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296
> >>        do_syscall_64+0x46/0xb0 arch/x86/entry/common.c:86
> >>        entry_SYSCALL_64_after_hwframe+0x63/0xcd
> >>
> >> -> #1 (rfcomm_mutex){+.+.}-{3:3}:
> >>        __mutex_lock_common kernel/locking/mutex.c:603 [inline]
> >>        __mutex_lock+0x12f/0x1360 kernel/locking/mutex.c:747
> >>        rfcomm_dlc_open+0x93/0xa80 net/bluetooth/rfcomm/core.c:425
> >>        rfcomm_sock_connect+0x329/0x450 net/bluetooth/rfcomm/sock.c:413
> >>        __sys_connect_file+0x153/0x1a0 net/socket.c:1976
> >>        __sys_connect+0x165/0x1a0 net/socket.c:1993
> >>        __do_sys_connect net/socket.c:2003 [inline]
> >>        __se_sys_connect net/socket.c:2000 [inline]
> >>        __x64_sys_connect+0x73/0xb0 net/socket.c:2000
> >>        do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> >>        do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
> >>        entry_SYSCALL_64_after_hwframe+0x63/0xcd
> >>
> >> -> #0 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0}:
> >>        check_prev_add kernel/locking/lockdep.c:3097 [inline]
> >>        check_prevs_add kernel/locking/lockdep.c:3216 [inline]
> >>        validate_chain kernel/locking/lockdep.c:3831 [inline]
> >>        __lock_acquire+0x2a43/0x56d0 kernel/locking/lockdep.c:5055
> >>        lock_acquire kernel/locking/lockdep.c:5668 [inline]
> >>        lock_acquire+0x1e3/0x630 kernel/locking/lockdep.c:5633
> >>        lock_sock_nested+0x3a/0xf0 net/core/sock.c:3470
> >>        lock_sock include/net/sock.h:1725 [inline]
> >>        rfcomm_sk_state_change+0x6d/0x3a0 net/bluetooth/rfcomm/sock.c:73
> >>        __rfcomm_dlc_close+0x1b1/0x890 net/bluetooth/rfcomm/core.c:489
> >>        rfcomm_dlc_close+0x1e9/0x240 net/bluetooth/rfcomm/core.c:520
> >>        __rfcomm_sock_close+0x13c/0x250 net/bluetooth/rfcomm/sock.c:220
> >>        rfcomm_sock_shutdown+0xd8/0x230 net/bluetooth/rfcomm/sock.c:907
> >>        rfcomm_sock_release+0x68/0x140 net/bluetooth/rfcomm/sock.c:928
> >>        __sock_release+0xcd/0x280 net/socket.c:650
> >>        sock_close+0x1c/0x20 net/socket.c:1365
> >>        __fput+0x27c/0xa90 fs/file_table.c:320
> >>        task_work_run+0x16f/0x270 kernel/task_work.c:179
> >>        exit_task_work include/linux/task_work.h:38 [inline]
> >>        do_exit+0xaa8/0x2950 kernel/exit.c:867
> >>        do_group_exit+0xd4/0x2a0 kernel/exit.c:1012
> >>        get_signal+0x21c3/0x2450 kernel/signal.c:2859
> >>        arch_do_signal_or_restart+0x79/0x5c0 arch/x86/kernel/signal.c:306
> >>        exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
> >>        exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:203
> >>        __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
> >>        syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296
> >>        do_syscall_64+0x46/0xb0 arch/x86/entry/common.c:86
> >>        entry_SYSCALL_64_after_hwframe+0x63/0xcd
> >>
> >> Signed-off-by: Ying Hsu <yinghsu@chromium.org>
> >> ---
> >> This commit has been tested with a C reproducer on qemu-x86_64
> >> and a ChromeOS device.
> >>
> >> Changes in v2:
> >> - Fix potential use-after-free in rfc_comm_sock_connect.
> >>
> >>  net/bluetooth/rfcomm/sock.c | 7 ++++++-
> >>  1 file changed, 6 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c
> >> index 21e24da4847f..4397e14ff560 100644
> >> --- a/net/bluetooth/rfcomm/sock.c
> >> +++ b/net/bluetooth/rfcomm/sock.c
> >> @@ -391,6 +391,7 @@ static int rfcomm_sock_connect(struct socket *sock, struct sockaddr *addr, int a
> >>             addr->sa_family != AF_BLUETOOTH)
> >>                 return -EINVAL;
> >>
> >> +       sock_hold(sk);
> >>         lock_sock(sk);
> >>
> >>         if (sk->sk_state != BT_OPEN && sk->sk_state != BT_BOUND) {
> >> @@ -410,14 +411,18 @@ static int rfcomm_sock_connect(struct socket *sock, struct sockaddr *addr, int a
> >>         d->sec_level = rfcomm_pi(sk)->sec_level;
> >>         d->role_switch = rfcomm_pi(sk)->role_switch;
> >>
> >> +       /* Drop sock lock to avoid potential deadlock with the RFCOMM lock */
> >> +       release_sock(sk);
> >>         err = rfcomm_dlc_open(d, &rfcomm_pi(sk)->src, &sa->rc_bdaddr,
> >>                               sa->rc_channel);
> >> -       if (!err)
> >> +       lock_sock(sk);
> >> +       if (!err && !sock_flag(sk, SOCK_ZAPPED))
> >>                 err = bt_sock_wait_state(sk, BT_CONNECTED,
> >>                                 sock_sndtimeo(sk, flags & O_NONBLOCK));
> >>
> >>  done:
> >>         release_sock(sk);
> >> +       sock_put(sk);
> >>         return err;
> >>  }
> >
> >This sounds like a great solution to hold the reference and then
>
> Why do you need sock_hold/put in the same proto_ops.callback sock opts ?
> it should be guaranteed by the caller the sk will remain valid
> or if you are paranoid then sock_hold() on your proto_ops.bind() and put()
> on your proto_ops.release()

It doesn't looks like there is a sock_hold done in the likes of
__sys_connect/__sys_connect_file so afaik it is possible that the sk
is freed in the meantime if we attempt to release and lock afterward,
and about being paranoid I guess we are past that already since with
the likes of fuzzing testing is already paranoid in itself.

> >checking if the socket has been zapped when attempting to lock_sock,
> >so Ive been thinking on generalize this into something like
> >bt_sock_connect(sock, addr, alen, callback) so we make sure the
> >callback is done while holding a reference but with the socket
> >unlocked since typically the underline procedure only needs to access
> >the pi(sk) information without changing it e.g. rfcomm_dlc_open,
> >anyway Im fine if you don't want to pursue doing it right now but I'm
> >afraid these type of locking problem is no restricted to RFCOMM only.
> >
> >> --
> >> 2.39.0.314.g84b9a713c41-goog
> >>
> >
> >
> >--
> >Luiz Augusto von Dentz



-- 
Luiz Augusto von Dentz

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] Bluetooth: Fix possible deadlock in rfcomm_sk_state_change
  2023-01-06 19:44     ` Luiz Augusto von Dentz
@ 2023-01-09  7:06       ` Ying Hsu
       [not found]       ` <CAAa9mD3hybWw0cZ9_p2ZWKQFFmPUvsj3efOB_j9VLd_4RgcmPA@mail.gmail.com>
  2023-01-10  8:58       ` Saeed Mahameed
  2 siblings, 0 replies; 8+ messages in thread
From: Ying Hsu @ 2023-01-09  7:06 UTC (permalink / raw)
  To: Luiz Augusto von Dentz
  Cc: Saeed Mahameed, linux-bluetooth, marcel, leon,
	chromeos-bluetooth-upstreaming, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Johan Hedberg, Paolo Abeni, linux-kernel, netdev

Hi Luiz,

Based on the stack trace reported by syzbot, the deadlock happened in
a single process.
I'll revise the commit message in the next revision. Thank you for
catching that.

Generalizing it sounds good.
But if it releases the sk lock as below, the do_something() part might
be different for different proto.
```
bt_sock_connect_v1(..., callback) {
    sock_hold(sk);
    release_sock(sk);
    err = callback(...);
    lock_sock(sk);
    if (!err && !sock_flag(sk, SOCK_ZAPPED)) {
        do_something();
    }
    sock_put(sk);
    return err;
}
```

Another proposal is to have the callback executed with sk lock
acquired, and the callback is almost the same as the original connect
function for each proto,
but they'll have to manage the sk lock and check its ZAPPED state.
What do you think?
```
bt_sock_connect_v2(..., callback) {
    sock_hold(sk);
    lock_sock(sk);
    err = callback(...);
    release_sock(sk);
    sock_put(sk);
    return err;
}

rfcomm_sock_connect(...) {
    return bt_sock_connect_v2(..., __rfcomm_sock_connect);
}
```


On Sat, Jan 7, 2023 at 3:45 AM Luiz Augusto von Dentz
<luiz.dentz@gmail.com> wrote:
>
> Hi Saeed,
>
> On Thu, Jan 5, 2023 at 5:18 PM Saeed Mahameed <saeed@kernel.org> wrote:
> >
> > On 04 Jan 14:21, Luiz Augusto von Dentz wrote:
> > >Hi Ying,
> > >
> > >On Wed, Jan 4, 2023 at 7:07 AM Ying Hsu <yinghsu@chromium.org> wrote:
> > >>
> > >> There's a possible deadlock when two processes are connecting
> > >> and closing a RFCOMM socket concurrently. Here's the call trace:
> > >
> > >Are you sure it is 2 different processes? Usually that would mean 2
> > >different sockets (sk) then so they wouldn't share the same lock, so
> > >this sounds more like 2 different threads, perhaps it is worth
> > >creating a testing case in our rfcomm-tester so we are able to detect
> > >this sort of thing in the future.
> > >
> > >> -> #2 (&d->lock){+.+.}-{3:3}:
> > >>        __mutex_lock_common kernel/locking/mutex.c:603 [inline]
> > >>        __mutex_lock0x12f/0x1360 kernel/locking/mutex.c:747
> > >>        __rfcomm_dlc_close+0x15d/0x890 net/bluetooth/rfcomm/core.c:487
> > >>        rfcomm_dlc_close+1e9/0x240 net/bluetooth/rfcomm/core.c:520
> > >>        __rfcomm_sock_close+0x13c/0x250 net/bluetooth/rfcomm/sock.c:220
> > >>        rfcomm_sock_shutdown+0xd8/0x230 net/bluetooth/rfcomm/sock.c:907
> > >>        rfcomm_sock_release+0x68/0x140 net/bluetooth/rfcomm/sock.c:928
> > >>        __sock_release+0xcd/0x280 net/socket.c:650
> > >>        sock_close+0x1c/0x20 net/socket.c:1365
> > >>        __fput+0x27c/0xa90 fs/file_table.c:320
> > >>        task_work_run+0x16f/0x270 kernel/task_work.c:179
> > >>        exit_task_work include/linux/task_work.h:38 [inline]
> > >>        do_exit+0xaa8/0x2950 kernel/exit.c:867
> > >>        do_group_exit+0xd4/0x2a0 kernel/exit.c:1012
> > >>        get_signal+0x21c3/0x2450 kernel/signal.c:2859
> > >>        arch_do_signal_or_restart+0x79/0x5c0 arch/x86/kernel/signal.c:306
> > >>        exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
> > >>        exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:203
> > >>        __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
> > >>        syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296
> > >>        do_syscall_64+0x46/0xb0 arch/x86/entry/common.c:86
> > >>        entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > >>
> > >> -> #1 (rfcomm_mutex){+.+.}-{3:3}:
> > >>        __mutex_lock_common kernel/locking/mutex.c:603 [inline]
> > >>        __mutex_lock+0x12f/0x1360 kernel/locking/mutex.c:747
> > >>        rfcomm_dlc_open+0x93/0xa80 net/bluetooth/rfcomm/core.c:425
> > >>        rfcomm_sock_connect+0x329/0x450 net/bluetooth/rfcomm/sock.c:413
> > >>        __sys_connect_file+0x153/0x1a0 net/socket.c:1976
> > >>        __sys_connect+0x165/0x1a0 net/socket.c:1993
> > >>        __do_sys_connect net/socket.c:2003 [inline]
> > >>        __se_sys_connect net/socket.c:2000 [inline]
> > >>        __x64_sys_connect+0x73/0xb0 net/socket.c:2000
> > >>        do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > >>        do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
> > >>        entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > >>
> > >> -> #0 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0}:
> > >>        check_prev_add kernel/locking/lockdep.c:3097 [inline]
> > >>        check_prevs_add kernel/locking/lockdep.c:3216 [inline]
> > >>        validate_chain kernel/locking/lockdep.c:3831 [inline]
> > >>        __lock_acquire+0x2a43/0x56d0 kernel/locking/lockdep.c:5055
> > >>        lock_acquire kernel/locking/lockdep.c:5668 [inline]
> > >>        lock_acquire+0x1e3/0x630 kernel/locking/lockdep.c:5633
> > >>        lock_sock_nested+0x3a/0xf0 net/core/sock.c:3470
> > >>        lock_sock include/net/sock.h:1725 [inline]
> > >>        rfcomm_sk_state_change+0x6d/0x3a0 net/bluetooth/rfcomm/sock.c:73
> > >>        __rfcomm_dlc_close+0x1b1/0x890 net/bluetooth/rfcomm/core.c:489
> > >>        rfcomm_dlc_close+0x1e9/0x240 net/bluetooth/rfcomm/core.c:520
> > >>        __rfcomm_sock_close+0x13c/0x250 net/bluetooth/rfcomm/sock.c:220
> > >>        rfcomm_sock_shutdown+0xd8/0x230 net/bluetooth/rfcomm/sock.c:907
> > >>        rfcomm_sock_release+0x68/0x140 net/bluetooth/rfcomm/sock.c:928
> > >>        __sock_release+0xcd/0x280 net/socket.c:650
> > >>        sock_close+0x1c/0x20 net/socket.c:1365
> > >>        __fput+0x27c/0xa90 fs/file_table.c:320
> > >>        task_work_run+0x16f/0x270 kernel/task_work.c:179
> > >>        exit_task_work include/linux/task_work.h:38 [inline]
> > >>        do_exit+0xaa8/0x2950 kernel/exit.c:867
> > >>        do_group_exit+0xd4/0x2a0 kernel/exit.c:1012
> > >>        get_signal+0x21c3/0x2450 kernel/signal.c:2859
> > >>        arch_do_signal_or_restart+0x79/0x5c0 arch/x86/kernel/signal.c:306
> > >>        exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
> > >>        exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:203
> > >>        __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
> > >>        syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296
> > >>        do_syscall_64+0x46/0xb0 arch/x86/entry/common.c:86
> > >>        entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > >>
> > >> Signed-off-by: Ying Hsu <yinghsu@chromium.org>
> > >> ---
> > >> This commit has been tested with a C reproducer on qemu-x86_64
> > >> and a ChromeOS device.
> > >>
> > >> Changes in v2:
> > >> - Fix potential use-after-free in rfc_comm_sock_connect.
> > >>
> > >>  net/bluetooth/rfcomm/sock.c | 7 ++++++-
> > >>  1 file changed, 6 insertions(+), 1 deletion(-)
> > >>
> > >> diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c
> > >> index 21e24da4847f..4397e14ff560 100644
> > >> --- a/net/bluetooth/rfcomm/sock.c
> > >> +++ b/net/bluetooth/rfcomm/sock.c
> > >> @@ -391,6 +391,7 @@ static int rfcomm_sock_connect(struct socket *sock, struct sockaddr *addr, int a
> > >>             addr->sa_family != AF_BLUETOOTH)
> > >>                 return -EINVAL;
> > >>
> > >> +       sock_hold(sk);
> > >>         lock_sock(sk);
> > >>
> > >>         if (sk->sk_state != BT_OPEN && sk->sk_state != BT_BOUND) {
> > >> @@ -410,14 +411,18 @@ static int rfcomm_sock_connect(struct socket *sock, struct sockaddr *addr, int a
> > >>         d->sec_level = rfcomm_pi(sk)->sec_level;
> > >>         d->role_switch = rfcomm_pi(sk)->role_switch;
> > >>
> > >> +       /* Drop sock lock to avoid potential deadlock with the RFCOMM lock */
> > >> +       release_sock(sk);
> > >>         err = rfcomm_dlc_open(d, &rfcomm_pi(sk)->src, &sa->rc_bdaddr,
> > >>                               sa->rc_channel);
> > >> -       if (!err)
> > >> +       lock_sock(sk);
> > >> +       if (!err && !sock_flag(sk, SOCK_ZAPPED))
> > >>                 err = bt_sock_wait_state(sk, BT_CONNECTED,
> > >>                                 sock_sndtimeo(sk, flags & O_NONBLOCK));
> > >>
> > >>  done:
> > >>         release_sock(sk);
> > >> +       sock_put(sk);
> > >>         return err;
> > >>  }
> > >
> > >This sounds like a great solution to hold the reference and then
> >
> > Why do you need sock_hold/put in the same proto_ops.callback sock opts ?
> > it should be guaranteed by the caller the sk will remain valid
> > or if you are paranoid then sock_hold() on your proto_ops.bind() and put()
> > on your proto_ops.release()
>
> It doesn't looks like there is a sock_hold done in the likes of
> __sys_connect/__sys_connect_file so afaik it is possible that the sk
> is freed in the meantime if we attempt to release and lock afterward,
> and about being paranoid I guess we are past that already since with
> the likes of fuzzing testing is already paranoid in itself.
>
> > >checking if the socket has been zapped when attempting to lock_sock,
> > >so Ive been thinking on generalize this into something like
> > >bt_sock_connect(sock, addr, alen, callback) so we make sure the
> > >callback is done while holding a reference but with the socket
> > >unlocked since typically the underline procedure only needs to access
> > >the pi(sk) information without changing it e.g. rfcomm_dlc_open,
> > >anyway Im fine if you don't want to pursue doing it right now but I'm
> > >afraid these type of locking problem is no restricted to RFCOMM only.
> > >
> > >> --
> > >> 2.39.0.314.g84b9a713c41-goog
> > >>
> > >
> > >
> > >--
> > >Luiz Augusto von Dentz
>
>
>
> --
> Luiz Augusto von Dentz

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] Bluetooth: Fix possible deadlock in rfcomm_sk_state_change
       [not found]       ` <CAAa9mD3hybWw0cZ9_p2ZWKQFFmPUvsj3efOB_j9VLd_4RgcmPA@mail.gmail.com>
@ 2023-01-09 17:19         ` Luiz Augusto von Dentz
  2023-01-10  2:15           ` Ying Hsu
  0 siblings, 1 reply; 8+ messages in thread
From: Luiz Augusto von Dentz @ 2023-01-09 17:19 UTC (permalink / raw)
  To: Ying Hsu
  Cc: Saeed Mahameed, linux-bluetooth, marcel, leon,
	chromeos-bluetooth-upstreaming, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Johan Hedberg, Paolo Abeni, linux-kernel, netdev

Hi Ying,

On Sat, Jan 7, 2023 at 4:35 AM Ying Hsu <yinghsu@chromium.org> wrote:
>
> Hi Luiz,
>
> Based on the stack trace reported by syzbot, the deadlock happened in a single process.
> I'll revise the commit message in the next revision. Thank you for catching that.
>
> Generalizing it sounds good.
> But if it releases the sk lock as below, the do_something() part might be different for different proto.
> ```
> bt_sock_connect_v1(..., callback) {
>     sock_hold(sk);
>     release_sock(sk);
>     err = callback(...);
>     lock_sock(sk);
>     if (!err && !sock_flag(sk, SOCK_ZAPPED)) {
>         do_something();
>     }
>     sock_put(sk);
>     return err;
> }
> ```
>
> Another proposal is to have the callback executed with sk lock acquired, and the callback is almost the same as the original connect function for each proto,
> but they'll have to manage the sk lock and check its ZAPPED state. What do you think?
> ```
> bt_sock_connect_v2(..., callback) {
>     sock_hold(sk);
>     lock_sock(sk);
>     err = callback(...);
>     release_sock(sk);
>     sock_put(sk);
>     return err;
> }
>
> rfcomm_sock_connect(...) {
>     return bt_sock_connect_v2(..., __rfcomm_sock_connect);
> }
> ```

I think it is worth trying to prototype both and see which one we end
up consolidating more code since we might as well do the likes the
likes of bt_sock_wait_state, we could also in theory have a parameter
which indicates if the callback expects the sk to be locked or not.

> On Sat, Jan 7, 2023 at 3:45 AM Luiz Augusto von Dentz <luiz.dentz@gmail.com> wrote:
>>
>> Hi Saeed,
>>
>> On Thu, Jan 5, 2023 at 5:18 PM Saeed Mahameed <saeed@kernel.org> wrote:
>> >
>> > On 04 Jan 14:21, Luiz Augusto von Dentz wrote:
>> > >Hi Ying,
>> > >
>> > >On Wed, Jan 4, 2023 at 7:07 AM Ying Hsu <yinghsu@chromium.org> wrote:
>> > >>
>> > >> There's a possible deadlock when two processes are connecting
>> > >> and closing a RFCOMM socket concurrently. Here's the call trace:
>> > >
>> > >Are you sure it is 2 different processes? Usually that would mean 2
>> > >different sockets (sk) then so they wouldn't share the same lock, so
>> > >this sounds more like 2 different threads, perhaps it is worth
>> > >creating a testing case in our rfcomm-tester so we are able to detect
>> > >this sort of thing in the future.
>> > >
>> > >> -> #2 (&d->lock){+.+.}-{3:3}:
>> > >>        __mutex_lock_common kernel/locking/mutex.c:603 [inline]
>> > >>        __mutex_lock0x12f/0x1360 kernel/locking/mutex.c:747
>> > >>        __rfcomm_dlc_close+0x15d/0x890 net/bluetooth/rfcomm/core.c:487
>> > >>        rfcomm_dlc_close+1e9/0x240 net/bluetooth/rfcomm/core.c:520
>> > >>        __rfcomm_sock_close+0x13c/0x250 net/bluetooth/rfcomm/sock.c:220
>> > >>        rfcomm_sock_shutdown+0xd8/0x230 net/bluetooth/rfcomm/sock.c:907
>> > >>        rfcomm_sock_release+0x68/0x140 net/bluetooth/rfcomm/sock.c:928
>> > >>        __sock_release+0xcd/0x280 net/socket.c:650
>> > >>        sock_close+0x1c/0x20 net/socket.c:1365
>> > >>        __fput+0x27c/0xa90 fs/file_table.c:320
>> > >>        task_work_run+0x16f/0x270 kernel/task_work.c:179
>> > >>        exit_task_work include/linux/task_work.h:38 [inline]
>> > >>        do_exit+0xaa8/0x2950 kernel/exit.c:867
>> > >>        do_group_exit+0xd4/0x2a0 kernel/exit.c:1012
>> > >>        get_signal+0x21c3/0x2450 kernel/signal.c:2859
>> > >>        arch_do_signal_or_restart+0x79/0x5c0 arch/x86/kernel/signal.c:306
>> > >>        exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
>> > >>        exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:203
>> > >>        __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
>> > >>        syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296
>> > >>        do_syscall_64+0x46/0xb0 arch/x86/entry/common.c:86
>> > >>        entry_SYSCALL_64_after_hwframe+0x63/0xcd
>> > >>
>> > >> -> #1 (rfcomm_mutex){+.+.}-{3:3}:
>> > >>        __mutex_lock_common kernel/locking/mutex.c:603 [inline]
>> > >>        __mutex_lock+0x12f/0x1360 kernel/locking/mutex.c:747
>> > >>        rfcomm_dlc_open+0x93/0xa80 net/bluetooth/rfcomm/core.c:425
>> > >>        rfcomm_sock_connect+0x329/0x450 net/bluetooth/rfcomm/sock.c:413
>> > >>        __sys_connect_file+0x153/0x1a0 net/socket.c:1976
>> > >>        __sys_connect+0x165/0x1a0 net/socket.c:1993
>> > >>        __do_sys_connect net/socket.c:2003 [inline]
>> > >>        __se_sys_connect net/socket.c:2000 [inline]
>> > >>        __x64_sys_connect+0x73/0xb0 net/socket.c:2000
>> > >>        do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>> > >>        do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
>> > >>        entry_SYSCALL_64_after_hwframe+0x63/0xcd
>> > >>
>> > >> -> #0 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0}:
>> > >>        check_prev_add kernel/locking/lockdep.c:3097 [inline]
>> > >>        check_prevs_add kernel/locking/lockdep.c:3216 [inline]
>> > >>        validate_chain kernel/locking/lockdep.c:3831 [inline]
>> > >>        __lock_acquire+0x2a43/0x56d0 kernel/locking/lockdep.c:5055
>> > >>        lock_acquire kernel/locking/lockdep.c:5668 [inline]
>> > >>        lock_acquire+0x1e3/0x630 kernel/locking/lockdep.c:5633
>> > >>        lock_sock_nested+0x3a/0xf0 net/core/sock.c:3470
>> > >>        lock_sock include/net/sock.h:1725 [inline]
>> > >>        rfcomm_sk_state_change+0x6d/0x3a0 net/bluetooth/rfcomm/sock.c:73
>> > >>        __rfcomm_dlc_close+0x1b1/0x890 net/bluetooth/rfcomm/core.c:489
>> > >>        rfcomm_dlc_close+0x1e9/0x240 net/bluetooth/rfcomm/core.c:520
>> > >>        __rfcomm_sock_close+0x13c/0x250 net/bluetooth/rfcomm/sock.c:220
>> > >>        rfcomm_sock_shutdown+0xd8/0x230 net/bluetooth/rfcomm/sock.c:907
>> > >>        rfcomm_sock_release+0x68/0x140 net/bluetooth/rfcomm/sock.c:928
>> > >>        __sock_release+0xcd/0x280 net/socket.c:650
>> > >>        sock_close+0x1c/0x20 net/socket.c:1365
>> > >>        __fput+0x27c/0xa90 fs/file_table.c:320
>> > >>        task_work_run+0x16f/0x270 kernel/task_work.c:179
>> > >>        exit_task_work include/linux/task_work.h:38 [inline]
>> > >>        do_exit+0xaa8/0x2950 kernel/exit.c:867
>> > >>        do_group_exit+0xd4/0x2a0 kernel/exit.c:1012
>> > >>        get_signal+0x21c3/0x2450 kernel/signal.c:2859
>> > >>        arch_do_signal_or_restart+0x79/0x5c0 arch/x86/kernel/signal.c:306
>> > >>        exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
>> > >>        exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:203
>> > >>        __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
>> > >>        syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296
>> > >>        do_syscall_64+0x46/0xb0 arch/x86/entry/common.c:86
>> > >>        entry_SYSCALL_64_after_hwframe+0x63/0xcd
>> > >>
>> > >> Signed-off-by: Ying Hsu <yinghsu@chromium.org>
>> > >> ---
>> > >> This commit has been tested with a C reproducer on qemu-x86_64
>> > >> and a ChromeOS device.
>> > >>
>> > >> Changes in v2:
>> > >> - Fix potential use-after-free in rfc_comm_sock_connect.
>> > >>
>> > >>  net/bluetooth/rfcomm/sock.c | 7 ++++++-
>> > >>  1 file changed, 6 insertions(+), 1 deletion(-)
>> > >>
>> > >> diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c
>> > >> index 21e24da4847f..4397e14ff560 100644
>> > >> --- a/net/bluetooth/rfcomm/sock.c
>> > >> +++ b/net/bluetooth/rfcomm/sock.c
>> > >> @@ -391,6 +391,7 @@ static int rfcomm_sock_connect(struct socket *sock, struct sockaddr *addr, int a
>> > >>             addr->sa_family != AF_BLUETOOTH)
>> > >>                 return -EINVAL;
>> > >>
>> > >> +       sock_hold(sk);
>> > >>         lock_sock(sk);
>> > >>
>> > >>         if (sk->sk_state != BT_OPEN && sk->sk_state != BT_BOUND) {
>> > >> @@ -410,14 +411,18 @@ static int rfcomm_sock_connect(struct socket *sock, struct sockaddr *addr, int a
>> > >>         d->sec_level = rfcomm_pi(sk)->sec_level;
>> > >>         d->role_switch = rfcomm_pi(sk)->role_switch;
>> > >>
>> > >> +       /* Drop sock lock to avoid potential deadlock with the RFCOMM lock */
>> > >> +       release_sock(sk);
>> > >>         err = rfcomm_dlc_open(d, &rfcomm_pi(sk)->src, &sa->rc_bdaddr,
>> > >>                               sa->rc_channel);
>> > >> -       if (!err)
>> > >> +       lock_sock(sk);
>> > >> +       if (!err && !sock_flag(sk, SOCK_ZAPPED))
>> > >>                 err = bt_sock_wait_state(sk, BT_CONNECTED,
>> > >>                                 sock_sndtimeo(sk, flags & O_NONBLOCK));
>> > >>
>> > >>  done:
>> > >>         release_sock(sk);
>> > >> +       sock_put(sk);
>> > >>         return err;
>> > >>  }
>> > >
>> > >This sounds like a great solution to hold the reference and then
>> >
>> > Why do you need sock_hold/put in the same proto_ops.callback sock opts ?
>> > it should be guaranteed by the caller the sk will remain valid
>> > or if you are paranoid then sock_hold() on your proto_ops.bind() and put()
>> > on your proto_ops.release()
>>
>> It doesn't looks like there is a sock_hold done in the likes of
>> __sys_connect/__sys_connect_file so afaik it is possible that the sk
>> is freed in the meantime if we attempt to release and lock afterward,
>> and about being paranoid I guess we are past that already since with
>> the likes of fuzzing testing is already paranoid in itself.
>>
>> > >checking if the socket has been zapped when attempting to lock_sock,
>> > >so Ive been thinking on generalize this into something like
>> > >bt_sock_connect(sock, addr, alen, callback) so we make sure the
>> > >callback is done while holding a reference but with the socket
>> > >unlocked since typically the underline procedure only needs to access
>> > >the pi(sk) information without changing it e.g. rfcomm_dlc_open,
>> > >anyway Im fine if you don't want to pursue doing it right now but I'm
>> > >afraid these type of locking problem is no restricted to RFCOMM only.
>> > >
>> > >> --
>> > >> 2.39.0.314.g84b9a713c41-goog
>> > >>
>> > >
>> > >
>> > >--
>> > >Luiz Augusto von Dentz
>>
>>
>>
>> --
>> Luiz Augusto von Dentz



-- 
Luiz Augusto von Dentz

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] Bluetooth: Fix possible deadlock in rfcomm_sk_state_change
  2023-01-09 17:19         ` Luiz Augusto von Dentz
@ 2023-01-10  2:15           ` Ying Hsu
  0 siblings, 0 replies; 8+ messages in thread
From: Ying Hsu @ 2023-01-10  2:15 UTC (permalink / raw)
  To: Luiz Augusto von Dentz
  Cc: Saeed Mahameed, linux-bluetooth, marcel, leon,
	chromeos-bluetooth-upstreaming, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Johan Hedberg, Paolo Abeni, linux-kernel, netdev

Sure, let me wrap up the fix for RFCOMM in the next patchset first. We
can follow the generic fix in other patches.

On Tue, Jan 10, 2023 at 1:20 AM Luiz Augusto von Dentz
<luiz.dentz@gmail.com> wrote:
>
> Hi Ying,
>
> On Sat, Jan 7, 2023 at 4:35 AM Ying Hsu <yinghsu@chromium.org> wrote:
> >
> > Hi Luiz,
> >
> > Based on the stack trace reported by syzbot, the deadlock happened in a single process.
> > I'll revise the commit message in the next revision. Thank you for catching that.
> >
> > Generalizing it sounds good.
> > But if it releases the sk lock as below, the do_something() part might be different for different proto.
> > ```
> > bt_sock_connect_v1(..., callback) {
> >     sock_hold(sk);
> >     release_sock(sk);
> >     err = callback(...);
> >     lock_sock(sk);
> >     if (!err && !sock_flag(sk, SOCK_ZAPPED)) {
> >         do_something();
> >     }
> >     sock_put(sk);
> >     return err;
> > }
> > ```
> >
> > Another proposal is to have the callback executed with sk lock acquired, and the callback is almost the same as the original connect function for each proto,
> > but they'll have to manage the sk lock and check its ZAPPED state. What do you think?
> > ```
> > bt_sock_connect_v2(..., callback) {
> >     sock_hold(sk);
> >     lock_sock(sk);
> >     err = callback(...);
> >     release_sock(sk);
> >     sock_put(sk);
> >     return err;
> > }
> >
> > rfcomm_sock_connect(...) {
> >     return bt_sock_connect_v2(..., __rfcomm_sock_connect);
> > }
> > ```
>
> I think it is worth trying to prototype both and see which one we end
> up consolidating more code since we might as well do the likes the
> likes of bt_sock_wait_state, we could also in theory have a parameter
> which indicates if the callback expects the sk to be locked or not.
>
> > On Sat, Jan 7, 2023 at 3:45 AM Luiz Augusto von Dentz <luiz.dentz@gmail.com> wrote:
> >>
> >> Hi Saeed,
> >>
> >> On Thu, Jan 5, 2023 at 5:18 PM Saeed Mahameed <saeed@kernel.org> wrote:
> >> >
> >> > On 04 Jan 14:21, Luiz Augusto von Dentz wrote:
> >> > >Hi Ying,
> >> > >
> >> > >On Wed, Jan 4, 2023 at 7:07 AM Ying Hsu <yinghsu@chromium.org> wrote:
> >> > >>
> >> > >> There's a possible deadlock when two processes are connecting
> >> > >> and closing a RFCOMM socket concurrently. Here's the call trace:
> >> > >
> >> > >Are you sure it is 2 different processes? Usually that would mean 2
> >> > >different sockets (sk) then so they wouldn't share the same lock, so
> >> > >this sounds more like 2 different threads, perhaps it is worth
> >> > >creating a testing case in our rfcomm-tester so we are able to detect
> >> > >this sort of thing in the future.
> >> > >
> >> > >> -> #2 (&d->lock){+.+.}-{3:3}:
> >> > >>        __mutex_lock_common kernel/locking/mutex.c:603 [inline]
> >> > >>        __mutex_lock0x12f/0x1360 kernel/locking/mutex.c:747
> >> > >>        __rfcomm_dlc_close+0x15d/0x890 net/bluetooth/rfcomm/core.c:487
> >> > >>        rfcomm_dlc_close+1e9/0x240 net/bluetooth/rfcomm/core.c:520
> >> > >>        __rfcomm_sock_close+0x13c/0x250 net/bluetooth/rfcomm/sock.c:220
> >> > >>        rfcomm_sock_shutdown+0xd8/0x230 net/bluetooth/rfcomm/sock.c:907
> >> > >>        rfcomm_sock_release+0x68/0x140 net/bluetooth/rfcomm/sock.c:928
> >> > >>        __sock_release+0xcd/0x280 net/socket.c:650
> >> > >>        sock_close+0x1c/0x20 net/socket.c:1365
> >> > >>        __fput+0x27c/0xa90 fs/file_table.c:320
> >> > >>        task_work_run+0x16f/0x270 kernel/task_work.c:179
> >> > >>        exit_task_work include/linux/task_work.h:38 [inline]
> >> > >>        do_exit+0xaa8/0x2950 kernel/exit.c:867
> >> > >>        do_group_exit+0xd4/0x2a0 kernel/exit.c:1012
> >> > >>        get_signal+0x21c3/0x2450 kernel/signal.c:2859
> >> > >>        arch_do_signal_or_restart+0x79/0x5c0 arch/x86/kernel/signal.c:306
> >> > >>        exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
> >> > >>        exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:203
> >> > >>        __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
> >> > >>        syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296
> >> > >>        do_syscall_64+0x46/0xb0 arch/x86/entry/common.c:86
> >> > >>        entry_SYSCALL_64_after_hwframe+0x63/0xcd
> >> > >>
> >> > >> -> #1 (rfcomm_mutex){+.+.}-{3:3}:
> >> > >>        __mutex_lock_common kernel/locking/mutex.c:603 [inline]
> >> > >>        __mutex_lock+0x12f/0x1360 kernel/locking/mutex.c:747
> >> > >>        rfcomm_dlc_open+0x93/0xa80 net/bluetooth/rfcomm/core.c:425
> >> > >>        rfcomm_sock_connect+0x329/0x450 net/bluetooth/rfcomm/sock.c:413
> >> > >>        __sys_connect_file+0x153/0x1a0 net/socket.c:1976
> >> > >>        __sys_connect+0x165/0x1a0 net/socket.c:1993
> >> > >>        __do_sys_connect net/socket.c:2003 [inline]
> >> > >>        __se_sys_connect net/socket.c:2000 [inline]
> >> > >>        __x64_sys_connect+0x73/0xb0 net/socket.c:2000
> >> > >>        do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> >> > >>        do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
> >> > >>        entry_SYSCALL_64_after_hwframe+0x63/0xcd
> >> > >>
> >> > >> -> #0 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0}:
> >> > >>        check_prev_add kernel/locking/lockdep.c:3097 [inline]
> >> > >>        check_prevs_add kernel/locking/lockdep.c:3216 [inline]
> >> > >>        validate_chain kernel/locking/lockdep.c:3831 [inline]
> >> > >>        __lock_acquire+0x2a43/0x56d0 kernel/locking/lockdep.c:5055
> >> > >>        lock_acquire kernel/locking/lockdep.c:5668 [inline]
> >> > >>        lock_acquire+0x1e3/0x630 kernel/locking/lockdep.c:5633
> >> > >>        lock_sock_nested+0x3a/0xf0 net/core/sock.c:3470
> >> > >>        lock_sock include/net/sock.h:1725 [inline]
> >> > >>        rfcomm_sk_state_change+0x6d/0x3a0 net/bluetooth/rfcomm/sock.c:73
> >> > >>        __rfcomm_dlc_close+0x1b1/0x890 net/bluetooth/rfcomm/core.c:489
> >> > >>        rfcomm_dlc_close+0x1e9/0x240 net/bluetooth/rfcomm/core.c:520
> >> > >>        __rfcomm_sock_close+0x13c/0x250 net/bluetooth/rfcomm/sock.c:220
> >> > >>        rfcomm_sock_shutdown+0xd8/0x230 net/bluetooth/rfcomm/sock.c:907
> >> > >>        rfcomm_sock_release+0x68/0x140 net/bluetooth/rfcomm/sock.c:928
> >> > >>        __sock_release+0xcd/0x280 net/socket.c:650
> >> > >>        sock_close+0x1c/0x20 net/socket.c:1365
> >> > >>        __fput+0x27c/0xa90 fs/file_table.c:320
> >> > >>        task_work_run+0x16f/0x270 kernel/task_work.c:179
> >> > >>        exit_task_work include/linux/task_work.h:38 [inline]
> >> > >>        do_exit+0xaa8/0x2950 kernel/exit.c:867
> >> > >>        do_group_exit+0xd4/0x2a0 kernel/exit.c:1012
> >> > >>        get_signal+0x21c3/0x2450 kernel/signal.c:2859
> >> > >>        arch_do_signal_or_restart+0x79/0x5c0 arch/x86/kernel/signal.c:306
> >> > >>        exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
> >> > >>        exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:203
> >> > >>        __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
> >> > >>        syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296
> >> > >>        do_syscall_64+0x46/0xb0 arch/x86/entry/common.c:86
> >> > >>        entry_SYSCALL_64_after_hwframe+0x63/0xcd
> >> > >>
> >> > >> Signed-off-by: Ying Hsu <yinghsu@chromium.org>
> >> > >> ---
> >> > >> This commit has been tested with a C reproducer on qemu-x86_64
> >> > >> and a ChromeOS device.
> >> > >>
> >> > >> Changes in v2:
> >> > >> - Fix potential use-after-free in rfc_comm_sock_connect.
> >> > >>
> >> > >>  net/bluetooth/rfcomm/sock.c | 7 ++++++-
> >> > >>  1 file changed, 6 insertions(+), 1 deletion(-)
> >> > >>
> >> > >> diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c
> >> > >> index 21e24da4847f..4397e14ff560 100644
> >> > >> --- a/net/bluetooth/rfcomm/sock.c
> >> > >> +++ b/net/bluetooth/rfcomm/sock.c
> >> > >> @@ -391,6 +391,7 @@ static int rfcomm_sock_connect(struct socket *sock, struct sockaddr *addr, int a
> >> > >>             addr->sa_family != AF_BLUETOOTH)
> >> > >>                 return -EINVAL;
> >> > >>
> >> > >> +       sock_hold(sk);
> >> > >>         lock_sock(sk);
> >> > >>
> >> > >>         if (sk->sk_state != BT_OPEN && sk->sk_state != BT_BOUND) {
> >> > >> @@ -410,14 +411,18 @@ static int rfcomm_sock_connect(struct socket *sock, struct sockaddr *addr, int a
> >> > >>         d->sec_level = rfcomm_pi(sk)->sec_level;
> >> > >>         d->role_switch = rfcomm_pi(sk)->role_switch;
> >> > >>
> >> > >> +       /* Drop sock lock to avoid potential deadlock with the RFCOMM lock */
> >> > >> +       release_sock(sk);
> >> > >>         err = rfcomm_dlc_open(d, &rfcomm_pi(sk)->src, &sa->rc_bdaddr,
> >> > >>                               sa->rc_channel);
> >> > >> -       if (!err)
> >> > >> +       lock_sock(sk);
> >> > >> +       if (!err && !sock_flag(sk, SOCK_ZAPPED))
> >> > >>                 err = bt_sock_wait_state(sk, BT_CONNECTED,
> >> > >>                                 sock_sndtimeo(sk, flags & O_NONBLOCK));
> >> > >>
> >> > >>  done:
> >> > >>         release_sock(sk);
> >> > >> +       sock_put(sk);
> >> > >>         return err;
> >> > >>  }
> >> > >
> >> > >This sounds like a great solution to hold the reference and then
> >> >
> >> > Why do you need sock_hold/put in the same proto_ops.callback sock opts ?
> >> > it should be guaranteed by the caller the sk will remain valid
> >> > or if you are paranoid then sock_hold() on your proto_ops.bind() and put()
> >> > on your proto_ops.release()
> >>
> >> It doesn't looks like there is a sock_hold done in the likes of
> >> __sys_connect/__sys_connect_file so afaik it is possible that the sk
> >> is freed in the meantime if we attempt to release and lock afterward,
> >> and about being paranoid I guess we are past that already since with
> >> the likes of fuzzing testing is already paranoid in itself.
> >>
> >> > >checking if the socket has been zapped when attempting to lock_sock,
> >> > >so Ive been thinking on generalize this into something like
> >> > >bt_sock_connect(sock, addr, alen, callback) so we make sure the
> >> > >callback is done while holding a reference but with the socket
> >> > >unlocked since typically the underline procedure only needs to access
> >> > >the pi(sk) information without changing it e.g. rfcomm_dlc_open,
> >> > >anyway Im fine if you don't want to pursue doing it right now but I'm
> >> > >afraid these type of locking problem is no restricted to RFCOMM only.
> >> > >
> >> > >> --
> >> > >> 2.39.0.314.g84b9a713c41-goog
> >> > >>
> >> > >
> >> > >
> >> > >--
> >> > >Luiz Augusto von Dentz
> >>
> >>
> >>
> >> --
> >> Luiz Augusto von Dentz
>
>
>
> --
> Luiz Augusto von Dentz

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] Bluetooth: Fix possible deadlock in rfcomm_sk_state_change
  2023-01-06 19:44     ` Luiz Augusto von Dentz
  2023-01-09  7:06       ` Ying Hsu
       [not found]       ` <CAAa9mD3hybWw0cZ9_p2ZWKQFFmPUvsj3efOB_j9VLd_4RgcmPA@mail.gmail.com>
@ 2023-01-10  8:58       ` Saeed Mahameed
  2 siblings, 0 replies; 8+ messages in thread
From: Saeed Mahameed @ 2023-01-10  8:58 UTC (permalink / raw)
  To: Luiz Augusto von Dentz
  Cc: Ying Hsu, linux-bluetooth, marcel, leon,
	chromeos-bluetooth-upstreaming, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Johan Hedberg, Paolo Abeni, linux-kernel, netdev

On 06 Jan 11:44, Luiz Augusto von Dentz wrote:
>Hi Saeed,
>
>On Thu, Jan 5, 2023 at 5:18 PM Saeed Mahameed <saeed@kernel.org> wrote:
>>
>> On 04 Jan 14:21, Luiz Augusto von Dentz wrote:
>> >Hi Ying,
>> >
>> >On Wed, Jan 4, 2023 at 7:07 AM Ying Hsu <yinghsu@chromium.org> wrote:
>> >>
>> >> There's a possible deadlock when two processes are connecting
>> >> and closing a RFCOMM socket concurrently. Here's the call trace:
>> >
>> >Are you sure it is 2 different processes? Usually that would mean 2
>> >different sockets (sk) then so they wouldn't share the same lock, so
>> >this sounds more like 2 different threads, perhaps it is worth
>> >creating a testing case in our rfcomm-tester so we are able to detect
>> >this sort of thing in the future.
>> >
>> >> -> #2 (&d->lock){+.+.}-{3:3}:
>> >>        __mutex_lock_common kernel/locking/mutex.c:603 [inline]
>> >>        __mutex_lock0x12f/0x1360 kernel/locking/mutex.c:747
>> >>        __rfcomm_dlc_close+0x15d/0x890 net/bluetooth/rfcomm/core.c:487
>> >>        rfcomm_dlc_close+1e9/0x240 net/bluetooth/rfcomm/core.c:520
>> >>        __rfcomm_sock_close+0x13c/0x250 net/bluetooth/rfcomm/sock.c:220
>> >>        rfcomm_sock_shutdown+0xd8/0x230 net/bluetooth/rfcomm/sock.c:907
>> >>        rfcomm_sock_release+0x68/0x140 net/bluetooth/rfcomm/sock.c:928
>> >>        __sock_release+0xcd/0x280 net/socket.c:650
>> >>        sock_close+0x1c/0x20 net/socket.c:1365
>> >>        __fput+0x27c/0xa90 fs/file_table.c:320
>> >>        task_work_run+0x16f/0x270 kernel/task_work.c:179
>> >>        exit_task_work include/linux/task_work.h:38 [inline]
>> >>        do_exit+0xaa8/0x2950 kernel/exit.c:867
>> >>        do_group_exit+0xd4/0x2a0 kernel/exit.c:1012
>> >>        get_signal+0x21c3/0x2450 kernel/signal.c:2859
>> >>        arch_do_signal_or_restart+0x79/0x5c0 arch/x86/kernel/signal.c:306
>> >>        exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
>> >>        exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:203
>> >>        __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
>> >>        syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296
>> >>        do_syscall_64+0x46/0xb0 arch/x86/entry/common.c:86
>> >>        entry_SYSCALL_64_after_hwframe+0x63/0xcd
>> >>
>> >> -> #1 (rfcomm_mutex){+.+.}-{3:3}:
>> >>        __mutex_lock_common kernel/locking/mutex.c:603 [inline]
>> >>        __mutex_lock+0x12f/0x1360 kernel/locking/mutex.c:747
>> >>        rfcomm_dlc_open+0x93/0xa80 net/bluetooth/rfcomm/core.c:425
>> >>        rfcomm_sock_connect+0x329/0x450 net/bluetooth/rfcomm/sock.c:413
>> >>        __sys_connect_file+0x153/0x1a0 net/socket.c:1976
>> >>        __sys_connect+0x165/0x1a0 net/socket.c:1993
>> >>        __do_sys_connect net/socket.c:2003 [inline]
>> >>        __se_sys_connect net/socket.c:2000 [inline]
>> >>        __x64_sys_connect+0x73/0xb0 net/socket.c:2000
>> >>        do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>> >>        do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
>> >>        entry_SYSCALL_64_after_hwframe+0x63/0xcd
>> >>
>> >> -> #0 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0}:
>> >>        check_prev_add kernel/locking/lockdep.c:3097 [inline]
>> >>        check_prevs_add kernel/locking/lockdep.c:3216 [inline]
>> >>        validate_chain kernel/locking/lockdep.c:3831 [inline]
>> >>        __lock_acquire+0x2a43/0x56d0 kernel/locking/lockdep.c:5055
>> >>        lock_acquire kernel/locking/lockdep.c:5668 [inline]
>> >>        lock_acquire+0x1e3/0x630 kernel/locking/lockdep.c:5633
>> >>        lock_sock_nested+0x3a/0xf0 net/core/sock.c:3470
>> >>        lock_sock include/net/sock.h:1725 [inline]
>> >>        rfcomm_sk_state_change+0x6d/0x3a0 net/bluetooth/rfcomm/sock.c:73
>> >>        __rfcomm_dlc_close+0x1b1/0x890 net/bluetooth/rfcomm/core.c:489
>> >>        rfcomm_dlc_close+0x1e9/0x240 net/bluetooth/rfcomm/core.c:520
>> >>        __rfcomm_sock_close+0x13c/0x250 net/bluetooth/rfcomm/sock.c:220
>> >>        rfcomm_sock_shutdown+0xd8/0x230 net/bluetooth/rfcomm/sock.c:907
>> >>        rfcomm_sock_release+0x68/0x140 net/bluetooth/rfcomm/sock.c:928
>> >>        __sock_release+0xcd/0x280 net/socket.c:650
>> >>        sock_close+0x1c/0x20 net/socket.c:1365
>> >>        __fput+0x27c/0xa90 fs/file_table.c:320
>> >>        task_work_run+0x16f/0x270 kernel/task_work.c:179
>> >>        exit_task_work include/linux/task_work.h:38 [inline]
>> >>        do_exit+0xaa8/0x2950 kernel/exit.c:867
>> >>        do_group_exit+0xd4/0x2a0 kernel/exit.c:1012
>> >>        get_signal+0x21c3/0x2450 kernel/signal.c:2859
>> >>        arch_do_signal_or_restart+0x79/0x5c0 arch/x86/kernel/signal.c:306
>> >>        exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
>> >>        exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:203
>> >>        __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
>> >>        syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296
>> >>        do_syscall_64+0x46/0xb0 arch/x86/entry/common.c:86
>> >>        entry_SYSCALL_64_after_hwframe+0x63/0xcd
>> >>
>> >> Signed-off-by: Ying Hsu <yinghsu@chromium.org>
>> >> ---
>> >> This commit has been tested with a C reproducer on qemu-x86_64
>> >> and a ChromeOS device.
>> >>
>> >> Changes in v2:
>> >> - Fix potential use-after-free in rfc_comm_sock_connect.
>> >>
>> >>  net/bluetooth/rfcomm/sock.c | 7 ++++++-
>> >>  1 file changed, 6 insertions(+), 1 deletion(-)
>> >>
>> >> diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c
>> >> index 21e24da4847f..4397e14ff560 100644
>> >> --- a/net/bluetooth/rfcomm/sock.c
>> >> +++ b/net/bluetooth/rfcomm/sock.c
>> >> @@ -391,6 +391,7 @@ static int rfcomm_sock_connect(struct socket *sock, struct sockaddr *addr, int a
>> >>             addr->sa_family != AF_BLUETOOTH)
>> >>                 return -EINVAL;
>> >>
>> >> +       sock_hold(sk);
>> >>         lock_sock(sk);
>> >>
>> >>         if (sk->sk_state != BT_OPEN && sk->sk_state != BT_BOUND) {
>> >> @@ -410,14 +411,18 @@ static int rfcomm_sock_connect(struct socket *sock, struct sockaddr *addr, int a
>> >>         d->sec_level = rfcomm_pi(sk)->sec_level;
>> >>         d->role_switch = rfcomm_pi(sk)->role_switch;
>> >>
>> >> +       /* Drop sock lock to avoid potential deadlock with the RFCOMM lock */
>> >> +       release_sock(sk);
>> >>         err = rfcomm_dlc_open(d, &rfcomm_pi(sk)->src, &sa->rc_bdaddr,
>> >>                               sa->rc_channel);
>> >> -       if (!err)
>> >> +       lock_sock(sk);
>> >> +       if (!err && !sock_flag(sk, SOCK_ZAPPED))
>> >>                 err = bt_sock_wait_state(sk, BT_CONNECTED,
>> >>                                 sock_sndtimeo(sk, flags & O_NONBLOCK));
>> >>
>> >>  done:
>> >>         release_sock(sk);
>> >> +       sock_put(sk);
>> >>         return err;
>> >>  }
>> >
>> >This sounds like a great solution to hold the reference and then
>>
>> Why do you need sock_hold/put in the same proto_ops.callback sock opts ?
>> it should be guaranteed by the caller the sk will remain valid
>> or if you are paranoid then sock_hold() on your proto_ops.bind() and put()
>> on your proto_ops.release()
>
>It doesn't looks like there is a sock_hold done in the likes of
>__sys_connect/__sys_connect_file so afaik it is possible that the sk
>is freed in the meantime if we attempt to release and lock afterward,
>and about being paranoid I guess we are past that already since with
>the likes of fuzzing testing is already paranoid in itself.
>

My point is, if you claim that the sk can be freed from another process
after you call release_sock(sk);  this means it also can be free by 
another process before you call lock_sock(sk); so what makes the first
lock_sock(sk); safe in first place ? or after you changed the code to do
sock_hold(sk) what makes sock_hold(sk) safe if another process can free it
before you hold it ? 


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-01-10  9:01 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-04 15:07 [PATCH v2] Bluetooth: Fix possible deadlock in rfcomm_sk_state_change Ying Hsu
2023-01-04 22:21 ` Luiz Augusto von Dentz
2023-01-06  1:18   ` Saeed Mahameed
2023-01-06 19:44     ` Luiz Augusto von Dentz
2023-01-09  7:06       ` Ying Hsu
     [not found]       ` <CAAa9mD3hybWw0cZ9_p2ZWKQFFmPUvsj3efOB_j9VLd_4RgcmPA@mail.gmail.com>
2023-01-09 17:19         ` Luiz Augusto von Dentz
2023-01-10  2:15           ` Ying Hsu
2023-01-10  8:58       ` Saeed Mahameed

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).