linux-bluetooth.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2] Bluetooth: fix inconsistent lock states
@ 2021-07-21  9:38 Desmond Cheong Zhi Xi
  2021-07-21  9:38 ` [PATCH v3 1/2] Bluetooth: fix inconsistent lock state in SCO Desmond Cheong Zhi Xi
  2021-07-21  9:38 ` [PATCH v3 2/2] Bluetooth: fix inconsistent lock state in rfcomm_connect_ind Desmond Cheong Zhi Xi
  0 siblings, 2 replies; 9+ messages in thread
From: Desmond Cheong Zhi Xi @ 2021-07-21  9:38 UTC (permalink / raw)
  To: marcel, johan.hedberg, luiz.dentz, davem, kuba, matthieu.baerts, stefan
  Cc: Desmond Cheong Zhi Xi, linux-bluetooth, netdev, linux-kernel,
	skhan, gregkh, linux-kernel-mentees

Hi,

This series addresses inconsistent lock states first identified by
Syzbot here:
https://syzkaller.appspot.com/bug?extid=2f6d7c28bb4bf7e82060

v2 -> v3:
- Split SCO and RFCOMM code changes, as suggested by Luiz Augusto von
Dentz.
- Simplify local bh disabling in SCO by using local_bh_disable/enable
inside sco_chan_del. The rationale is inside the commit message, but in
summary I initially wanted to avoid nesting local_bh_disable until I
learned that local_bh_disable/enable pairs are reentrant.

v1 -> v2:
- Instead of pulling out the clean-up code out from sco_chan_del and
using it directly in sco_conn_del, disable local softirqs for relevant
sections.
- Disable local softirqs more thoroughly for instances of
bh_lock_sock/bh_lock_sock_nested in the bluetooth subsystem.
Specifically, the calls in af_bluetooth.c and rfcomm/sock.c are now made
with local softirqs disabled as well.

Best wishes,
Desmond

Desmond Cheong Zhi Xi (2):
  Bluetooth: fix inconsistent lock state in SCO
  Bluetooth: fix inconsistent lock state in rfcomm_connect_ind

 net/bluetooth/rfcomm/sock.c |  2 ++
 net/bluetooth/sco.c         | 21 +++++++++++++++++++++
 2 files changed, 23 insertions(+)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 1/2] Bluetooth: fix inconsistent lock state in SCO
  2021-07-21  9:38 [PATCH v3 0/2] Bluetooth: fix inconsistent lock states Desmond Cheong Zhi Xi
@ 2021-07-21  9:38 ` Desmond Cheong Zhi Xi
  2021-07-21 11:09   ` Bluetooth: fix inconsistent lock states bluez.test.bot
  2021-07-27  0:30   ` [PATCH v3 1/2] Bluetooth: fix inconsistent lock state in SCO Luiz Augusto von Dentz
  2021-07-21  9:38 ` [PATCH v3 2/2] Bluetooth: fix inconsistent lock state in rfcomm_connect_ind Desmond Cheong Zhi Xi
  1 sibling, 2 replies; 9+ messages in thread
From: Desmond Cheong Zhi Xi @ 2021-07-21  9:38 UTC (permalink / raw)
  To: marcel, johan.hedberg, luiz.dentz, davem, kuba, matthieu.baerts, stefan
  Cc: Desmond Cheong Zhi Xi, linux-bluetooth, netdev, linux-kernel,
	skhan, gregkh, linux-kernel-mentees, syzbot+2f6d7c28bb4bf7e82060

Syzbot reported an inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} lock
usage in sco_conn_del and sco_sock_timeout that could lead to
deadlocks.

This inconsistent lock state can also happen in sco_conn_ready,
rfcomm_connect_ind, and bt_accept_enqueue.

The issue is that these functions take a spin lock on the socket with
interrupts enabled, but sco_sock_timeout takes the lock in an IRQ
context. This could lead to deadlocks:

       CPU0
       ----
  lock(slock-AF_BLUETOOTH-BTPROTO_SCO);
  <Interrupt>
    lock(slock-AF_BLUETOOTH-BTPROTO_SCO);

 *** DEADLOCK ***

We fix this by ensuring that local bh is disabled before calling
bh_lock_sock.

After doing this, we additionally need to protect sco_conn_lock by
disabling local bh.

This is necessary because sco_conn_del makes a call to sco_chan_del
while holding on to the sock lock, and sco_chan_del itself makes a
call to sco_conn_lock. If sco_conn_lock is held elsewhere with
interrupts enabled, there could still be a
slock-AF_BLUETOOTH-BTPROTO_SCO --> &conn->lock#2 lock inversion as
follows:

        CPU0                    CPU1
        ----                    ----
   lock(&conn->lock#2);
                                local_irq_disable();
                                lock(slock-AF_BLUETOOTH-BTPROTO_SCO);
                                lock(&conn->lock#2);
   <Interrupt>
     lock(slock-AF_BLUETOOTH-BTPROTO_SCO);

  *** DEADLOCK ***

Although sco_conn_del disables local bh before calling sco_chan_del,
we can still wrap the calls to sco_conn_lock in sco_chan_del, with
local_bh_disable/enable as this pair of functions are reentrant.

Reported-by: syzbot+2f6d7c28bb4bf7e82060@syzkaller.appspotmail.com
Tested-by: syzbot+2f6d7c28bb4bf7e82060@syzkaller.appspotmail.com
Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
---
 net/bluetooth/sco.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c
index 3bd41563f118..34f3419c3330 100644
--- a/net/bluetooth/sco.c
+++ b/net/bluetooth/sco.c
@@ -140,10 +140,12 @@ static void sco_chan_del(struct sock *sk, int err)
 	BT_DBG("sk %p, conn %p, err %d", sk, conn, err);
 
 	if (conn) {
+		local_bh_disable();
 		sco_conn_lock(conn);
 		conn->sk = NULL;
 		sco_pi(sk)->conn = NULL;
 		sco_conn_unlock(conn);
+		local_bh_enable();
 
 		if (conn->hcon)
 			hci_conn_drop(conn->hcon);
@@ -167,16 +169,22 @@ static void sco_conn_del(struct hci_conn *hcon, int err)
 	BT_DBG("hcon %p conn %p, err %d", hcon, conn, err);
 
 	/* Kill socket */
+	local_bh_disable();
 	sco_conn_lock(conn);
 	sk = conn->sk;
 	sco_conn_unlock(conn);
+	local_bh_enable();
 
 	if (sk) {
 		sock_hold(sk);
+
+		local_bh_disable();
 		bh_lock_sock(sk);
 		sco_sock_clear_timer(sk);
 		sco_chan_del(sk, err);
 		bh_unlock_sock(sk);
+		local_bh_enable();
+
 		sco_sock_kill(sk);
 		sock_put(sk);
 	}
@@ -202,6 +210,7 @@ static int sco_chan_add(struct sco_conn *conn, struct sock *sk,
 {
 	int err = 0;
 
+	local_bh_disable();
 	sco_conn_lock(conn);
 	if (conn->sk)
 		err = -EBUSY;
@@ -209,6 +218,7 @@ static int sco_chan_add(struct sco_conn *conn, struct sock *sk,
 		__sco_chan_add(conn, sk, parent);
 
 	sco_conn_unlock(conn);
+	local_bh_enable();
 	return err;
 }
 
@@ -303,9 +313,11 @@ static void sco_recv_frame(struct sco_conn *conn, struct sk_buff *skb)
 {
 	struct sock *sk;
 
+	local_bh_disable();
 	sco_conn_lock(conn);
 	sk = conn->sk;
 	sco_conn_unlock(conn);
+	local_bh_enable();
 
 	if (!sk)
 		goto drop;
@@ -420,10 +432,12 @@ static void __sco_sock_close(struct sock *sk)
 		if (sco_pi(sk)->conn->hcon) {
 			sk->sk_state = BT_DISCONN;
 			sco_sock_set_timer(sk, SCO_DISCONN_TIMEOUT);
+			local_bh_disable();
 			sco_conn_lock(sco_pi(sk)->conn);
 			hci_conn_drop(sco_pi(sk)->conn->hcon);
 			sco_pi(sk)->conn->hcon = NULL;
 			sco_conn_unlock(sco_pi(sk)->conn);
+			local_bh_enable();
 		} else
 			sco_chan_del(sk, ECONNRESET);
 		break;
@@ -1084,21 +1098,26 @@ static void sco_conn_ready(struct sco_conn *conn)
 
 	if (sk) {
 		sco_sock_clear_timer(sk);
+		local_bh_disable();
 		bh_lock_sock(sk);
 		sk->sk_state = BT_CONNECTED;
 		sk->sk_state_change(sk);
 		bh_unlock_sock(sk);
+		local_bh_enable();
 	} else {
+		local_bh_disable();
 		sco_conn_lock(conn);
 
 		if (!conn->hcon) {
 			sco_conn_unlock(conn);
+			local_bh_enable();
 			return;
 		}
 
 		parent = sco_get_sock_listen(&conn->hcon->src);
 		if (!parent) {
 			sco_conn_unlock(conn);
+			local_bh_enable();
 			return;
 		}
 
@@ -1109,6 +1128,7 @@ static void sco_conn_ready(struct sco_conn *conn)
 		if (!sk) {
 			bh_unlock_sock(parent);
 			sco_conn_unlock(conn);
+			local_bh_enable();
 			return;
 		}
 
@@ -1131,6 +1151,7 @@ static void sco_conn_ready(struct sco_conn *conn)
 		bh_unlock_sock(parent);
 
 		sco_conn_unlock(conn);
+		local_bh_enable();
 	}
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 2/2] Bluetooth: fix inconsistent lock state in rfcomm_connect_ind
  2021-07-21  9:38 [PATCH v3 0/2] Bluetooth: fix inconsistent lock states Desmond Cheong Zhi Xi
  2021-07-21  9:38 ` [PATCH v3 1/2] Bluetooth: fix inconsistent lock state in SCO Desmond Cheong Zhi Xi
@ 2021-07-21  9:38 ` Desmond Cheong Zhi Xi
  2021-07-29 19:53   ` Marcel Holtmann
  1 sibling, 1 reply; 9+ messages in thread
From: Desmond Cheong Zhi Xi @ 2021-07-21  9:38 UTC (permalink / raw)
  To: marcel, johan.hedberg, luiz.dentz, davem, kuba, matthieu.baerts, stefan
  Cc: Desmond Cheong Zhi Xi, linux-bluetooth, netdev, linux-kernel,
	skhan, gregkh, linux-kernel-mentees

Commit fad003b6c8e3d ("Bluetooth: Fix inconsistent lock state with
RFCOMM") fixed a lockdep warning due to sk->sk_lock.slock being
acquired without disabling softirq while the lock is also used in
softirq context. This was done by disabling interrupts before calling
bh_lock_sock in rfcomm_sk_state_change.

Later, this was changed in commit e6da0edc24ee ("Bluetooth: Acquire
sk_lock.slock without disabling interrupts") to disable softirqs
only.

However, there is another instance of sk->sk_lock.slock being acquired
without disabling softirq in rfcomm_connect_ind. This patch fixes this
by disabling local bh before the call to bh_lock_sock.

Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
---
 net/bluetooth/rfcomm/sock.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c
index ae6f80730561..d8734abb2df4 100644
--- a/net/bluetooth/rfcomm/sock.c
+++ b/net/bluetooth/rfcomm/sock.c
@@ -974,6 +974,7 @@ int rfcomm_connect_ind(struct rfcomm_session *s, u8 channel, struct rfcomm_dlc *
 	if (!parent)
 		return 0;
 
+	local_bh_disable();
 	bh_lock_sock(parent);
 
 	/* Check for backlog size */
@@ -1002,6 +1003,7 @@ int rfcomm_connect_ind(struct rfcomm_session *s, u8 channel, struct rfcomm_dlc *
 
 done:
 	bh_unlock_sock(parent);
+	local_bh_enable();
 
 	if (test_bit(BT_SK_DEFER_SETUP, &bt_sk(parent)->flags))
 		parent->sk_state_change(parent);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* RE: Bluetooth: fix inconsistent lock states
  2021-07-21  9:38 ` [PATCH v3 1/2] Bluetooth: fix inconsistent lock state in SCO Desmond Cheong Zhi Xi
@ 2021-07-21 11:09   ` bluez.test.bot
  2021-07-27  0:30   ` [PATCH v3 1/2] Bluetooth: fix inconsistent lock state in SCO Luiz Augusto von Dentz
  1 sibling, 0 replies; 9+ messages in thread
From: bluez.test.bot @ 2021-07-21 11:09 UTC (permalink / raw)
  To: linux-bluetooth, desmondcheongzx

[-- Attachment #1: Type: text/plain, Size: 3513 bytes --]

This is automated email and please do not reply to this email!

Dear submitter,

Thank you for submitting the patches to the linux bluetooth mailing list.
This is a CI test results with your patch series:
PW Link:https://patchwork.kernel.org/project/bluetooth/list/?series=518815

---Test result---

Test Summary:
CheckPatch                    FAIL      0.90 seconds
GitLint                       PASS      0.24 seconds
BuildKernel                   PASS      598.23 seconds
TestRunner: Setup             PASS      394.48 seconds
TestRunner: l2cap-tester      PASS      2.88 seconds
TestRunner: bnep-tester       PASS      2.10 seconds
TestRunner: mgmt-tester       PASS      31.86 seconds
TestRunner: rfcomm-tester     PASS      2.40 seconds
TestRunner: sco-tester        PASS      2.25 seconds
TestRunner: smp-tester        FAIL      2.28 seconds
TestRunner: userchan-tester   PASS      2.14 seconds

Details
##############################
Test: CheckPatch - FAIL - 0.90 seconds
Run checkpatch.pl script with rule in .checkpatch.conf
Bluetooth: fix inconsistent lock state in rfcomm_connect_ind
WARNING: Unknown commit id 'fad003b6c8e3d', maybe rebased or not pulled?
#6: 
Commit fad003b6c8e3d ("Bluetooth: Fix inconsistent lock state with

WARNING: Unknown commit id 'e6da0edc24ee', maybe rebased or not pulled?
#12: 
Later, this was changed in commit e6da0edc24ee ("Bluetooth: Acquire

total: 0 errors, 2 warnings, 0 checks, 14 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

"[PATCH] Bluetooth: fix inconsistent lock state in rfcomm_connect_ind" has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.


##############################
Test: GitLint - PASS - 0.24 seconds
Run gitlint with rule in .gitlint


##############################
Test: BuildKernel - PASS - 598.23 seconds
Build Kernel with minimal configuration supports Bluetooth


##############################
Test: TestRunner: Setup - PASS - 394.48 seconds
Setup environment for running Test Runner


##############################
Test: TestRunner: l2cap-tester - PASS - 2.88 seconds
Run test-runner with l2cap-tester
Total: 40, Passed: 40 (100.0%), Failed: 0, Not Run: 0

##############################
Test: TestRunner: bnep-tester - PASS - 2.10 seconds
Run test-runner with bnep-tester
Total: 1, Passed: 1 (100.0%), Failed: 0, Not Run: 0

##############################
Test: TestRunner: mgmt-tester - PASS - 31.86 seconds
Run test-runner with mgmt-tester
Total: 448, Passed: 445 (99.3%), Failed: 0, Not Run: 3

##############################
Test: TestRunner: rfcomm-tester - PASS - 2.40 seconds
Run test-runner with rfcomm-tester
Total: 9, Passed: 9 (100.0%), Failed: 0, Not Run: 0

##############################
Test: TestRunner: sco-tester - PASS - 2.25 seconds
Run test-runner with sco-tester
Total: 8, Passed: 8 (100.0%), Failed: 0, Not Run: 0

##############################
Test: TestRunner: smp-tester - FAIL - 2.28 seconds
Run test-runner with smp-tester
Total: 8, Passed: 7 (87.5%), Failed: 1, Not Run: 0

Failed Test Cases
SMP Client - SC Request 2                            Failed       0.022 seconds

##############################
Test: TestRunner: userchan-tester - PASS - 2.14 seconds
Run test-runner with userchan-tester
Total: 3, Passed: 3 (100.0%), Failed: 0, Not Run: 0



---
Regards,
Linux Bluetooth


[-- Attachment #2: l2cap-tester.log --]
[-- Type: application/octet-stream, Size: 44350 bytes --]

[-- Attachment #3: bnep-tester.log --]
[-- Type: application/octet-stream, Size: 3556 bytes --]

[-- Attachment #4: mgmt-tester.log --]
[-- Type: application/octet-stream, Size: 616827 bytes --]

[-- Attachment #5: rfcomm-tester.log --]
[-- Type: application/octet-stream, Size: 11676 bytes --]

[-- Attachment #6: sco-tester.log --]
[-- Type: application/octet-stream, Size: 9912 bytes --]

[-- Attachment #7: smp-tester.log --]
[-- Type: application/octet-stream, Size: 11705 bytes --]

[-- Attachment #8: userchan-tester.log --]
[-- Type: application/octet-stream, Size: 5453 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/2] Bluetooth: fix inconsistent lock state in SCO
  2021-07-21  9:38 ` [PATCH v3 1/2] Bluetooth: fix inconsistent lock state in SCO Desmond Cheong Zhi Xi
  2021-07-21 11:09   ` Bluetooth: fix inconsistent lock states bluez.test.bot
@ 2021-07-27  0:30   ` Luiz Augusto von Dentz
  2021-07-27  5:13     ` Desmond Cheong Zhi Xi
  1 sibling, 1 reply; 9+ messages in thread
From: Luiz Augusto von Dentz @ 2021-07-27  0:30 UTC (permalink / raw)
  To: Desmond Cheong Zhi Xi
  Cc: Marcel Holtmann, Johan Hedberg, David Miller, Jakub Kicinski,
	matthieu.baerts, stefan, linux-bluetooth,
	open list:NETWORKING [GENERAL],
	Linux Kernel Mailing List, skhan, Greg Kroah-Hartman,
	linux-kernel-mentees, syzbot+2f6d7c28bb4bf7e82060

Hi Desmond,

On Wed, Jul 21, 2021 at 2:39 AM Desmond Cheong Zhi Xi
<desmondcheongzx@gmail.com> wrote:
>
> Syzbot reported an inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} lock
> usage in sco_conn_del and sco_sock_timeout that could lead to
> deadlocks.
>
> This inconsistent lock state can also happen in sco_conn_ready,
> rfcomm_connect_ind, and bt_accept_enqueue.
>
> The issue is that these functions take a spin lock on the socket with
> interrupts enabled, but sco_sock_timeout takes the lock in an IRQ
> context. This could lead to deadlocks:
>
>        CPU0
>        ----
>   lock(slock-AF_BLUETOOTH-BTPROTO_SCO);
>   <Interrupt>
>     lock(slock-AF_BLUETOOTH-BTPROTO_SCO);

Having a second look at this, it does seem this is due to use of
sk->sk_timer which apparently run its callback on IRQ context, so I
wonder if wouldn't be a better idea to switch to a delayed_work to
avoid having to deal with the likes of local_bh_disable, in fact it
seems we have been misusing it since the documentation says it is for
sock cleanup not for handling things like SNDTIMEO, we don't really
use it for other socket types so I wonder when we start using
delayed_work we forgot about sco.c.

>  *** DEADLOCK ***
>
> We fix this by ensuring that local bh is disabled before calling
> bh_lock_sock.
>
> After doing this, we additionally need to protect sco_conn_lock by
> disabling local bh.
>
> This is necessary because sco_conn_del makes a call to sco_chan_del
> while holding on to the sock lock, and sco_chan_del itself makes a
> call to sco_conn_lock. If sco_conn_lock is held elsewhere with
> interrupts enabled, there could still be a
> slock-AF_BLUETOOTH-BTPROTO_SCO --> &conn->lock#2 lock inversion as
> follows:
>
>         CPU0                    CPU1
>         ----                    ----
>    lock(&conn->lock#2);
>                                 local_irq_disable();
>                                 lock(slock-AF_BLUETOOTH-BTPROTO_SCO);
>                                 lock(&conn->lock#2);
>    <Interrupt>
>      lock(slock-AF_BLUETOOTH-BTPROTO_SCO);
>
>   *** DEADLOCK ***
>
> Although sco_conn_del disables local bh before calling sco_chan_del,
> we can still wrap the calls to sco_conn_lock in sco_chan_del, with
> local_bh_disable/enable as this pair of functions are reentrant.
>
> Reported-by: syzbot+2f6d7c28bb4bf7e82060@syzkaller.appspotmail.com
> Tested-by: syzbot+2f6d7c28bb4bf7e82060@syzkaller.appspotmail.com
> Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
> ---
>  net/bluetooth/sco.c | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
>
> diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c
> index 3bd41563f118..34f3419c3330 100644
> --- a/net/bluetooth/sco.c
> +++ b/net/bluetooth/sco.c
> @@ -140,10 +140,12 @@ static void sco_chan_del(struct sock *sk, int err)
>         BT_DBG("sk %p, conn %p, err %d", sk, conn, err);
>
>         if (conn) {
> +               local_bh_disable();
>                 sco_conn_lock(conn);
>                 conn->sk = NULL;
>                 sco_pi(sk)->conn = NULL;
>                 sco_conn_unlock(conn);
> +               local_bh_enable();
>
>                 if (conn->hcon)
>                         hci_conn_drop(conn->hcon);
> @@ -167,16 +169,22 @@ static void sco_conn_del(struct hci_conn *hcon, int err)
>         BT_DBG("hcon %p conn %p, err %d", hcon, conn, err);
>
>         /* Kill socket */
> +       local_bh_disable();
>         sco_conn_lock(conn);
>         sk = conn->sk;
>         sco_conn_unlock(conn);
> +       local_bh_enable();
>
>         if (sk) {
>                 sock_hold(sk);
> +
> +               local_bh_disable();
>                 bh_lock_sock(sk);
>                 sco_sock_clear_timer(sk);
>                 sco_chan_del(sk, err);
>                 bh_unlock_sock(sk);
> +               local_bh_enable();
> +
>                 sco_sock_kill(sk);
>                 sock_put(sk);
>         }
> @@ -202,6 +210,7 @@ static int sco_chan_add(struct sco_conn *conn, struct sock *sk,
>  {
>         int err = 0;
>
> +       local_bh_disable();
>         sco_conn_lock(conn);
>         if (conn->sk)
>                 err = -EBUSY;
> @@ -209,6 +218,7 @@ static int sco_chan_add(struct sco_conn *conn, struct sock *sk,
>                 __sco_chan_add(conn, sk, parent);
>
>         sco_conn_unlock(conn);
> +       local_bh_enable();
>         return err;
>  }
>
> @@ -303,9 +313,11 @@ static void sco_recv_frame(struct sco_conn *conn, struct sk_buff *skb)
>  {
>         struct sock *sk;
>
> +       local_bh_disable();
>         sco_conn_lock(conn);
>         sk = conn->sk;
>         sco_conn_unlock(conn);
> +       local_bh_enable();
>
>         if (!sk)
>                 goto drop;
> @@ -420,10 +432,12 @@ static void __sco_sock_close(struct sock *sk)
>                 if (sco_pi(sk)->conn->hcon) {
>                         sk->sk_state = BT_DISCONN;
>                         sco_sock_set_timer(sk, SCO_DISCONN_TIMEOUT);
> +                       local_bh_disable();
>                         sco_conn_lock(sco_pi(sk)->conn);
>                         hci_conn_drop(sco_pi(sk)->conn->hcon);
>                         sco_pi(sk)->conn->hcon = NULL;
>                         sco_conn_unlock(sco_pi(sk)->conn);
> +                       local_bh_enable();
>                 } else
>                         sco_chan_del(sk, ECONNRESET);
>                 break;
> @@ -1084,21 +1098,26 @@ static void sco_conn_ready(struct sco_conn *conn)
>
>         if (sk) {
>                 sco_sock_clear_timer(sk);
> +               local_bh_disable();
>                 bh_lock_sock(sk);
>                 sk->sk_state = BT_CONNECTED;
>                 sk->sk_state_change(sk);
>                 bh_unlock_sock(sk);
> +               local_bh_enable();
>         } else {
> +               local_bh_disable();
>                 sco_conn_lock(conn);
>
>                 if (!conn->hcon) {
>                         sco_conn_unlock(conn);
> +                       local_bh_enable();
>                         return;
>                 }
>
>                 parent = sco_get_sock_listen(&conn->hcon->src);
>                 if (!parent) {
>                         sco_conn_unlock(conn);
> +                       local_bh_enable();
>                         return;
>                 }
>
> @@ -1109,6 +1128,7 @@ static void sco_conn_ready(struct sco_conn *conn)
>                 if (!sk) {
>                         bh_unlock_sock(parent);
>                         sco_conn_unlock(conn);
> +                       local_bh_enable();
>                         return;
>                 }
>
> @@ -1131,6 +1151,7 @@ static void sco_conn_ready(struct sco_conn *conn)
>                 bh_unlock_sock(parent);
>
>                 sco_conn_unlock(conn);
> +               local_bh_enable();
>         }
>  }
>
> --
> 2.25.1
>


-- 
Luiz Augusto von Dentz

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/2] Bluetooth: fix inconsistent lock state in SCO
  2021-07-27  0:30   ` [PATCH v3 1/2] Bluetooth: fix inconsistent lock state in SCO Luiz Augusto von Dentz
@ 2021-07-27  5:13     ` Desmond Cheong Zhi Xi
  0 siblings, 0 replies; 9+ messages in thread
From: Desmond Cheong Zhi Xi @ 2021-07-27  5:13 UTC (permalink / raw)
  To: Luiz Augusto von Dentz
  Cc: Marcel Holtmann, Johan Hedberg, David Miller, Jakub Kicinski,
	matthieu.baerts, stefan, linux-bluetooth,
	open list:NETWORKING [GENERAL],
	Linux Kernel Mailing List, skhan, Greg Kroah-Hartman,
	linux-kernel-mentees, syzbot+2f6d7c28bb4bf7e82060

On 27/7/21 8:30 am, Luiz Augusto von Dentz wrote:
> Hi Desmond,
> 
> On Wed, Jul 21, 2021 at 2:39 AM Desmond Cheong Zhi Xi
> <desmondcheongzx@gmail.com> wrote:
>>
>> Syzbot reported an inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} lock
>> usage in sco_conn_del and sco_sock_timeout that could lead to
>> deadlocks.
>>
>> This inconsistent lock state can also happen in sco_conn_ready,
>> rfcomm_connect_ind, and bt_accept_enqueue.
>>
>> The issue is that these functions take a spin lock on the socket with
>> interrupts enabled, but sco_sock_timeout takes the lock in an IRQ
>> context. This could lead to deadlocks:
>>
>>         CPU0
>>         ----
>>    lock(slock-AF_BLUETOOTH-BTPROTO_SCO);
>>    <Interrupt>
>>      lock(slock-AF_BLUETOOTH-BTPROTO_SCO);
> 
> Having a second look at this, it does seem this is due to use of
> sk->sk_timer which apparently run its callback on IRQ context, so I
> wonder if wouldn't be a better idea to switch to a delayed_work to
> avoid having to deal with the likes of local_bh_disable, in fact it
> seems we have been misusing it since the documentation says it is for
> sock cleanup not for handling things like SNDTIMEO, we don't really
> use it for other socket types so I wonder when we start using
> delayed_work we forgot about sco.c.
> 

Hi Luiz,

That makes sense to me. I don't think there's a need for the timeout to 
be run in an IRQ context.

I'll prepare a patch for this, thanks for the suggestion.

>>   *** DEADLOCK ***
>>
>> We fix this by ensuring that local bh is disabled before calling
>> bh_lock_sock.
>>
>> After doing this, we additionally need to protect sco_conn_lock by
>> disabling local bh.
>>
>> This is necessary because sco_conn_del makes a call to sco_chan_del
>> while holding on to the sock lock, and sco_chan_del itself makes a
>> call to sco_conn_lock. If sco_conn_lock is held elsewhere with
>> interrupts enabled, there could still be a
>> slock-AF_BLUETOOTH-BTPROTO_SCO --> &conn->lock#2 lock inversion as
>> follows:
>>
>>          CPU0                    CPU1
>>          ----                    ----
>>     lock(&conn->lock#2);
>>                                  local_irq_disable();
>>                                  lock(slock-AF_BLUETOOTH-BTPROTO_SCO);
>>                                  lock(&conn->lock#2);
>>     <Interrupt>
>>       lock(slock-AF_BLUETOOTH-BTPROTO_SCO);
>>
>>    *** DEADLOCK ***
>>
>> Although sco_conn_del disables local bh before calling sco_chan_del,
>> we can still wrap the calls to sco_conn_lock in sco_chan_del, with
>> local_bh_disable/enable as this pair of functions are reentrant.
>>
>> Reported-by: syzbot+2f6d7c28bb4bf7e82060@syzkaller.appspotmail.com
>> Tested-by: syzbot+2f6d7c28bb4bf7e82060@syzkaller.appspotmail.com
>> Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
>> ---
>>   net/bluetooth/sco.c | 21 +++++++++++++++++++++
>>   1 file changed, 21 insertions(+)
>>
>> diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c
>> index 3bd41563f118..34f3419c3330 100644
>> --- a/net/bluetooth/sco.c
>> +++ b/net/bluetooth/sco.c
>> @@ -140,10 +140,12 @@ static void sco_chan_del(struct sock *sk, int err)
>>          BT_DBG("sk %p, conn %p, err %d", sk, conn, err);
>>
>>          if (conn) {
>> +               local_bh_disable();
>>                  sco_conn_lock(conn);
>>                  conn->sk = NULL;
>>                  sco_pi(sk)->conn = NULL;
>>                  sco_conn_unlock(conn);
>> +               local_bh_enable();
>>
>>                  if (conn->hcon)
>>                          hci_conn_drop(conn->hcon);
>> @@ -167,16 +169,22 @@ static void sco_conn_del(struct hci_conn *hcon, int err)
>>          BT_DBG("hcon %p conn %p, err %d", hcon, conn, err);
>>
>>          /* Kill socket */
>> +       local_bh_disable();
>>          sco_conn_lock(conn);
>>          sk = conn->sk;
>>          sco_conn_unlock(conn);
>> +       local_bh_enable();
>>
>>          if (sk) {
>>                  sock_hold(sk);
>> +
>> +               local_bh_disable();
>>                  bh_lock_sock(sk);
>>                  sco_sock_clear_timer(sk);
>>                  sco_chan_del(sk, err);
>>                  bh_unlock_sock(sk);
>> +               local_bh_enable();
>> +
>>                  sco_sock_kill(sk);
>>                  sock_put(sk);
>>          }
>> @@ -202,6 +210,7 @@ static int sco_chan_add(struct sco_conn *conn, struct sock *sk,
>>   {
>>          int err = 0;
>>
>> +       local_bh_disable();
>>          sco_conn_lock(conn);
>>          if (conn->sk)
>>                  err = -EBUSY;
>> @@ -209,6 +218,7 @@ static int sco_chan_add(struct sco_conn *conn, struct sock *sk,
>>                  __sco_chan_add(conn, sk, parent);
>>
>>          sco_conn_unlock(conn);
>> +       local_bh_enable();
>>          return err;
>>   }
>>
>> @@ -303,9 +313,11 @@ static void sco_recv_frame(struct sco_conn *conn, struct sk_buff *skb)
>>   {
>>          struct sock *sk;
>>
>> +       local_bh_disable();
>>          sco_conn_lock(conn);
>>          sk = conn->sk;
>>          sco_conn_unlock(conn);
>> +       local_bh_enable();
>>
>>          if (!sk)
>>                  goto drop;
>> @@ -420,10 +432,12 @@ static void __sco_sock_close(struct sock *sk)
>>                  if (sco_pi(sk)->conn->hcon) {
>>                          sk->sk_state = BT_DISCONN;
>>                          sco_sock_set_timer(sk, SCO_DISCONN_TIMEOUT);
>> +                       local_bh_disable();
>>                          sco_conn_lock(sco_pi(sk)->conn);
>>                          hci_conn_drop(sco_pi(sk)->conn->hcon);
>>                          sco_pi(sk)->conn->hcon = NULL;
>>                          sco_conn_unlock(sco_pi(sk)->conn);
>> +                       local_bh_enable();
>>                  } else
>>                          sco_chan_del(sk, ECONNRESET);
>>                  break;
>> @@ -1084,21 +1098,26 @@ static void sco_conn_ready(struct sco_conn *conn)
>>
>>          if (sk) {
>>                  sco_sock_clear_timer(sk);
>> +               local_bh_disable();
>>                  bh_lock_sock(sk);
>>                  sk->sk_state = BT_CONNECTED;
>>                  sk->sk_state_change(sk);
>>                  bh_unlock_sock(sk);
>> +               local_bh_enable();
>>          } else {
>> +               local_bh_disable();
>>                  sco_conn_lock(conn);
>>
>>                  if (!conn->hcon) {
>>                          sco_conn_unlock(conn);
>> +                       local_bh_enable();
>>                          return;
>>                  }
>>
>>                  parent = sco_get_sock_listen(&conn->hcon->src);
>>                  if (!parent) {
>>                          sco_conn_unlock(conn);
>> +                       local_bh_enable();
>>                          return;
>>                  }
>>
>> @@ -1109,6 +1128,7 @@ static void sco_conn_ready(struct sco_conn *conn)
>>                  if (!sk) {
>>                          bh_unlock_sock(parent);
>>                          sco_conn_unlock(conn);
>> +                       local_bh_enable();
>>                          return;
>>                  }
>>
>> @@ -1131,6 +1151,7 @@ static void sco_conn_ready(struct sco_conn *conn)
>>                  bh_unlock_sock(parent);
>>
>>                  sco_conn_unlock(conn);
>> +               local_bh_enable();
>>          }
>>   }
>>
>> --
>> 2.25.1
>>
> 
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 2/2] Bluetooth: fix inconsistent lock state in rfcomm_connect_ind
  2021-07-21  9:38 ` [PATCH v3 2/2] Bluetooth: fix inconsistent lock state in rfcomm_connect_ind Desmond Cheong Zhi Xi
@ 2021-07-29 19:53   ` Marcel Holtmann
  2021-07-30  9:06     ` Desmond Cheong Zhi Xi
  0 siblings, 1 reply; 9+ messages in thread
From: Marcel Holtmann @ 2021-07-29 19:53 UTC (permalink / raw)
  To: Desmond Cheong Zhi Xi
  Cc: Johan Hedberg, Luiz Augusto von Dentz, David S. Miller,
	Jakub Kicinski, Matthieu Baerts, Stefan Schmidt, linux-bluetooth,
	open list:NETWORKING [GENERAL],
	open list, skhan, Greg Kroah-Hartman, linux-kernel-mentees

Hi Desmond,

> Commit fad003b6c8e3d ("Bluetooth: Fix inconsistent lock state with
> RFCOMM") fixed a lockdep warning due to sk->sk_lock.slock being
> acquired without disabling softirq while the lock is also used in
> softirq context. This was done by disabling interrupts before calling
> bh_lock_sock in rfcomm_sk_state_change.
> 
> Later, this was changed in commit e6da0edc24ee ("Bluetooth: Acquire
> sk_lock.slock without disabling interrupts") to disable softirqs
> only.
> 
> However, there is another instance of sk->sk_lock.slock being acquired
> without disabling softirq in rfcomm_connect_ind. This patch fixes this
> by disabling local bh before the call to bh_lock_sock.

back in the days, the packet processing was done in a tasklet, but these days it is done in a workqueue. So shouldn’t this be just converted into a lock_sock(). Am I missing something?

Regards

Marcel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 2/2] Bluetooth: fix inconsistent lock state in rfcomm_connect_ind
  2021-07-29 19:53   ` Marcel Holtmann
@ 2021-07-30  9:06     ` Desmond Cheong Zhi Xi
  2021-07-30 13:40       ` Marcel Holtmann
  0 siblings, 1 reply; 9+ messages in thread
From: Desmond Cheong Zhi Xi @ 2021-07-30  9:06 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Johan Hedberg, Luiz Augusto von Dentz, David S. Miller,
	Jakub Kicinski, Matthieu Baerts, Stefan Schmidt, linux-bluetooth,
	open list:NETWORKING [GENERAL],
	open list, skhan, Greg Kroah-Hartman, linux-kernel-mentees

Hi Marcel,

On 30/7/21 3:53 am, Marcel Holtmann wrote:
> Hi Desmond,
> 
>> Commit fad003b6c8e3d ("Bluetooth: Fix inconsistent lock state with
>> RFCOMM") fixed a lockdep warning due to sk->sk_lock.slock being
>> acquired without disabling softirq while the lock is also used in
>> softirq context. This was done by disabling interrupts before calling
>> bh_lock_sock in rfcomm_sk_state_change.
>>
>> Later, this was changed in commit e6da0edc24ee ("Bluetooth: Acquire
>> sk_lock.slock without disabling interrupts") to disable softirqs
>> only.
>>
>> However, there is another instance of sk->sk_lock.slock being acquired
>> without disabling softirq in rfcomm_connect_ind. This patch fixes this
>> by disabling local bh before the call to bh_lock_sock.
> 
> back in the days, the packet processing was done in a tasklet, but these days it is done in a workqueue. So shouldn’t this be just converted into a lock_sock(). Am I missing something?
> 

Thanks for the info. I think you're right, I just didn't understand very 
much when I wrote this patch.

If I'm understanding correctly, it seems that both the bh_lock_sock in 
rfcomm_connect_ind, and spin_lock_bh in rfcomm_sk_state_change need to 
be changed to lock_sock, otherwise they don't provide any 
synchronization with other functions in RFCOMM that use lock_sock.

If that sounds correct I can prepare the patch for that.

Best wishes,
Desmond

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 2/2] Bluetooth: fix inconsistent lock state in rfcomm_connect_ind
  2021-07-30  9:06     ` Desmond Cheong Zhi Xi
@ 2021-07-30 13:40       ` Marcel Holtmann
  0 siblings, 0 replies; 9+ messages in thread
From: Marcel Holtmann @ 2021-07-30 13:40 UTC (permalink / raw)
  To: Desmond Cheong Zhi Xi
  Cc: Johan Hedberg, Luiz Augusto von Dentz, David S. Miller,
	Jakub Kicinski, Matthieu Baerts, Stefan Schmidt, linux-bluetooth,
	open list:NETWORKING [GENERAL],
	open list, skhan, Greg Kroah-Hartman, linux-kernel-mentees

Hi Desmond,

>>> Commit fad003b6c8e3d ("Bluetooth: Fix inconsistent lock state with
>>> RFCOMM") fixed a lockdep warning due to sk->sk_lock.slock being
>>> acquired without disabling softirq while the lock is also used in
>>> softirq context. This was done by disabling interrupts before calling
>>> bh_lock_sock in rfcomm_sk_state_change.
>>> 
>>> Later, this was changed in commit e6da0edc24ee ("Bluetooth: Acquire
>>> sk_lock.slock without disabling interrupts") to disable softirqs
>>> only.
>>> 
>>> However, there is another instance of sk->sk_lock.slock being acquired
>>> without disabling softirq in rfcomm_connect_ind. This patch fixes this
>>> by disabling local bh before the call to bh_lock_sock.
>> back in the days, the packet processing was done in a tasklet, but these days it is done in a workqueue. So shouldn’t this be just converted into a lock_sock(). Am I missing something?
> 
> Thanks for the info. I think you're right, I just didn't understand very much when I wrote this patch.
> 
> If I'm understanding correctly, it seems that both the bh_lock_sock in rfcomm_connect_ind, and spin_lock_bh in rfcomm_sk_state_change need to be changed to lock_sock, otherwise they don't provide any synchronization with other functions in RFCOMM that use lock_sock.
> 
> If that sounds correct I can prepare the patch for that.

please do so and re-run the tests. Thanks.

Regards

Marcel


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-07-30 13:40 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-21  9:38 [PATCH v3 0/2] Bluetooth: fix inconsistent lock states Desmond Cheong Zhi Xi
2021-07-21  9:38 ` [PATCH v3 1/2] Bluetooth: fix inconsistent lock state in SCO Desmond Cheong Zhi Xi
2021-07-21 11:09   ` Bluetooth: fix inconsistent lock states bluez.test.bot
2021-07-27  0:30   ` [PATCH v3 1/2] Bluetooth: fix inconsistent lock state in SCO Luiz Augusto von Dentz
2021-07-27  5:13     ` Desmond Cheong Zhi Xi
2021-07-21  9:38 ` [PATCH v3 2/2] Bluetooth: fix inconsistent lock state in rfcomm_connect_ind Desmond Cheong Zhi Xi
2021-07-29 19:53   ` Marcel Holtmann
2021-07-30  9:06     ` Desmond Cheong Zhi Xi
2021-07-30 13:40       ` Marcel Holtmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).