From: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com> To: Luiz Augusto von Dentz <luiz.dentz@gmail.com> Cc: Eric Dumazet <eric.dumazet@gmail.com>, Marcel Holtmann <marcel@holtmann.org>, Johan Hedberg <johan.hedberg@gmail.com>, David Miller <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>, sudipm.mukherjee@gmail.com, "linux-bluetooth@vger.kernel.org" <linux-bluetooth@vger.kernel.org>, "open list:NETWORKING [GENERAL]" <netdev@vger.kernel.org>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, skhan@linuxfoundation.org, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, linux-kernel-mentees@lists.linuxfoundation.org, syzbot+2f6d7c28bb4bf7e82060@syzkaller.appspotmail.com Subject: Re: [PATCH v6 1/6] Bluetooth: schedule SCO timeouts with delayed_work Date: Thu, 2 Sep 2021 23:17:33 -0400 [thread overview] Message-ID: <4aaa52c1-59dc-59ad-60c6-0fac9ecd5189@gmail.com> (raw) In-Reply-To: <CABBYNZJaPFzU-oXcYkuob0zw16tNcVgoVx8N-_GvL8=nT0Kn3Q@mail.gmail.com> Hi Luiz, On 2/9/21 7:42 pm, Luiz Augusto von Dentz wrote: > Hi Desmond, > > On Thu, Sep 2, 2021 at 4:05 PM Desmond Cheong Zhi Xi > <desmondcheongzx@gmail.com> wrote: >> >> On 2/9/21 6:53 pm, Desmond Cheong Zhi Xi wrote: >>> On 2/9/21 5:41 pm, Eric Dumazet wrote: >>>> >>>> >>>> On 9/2/21 12:32 PM, Desmond Cheong Zhi Xi wrote: >>>>> >>>>> Hi Eric, >>>>> >>>>> This actually seems to be a pre-existing error in sco_sock_connect >>>>> that we now hit in sco_sock_timeout. >>>>> >>>>> Any thoughts on the following patch to address the problem? >>>>> >>>>> Link: >>>>> https://lore.kernel.org/lkml/20210831065601.101185-1-desmondcheongzx@gmail.com/ >>>>> >>>> >>>> >>>> syzbot is still working on finding a repro, this is obviously not >>>> trivial, >>>> because this is a race window. >>>> >>>> I think this can happen even with a single SCO connection. >>>> >>>> This might be triggered more easily forcing a delay in sco_sock_timeout() >>>> >>>> diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c >>>> index >>>> 98a88158651281c9f75c4e0371044251e976e7ef..71ebe0243fab106c676c308724fe3a3f92a62cbd >>>> 100644 >>>> --- a/net/bluetooth/sco.c >>>> +++ b/net/bluetooth/sco.c >>>> @@ -84,8 +84,14 @@ static void sco_sock_timeout(struct work_struct *work) >>>> sco_conn_lock(conn); >>>> sk = conn->sk; >>>> - if (sk) >>>> + if (sk) { >>>> + // lets pretend cpu has been busy (in interrupts) for >>>> 100ms >>>> + int i; >>>> + for (i=0;i<100000;i++) >>>> + udelay(1); >>>> + >>>> sock_hold(sk); >>>> + }> sco_conn_unlock(conn); >>>> if (!sk) >>>> >>>> >>>> Stack trace tells us that sco_sock_timeout() is running after last >>>> reference >>>> on socket has been released. >>>> >>>> __refcount_add include/linux/refcount.h:199 [inline] >>>> __refcount_inc include/linux/refcount.h:250 [inline] >>>> refcount_inc include/linux/refcount.h:267 [inline] >>>> sock_hold include/net/sock.h:702 [inline] >>>> sco_sock_timeout+0x216/0x290 net/bluetooth/sco.c:88 >>>> process_one_work+0x98d/0x1630 kernel/workqueue.c:2276 >>>> worker_thread+0x658/0x11f0 kernel/workqueue.c:2422 >>>> kthread+0x3e5/0x4d0 kernel/kthread.c:319 >>>> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 >>>> >>>> This is why I suggested to delay sock_put() to make sure this can not >>>> happen. >>>> >>>> diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c >>>> index >>>> 98a88158651281c9f75c4e0371044251e976e7ef..bd0222e3f05a6bcb40cffe8405c9dfff98d7afde >>>> 100644 >>>> --- a/net/bluetooth/sco.c >>>> +++ b/net/bluetooth/sco.c >>>> @@ -195,10 +195,11 @@ static void sco_conn_del(struct hci_conn *hcon, >>>> int err) >>>> sco_sock_clear_timer(sk); >>>> sco_chan_del(sk, err); >>>> release_sock(sk); >>>> - sock_put(sk); >>>> /* Ensure no more work items will run before freeing >>>> conn. */ >>>> cancel_delayed_work_sync(&conn->timeout_work); >>>> + >>>> + sock_put(sk); >>>> } >>>> hcon->sco_data = NULL; >>>> >>> >>> I see where you're going with this, but once sco_chan_del returns, any >>> instance of sco_sock_timeout that hasn't yet called sock_hold will >>> simply return, because conn->sk is NULL. Adding a delay to the >>> sco_conn_lock critical section in sco_sock_timeout would not affect this >>> because sco_chan_del clears conn->sk while holding onto the lock. >>> >>> The main reason that cancel_delayed_work_sync is run there is to make >>> sure that we don't have a UAF on the SCO connection itself after we free >>> conn. >>> >> >> Now that I think about this, the init and cleanup isn't quite right >> either. The delayed work should be initialized when the connection is >> allocated, and we should always cancel all work before freeing: >> >> diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c >> index ea18e5b56343..bba5cdb4cb4a 100644 >> --- a/net/bluetooth/sco.c >> +++ b/net/bluetooth/sco.c >> @@ -133,6 +133,7 @@ static struct sco_conn *sco_conn_add(struct hci_conn *hcon) >> return NULL; >> >> spin_lock_init(&conn->lock); >> + INIT_DELAYED_WORK(&conn->timeout_work, sco_sock_timeout); >> >> hcon->sco_data = conn; >> conn->hcon = hcon; >> @@ -197,11 +198,11 @@ static void sco_conn_del(struct hci_conn *hcon, int err) >> sco_chan_del(sk, err); >> release_sock(sk); >> sock_put(sk); >> - >> - /* Ensure no more work items will run before freeing conn. */ >> - cancel_delayed_work_sync(&conn->timeout_work); >> } >> >> + /* Ensure no more work items will run before freeing conn. */ >> + cancel_delayed_work_sync(&conn->timeout_work); >> + >> hcon->sco_data = NULL; >> kfree(conn); >> } >> @@ -214,8 +215,6 @@ static void __sco_chan_add(struct sco_conn *conn, struct sock *sk, >> sco_pi(sk)->conn = conn; >> conn->sk = sk; >> >> - INIT_DELAYED_WORK(&conn->timeout_work, sco_sock_timeout); >> - >> if (parent) >> bt_accept_enqueue(parent, sk, true); >> } > > I have come to something similar, do you care to send a proper patch > so we can get this merged. > Sounds good. Just finished running some tests locally, I'll send out the patches now. >>> For a single SCO connection with well-formed channel, I think there >>> can't be a race. Here's the reasoning: >>> >>> - For the timeout to be scheduled, a socket must have a channel with a >>> connection. >>> >>> - When a channel between a socket and connection is established, the >>> socket transitions from BT_OPEN to BT_CONNECTED, BT_CONNECT, or >>> BT_CONNECT2. >>> >>> - For a socket to be released, it has to be zapped. For sockets that >>> have a state of BT_CONNECTED, BT_CONNECT, or BT_CONNECT2, they are >>> zapped only when the channel is deleted. >>> >>> - If the channel is deleted (which is protected by sco_conn_lock), then >>> conn->sk is NULL, and sco_sock_timeout simply exits. If we had entered >>> the critical section in sco_sock_timeout before the channel was deleted, >>> then we increased the reference count on the socket, so it won't be >>> freed until sco_sock_timeout is done. >>> >>> Hence, sco_sock_timeout doesn't race with the release of a socket that >>> has a well-formed channel with a connection. >>> >>> But if multiple connections are allocated and overwritten in >>> sco_sock_connect, then none of the above assumptions hold because the >>> SCO connection can't be cleaned up (i.e. conn->sk cannot be set to NULL) >>> when the associated socket is released. This scenario happens in the >>> syzbot reproducer for the crash here: >>> https://syzkaller.appspot.com/bug?id=bcc246d137428d00ed14b476c2068579515fe2bc >>> >>> >>> That aside, upon taking a closer look, I think there is indeed a race >>> lurking in sco_conn_del, but it's not the one that syzbot is hitting. >>> Our sock_hold simply comes too late, and by the time it's called we >>> might have already have freed the socket. >>> >>> So probably something like this needs to happen: >>> >>> diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c >>> index fa25b07120c9..ea18e5b56343 100644 >>> --- a/net/bluetooth/sco.c >>> +++ b/net/bluetooth/sco.c >>> @@ -187,10 +187,11 @@ static void sco_conn_del(struct hci_conn *hcon, >>> int err) >>> /* Kill socket */ >>> sco_conn_lock(conn); >>> sk = conn->sk; >>> + if (sk) >>> + sock_hold(sk); >>> sco_conn_unlock(conn); >>> >>> if (sk) { >>> - sock_hold(sk); >>> lock_sock(sk); >>> sco_sock_clear_timer(sk); >>> sco_chan_del(sk, err); >> > >
WARNING: multiple messages have this Message-ID (diff)
From: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com> To: Luiz Augusto von Dentz <luiz.dentz@gmail.com> Cc: Johan Hedberg <johan.hedberg@gmail.com>, Eric Dumazet <eric.dumazet@gmail.com>, "open list:NETWORKING \[GENERAL\]" <netdev@vger.kernel.org>, Marcel Holtmann <marcel@holtmann.org>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, David Miller <davem@davemloft.net>, "linux-bluetooth@vger.kernel.org" <linux-bluetooth@vger.kernel.org>, syzbot+2f6d7c28bb4bf7e82060@syzkaller.appspotmail.com, Jakub Kicinski <kuba@kernel.org>, linux-kernel-mentees@lists.linuxfoundation.org, sudipm.mukherjee@gmail.com Subject: Re: [PATCH v6 1/6] Bluetooth: schedule SCO timeouts with delayed_work Date: Thu, 2 Sep 2021 23:17:33 -0400 [thread overview] Message-ID: <4aaa52c1-59dc-59ad-60c6-0fac9ecd5189@gmail.com> (raw) In-Reply-To: <CABBYNZJaPFzU-oXcYkuob0zw16tNcVgoVx8N-_GvL8=nT0Kn3Q@mail.gmail.com> Hi Luiz, On 2/9/21 7:42 pm, Luiz Augusto von Dentz wrote: > Hi Desmond, > > On Thu, Sep 2, 2021 at 4:05 PM Desmond Cheong Zhi Xi > <desmondcheongzx@gmail.com> wrote: >> >> On 2/9/21 6:53 pm, Desmond Cheong Zhi Xi wrote: >>> On 2/9/21 5:41 pm, Eric Dumazet wrote: >>>> >>>> >>>> On 9/2/21 12:32 PM, Desmond Cheong Zhi Xi wrote: >>>>> >>>>> Hi Eric, >>>>> >>>>> This actually seems to be a pre-existing error in sco_sock_connect >>>>> that we now hit in sco_sock_timeout. >>>>> >>>>> Any thoughts on the following patch to address the problem? >>>>> >>>>> Link: >>>>> https://lore.kernel.org/lkml/20210831065601.101185-1-desmondcheongzx@gmail.com/ >>>>> >>>> >>>> >>>> syzbot is still working on finding a repro, this is obviously not >>>> trivial, >>>> because this is a race window. >>>> >>>> I think this can happen even with a single SCO connection. >>>> >>>> This might be triggered more easily forcing a delay in sco_sock_timeout() >>>> >>>> diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c >>>> index >>>> 98a88158651281c9f75c4e0371044251e976e7ef..71ebe0243fab106c676c308724fe3a3f92a62cbd >>>> 100644 >>>> --- a/net/bluetooth/sco.c >>>> +++ b/net/bluetooth/sco.c >>>> @@ -84,8 +84,14 @@ static void sco_sock_timeout(struct work_struct *work) >>>> sco_conn_lock(conn); >>>> sk = conn->sk; >>>> - if (sk) >>>> + if (sk) { >>>> + // lets pretend cpu has been busy (in interrupts) for >>>> 100ms >>>> + int i; >>>> + for (i=0;i<100000;i++) >>>> + udelay(1); >>>> + >>>> sock_hold(sk); >>>> + }> sco_conn_unlock(conn); >>>> if (!sk) >>>> >>>> >>>> Stack trace tells us that sco_sock_timeout() is running after last >>>> reference >>>> on socket has been released. >>>> >>>> __refcount_add include/linux/refcount.h:199 [inline] >>>> __refcount_inc include/linux/refcount.h:250 [inline] >>>> refcount_inc include/linux/refcount.h:267 [inline] >>>> sock_hold include/net/sock.h:702 [inline] >>>> sco_sock_timeout+0x216/0x290 net/bluetooth/sco.c:88 >>>> process_one_work+0x98d/0x1630 kernel/workqueue.c:2276 >>>> worker_thread+0x658/0x11f0 kernel/workqueue.c:2422 >>>> kthread+0x3e5/0x4d0 kernel/kthread.c:319 >>>> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 >>>> >>>> This is why I suggested to delay sock_put() to make sure this can not >>>> happen. >>>> >>>> diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c >>>> index >>>> 98a88158651281c9f75c4e0371044251e976e7ef..bd0222e3f05a6bcb40cffe8405c9dfff98d7afde >>>> 100644 >>>> --- a/net/bluetooth/sco.c >>>> +++ b/net/bluetooth/sco.c >>>> @@ -195,10 +195,11 @@ static void sco_conn_del(struct hci_conn *hcon, >>>> int err) >>>> sco_sock_clear_timer(sk); >>>> sco_chan_del(sk, err); >>>> release_sock(sk); >>>> - sock_put(sk); >>>> /* Ensure no more work items will run before freeing >>>> conn. */ >>>> cancel_delayed_work_sync(&conn->timeout_work); >>>> + >>>> + sock_put(sk); >>>> } >>>> hcon->sco_data = NULL; >>>> >>> >>> I see where you're going with this, but once sco_chan_del returns, any >>> instance of sco_sock_timeout that hasn't yet called sock_hold will >>> simply return, because conn->sk is NULL. Adding a delay to the >>> sco_conn_lock critical section in sco_sock_timeout would not affect this >>> because sco_chan_del clears conn->sk while holding onto the lock. >>> >>> The main reason that cancel_delayed_work_sync is run there is to make >>> sure that we don't have a UAF on the SCO connection itself after we free >>> conn. >>> >> >> Now that I think about this, the init and cleanup isn't quite right >> either. The delayed work should be initialized when the connection is >> allocated, and we should always cancel all work before freeing: >> >> diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c >> index ea18e5b56343..bba5cdb4cb4a 100644 >> --- a/net/bluetooth/sco.c >> +++ b/net/bluetooth/sco.c >> @@ -133,6 +133,7 @@ static struct sco_conn *sco_conn_add(struct hci_conn *hcon) >> return NULL; >> >> spin_lock_init(&conn->lock); >> + INIT_DELAYED_WORK(&conn->timeout_work, sco_sock_timeout); >> >> hcon->sco_data = conn; >> conn->hcon = hcon; >> @@ -197,11 +198,11 @@ static void sco_conn_del(struct hci_conn *hcon, int err) >> sco_chan_del(sk, err); >> release_sock(sk); >> sock_put(sk); >> - >> - /* Ensure no more work items will run before freeing conn. */ >> - cancel_delayed_work_sync(&conn->timeout_work); >> } >> >> + /* Ensure no more work items will run before freeing conn. */ >> + cancel_delayed_work_sync(&conn->timeout_work); >> + >> hcon->sco_data = NULL; >> kfree(conn); >> } >> @@ -214,8 +215,6 @@ static void __sco_chan_add(struct sco_conn *conn, struct sock *sk, >> sco_pi(sk)->conn = conn; >> conn->sk = sk; >> >> - INIT_DELAYED_WORK(&conn->timeout_work, sco_sock_timeout); >> - >> if (parent) >> bt_accept_enqueue(parent, sk, true); >> } > > I have come to something similar, do you care to send a proper patch > so we can get this merged. > Sounds good. Just finished running some tests locally, I'll send out the patches now. >>> For a single SCO connection with well-formed channel, I think there >>> can't be a race. Here's the reasoning: >>> >>> - For the timeout to be scheduled, a socket must have a channel with a >>> connection. >>> >>> - When a channel between a socket and connection is established, the >>> socket transitions from BT_OPEN to BT_CONNECTED, BT_CONNECT, or >>> BT_CONNECT2. >>> >>> - For a socket to be released, it has to be zapped. For sockets that >>> have a state of BT_CONNECTED, BT_CONNECT, or BT_CONNECT2, they are >>> zapped only when the channel is deleted. >>> >>> - If the channel is deleted (which is protected by sco_conn_lock), then >>> conn->sk is NULL, and sco_sock_timeout simply exits. If we had entered >>> the critical section in sco_sock_timeout before the channel was deleted, >>> then we increased the reference count on the socket, so it won't be >>> freed until sco_sock_timeout is done. >>> >>> Hence, sco_sock_timeout doesn't race with the release of a socket that >>> has a well-formed channel with a connection. >>> >>> But if multiple connections are allocated and overwritten in >>> sco_sock_connect, then none of the above assumptions hold because the >>> SCO connection can't be cleaned up (i.e. conn->sk cannot be set to NULL) >>> when the associated socket is released. This scenario happens in the >>> syzbot reproducer for the crash here: >>> https://syzkaller.appspot.com/bug?id=bcc246d137428d00ed14b476c2068579515fe2bc >>> >>> >>> That aside, upon taking a closer look, I think there is indeed a race >>> lurking in sco_conn_del, but it's not the one that syzbot is hitting. >>> Our sock_hold simply comes too late, and by the time it's called we >>> might have already have freed the socket. >>> >>> So probably something like this needs to happen: >>> >>> diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c >>> index fa25b07120c9..ea18e5b56343 100644 >>> --- a/net/bluetooth/sco.c >>> +++ b/net/bluetooth/sco.c >>> @@ -187,10 +187,11 @@ static void sco_conn_del(struct hci_conn *hcon, >>> int err) >>> /* Kill socket */ >>> sco_conn_lock(conn); >>> sk = conn->sk; >>> + if (sk) >>> + sock_hold(sk); >>> sco_conn_unlock(conn); >>> >>> if (sk) { >>> - sock_hold(sk); >>> lock_sock(sk); >>> sco_sock_clear_timer(sk); >>> sco_chan_del(sk, err); >> > > _______________________________________________ Linux-kernel-mentees mailing list Linux-kernel-mentees@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees
next prev parent reply other threads:[~2021-09-03 3:17 UTC|newest] Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-08-10 4:14 [PATCH v6 0/6] Bluetooth: fix locking and socket killing in SCO and RFCOMM Desmond Cheong Zhi Xi 2021-08-10 4:14 ` Desmond Cheong Zhi Xi 2021-08-10 4:14 ` [PATCH v6 1/6] Bluetooth: schedule SCO timeouts with delayed_work Desmond Cheong Zhi Xi 2021-08-10 4:14 ` Desmond Cheong Zhi Xi 2021-08-10 5:14 ` Bluetooth: fix locking and socket killing in SCO and RFCOMM bluez.test.bot 2021-08-10 17:51 ` Luiz Augusto von Dentz 2021-09-02 19:17 ` [PATCH v6 1/6] Bluetooth: schedule SCO timeouts with delayed_work Eric Dumazet 2021-09-02 19:17 ` Eric Dumazet 2021-09-02 19:32 ` Desmond Cheong Zhi Xi 2021-09-02 19:32 ` Desmond Cheong Zhi Xi 2021-09-02 21:41 ` Eric Dumazet 2021-09-02 21:41 ` Eric Dumazet 2021-09-02 22:53 ` Desmond Cheong Zhi Xi 2021-09-02 22:53 ` Desmond Cheong Zhi Xi 2021-09-02 23:05 ` Desmond Cheong Zhi Xi 2021-09-02 23:05 ` Desmond Cheong Zhi Xi 2021-09-02 23:42 ` Luiz Augusto von Dentz 2021-09-02 23:42 ` Luiz Augusto von Dentz 2021-09-03 3:17 ` Desmond Cheong Zhi Xi [this message] 2021-09-03 3:17 ` Desmond Cheong Zhi Xi 2021-08-10 4:14 ` [PATCH v6 2/6] Bluetooth: avoid circular locks in sco_sock_connect Desmond Cheong Zhi Xi 2021-08-10 4:14 ` Desmond Cheong Zhi Xi 2021-08-10 4:14 ` [PATCH v6 3/6] Bluetooth: switch to lock_sock in SCO Desmond Cheong Zhi Xi 2021-08-10 4:14 ` Desmond Cheong Zhi Xi 2021-08-10 4:14 ` [PATCH v6 4/6] Bluetooth: serialize calls to sco_sock_{set,clear}_timer Desmond Cheong Zhi Xi 2021-08-10 4:14 ` [PATCH v6 4/6] Bluetooth: serialize calls to sco_sock_{set, clear}_timer Desmond Cheong Zhi Xi 2021-08-10 4:14 ` [PATCH v6 5/6] Bluetooth: switch to lock_sock in RFCOMM Desmond Cheong Zhi Xi 2021-08-10 4:14 ` Desmond Cheong Zhi Xi 2021-08-10 4:14 ` [PATCH v6 6/6] Bluetooth: fix repeated calls to sco_sock_kill Desmond Cheong Zhi Xi 2021-08-10 4:14 ` Desmond Cheong Zhi Xi
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=4aaa52c1-59dc-59ad-60c6-0fac9ecd5189@gmail.com \ --to=desmondcheongzx@gmail.com \ --cc=davem@davemloft.net \ --cc=eric.dumazet@gmail.com \ --cc=gregkh@linuxfoundation.org \ --cc=johan.hedberg@gmail.com \ --cc=kuba@kernel.org \ --cc=linux-bluetooth@vger.kernel.org \ --cc=linux-kernel-mentees@lists.linuxfoundation.org \ --cc=linux-kernel@vger.kernel.org \ --cc=luiz.dentz@gmail.com \ --cc=marcel@holtmann.org \ --cc=netdev@vger.kernel.org \ --cc=skhan@linuxfoundation.org \ --cc=sudipm.mukherjee@gmail.com \ --cc=syzbot+2f6d7c28bb4bf7e82060@syzkaller.appspotmail.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.