Re: Re: [PATCH v3] Bluetooth: call lock_sock() outside of spinlock section

From: LinMa <linma@zju.edu.cn>
To: "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>
Cc: "Luiz Augusto von Dentz" <luiz.dentz@gmail.com>,
	"Johan Hedberg" <johan.hedberg@gmail.com>,
	"Marcel Holtmann" <marcel@holtmann.org>,
	"linux-bluetooth@vger.kernel.org"
	<linux-bluetooth@vger.kernel.org>,
	"open list:NETWORKING [GENERAL]" <netdev@vger.kernel.org>
Subject: Re: Re: [PATCH v3] Bluetooth: call lock_sock() outside of spinlock section
Date: Thu, 22 Jul 2021 12:47:20 +0800 (GMT+08:00)	[thread overview]
Message-ID: <7e4fafe2.58d70.17acc8a1a0b.Coremail.linma@zju.edu.cn> (raw)
In-Reply-To: <e07c5bbf-115c-6ffa-8492-7b749b9d286b@i-love.sakura.ne.jp>

Hi Tetsuo,

Just find out another interesting function: sock_owned_by_user(). (I am just a noob of kernel locks)

Hence I think the following patch has the same 'effect' as the old patch e305509e678b3 ("Bluetooth: use correct lock to prevent UAF of hdev object")

diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
index b04a5a02ecf3..0cc4b88daa96 100644
--- a/net/bluetooth/hci_sock.c
+++ b/net/bluetooth/hci_sock.c
@@ -762,7 +762,11 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
                /* Detach sockets from device */
                read_lock(&hci_sk_list.lock);
                sk_for_each(sk, &hci_sk_list.head) {
-                       lock_sock(sk);
+                       bh_lock_sock_nested(sk);
+busywait:
+                       if (sock_owned_by_user(sk))
+                               goto busywait;
+
                        if (hci_pi(sk)->hdev == hdev) {
                                hci_pi(sk)->hdev = NULL;
                                sk->sk_err = EPIPE;
@@ -771,7 +775,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)

                                hci_dev_put(hdev);
                        }
-                       release_sock(sk);
+                       bh_unlock_sock(sk);
                }
                read_unlock(&hci_sk_list.lock);
        }

The sad thing is that it seems will cost CPU resource to do meaningless wait...

What do you think? Can this sock_owned_by_user() function do any help?

Thanks
Lin Ma


> From: "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>
> Sent Time: 2021-07-16 22:48:13 (Friday)
> To: "Desmond Cheong Zhi Xi" <desmondcheongzx@gmail.com>, LinMa <linma@zju.edu.cn>, "Luiz Augusto von Dentz" <luiz.dentz@gmail.com>, "Johan Hedberg" <johan.hedberg@gmail.com>, "Marcel Holtmann" <marcel@holtmann.org>
> Cc: "linux-bluetooth@vger.kernel.org" <linux-bluetooth@vger.kernel.org>, "David S. Miller" <davem@davemloft.net>, "Jakub Kicinski" <kuba@kernel.org>, "open list:NETWORKING [GENERAL]" <netdev@vger.kernel.org>
> Subject: Re: [PATCH v3] Bluetooth: call lock_sock() outside of spinlock section
> 
> On 2021/07/16 13:11, Desmond Cheong Zhi Xi wrote:
> > On 16/7/21 11:47 am, Desmond Cheong Zhi Xi wrote:
> >> Saw this and thought I'd offer my two cents.
> >> BUG: sleeping function called from invalid context
> >> This is the original problem that Tetsuo's patch was trying to fix.
> 
> Yes.
> 
> >> Under the hood of lock_sock, we call lock_sock_nested which might sleep
> >> because of the mutex_acquire.
> 
> Both lock_sock() and lock_sock_nested() might sleep.
> 
> >> But we shouldn't sleep while holding the rw spinlock.
> 
> Right. In atomic context (e.g. inside interrupt handler, schedulable context
> with interrupts or preemption disabled, schedulable context inside RCU read
> critical section, schedulable context inside a spinlock section), we must not
> call functions (e.g. waiting for a mutex, waiting for a semaphore, waiting for
> a page fault) which are not atomic.
> 
> >> So we either have to acquire a spinlock instead of a mutex as was done before,
> 
> Regarding hci_sock_dev_event(HCI_DEV_UNREG) case, we can't use a spinlock.
> 
> Like LinMa explained, lock_sock() has to be used in order to serialize functions
> (e.g. hci_sock_sendmsg()) which access hci_pi(sk)->hdev between lock_sock(sk) and
> release_sock(sk). And like I explained, we can't defer resetting hci_pi(sk)->hdev
> to NULL, for hci_sock_dev_event(HCI_DEV_UNREG) is responsible for resetting
> hci_pi(sk)->hdev to NULL because the caller of hci_sock_dev_event(HCI_DEV_UNREG)
> immediately destroys resources associated with this hdev.
> 
> >> or we need to move lock_sock out of the rw spinlock critical section as Tetsuo proposes.
> 
> Exactly. Since this is a regression introduced when fixing CVE-2021-3573, Linux
> distributors are waiting for this patch so that they can apply the fix for CVE-2021-3573.
> This patch should be sent to linux.git and stables as soon as possible. But due to little
> attention on this patch, I'm already testing this patch in linux-next.git via my tree.
> I'll drop when Bluetooth maintainers pick this patch up for linux-5.14-rcX. (Or should I
> directly send to Linus?)
> 
> >>
> > 
> > My bad, was thinking more about the problem and noticed your poc was for hci_sock_sendmsg,
> > not hci_sock_dev_event.
> 
> I didn't catch this part. Are you talking about a different poc?
> As far as I'm aware, exp.c in POC.zip was for hci_sock_bound_ioctl(HCIUNBLOCKADDR).
> 
> hci_sock_bound_ioctl(HCIUNBLOCKADDR) (which is called between lock_sock() and release_sock())
> calls copy_from_user() which might cause page fault, and userfaultfd mechanism allows an attacker
> to slowdown page fault handling enough to hci_sock_dev_event(HCI_DEV_UNREG) to return without
> waiting for hci_sock_bound_ioctl(HCIUNBLOCKADDR) to call release_sock(). This race window
> results in UAF (doesn't it, LinMa?).
> 
> > In this case, it's not clear to me why the atomic context is being violated.
> 
> In atomic context (in hci_sock_dev_event(HCI_DEV_UNREG) case, between
> read_lock(&hci_sk_list.lock) and read_unlock(&hci_sk_list.lock)), we must not call
> lock_sock(sk) which might wait for hci_sock_bound_ioctl(HCIUNBLOCKADDR) to call release_sock().
> 
> > 
> > Sorry for the noise.
> > 
> >>>
> >>> The patch provided by Desmond adds the local_bh_disable() before the bh_lock_sock() so I also try that in
> >>>
> >>> --- a/net/bluetooth/hci_sock.c
> >>> +++ b/net/bluetooth/hci_sock.c
> >>> @@ -762,6 +762,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
> >>>                  /* Detach sockets from device */
> >>>                  read_lock(&hci_sk_list.lock);
> >>>                  sk_for_each(sk, &hci_sk_list.head) {
> >>> +                       local_bh_disable();
> >>>                          bh_lock_sock_nested(sk);
> >>>                          if (hci_pi(sk)->hdev == hdev) {
> >>>                                  hci_pi(sk)->hdev = NULL;
> >>> @@ -772,6 +773,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
> >>>                                  hci_dev_put(hdev);
> >>>                          }
> >>>                          bh_unlock_sock(sk);
> >>> +                       local_bh_enable();
> >>>                  }
> >>>                  read_unlock(&hci_sk_list.lock);
> >>>          }
> >>>
> >>> But this is not useful, the UAF still occurs
> >>>
> >>
> >> I might be very mistaken on this, but I believe the UAF still happens because
> >> you can't really mix bh_lock_sock* and lock_sock* to protect the same things.
> 
> Right. https://www.kernel.org/doc/html/v5.13/kernel-hacking/locking.html
> 
> >> The former holds the spinlock &sk->sk_lock.slock and synchronizes between
> >> user contexts and bottom halves,
> 
> serializes access to resources which might be accessed from atomic (i.e. non-schedulable) contexts
> 
> >> while the latter holds a mutex on &sk->sk_lock.dep_map to synchronize between
> >> multiple users.
> 
> serializes access to resources which are accessed from only schedulable (i.e. non-atomic) contexts
> 
> >>
> >> One option I can think of would be to switch instances of lock_sock to bh_lock_sock_nested
> >> for users that might race (such as hci_sock_sendmsg, hci_sock_bound_ioctl, and others as
> >> needed). But I'm not sure if that's quite what we want, plus we would need to ensure that
> >> sleeping functions aren't called between the bh_lock/unlock.
> 
> We can't do it for hci_sock_dev_event(HCI_DEV_UNREG).