Hello there I need to say sorry about the erroneous patch I provided recently to fix the CVE-2021-3573 (https://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git/commit/?id=e305509e678b3a4af2b3cfd410f409f7cdaabb52), which will throw a locking bug when the CONFIG_DEBUG_ATOMIC_SLEEP option is enabled. [ 8.234583] BUG: sleeping function called from invalid context at net/core/sock.c:3048 [ 8.235336] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 125, name: exp [ 8.236038] CPU: 0 PID: 125 Comm: exp Not tainted 5.11.11+ #13 [ 8.236542] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 8.237330] Call Trace: [ 8.237605] dump_stack+0x1b9/0x22e [ 8.237946] ? log_buf_vmcoreinfo_setup+0x45d/0x45d [ 8.238453] ? tty_ldisc_hangup+0x4d7/0x6d0 [ 8.238912] ? show_regs_print_info+0x12/0x12 [ 8.239383] ? task_work_run+0x16c/0x210 [ 8.239807] ? syscall_exit_to_user_mode+0x20/0x40 [ 8.240324] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 8.240897] ? _raw_spin_lock+0xa1/0x170 [ 8.241326] ___might_sleep+0x32d/0x420 [ 8.241749] ? stack_trace_snprint+0xe0/0xe0 [ 8.242204] ? __might_sleep+0x100/0x100 [ 8.242636] ? deactivate_slab+0x1ca/0x560 [ 8.243080] lock_sock_nested+0x96/0x360 [ 8.243523] ? hci_sock_dev_event+0xfe/0x5b0 [ 8.244007] ? sock_def_destruct+0x10/0x10 [ 8.244372] ? kasan_set_free_info+0x1f/0x40 [ 8.244738] ? kmem_cache_free+0xca/0x220 [ 8.245093] hci_sock_dev_event+0x2fa/0x5b0 [ 8.245454] hci_unregister_dev+0x3fa/0x1700 [ 8.245820] ? rcu_sync_exit+0xe0/0x1e0 Recently I received several emails informing this. Thank Anand for his detailed analysis, you can find the relevant discussion either in the linux-kernel@vger.kernel.org mail list or the stable@vger.kernel.org mail list. (https://www.spinics.net/lists/stable/msg476038.html for example). --- a/net/bluetooth/hci_sock.c +++ b/net/bluetooth/hci_sock.c @@ -755,7 +755,7 @@ void hci_sock_dev_event(struct hci_dev * /* Detach sockets from device */ read_lock(&hci_sk_list.lock); sk_for_each(sk, &hci_sk_list.head) { - bh_lock_sock_nested(sk); + lock_sock(sk); if (hci_pi(sk)->hdev == hdev) { hci_pi(sk)->hdev = NULL; sk->sk_err = EPIPE; @@ -764,7 +764,7 @@ void hci_sock_dev_event(struct hci_dev * hci_dev_put(hdev); } - bh_unlock_sock(sk); + release_sock(sk); } read_unlock(&hci_sk_list.lock); } The core insight for the bug in the previous commit is that: the lock_sock() I replaced can sleep while the read_lock(&hci_sk_list.lock) two lines before is not going to allow the sleep. Okay, the reason I changed the bh_lock_sock_nested() to lock_sock() is that I want this hci_sock_dev_event() function hangs out when other functions, like hci_sock_bound_ioctl() holds the lock. Otherwise bad UAF can happen. Check the OSS mail list for details: https://www.openwall.com/lists/oss-security/2021/06/08/2 However, from the lock principle in the Linux kernel, this lock replacement is not appropriate. I take a lot of time to try with other lock combinations for this case but failed. For example, I tried to replace the rwlock_t in the hci_sk_list with a sleep-able mutex lock. But the CONFIG_DEBUG_ATOMIC_SLEEP tells me that this mutex lock is not expected in the system call context. In short, I have no idea if there is any lock replacing solution for this bug. I need help and suggestions because the lock mechanism is just so difficult. Think about the UAF root cause, I find out another possible patch for this, which is also provided as the attachment. That is, maybe we can use a ref-count method as well as additional checks to prevent the UAF of hdev object. I mainly concentrated on the hci_sock_bound_ioctl() and hci_sock_sendmsg() functions, which are the main targets for me to write the exploit. For now I think these checks are enough and welcome more advice. Thanks Lin Ma