* [PATCH 0/1] Bluetooth: Fix refcount use-after-free issue @ 2020-01-28 18:54 Manish Mandlik 2020-01-28 18:54 ` [PATCH 1/1] " Manish Mandlik 0 siblings, 1 reply; 3+ messages in thread From: Manish Mandlik @ 2020-01-28 18:54 UTC (permalink / raw) To: Marcel Holtmann Cc: Yoni Shavit, linux-bluetooth, Alain Michaud, Abhishek Pandit-Subedi, ChromeOS Bluetooth Upstreaming, Manish Mandlik, David S. Miller, Johan Hedberg, netdev, linux-kernel, Jakub Kicinski Hello Linux-Bluetooth, Sometimes after boot following kernel warning is observed: [ 62.793493] refcount_t: underflow; use-after-free. [ 62.799419] WARNING: CPU: 2 PID: 69 at /mnt/host/source/src/third_party/ kernel/v4.19/lib/refcount.c:187 refcount_sub_and_test_checked+0x80/0x8c [ 62.812298] Modules linked in: hidp rfcomm uinput hci_uart btqca bluetooth ecdh_generic mtk_scp mtk_rpmsg mtk_scp_ipi rpmsg_core bridge snd_seq_dummy stp llc snd_seq snd_seq_device lzo_rle lzo_compress nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp esp6 ah6 xfrm6_mode_tunnel xfrm6_mode_transport xfrm4_mode_tunnel xfrm4_mode_transport ip6t_REJECT ip6t_ipv6header zram ipt_MASQUERADE fuse ath10k_sdio ath10k_core ath mac80211 cfg80211 joydev [ 62.852227] CPU: 2 PID: 69 Comm: kworker/2:1 Tainted: G S 4.19.36 #344 [ 62.860057] Hardware name: MediaTek kukui rev1 board (DT) [ 62.865510] Workqueue: events l2cap_chan_timeout [bluetooth] [ 62.871177] pstate: 60000005 (nZCv daif -PAN -UAO) [ 62.875973] pc : refcount_sub_and_test_checked+0x80/0x8c [ 62.881285] lr : refcount_sub_and_test_checked+0x7c/0x8c [ 62.886594] sp : ffffff8008533cc0 [ 62.889907] x29: ffffff8008533cc0 x28: 0000000000000402 [ 62.895227] x27: ffffffaf37f16000 x26: ffffffe33b342d80 [ 62.900547] x25: ffffffe33aa2c210 x24: 0000000000000000 [ 62.905867] x23: ffffffe3294ef910 x22: ffffffe3294efc68 [ 62.911188] x21: ffffffe3294ef910 x20: ffffffe320fe3238 [ 62.916516] x19: ffffffe3294ed000 x18: 0000464806a32cd4 [ 62.921842] x17: 0000000000000400 x16: 0000000000000001 [ 62.927162] x15: 0000000000000000 x14: 0000000000000001 [ 62.932482] x13: 00000000000c001f x12: 0000000000000000 [ 62.937802] x11: 0000000000000001 x10: 0000000000000007 [ 62.943122] x9 : 97fe39c0a1baee00 x8 : 97fe39c0a1baee00 [ 62.948442] x7 : ffffffaf36af114c x6 : 0000000000000000 [ 62.953762] x5 : 0000000000000080 x4 : 0000000000000001 [ 62.959081] x3 : ffffff8008533868 x2 : 0000000000000006 [ 62.964401] x1 : ffffffe33b3436a0 x0 : 0000000000000000 [ 62.969721] Call trace: [ 62.972176] refcount_sub_and_test_checked+0x80/0x8c [ 62.977142] refcount_dec_and_test_checked+0x14/0x20 [ 62.982153] l2cap_sock_kill+0x40/0x58 [bluetooth] [ 62.986974] l2cap_sock_close_cb+0x1c/0x28 [bluetooth] [ 62.992140] l2cap_chan_timeout+0x94/0xb4 [bluetooth] [ 62.997196] process_one_work+0x330/0x65c [ 63.001206] worker_thread+0x2c8/0x3ec [ 63.004957] kthread+0x124/0x134 [ 63.008197] ret_from_fork+0x10/0x18 [ 63.011784] irq event stamp: 50638 [ 63.015203] hardirqs last enabled at (50637): [<ffffffaf37404fa4>] _raw_spin_unlock_irq+0x34/0x68 [ 63.024165] hardirqs last disabled at (50638): [<ffffffaf36a80e4c>] do_debug_exception+0x44/0x16c [ 63.033036] softirqs last enabled at (50632): [<ffffffaf36a81634>] __do_softirq+0x45c/0x4a4 [ 63.041473] softirqs last disabled at (50613): [<ffffffaf36abcf80>] irq_exit+0xd8/0xf8 [ 63.049386] ---[ end trace 91fdf7b9eddd3bb0 ]--- After analyzing the code, we noticed that there is a race condition between l2cap_chan_timeout() and l2cap_sock_release() while killing the socket. There are few more places in l2cap code where this race condition will occur. Issue is reproducible by writing a test which runs connect/disconnect of a bluetooth device in loop. With the help of this test, issue is consistently reproducible just within a couple of hours. To fix this, protect teardown/sock_kill and orphan/sock_kill by adding hold_lock on l2cap channel to ensure that the socket is killed only after it is marked as zapped and orphan. This change was tested by running the above test overnight and verifying that issue is not observed. There are places in sco and rfcomm code as well where similar race condition may occur. We are planning to send separate patches for them as well. Please let us know if this looks like a good generalized solution. Regards, Manish. Manish Mandlik (1): Bluetooth: Fix refcount use-after-free issue net/bluetooth/l2cap_core.c | 26 +++++++++++++++----------- net/bluetooth/l2cap_sock.c | 16 +++++++++++++--- 2 files changed, 28 insertions(+), 14 deletions(-) -- 2.25.0.341.g760bfbb309-goog ^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH 1/1] Bluetooth: Fix refcount use-after-free issue 2020-01-28 18:54 [PATCH 0/1] Bluetooth: Fix refcount use-after-free issue Manish Mandlik @ 2020-01-28 18:54 ` Manish Mandlik 2020-01-29 3:55 ` Marcel Holtmann 0 siblings, 1 reply; 3+ messages in thread From: Manish Mandlik @ 2020-01-28 18:54 UTC (permalink / raw) To: Marcel Holtmann Cc: Yoni Shavit, linux-bluetooth, Alain Michaud, Abhishek Pandit-Subedi, ChromeOS Bluetooth Upstreaming, Manish Mandlik, David S. Miller, Johan Hedberg, netdev, linux-kernel, Jakub Kicinski There is no lock preventing both l2cap_sock_release() and chan->ops->close() from running at the same time. If we consider Thread A running l2cap_chan_timeout() and Thread B running l2cap_sock_release(), expected behavior is: A::l2cap_chan_timeout()->l2cap_chan_close()->l2cap_sock_teardown_cb() A::l2cap_chan_timeout()->l2cap_sock_close_cb()->l2cap_sock_kill() B::l2cap_sock_release()->sock_orphan() B::l2cap_sock_release()->l2cap_sock_kill() where, sock_orphan() clears "sk->sk_socket" and l2cap_sock_teardown_cb() marks socket as SOCK_ZAPPED. In l2cap_sock_kill(), there is an "if-statement" that checks if both sock_orphan() and sock_teardown() has been run i.e. sk->sk_socket is NULL and socket is marked as SOCK_ZAPPED. Socket is killed if the condition is satisfied. In the race condition, following occurs: A::l2cap_chan_timeout()->l2cap_chan_close()->l2cap_sock_teardown_cb() B::l2cap_sock_release()->sock_orphan() B::l2cap_sock_release()->l2cap_sock_kill() A::l2cap_chan_timeout()->l2cap_sock_close_cb()->l2cap_sock_kill() In this scenario, "if-statement" is true in both B::l2cap_sock_kill() and A::l2cap_sock_kill() and we hit "refcount: underflow; use-after-free" bug. Similar condition occurs at other places where teardown/sock_kill is happening: l2cap_disconnect_rsp()->l2cap_chan_del()->l2cap_sock_teardown_cb() l2cap_disconnect_rsp()->l2cap_sock_close_cb()->l2cap_sock_kill() l2cap_conn_del()->l2cap_chan_del()->l2cap_sock_teardown_cb() l2cap_conn_del()->l2cap_sock_close_cb()->l2cap_sock_kill() l2cap_disconnect_req()->l2cap_chan_del()->l2cap_sock_teardown_cb() l2cap_disconnect_req()->l2cap_sock_close_cb()->l2cap_sock_kill() l2cap_sock_cleanup_listen()->l2cap_chan_close()->l2cap_sock_teardown_cb() l2cap_sock_cleanup_listen()->l2cap_sock_kill() Protect teardown/sock_kill and orphan/sock_kill by adding hold_lock on l2cap channel to ensure that the socket is killed only after marked as zapped and orphan. Signed-off-by: Manish Mandlik <mmandlik@google.com> --- net/bluetooth/l2cap_core.c | 26 +++++++++++++++----------- net/bluetooth/l2cap_sock.c | 16 +++++++++++++--- 2 files changed, 28 insertions(+), 14 deletions(-) diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c index 195459a1e53e..dd2021270b8a 100644 --- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -419,6 +419,9 @@ static void l2cap_chan_timeout(struct work_struct *work) BT_DBG("chan %p state %s", chan, state_to_string(chan->state)); mutex_lock(&conn->chan_lock); + /* __set_chan_timer() calls l2cap_chan_hold(chan) while scheduling + * this work. No need to call l2cap_chan_hold(chan) here again. + */ l2cap_chan_lock(chan); if (chan->state == BT_CONNECTED || chan->state == BT_CONFIG) @@ -431,12 +434,12 @@ static void l2cap_chan_timeout(struct work_struct *work) l2cap_chan_close(chan, reason); - l2cap_chan_unlock(chan); - chan->ops->close(chan); - mutex_unlock(&conn->chan_lock); + l2cap_chan_unlock(chan); l2cap_chan_put(chan); + + mutex_unlock(&conn->chan_lock); } struct l2cap_chan *l2cap_chan_create(void) @@ -1737,9 +1740,9 @@ static void l2cap_conn_del(struct hci_conn *hcon, int err) l2cap_chan_del(chan, err); - l2cap_chan_unlock(chan); - chan->ops->close(chan); + + l2cap_chan_unlock(chan); l2cap_chan_put(chan); } @@ -4405,6 +4408,7 @@ static inline int l2cap_disconnect_req(struct l2cap_conn *conn, return 0; } + l2cap_chan_hold(chan); l2cap_chan_lock(chan); rsp.dcid = cpu_to_le16(chan->scid); @@ -4413,12 +4417,11 @@ static inline int l2cap_disconnect_req(struct l2cap_conn *conn, chan->ops->set_shutdown(chan); - l2cap_chan_hold(chan); l2cap_chan_del(chan, ECONNRESET); - l2cap_chan_unlock(chan); - chan->ops->close(chan); + + l2cap_chan_unlock(chan); l2cap_chan_put(chan); mutex_unlock(&conn->chan_lock); @@ -4450,20 +4453,21 @@ static inline int l2cap_disconnect_rsp(struct l2cap_conn *conn, return 0; } + l2cap_chan_hold(chan); l2cap_chan_lock(chan); if (chan->state != BT_DISCONN) { l2cap_chan_unlock(chan); + l2cap_chan_put(chan); mutex_unlock(&conn->chan_lock); return 0; } - l2cap_chan_hold(chan); l2cap_chan_del(chan, 0); - l2cap_chan_unlock(chan); - chan->ops->close(chan); + + l2cap_chan_unlock(chan); l2cap_chan_put(chan); mutex_unlock(&conn->chan_lock); diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c index a7be8b59b3c2..ab65304f3f63 100644 --- a/net/bluetooth/l2cap_sock.c +++ b/net/bluetooth/l2cap_sock.c @@ -1042,7 +1042,7 @@ static int l2cap_sock_recvmsg(struct socket *sock, struct msghdr *msg, } /* Kill socket (only if zapped and orphan) - * Must be called on unlocked socket. + * Must be called on unlocked socket, with l2cap channel lock. */ static void l2cap_sock_kill(struct sock *sk) { @@ -1203,8 +1203,15 @@ static int l2cap_sock_release(struct socket *sock) err = l2cap_sock_shutdown(sock, 2); + l2cap_chan_hold(l2cap_pi(sk)->chan); + l2cap_chan_lock(l2cap_pi(sk)->chan); + sock_orphan(sk); l2cap_sock_kill(sk); + + l2cap_chan_unlock(l2cap_pi(sk)->chan); + l2cap_chan_put(l2cap_pi(sk)->chan); + return err; } @@ -1222,12 +1229,15 @@ static void l2cap_sock_cleanup_listen(struct sock *parent) BT_DBG("child chan %p state %s", chan, state_to_string(chan->state)); + l2cap_chan_hold(chan); l2cap_chan_lock(chan); + __clear_chan_timer(chan); l2cap_chan_close(chan, ECONNRESET); - l2cap_chan_unlock(chan); - l2cap_sock_kill(sk); + + l2cap_chan_unlock(chan); + l2cap_chan_put(chan); } } -- 2.25.0.341.g760bfbb309-goog ^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH 1/1] Bluetooth: Fix refcount use-after-free issue 2020-01-28 18:54 ` [PATCH 1/1] " Manish Mandlik @ 2020-01-29 3:55 ` Marcel Holtmann 0 siblings, 0 replies; 3+ messages in thread From: Marcel Holtmann @ 2020-01-29 3:55 UTC (permalink / raw) To: Manish Mandlik Cc: Yoni Shavit, linux-bluetooth, Alain Michaud, Abhishek Pandit-Subedi, ChromeOS Bluetooth Upstreaming, David S. Miller, Johan Hedberg, netdev, linux-kernel, Jakub Kicinski Hi Manish, > There is no lock preventing both l2cap_sock_release() and > chan->ops->close() from running at the same time. > > If we consider Thread A running l2cap_chan_timeout() and Thread B running > l2cap_sock_release(), expected behavior is: > A::l2cap_chan_timeout()->l2cap_chan_close()->l2cap_sock_teardown_cb() > A::l2cap_chan_timeout()->l2cap_sock_close_cb()->l2cap_sock_kill() > B::l2cap_sock_release()->sock_orphan() > B::l2cap_sock_release()->l2cap_sock_kill() > > where, > sock_orphan() clears "sk->sk_socket" and l2cap_sock_teardown_cb() marks > socket as SOCK_ZAPPED. > > In l2cap_sock_kill(), there is an "if-statement" that checks if both > sock_orphan() and sock_teardown() has been run i.e. sk->sk_socket is NULL > and socket is marked as SOCK_ZAPPED. Socket is killed if the condition is > satisfied. > > In the race condition, following occurs: > A::l2cap_chan_timeout()->l2cap_chan_close()->l2cap_sock_teardown_cb() > B::l2cap_sock_release()->sock_orphan() > B::l2cap_sock_release()->l2cap_sock_kill() > A::l2cap_chan_timeout()->l2cap_sock_close_cb()->l2cap_sock_kill() > > In this scenario, "if-statement" is true in both B::l2cap_sock_kill() and > A::l2cap_sock_kill() and we hit "refcount: underflow; use-after-free" bug. > > Similar condition occurs at other places where teardown/sock_kill is > happening: > l2cap_disconnect_rsp()->l2cap_chan_del()->l2cap_sock_teardown_cb() > l2cap_disconnect_rsp()->l2cap_sock_close_cb()->l2cap_sock_kill() > > l2cap_conn_del()->l2cap_chan_del()->l2cap_sock_teardown_cb() > l2cap_conn_del()->l2cap_sock_close_cb()->l2cap_sock_kill() > > l2cap_disconnect_req()->l2cap_chan_del()->l2cap_sock_teardown_cb() > l2cap_disconnect_req()->l2cap_sock_close_cb()->l2cap_sock_kill() > > l2cap_sock_cleanup_listen()->l2cap_chan_close()->l2cap_sock_teardown_cb() > l2cap_sock_cleanup_listen()->l2cap_sock_kill() > > Protect teardown/sock_kill and orphan/sock_kill by adding hold_lock on > l2cap channel to ensure that the socket is killed only after marked as > zapped and orphan. > > Signed-off-by: Manish Mandlik <mmandlik@google.com> > --- > > net/bluetooth/l2cap_core.c | 26 +++++++++++++++----------- > net/bluetooth/l2cap_sock.c | 16 +++++++++++++--- > 2 files changed, 28 insertions(+), 14 deletions(-) patch has been applied to bluetooth-next tree. Regards Marcel ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2020-01-29 3:56 UTC | newest] Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-01-28 18:54 [PATCH 0/1] Bluetooth: Fix refcount use-after-free issue Manish Mandlik 2020-01-28 18:54 ` [PATCH 1/1] " Manish Mandlik 2020-01-29 3:55 ` Marcel Holtmann
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).