All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jiri Slaby <jslaby@suse.cz>
To: stable@vger.kernel.org
Cc: Rainer Weikusat <rweikusat@mobileactivedefense.com>,
	"David S . Miller" <davem@davemloft.net>,
	Jiri Slaby <jslaby@suse.cz>
Subject: [patch added to the 3.12 stable tree] unix: avoid use-after-free in ep_remove_wait_queue
Date: Tue,  5 Jan 2016 16:28:06 +0100	[thread overview]
Message-ID: <1452007726-3747-5-git-send-email-jslaby@suse.cz> (raw)
In-Reply-To: <1452007726-3747-1-git-send-email-jslaby@suse.cz>

From: Rainer Weikusat <rweikusat@mobileactivedefense.com>

This patch has been added to the 3.12 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 7d267278a9ece963d77eefec61630223fce08c6c ]

Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:
An AF_UNIX datagram socket being the client in an n:1 association with
some server socket is only allowed to send messages to the server if the
receive queue of this socket contains at most sk_max_ack_backlog
datagrams. This implies that prospective writers might be forced to go
to sleep despite none of the message presently enqueued on the server
receive queue were sent by them. In order to ensure that these will be
woken up once space becomes again available, the present unix_dgram_poll
routine does a second sock_poll_wait call with the peer_wait wait queue
of the server socket as queue argument (unix_dgram_recvmsg does a wake
up on this queue after a datagram was received). This is inherently
problematic because the server socket is only guaranteed to remain alive
for as long as the client still holds a reference to it. In case the
connection is dissolved via connect or by the dead peer detection logic
in unix_dgram_sendmsg, the server socket may be freed despite "the
polling mechanism" (in particular, epoll) still has a pointer to the
corresponding peer_wait queue. There's no way to forcibly deregister a
wait queue with epoll.

Based on an idea by Jason Baron, the patch below changes the code such
that a wait_queue_t belonging to the client socket is enqueued on the
peer_wait queue of the server whenever the peer receive queue full
condition is detected by either a sendmsg or a poll. A wake up on the
peer queue is then relayed to the ordinary wait queue of the client
socket via wake function. The connection to the peer wait queue is again
dissolved if either a wake up is about to be relayed or the client
socket reconnects or a dead peer is detected or the client socket is
itself closed. This enables removing the second sock_poll_wait from
unix_dgram_poll, thus avoiding the use-after-free, while still ensuring
that no blocked writer sleeps forever.

Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Fixes: ec0d215f9420 ("af_unix: fix 'poll for write'/connected DGRAM sockets")
Reviewed-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 include/net/af_unix.h |   1 +
 net/unix/af_unix.c    | 183 ++++++++++++++++++++++++++++++++++++++++++++------
 2 files changed, 165 insertions(+), 19 deletions(-)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index dfe4ddfbb43c..e830c3dff61a 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -63,6 +63,7 @@ struct unix_sock {
 #define UNIX_GC_CANDIDATE	0
 #define UNIX_GC_MAYBE_CYCLE	1
 	struct socket_wq	peer_wq;
+	wait_queue_t		peer_wake;
 };
 
 static inline struct unix_sock *unix_sk(struct sock *sk)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 157b3595ef62..9ce79ed792cd 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -314,6 +314,118 @@ found:
 	return s;
 }
 
+/* Support code for asymmetrically connected dgram sockets
+ *
+ * If a datagram socket is connected to a socket not itself connected
+ * to the first socket (eg, /dev/log), clients may only enqueue more
+ * messages if the present receive queue of the server socket is not
+ * "too large". This means there's a second writeability condition
+ * poll and sendmsg need to test. The dgram recv code will do a wake
+ * up on the peer_wait wait queue of a socket upon reception of a
+ * datagram which needs to be propagated to sleeping would-be writers
+ * since these might not have sent anything so far. This can't be
+ * accomplished via poll_wait because the lifetime of the server
+ * socket might be less than that of its clients if these break their
+ * association with it or if the server socket is closed while clients
+ * are still connected to it and there's no way to inform "a polling
+ * implementation" that it should let go of a certain wait queue
+ *
+ * In order to propagate a wake up, a wait_queue_t of the client
+ * socket is enqueued on the peer_wait queue of the server socket
+ * whose wake function does a wake_up on the ordinary client socket
+ * wait queue. This connection is established whenever a write (or
+ * poll for write) hit the flow control condition and broken when the
+ * association to the server socket is dissolved or after a wake up
+ * was relayed.
+ */
+
+static int unix_dgram_peer_wake_relay(wait_queue_t *q, unsigned mode, int flags,
+				      void *key)
+{
+	struct unix_sock *u;
+	wait_queue_head_t *u_sleep;
+
+	u = container_of(q, struct unix_sock, peer_wake);
+
+	__remove_wait_queue(&unix_sk(u->peer_wake.private)->peer_wait,
+			    q);
+	u->peer_wake.private = NULL;
+
+	/* relaying can only happen while the wq still exists */
+	u_sleep = sk_sleep(&u->sk);
+	if (u_sleep)
+		wake_up_interruptible_poll(u_sleep, key);
+
+	return 0;
+}
+
+static int unix_dgram_peer_wake_connect(struct sock *sk, struct sock *other)
+{
+	struct unix_sock *u, *u_other;
+	int rc;
+
+	u = unix_sk(sk);
+	u_other = unix_sk(other);
+	rc = 0;
+	spin_lock(&u_other->peer_wait.lock);
+
+	if (!u->peer_wake.private) {
+		u->peer_wake.private = other;
+		__add_wait_queue(&u_other->peer_wait, &u->peer_wake);
+
+		rc = 1;
+	}
+
+	spin_unlock(&u_other->peer_wait.lock);
+	return rc;
+}
+
+static void unix_dgram_peer_wake_disconnect(struct sock *sk,
+					    struct sock *other)
+{
+	struct unix_sock *u, *u_other;
+
+	u = unix_sk(sk);
+	u_other = unix_sk(other);
+	spin_lock(&u_other->peer_wait.lock);
+
+	if (u->peer_wake.private == other) {
+		__remove_wait_queue(&u_other->peer_wait, &u->peer_wake);
+		u->peer_wake.private = NULL;
+	}
+
+	spin_unlock(&u_other->peer_wait.lock);
+}
+
+static void unix_dgram_peer_wake_disconnect_wakeup(struct sock *sk,
+						   struct sock *other)
+{
+	unix_dgram_peer_wake_disconnect(sk, other);
+	wake_up_interruptible_poll(sk_sleep(sk),
+				   POLLOUT |
+				   POLLWRNORM |
+				   POLLWRBAND);
+}
+
+/* preconditions:
+ *	- unix_peer(sk) == other
+ *	- association is stable
+ */
+static int unix_dgram_peer_wake_me(struct sock *sk, struct sock *other)
+{
+	int connected;
+
+	connected = unix_dgram_peer_wake_connect(sk, other);
+
+	if (unix_recvq_full(other))
+		return 1;
+
+	if (connected)
+		unix_dgram_peer_wake_disconnect(sk, other);
+
+	return 0;
+}
+
 static inline int unix_writable(struct sock *sk)
 {
 	return (atomic_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf;
@@ -418,6 +530,8 @@ static void unix_release_sock(struct sock *sk, int embrion)
 			skpair->sk_state_change(skpair);
 			sk_wake_async(skpair, SOCK_WAKE_WAITD, POLL_HUP);
 		}
+
+		unix_dgram_peer_wake_disconnect(sk, skpair);
 		sock_put(skpair); /* It may now die */
 		unix_peer(sk) = NULL;
 	}
@@ -651,6 +765,7 @@ static struct sock *unix_create1(struct net *net, struct socket *sock)
 	INIT_LIST_HEAD(&u->link);
 	mutex_init(&u->readlock); /* single task reading lock */
 	init_waitqueue_head(&u->peer_wait);
+	init_waitqueue_func_entry(&u->peer_wake, unix_dgram_peer_wake_relay);
 	unix_insert_socket(unix_sockets_unbound(sk), sk);
 out:
 	if (sk == NULL)
@@ -1018,6 +1133,8 @@ restart:
 	if (unix_peer(sk)) {
 		struct sock *old_peer = unix_peer(sk);
 		unix_peer(sk) = other;
+		unix_dgram_peer_wake_disconnect_wakeup(sk, old_peer);
+
 		unix_state_double_unlock(sk, other);
 
 		if (other != old_peer)
@@ -1457,6 +1574,7 @@ static int unix_dgram_sendmsg(struct kiocb *kiocb, struct socket *sock,
 	struct scm_cookie tmp_scm;
 	int max_level;
 	int data_len = 0;
+	int sk_locked;
 
 	if (NULL == siocb->scm)
 		siocb->scm = &tmp_scm;
@@ -1534,12 +1652,14 @@ restart:
 		goto out_free;
 	}
 
+	sk_locked = 0;
 	unix_state_lock(other);
+restart_locked:
 	err = -EPERM;
 	if (!unix_may_send(sk, other))
 		goto out_unlock;
 
-	if (sock_flag(other, SOCK_DEAD)) {
+	if (unlikely(sock_flag(other, SOCK_DEAD))) {
 		/*
 		 *	Check with 1003.1g - what should
 		 *	datagram error
@@ -1547,10 +1667,14 @@ restart:
 		unix_state_unlock(other);
 		sock_put(other);
 
+		if (!sk_locked)
+			unix_state_lock(sk);
+
 		err = 0;
-		unix_state_lock(sk);
 		if (unix_peer(sk) == other) {
 			unix_peer(sk) = NULL;
+			unix_dgram_peer_wake_disconnect_wakeup(sk, other);
+
 			unix_state_unlock(sk);
 
 			unix_dgram_disconnected(sk, other);
@@ -1576,21 +1700,38 @@ restart:
 			goto out_unlock;
 	}
 
-	if (unix_peer(other) != sk && unix_recvq_full(other)) {
-		if (!timeo) {
-			err = -EAGAIN;
-			goto out_unlock;
+	if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {
+		if (timeo) {
+			timeo = unix_wait_for_peer(other, timeo);
+
+			err = sock_intr_errno(timeo);
+			if (signal_pending(current))
+				goto out_free;
+
+			goto restart;
 		}
 
-		timeo = unix_wait_for_peer(other, timeo);
+		if (!sk_locked) {
+			unix_state_unlock(other);
+			unix_state_double_lock(sk, other);
+		}
 
-		err = sock_intr_errno(timeo);
-		if (signal_pending(current))
-			goto out_free;
+		if (unix_peer(sk) != other ||
+		    unix_dgram_peer_wake_me(sk, other)) {
+			err = -EAGAIN;
+			sk_locked = 1;
+			goto out_unlock;
+		}
 
-		goto restart;
+		if (!sk_locked) {
+			sk_locked = 1;
+			goto restart_locked;
+		}
 	}
 
+	if (unlikely(sk_locked))
+		unix_state_unlock(sk);
+
 	if (sock_flag(other, SOCK_RCVTSTAMP))
 		__net_timestamp(skb);
 	maybe_add_creds(skb, sock, other);
@@ -1604,6 +1745,8 @@ restart:
 	return len;
 
 out_unlock:
+	if (sk_locked)
+		unix_state_unlock(sk);
 	unix_state_unlock(other);
 out_free:
 	kfree_skb(skb);
@@ -2261,14 +2404,16 @@ static unsigned int unix_dgram_poll(struct file *file, struct socket *sock,
 		return mask;
 
 	writable = unix_writable(sk);
-	other = unix_peer_get(sk);
-	if (other) {
-		if (unix_peer(other) != sk) {
-			sock_poll_wait(file, &unix_sk(other)->peer_wait, wait);
-			if (unix_recvq_full(other))
-				writable = 0;
-		}
-		sock_put(other);
+	if (writable) {
+		unix_state_lock(sk);
+
+		other = unix_peer(sk);
+		if (other && unix_peer(other) != sk &&
+		    unix_recvq_full(other) &&
+		    unix_dgram_peer_wake_me(sk, other))
+			writable = 0;
+
+		unix_state_unlock(sk);
 	}
 
 	if (writable)
-- 
2.6.4


  parent reply	other threads:[~2016-01-05 15:29 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-05 15:28 [patch added to the 3.12 stable tree] ipv6: fix tunnel error handling Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] MIPS: KVM: Fix ASID restoration logic Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] MIPS: KVM: Fix CACHE immediate offset sign extension Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] MIPS: KVM: Uninit VCPU in vcpu_create error path Jiri Slaby
2016-01-05 15:28 ` Jiri Slaby [this message]
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] packet: do skb_probe_transport_header when we actually have data Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] packet: infer protocol from ethernet header if unset Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] sctp: translate host order to network order when setting a hmacid Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] snmp: Remove duplicate OUTMCAST stat increment Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] net: qmi_wwan: add XS Stick W100-2 from 4G Systems Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] tcp: md5: fix lockdep annotation Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] tcp: initialize tp->copied_seq in case of cross SYN connection Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] net, scm: fix PaX detected msg_controllen overflow in scm_detach_fds Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] net: ipmr: fix static mfc/dev leaks on table destruction Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] net: ip6mr: " Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] broadcom: fix PHY_ID_BCM5481 entry in the id table Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] ipv6: distinguish frag queues by device for multicast and link-local packets Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] ipv6: add complete rcu protection around np->opt Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] net/neighbour: fix crash at dumping device-agnostic proxy entries Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] ipv6: sctp: implement sctp_v6_destroy_sock() Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] Bluetooth: ath3k: Add support of 04ca:300d AR3012 device Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] ARM: 8426/1: dma-mapping: add missing range check in dma_mmap() Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] ARM: 8427/1: dma-mapping: add support for offset parameter " Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] ARM: common: edma: Fix channel parameter for irq callbacks Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] x86/setup: Extend low identity map to cover whole kernel range Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] x86/setup: Fix low identity map for >= 2GB " Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] x86/cpu: Call verify_cpu() after having entered long mode too Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] x86/cpu: Fix SMAP check in PVOPS environments Jiri Slaby
2016-01-05 15:28 ` Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] mac80211: fix driver RSSI event calculations Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] net: mvneta: Fix CPU_MAP registers initialisation Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] mwifiex: fix mwifiex_rdeeprom_read() Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] staging: rtl8712: Add device ID for Sitecom WLA2100 Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] Bluetooth: hidp: fix device disconnect on idle timeout Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] Bluetooth: ath3k: Add new AR3012 0930:021c id Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] Bluetooth: ath3k: Add support of AR3012 0cf3:817b device Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] can: sja1000: clear interrupts on start Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] arm64: Fix compat register mappings Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] usblp: do not set TASK_INTERRUPTIBLE before lock Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] usb: musb: core: fix order of arguments to ulpi write callback Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] USB: ti_usb_3410_5052: Add Honeywell HGI80 ID Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] USB: serial: option: add support for Novatel MiFi USB620L Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] USB: option: add XS Stick W100-2 from 4G Systems Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] ALSA: usb-audio: add packet size quirk for the Medeli DD305 Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] ALSA: usb-audio: prevent CH345 multiport output SysEx corruption Jiri Slaby
2016-01-05 15:28 ` [patch added to the 3.12 stable tree] ALSA: usb-audio: work around CH345 input " Jiri Slaby

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1452007726-3747-5-git-send-email-jslaby@suse.cz \
    --to=jslaby@suse.cz \
    --cc=davem@davemloft.net \
    --cc=rweikusat@mobileactivedefense.com \
    --cc=stable@vger.kernel.org \
    --subject='Re: [patch added to the 3.12 stable tree] unix: avoid use-after-free in ep_remove_wait_queue' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.