netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/6] MPTCP: improve fallback to TCP
@ 2020-06-29 20:26 Davide Caratti
  2020-06-29 20:26 ` [PATCH net-next 1/6] net: mptcp: " Davide Caratti
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: Davide Caratti @ 2020-06-29 20:26 UTC (permalink / raw)
  To: Mat Martineau, Matthieu Baerts, David S. Miller, Jakub Kicinski,
	netdev, mptcp

there are situations where MPTCP sockets should fall-back to regular TCP:
this series reworks the fallback code to pursue the following goals:

1) cleanup the non fallback code, removing most of 'if (<fallback>)' in
   the data path
2) improve performance for non-fallback sockets, avoiding locks in poll()

further work will also leverage on this changes to achieve:

a) more consistent behavior of gestockopt()/setsockopt() on passive sockets
   after fallback
b) support for "infinite maps" as per RFC8684, section 3.7

the series is made of the following items:

- patch 1 lets sendmsg() / recvmsg() / poll() use the main socket also
  after fallback
- patch 2 fixes 'simultaneous connect' scenario after fallback. The
  problem was present also before the rework, but the fix is much easier
  to implement after patch 1
- patch 3, 4, 5 are clean-ups for code that is no more needed after the
  fallback rework
- patch 6 fixes a race condition between close() and poll(). The problem
  was theoretically present before the rework, but it became almost
  systematic after patch 1

Davide Caratti (2):
  net: mptcp: improve fallback to TCP
  mptcp: fallback in case of simultaneous connect

Paolo Abeni (4):
  mptcp: check for plain TCP sock at accept time
  mptcp: create first subflow at msk creation time
  mptcp: __mptcp_tcp_fallback() returns a struct sock
  mptcp: close poll() races

 net/mptcp/options.c  |   9 +-
 net/mptcp/protocol.c | 267 ++++++++++++++-----------------------------
 net/mptcp/protocol.h |  43 +++++++
 net/mptcp/subflow.c  |  57 ++++++---
 4 files changed, 175 insertions(+), 201 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH net-next 1/6] net: mptcp: improve fallback to TCP
  2020-06-29 20:26 [PATCH net-next 0/6] MPTCP: improve fallback to TCP Davide Caratti
@ 2020-06-29 20:26 ` Davide Caratti
  2020-06-29 23:29   ` Mat Martineau
  2020-06-29 20:26 ` [PATCH net-next 2/6] mptcp: fallback in case of simultaneous connect Davide Caratti
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Davide Caratti @ 2020-06-29 20:26 UTC (permalink / raw)
  To: Mat Martineau, Matthieu Baerts, David S. Miller, Jakub Kicinski,
	netdev, mptcp

Keep using MPTCP sockets and a use "dummy mapping" in case of fallback
to regular TCP. When fallback is triggered, skip addition of the MPTCP
option on send.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/11
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/22
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
---
 net/mptcp/options.c  |  9 +++-
 net/mptcp/protocol.c | 98 ++++++++++++--------------------------------
 net/mptcp/protocol.h | 33 +++++++++++++++
 net/mptcp/subflow.c  | 47 +++++++++++++--------
 4 files changed, 98 insertions(+), 89 deletions(-)

diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index df9a51425c6f..b96d3660562f 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -624,6 +624,9 @@ bool mptcp_established_options(struct sock *sk, struct sk_buff *skb,
 
 	opts->suboptions = 0;
 
+	if (unlikely(mptcp_check_fallback(sk)))
+		return false;
+
 	if (mptcp_established_options_mp(sk, skb, &opt_size, remaining, opts))
 		ret = true;
 	else if (mptcp_established_options_dss(sk, skb, &opt_size, remaining,
@@ -714,7 +717,8 @@ static bool check_fully_established(struct mptcp_sock *msk, struct sock *sk,
 	 */
 	if (!mp_opt->mp_capable) {
 		subflow->mp_capable = 0;
-		tcp_sk(sk)->is_mptcp = 0;
+		pr_fallback(msk);
+		__mptcp_do_fallback(msk);
 		return false;
 	}
 
@@ -814,6 +818,9 @@ void mptcp_incoming_options(struct sock *sk, struct sk_buff *skb,
 	struct mptcp_options_received mp_opt;
 	struct mptcp_ext *mpext;
 
+	if (__mptcp_check_fallback(msk))
+		return;
+
 	mptcp_get_options(skb, &mp_opt);
 	if (!check_fully_established(msk, sk, subflow, skb, &mp_opt))
 		return;
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index be09fd525f8f..84ae96be9837 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -52,11 +52,6 @@ static struct socket *__mptcp_nmpc_socket(const struct mptcp_sock *msk)
 	return msk->subflow;
 }
 
-static bool __mptcp_needs_tcp_fallback(const struct mptcp_sock *msk)
-{
-	return msk->first && !sk_is_mptcp(msk->first);
-}
-
 static struct socket *mptcp_is_tcpsk(struct sock *sk)
 {
 	struct socket *sock = sk->sk_socket;
@@ -94,7 +89,7 @@ static struct socket *__mptcp_tcp_fallback(struct mptcp_sock *msk)
 	if (unlikely(sock))
 		return sock;
 
-	if (likely(!__mptcp_needs_tcp_fallback(msk)))
+	if (likely(!__mptcp_check_fallback(msk)))
 		return NULL;
 
 	return msk->subflow;
@@ -133,6 +128,11 @@ static struct socket *__mptcp_socket_create(struct mptcp_sock *msk, int state)
 	list_add(&subflow->node, &msk->conn_list);
 	subflow->request_mptcp = 1;
 
+	/* accept() will wait on first subflow sk_wq, and we always wakes up
+	 * via msk->sk_socket
+	 */
+	RCU_INIT_POINTER(msk->first->sk_wq, &sk->sk_socket->wq);
+
 set_state:
 	if (state != MPTCP_SAME_STATE)
 		inet_sk_state_store(sk, state);
@@ -229,6 +229,15 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
 		if (!skb)
 			break;
 
+		if (__mptcp_check_fallback(msk)) {
+			/* if we are running under the workqueue, TCP could have
+			 * collapsed skbs between dummy map creation and now
+			 * be sure to adjust the size
+			 */
+			map_remaining = skb->len;
+			subflow->map_data_len = skb->len;
+		}
+
 		offset = seq - TCP_SKB_CB(skb)->seq;
 		fin = TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN;
 		if (fin) {
@@ -466,8 +475,15 @@ static void mptcp_clean_una(struct sock *sk)
 {
 	struct mptcp_sock *msk = mptcp_sk(sk);
 	struct mptcp_data_frag *dtmp, *dfrag;
-	u64 snd_una = atomic64_read(&msk->snd_una);
 	bool cleaned = false;
+	u64 snd_una;
+
+	/* on fallback we just need to ignore snd_una, as this is really
+	 * plain TCP
+	 */
+	if (__mptcp_check_fallback(msk))
+		atomic64_set(&msk->snd_una, msk->write_seq);
+	snd_una = atomic64_read(&msk->snd_una);
 
 	list_for_each_entry_safe(dfrag, dtmp, &msk->rtx_queue, list) {
 		if (after64(dfrag->data_seq + dfrag->data_len, snd_una))
@@ -740,7 +756,6 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	int mss_now = 0, size_goal = 0, ret = 0;
 	struct mptcp_sock *msk = mptcp_sk(sk);
 	struct page_frag *pfrag;
-	struct socket *ssock;
 	size_t copied = 0;
 	struct sock *ssk;
 	bool tx_ok;
@@ -759,15 +774,6 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 			goto out;
 	}
 
-fallback:
-	ssock = __mptcp_tcp_fallback(msk);
-	if (unlikely(ssock)) {
-		release_sock(sk);
-		pr_debug("fallback passthrough");
-		ret = sock_sendmsg(ssock, msg);
-		return ret >= 0 ? ret + copied : (copied ? copied : ret);
-	}
-
 	pfrag = sk_page_frag(sk);
 restart:
 	mptcp_clean_una(sk);
@@ -819,17 +825,6 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 			}
 			break;
 		}
-		if (ret == 0 && unlikely(__mptcp_needs_tcp_fallback(msk))) {
-			/* Can happen for passive sockets:
-			 * 3WHS negotiated MPTCP, but first packet after is
-			 * plain TCP (e.g. due to middlebox filtering unknown
-			 * options).
-			 *
-			 * Fall back to TCP.
-			 */
-			release_sock(ssk);
-			goto fallback;
-		}
 
 		copied += ret;
 
@@ -972,7 +967,6 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 			 int nonblock, int flags, int *addr_len)
 {
 	struct mptcp_sock *msk = mptcp_sk(sk);
-	struct socket *ssock;
 	int copied = 0;
 	int target;
 	long timeo;
@@ -981,16 +975,6 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		return -EOPNOTSUPP;
 
 	lock_sock(sk);
-	ssock = __mptcp_tcp_fallback(msk);
-	if (unlikely(ssock)) {
-fallback:
-		release_sock(sk);
-		pr_debug("fallback-read subflow=%p",
-			 mptcp_subflow_ctx(ssock->sk));
-		copied = sock_recvmsg(ssock, msg, flags);
-		return copied;
-	}
-
 	timeo = sock_rcvtimeo(sk, nonblock);
 
 	len = min_t(size_t, len, INT_MAX);
@@ -1056,9 +1040,6 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 
 		pr_debug("block timeout %ld", timeo);
 		mptcp_wait_data(sk, &timeo);
-		ssock = __mptcp_tcp_fallback(msk);
-		if (unlikely(ssock))
-			goto fallback;
 	}
 
 	if (skb_queue_empty(&sk->sk_receive_queue)) {
@@ -1335,8 +1316,6 @@ static void mptcp_subflow_shutdown(struct sock *ssk, int how,
 		break;
 	}
 
-	/* Wake up anyone sleeping in poll. */
-	ssk->sk_state_change(ssk);
 	release_sock(ssk);
 }
 
@@ -1660,12 +1639,6 @@ void mptcp_finish_connect(struct sock *ssk)
 	sk = subflow->conn;
 	msk = mptcp_sk(sk);
 
-	if (!subflow->mp_capable) {
-		MPTCP_INC_STATS(sock_net(sk),
-				MPTCP_MIB_MPCAPABLEACTIVEFALLBACK);
-		return;
-	}
-
 	pr_debug("msk=%p, token=%u", sk, subflow->token);
 
 	mptcp_crypto_key_sha(subflow->remote_key, NULL, &ack_seq);
@@ -1971,23 +1944,10 @@ static __poll_t mptcp_poll(struct file *file, struct socket *sock,
 {
 	struct sock *sk = sock->sk;
 	struct mptcp_sock *msk;
-	struct socket *ssock;
 	__poll_t mask = 0;
 
 	msk = mptcp_sk(sk);
-	lock_sock(sk);
-	ssock = __mptcp_tcp_fallback(msk);
-	if (!ssock)
-		ssock = __mptcp_nmpc_socket(msk);
-	if (ssock) {
-		mask = ssock->ops->poll(file, ssock, wait);
-		release_sock(sk);
-		return mask;
-	}
-
-	release_sock(sk);
 	sock_poll_wait(file, sock, wait);
-	lock_sock(sk);
 
 	if (test_bit(MPTCP_DATA_READY, &msk->flags))
 		mask = EPOLLIN | EPOLLRDNORM;
@@ -1997,8 +1957,6 @@ static __poll_t mptcp_poll(struct file *file, struct socket *sock,
 	if (sk->sk_shutdown & RCV_SHUTDOWN)
 		mask |= EPOLLIN | EPOLLRDNORM | EPOLLRDHUP;
 
-	release_sock(sk);
-
 	return mask;
 }
 
@@ -2006,18 +1964,11 @@ static int mptcp_shutdown(struct socket *sock, int how)
 {
 	struct mptcp_sock *msk = mptcp_sk(sock->sk);
 	struct mptcp_subflow_context *subflow;
-	struct socket *ssock;
 	int ret = 0;
 
 	pr_debug("sk=%p, how=%d", msk, how);
 
 	lock_sock(sock->sk);
-	ssock = __mptcp_tcp_fallback(msk);
-	if (ssock) {
-		release_sock(sock->sk);
-		return inet_shutdown(ssock, how);
-	}
-
 	if (how == SHUT_WR || how == SHUT_RDWR)
 		inet_sk_state_store(sock->sk, TCP_FIN_WAIT1);
 
@@ -2043,6 +1994,9 @@ static int mptcp_shutdown(struct socket *sock, int how)
 		mptcp_subflow_shutdown(tcp_sk, how, 1, msk->write_seq);
 	}
 
+	/* Wake up anyone sleeping in poll. */
+	sock->sk->sk_state_change(sock->sk);
+
 out_unlock:
 	release_sock(sock->sk);
 
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index c05552e5fa23..a709df659ae0 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -89,6 +89,7 @@
 #define MPTCP_SEND_SPACE	1
 #define MPTCP_WORK_RTX		2
 #define MPTCP_WORK_EOF		3
+#define MPTCP_FALLBACK_DONE	4
 
 struct mptcp_options_received {
 	u64	sndr_key;
@@ -457,4 +458,36 @@ static inline bool before64(__u64 seq1, __u64 seq2)
 
 void mptcp_diag_subflow_init(struct tcp_ulp_ops *ops);
 
+static inline bool __mptcp_check_fallback(struct mptcp_sock *msk)
+{
+	return test_bit(MPTCP_FALLBACK_DONE, &msk->flags);
+}
+
+static inline bool mptcp_check_fallback(struct sock *sk)
+{
+	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
+	struct mptcp_sock *msk = mptcp_sk(subflow->conn);
+
+	return __mptcp_check_fallback(msk);
+}
+
+static inline void __mptcp_do_fallback(struct mptcp_sock *msk)
+{
+	if (test_bit(MPTCP_FALLBACK_DONE, &msk->flags)) {
+		pr_debug("TCP fallback already done (msk=%p)", msk);
+		return;
+	}
+	set_bit(MPTCP_FALLBACK_DONE, &msk->flags);
+}
+
+static inline void mptcp_do_fallback(struct sock *sk)
+{
+	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
+	struct mptcp_sock *msk = mptcp_sk(subflow->conn);
+
+	__mptcp_do_fallback(msk);
+}
+
+#define pr_fallback(a) pr_debug("%s:fallback to TCP (msk=%p)", __func__, a)
+
 #endif /* __MPTCP_PROTOCOL_H */
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 102db8c88e97..cb8a42ff4646 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -216,7 +216,6 @@ static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb)
 	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
 	struct mptcp_options_received mp_opt;
 	struct sock *parent = subflow->conn;
-	struct tcp_sock *tp = tcp_sk(sk);
 
 	subflow->icsk_af_ops->sk_rx_dst_set(sk, skb);
 
@@ -230,6 +229,8 @@ static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb)
 		return;
 
 	subflow->conn_finished = 1;
+	subflow->ssn_offset = TCP_SKB_CB(skb)->seq;
+	pr_debug("subflow=%p synack seq=%x", subflow, subflow->ssn_offset);
 
 	mptcp_get_options(skb, &mp_opt);
 	if (subflow->request_mptcp && mp_opt.mp_capable) {
@@ -245,21 +246,20 @@ static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb)
 		pr_debug("subflow=%p, thmac=%llu, remote_nonce=%u", subflow,
 			 subflow->thmac, subflow->remote_nonce);
 	} else {
-		tp->is_mptcp = 0;
+		if (subflow->request_mptcp)
+			MPTCP_INC_STATS(sock_net(sk),
+					MPTCP_MIB_MPCAPABLEACTIVEFALLBACK);
+		mptcp_do_fallback(sk);
+		pr_fallback(mptcp_sk(subflow->conn));
 	}
 
-	if (!tp->is_mptcp)
+	if (mptcp_check_fallback(sk))
 		return;
 
 	if (subflow->mp_capable) {
 		pr_debug("subflow=%p, remote_key=%llu", mptcp_subflow_ctx(sk),
 			 subflow->remote_key);
 		mptcp_finish_connect(sk);
-
-		if (skb) {
-			pr_debug("synack seq=%u", TCP_SKB_CB(skb)->seq);
-			subflow->ssn_offset = TCP_SKB_CB(skb)->seq;
-		}
 	} else if (subflow->mp_join) {
 		u8 hmac[SHA256_DIGEST_SIZE];
 
@@ -279,9 +279,6 @@ static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb)
 
 		memcpy(subflow->hmac, hmac, MPTCPOPT_HMAC_LEN);
 
-		if (skb)
-			subflow->ssn_offset = TCP_SKB_CB(skb)->seq;
-
 		if (!mptcp_finish_join(sk))
 			goto do_reset;
 
@@ -557,7 +554,8 @@ enum mapping_status {
 	MAPPING_OK,
 	MAPPING_INVALID,
 	MAPPING_EMPTY,
-	MAPPING_DATA_FIN
+	MAPPING_DATA_FIN,
+	MAPPING_DUMMY
 };
 
 static u64 expand_seq(u64 old_seq, u16 old_data_len, u64 seq)
@@ -621,6 +619,9 @@ static enum mapping_status get_mapping_status(struct sock *ssk)
 	if (!skb)
 		return MAPPING_EMPTY;
 
+	if (mptcp_check_fallback(ssk))
+		return MAPPING_DUMMY;
+
 	mpext = mptcp_get_ext(skb);
 	if (!mpext || !mpext->use_map) {
 		if (!subflow->map_valid && !skb->len) {
@@ -762,6 +763,16 @@ static bool subflow_check_data_avail(struct sock *ssk)
 			ssk->sk_err = EBADMSG;
 			goto fatal;
 		}
+		if (status == MAPPING_DUMMY) {
+			__mptcp_do_fallback(msk);
+			skb = skb_peek(&ssk->sk_receive_queue);
+			subflow->map_valid = 1;
+			subflow->map_seq = READ_ONCE(msk->ack_seq);
+			subflow->map_data_len = skb->len;
+			subflow->map_subflow_seq = tcp_sk(ssk)->copied_seq -
+						   subflow->ssn_offset;
+			return true;
+		}
 
 		if (status != MAPPING_OK)
 			return false;
@@ -885,14 +896,18 @@ static void subflow_data_ready(struct sock *sk)
 {
 	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
 	struct sock *parent = subflow->conn;
+	struct mptcp_sock *msk;
 
-	if (!subflow->mp_capable && !subflow->mp_join) {
-		subflow->tcp_data_ready(sk);
-
+	msk = mptcp_sk(parent);
+	if (inet_sk_state_load(sk) == TCP_LISTEN) {
+		set_bit(MPTCP_DATA_READY, &msk->flags);
 		parent->sk_data_ready(parent);
 		return;
 	}
 
+	WARN_ON_ONCE(!__mptcp_check_fallback(msk) && !subflow->mp_capable &&
+		     !subflow->mp_join);
+
 	if (mptcp_subflow_data_available(sk))
 		mptcp_data_ready(parent, sk);
 }
@@ -1117,7 +1132,7 @@ static void subflow_state_change(struct sock *sk)
 	 * a fin packet carrying a DSS can be unnoticed if we don't trigger
 	 * the data available machinery here.
 	 */
-	if (subflow->mp_capable && mptcp_subflow_data_available(sk))
+	if (mptcp_subflow_data_available(sk))
 		mptcp_data_ready(parent, sk);
 
 	if (!(parent->sk_shutdown & RCV_SHUTDOWN) &&
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 2/6] mptcp: fallback in case of simultaneous connect
  2020-06-29 20:26 [PATCH net-next 0/6] MPTCP: improve fallback to TCP Davide Caratti
  2020-06-29 20:26 ` [PATCH net-next 1/6] net: mptcp: " Davide Caratti
@ 2020-06-29 20:26 ` Davide Caratti
  2020-06-29 23:29   ` Mat Martineau
  2020-06-29 20:26 ` [PATCH net-next 3/6] mptcp: check for plain TCP sock at accept time Davide Caratti
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Davide Caratti @ 2020-06-29 20:26 UTC (permalink / raw)
  To: Mat Martineau, Matthieu Baerts, David S. Miller, Jakub Kicinski,
	netdev, mptcp

when a MPTCP client tries to connect to itself, tcp_finish_connect() is
never reached. Because of this, depending on the socket current state,
multiple faulty behaviours can be observed:

1) a WARN_ON() in subflow_data_ready() is hit
 WARNING: CPU: 2 PID: 882 at net/mptcp/subflow.c:911 subflow_data_ready+0x18b/0x230
 [...]
 CPU: 2 PID: 882 Comm: gh35 Not tainted 5.7.0+ #187
 [...]
 RIP: 0010:subflow_data_ready+0x18b/0x230
 [...]
 Call Trace:
  tcp_data_queue+0xd2f/0x4250
  tcp_rcv_state_process+0xb1c/0x49d3
  tcp_v4_do_rcv+0x2bc/0x790
  __release_sock+0x153/0x2d0
  release_sock+0x4f/0x170
  mptcp_shutdown+0x167/0x4e0
  __sys_shutdown+0xe6/0x180
  __x64_sys_shutdown+0x50/0x70
  do_syscall_64+0x9a/0x370
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

2) client is stuck forever in mptcp_sendmsg() because the socket is not
   TCP_ESTABLISHED

 crash> bt 4847
 PID: 4847   TASK: ffff88814b2fb100  CPU: 1   COMMAND: "gh35"
  #0 [ffff8881376ff680] __schedule at ffffffff97248da4
  #1 [ffff8881376ff778] schedule at ffffffff9724a34f
  #2 [ffff8881376ff7a0] schedule_timeout at ffffffff97252ba0
  #3 [ffff8881376ff8a8] wait_woken at ffffffff958ab4ba
  #4 [ffff8881376ff940] sk_stream_wait_connect at ffffffff96c2d859
  #5 [ffff8881376ffa28] mptcp_sendmsg at ffffffff97207fca
  #6 [ffff8881376ffbc0] sock_sendmsg at ffffffff96be1b5b
  #7 [ffff8881376ffbe8] sock_write_iter at ffffffff96be1daa
  #8 [ffff8881376ffce8] new_sync_write at ffffffff95e5cb52
  #9 [ffff8881376ffe50] vfs_write at ffffffff95e6547f
 #10 [ffff8881376ffe90] ksys_write at ffffffff95e65d26
 #11 [ffff8881376fff28] do_syscall_64 at ffffffff956088ba
 #12 [ffff8881376fff50] entry_SYSCALL_64_after_hwframe at ffffffff9740008c
     RIP: 00007f126f6956ed  RSP: 00007ffc2a320278  RFLAGS: 00000217
     RAX: ffffffffffffffda  RBX: 0000000020000044  RCX: 00007f126f6956ed
     RDX: 0000000000000004  RSI: 00000000004007b8  RDI: 0000000000000003
     RBP: 00007ffc2a3202a0   R8: 0000000000400720   R9: 0000000000400720
     R10: 0000000000400720  R11: 0000000000000217  R12: 00000000004004b0
     R13: 00007ffc2a320380  R14: 0000000000000000  R15: 0000000000000000
     ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b

3) tcpdump captures show that DSS is exchanged even when MP_CAPABLE handshake
   didn't complete.

 $ tcpdump -tnnr bad.pcap
 IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [S], seq 3208913911, win 65483, options [mss 65495,sackOK,TS val 3291706876 ecr 3291694721,nop,wscale 7,mptcp capable v1], length 0
 IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [S.], seq 3208913911, ack 3208913912, win 65483, options [mss 65495,sackOK,TS val 3291706876 ecr 3291706876,nop,wscale 7,mptcp capable v1], length 0
 IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [.], ack 1, win 512, options [nop,nop,TS val 3291706876 ecr 3291706876], length 0
 IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [F.], seq 1, ack 1, win 512, options [nop,nop,TS val 3291707876 ecr 3291706876,mptcp dss fin seq 0 subseq 0 len 1,nop,nop], length 0
 IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [.], ack 2, win 512, options [nop,nop,TS val 3291707876 ecr 3291707876], length 0

force a fallback to TCP in these cases, and adjust the main socket
state to avoid hanging in mptcp_sendmsg().

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/35
Reported-by: Christoph Paasch <cpaasch@apple.com>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
---
 net/mptcp/protocol.h | 10 ++++++++++
 net/mptcp/subflow.c  | 10 ++++++++++
 2 files changed, 20 insertions(+)

diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index a709df659ae0..1d05d9841b5c 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -490,4 +490,14 @@ static inline void mptcp_do_fallback(struct sock *sk)
 
 #define pr_fallback(a) pr_debug("%s:fallback to TCP (msk=%p)", __func__, a)
 
+static inline bool subflow_simultaneous_connect(struct sock *sk)
+{
+	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
+	struct sock *parent = subflow->conn;
+
+	return sk->sk_state == TCP_ESTABLISHED &&
+	       !mptcp_sk(parent)->pm.server_side &&
+	       !subflow->conn_finished;
+}
+
 #endif /* __MPTCP_PROTOCOL_H */
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index cb8a42ff4646..548f9e347ff5 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1128,6 +1128,16 @@ static void subflow_state_change(struct sock *sk)
 
 	__subflow_state_change(sk);
 
+	if (subflow_simultaneous_connect(sk)) {
+		mptcp_do_fallback(sk);
+		pr_fallback(mptcp_sk(parent));
+		subflow->conn_finished = 1;
+		if (inet_sk_state_load(parent) == TCP_SYN_SENT) {
+			inet_sk_state_store(parent, TCP_ESTABLISHED);
+			parent->sk_state_change(parent);
+		}
+	}
+
 	/* as recvmsg() does not acquire the subflow socket for ssk selection
 	 * a fin packet carrying a DSS can be unnoticed if we don't trigger
 	 * the data available machinery here.
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 3/6] mptcp: check for plain TCP sock at accept time
  2020-06-29 20:26 [PATCH net-next 0/6] MPTCP: improve fallback to TCP Davide Caratti
  2020-06-29 20:26 ` [PATCH net-next 1/6] net: mptcp: " Davide Caratti
  2020-06-29 20:26 ` [PATCH net-next 2/6] mptcp: fallback in case of simultaneous connect Davide Caratti
@ 2020-06-29 20:26 ` Davide Caratti
  2020-06-29 23:30   ` Mat Martineau
  2020-06-29 20:26 ` [PATCH net-next 4/6] mptcp: create first subflow at msk creation time Davide Caratti
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Davide Caratti @ 2020-06-29 20:26 UTC (permalink / raw)
  To: Mat Martineau, Matthieu Baerts, David S. Miller, Jakub Kicinski,
	netdev, mptcp

From: Paolo Abeni <pabeni@redhat.com>

This cleanup the code a bit and avoid corrupted states
on weird syscall sequence (accept(), connect()).

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
---
 net/mptcp/protocol.c | 69 +++++---------------------------------------
 1 file changed, 7 insertions(+), 62 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 84ae96be9837..dbeb6fe374f5 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -52,13 +52,10 @@ static struct socket *__mptcp_nmpc_socket(const struct mptcp_sock *msk)
 	return msk->subflow;
 }
 
-static struct socket *mptcp_is_tcpsk(struct sock *sk)
+static bool mptcp_is_tcpsk(struct sock *sk)
 {
 	struct socket *sock = sk->sk_socket;
 
-	if (sock->sk != sk)
-		return NULL;
-
 	if (unlikely(sk->sk_prot == &tcp_prot)) {
 		/* we are being invoked after mptcp_accept() has
 		 * accepted a non-mp-capable flow: sk is a tcp_sk,
@@ -68,27 +65,21 @@ static struct socket *mptcp_is_tcpsk(struct sock *sk)
 		 * bypass mptcp.
 		 */
 		sock->ops = &inet_stream_ops;
-		return sock;
+		return true;
 #if IS_ENABLED(CONFIG_MPTCP_IPV6)
 	} else if (unlikely(sk->sk_prot == &tcpv6_prot)) {
 		sock->ops = &inet6_stream_ops;
-		return sock;
+		return true;
 #endif
 	}
 
-	return NULL;
+	return false;
 }
 
 static struct socket *__mptcp_tcp_fallback(struct mptcp_sock *msk)
 {
-	struct socket *sock;
-
 	sock_owned_by_me((const struct sock *)msk);
 
-	sock = mptcp_is_tcpsk((struct sock *)msk);
-	if (unlikely(sock))
-		return sock;
-
 	if (likely(!__mptcp_check_fallback(msk)))
 		return NULL;
 
@@ -1466,7 +1457,6 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err,
 		return NULL;
 
 	pr_debug("msk=%p, subflow is mptcp=%d", msk, sk_is_mptcp(newsk));
-
 	if (sk_is_mptcp(newsk)) {
 		struct mptcp_subflow_context *subflow;
 		struct sock *new_mptcp_sock;
@@ -1821,42 +1811,6 @@ static int mptcp_stream_connect(struct socket *sock, struct sockaddr *uaddr,
 	return err;
 }
 
-static int mptcp_v4_getname(struct socket *sock, struct sockaddr *uaddr,
-			    int peer)
-{
-	if (sock->sk->sk_prot == &tcp_prot) {
-		/* we are being invoked from __sys_accept4, after
-		 * mptcp_accept() has just accepted a non-mp-capable
-		 * flow: sk is a tcp_sk, not an mptcp one.
-		 *
-		 * Hand the socket over to tcp so all further socket ops
-		 * bypass mptcp.
-		 */
-		sock->ops = &inet_stream_ops;
-	}
-
-	return inet_getname(sock, uaddr, peer);
-}
-
-#if IS_ENABLED(CONFIG_MPTCP_IPV6)
-static int mptcp_v6_getname(struct socket *sock, struct sockaddr *uaddr,
-			    int peer)
-{
-	if (sock->sk->sk_prot == &tcpv6_prot) {
-		/* we are being invoked from __sys_accept4 after
-		 * mptcp_accept() has accepted a non-mp-capable
-		 * subflow: sk is a tcp_sk, not mptcp.
-		 *
-		 * Hand the socket over to tcp so all further
-		 * socket ops bypass mptcp.
-		 */
-		sock->ops = &inet6_stream_ops;
-	}
-
-	return inet6_getname(sock, uaddr, peer);
-}
-#endif
-
 static int mptcp_listen(struct socket *sock, int backlog)
 {
 	struct mptcp_sock *msk = mptcp_sk(sock->sk);
@@ -1885,15 +1839,6 @@ static int mptcp_listen(struct socket *sock, int backlog)
 	return err;
 }
 
-static bool is_tcp_proto(const struct proto *p)
-{
-#if IS_ENABLED(CONFIG_MPTCP_IPV6)
-	return p == &tcp_prot || p == &tcpv6_prot;
-#else
-	return p == &tcp_prot;
-#endif
-}
-
 static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
 			       int flags, bool kern)
 {
@@ -1915,7 +1860,7 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
 	release_sock(sock->sk);
 
 	err = ssock->ops->accept(sock, newsock, flags, kern);
-	if (err == 0 && !is_tcp_proto(newsock->sk->sk_prot)) {
+	if (err == 0 && !mptcp_is_tcpsk(newsock->sk)) {
 		struct mptcp_sock *msk = mptcp_sk(newsock->sk);
 		struct mptcp_subflow_context *subflow;
 
@@ -2011,7 +1956,7 @@ static const struct proto_ops mptcp_stream_ops = {
 	.connect	   = mptcp_stream_connect,
 	.socketpair	   = sock_no_socketpair,
 	.accept		   = mptcp_stream_accept,
-	.getname	   = mptcp_v4_getname,
+	.getname	   = inet_getname,
 	.poll		   = mptcp_poll,
 	.ioctl		   = inet_ioctl,
 	.gettstamp	   = sock_gettstamp,
@@ -2065,7 +2010,7 @@ static const struct proto_ops mptcp_v6_stream_ops = {
 	.connect	   = mptcp_stream_connect,
 	.socketpair	   = sock_no_socketpair,
 	.accept		   = mptcp_stream_accept,
-	.getname	   = mptcp_v6_getname,
+	.getname	   = inet6_getname,
 	.poll		   = mptcp_poll,
 	.ioctl		   = inet6_ioctl,
 	.gettstamp	   = sock_gettstamp,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 4/6] mptcp: create first subflow at msk creation time
  2020-06-29 20:26 [PATCH net-next 0/6] MPTCP: improve fallback to TCP Davide Caratti
                   ` (2 preceding siblings ...)
  2020-06-29 20:26 ` [PATCH net-next 3/6] mptcp: check for plain TCP sock at accept time Davide Caratti
@ 2020-06-29 20:26 ` Davide Caratti
  2020-06-29 23:30   ` Mat Martineau
  2020-06-29 20:26 ` [PATCH net-next 5/6] mptcp: __mptcp_tcp_fallback() returns a struct sock Davide Caratti
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Davide Caratti @ 2020-06-29 20:26 UTC (permalink / raw)
  To: Mat Martineau, Matthieu Baerts, David S. Miller, Jakub Kicinski,
	netdev, mptcp

From: Paolo Abeni <pabeni@redhat.com>

This cleans the code a bit and makes the behavior more consistent.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
---
 net/mptcp/protocol.c | 53 +++++++++++++++++---------------------------
 1 file changed, 20 insertions(+), 33 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index dbeb6fe374f5..ad619bda71cc 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -86,32 +86,16 @@ static struct socket *__mptcp_tcp_fallback(struct mptcp_sock *msk)
 	return msk->subflow;
 }
 
-static bool __mptcp_can_create_subflow(const struct mptcp_sock *msk)
-{
-	return !msk->first;
-}
-
-static struct socket *__mptcp_socket_create(struct mptcp_sock *msk, int state)
+static int __mptcp_socket_create(struct mptcp_sock *msk)
 {
 	struct mptcp_subflow_context *subflow;
 	struct sock *sk = (struct sock *)msk;
 	struct socket *ssock;
 	int err;
 
-	ssock = __mptcp_tcp_fallback(msk);
-	if (unlikely(ssock))
-		return ssock;
-
-	ssock = __mptcp_nmpc_socket(msk);
-	if (ssock)
-		goto set_state;
-
-	if (!__mptcp_can_create_subflow(msk))
-		return ERR_PTR(-EINVAL);
-
 	err = mptcp_subflow_create_socket(sk, &ssock);
 	if (err)
-		return ERR_PTR(err);
+		return err;
 
 	msk->first = ssock->sk;
 	msk->subflow = ssock;
@@ -124,10 +108,7 @@ static struct socket *__mptcp_socket_create(struct mptcp_sock *msk, int state)
 	 */
 	RCU_INIT_POINTER(msk->first->sk_wq, &sk->sk_socket->wq);
 
-set_state:
-	if (state != MPTCP_SAME_STATE)
-		inet_sk_state_store(sk, state);
-	return ssock;
+	return 0;
 }
 
 static void __mptcp_move_skb(struct mptcp_sock *msk, struct sock *ssk,
@@ -1255,6 +1236,10 @@ static int mptcp_init_sock(struct sock *sk)
 	if (ret)
 		return ret;
 
+	ret = __mptcp_socket_create(mptcp_sk(sk));
+	if (ret)
+		return ret;
+
 	sk_sockets_allocated_inc(sk);
 	sk->sk_sndbuf = sock_net(sk)->ipv4.sysctl_tcp_wmem[2];
 
@@ -1744,9 +1729,9 @@ static int mptcp_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 	int err;
 
 	lock_sock(sock->sk);
-	ssock = __mptcp_socket_create(msk, MPTCP_SAME_STATE);
-	if (IS_ERR(ssock)) {
-		err = PTR_ERR(ssock);
+	ssock = __mptcp_nmpc_socket(msk);
+	if (!ssock) {
+		err = -EINVAL;
 		goto unlock;
 	}
 
@@ -1776,13 +1761,14 @@ static int mptcp_stream_connect(struct socket *sock, struct sockaddr *uaddr,
 		goto do_connect;
 	}
 
-	mptcp_token_destroy(msk);
-	ssock = __mptcp_socket_create(msk, TCP_SYN_SENT);
-	if (IS_ERR(ssock)) {
-		err = PTR_ERR(ssock);
+	ssock = __mptcp_nmpc_socket(msk);
+	if (!ssock) {
+		err = -EINVAL;
 		goto unlock;
 	}
 
+	mptcp_token_destroy(msk);
+	inet_sk_state_store(sock->sk, TCP_SYN_SENT);
 	subflow = mptcp_subflow_ctx(ssock->sk);
 #ifdef CONFIG_TCP_MD5SIG
 	/* no MPTCP if MD5SIG is enabled on this socket or we may run out of
@@ -1820,13 +1806,14 @@ static int mptcp_listen(struct socket *sock, int backlog)
 	pr_debug("msk=%p", msk);
 
 	lock_sock(sock->sk);
-	mptcp_token_destroy(msk);
-	ssock = __mptcp_socket_create(msk, TCP_LISTEN);
-	if (IS_ERR(ssock)) {
-		err = PTR_ERR(ssock);
+	ssock = __mptcp_nmpc_socket(msk);
+	if (!ssock) {
+		err = -EINVAL;
 		goto unlock;
 	}
 
+	mptcp_token_destroy(msk);
+	inet_sk_state_store(sock->sk, TCP_LISTEN);
 	sock_set_flag(sock->sk, SOCK_RCU_FREE);
 
 	err = ssock->ops->listen(ssock, backlog);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 5/6] mptcp: __mptcp_tcp_fallback() returns a struct sock
  2020-06-29 20:26 [PATCH net-next 0/6] MPTCP: improve fallback to TCP Davide Caratti
                   ` (3 preceding siblings ...)
  2020-06-29 20:26 ` [PATCH net-next 4/6] mptcp: create first subflow at msk creation time Davide Caratti
@ 2020-06-29 20:26 ` Davide Caratti
  2020-06-29 23:30   ` Mat Martineau
  2020-06-29 20:26 ` [PATCH net-next 6/6] mptcp: close poll() races Davide Caratti
  2020-06-30  0:29 ` [PATCH net-next 0/6] MPTCP: improve fallback to TCP David Miller
  6 siblings, 1 reply; 14+ messages in thread
From: Davide Caratti @ 2020-06-29 20:26 UTC (permalink / raw)
  To: Mat Martineau, Matthieu Baerts, David S. Miller, Jakub Kicinski,
	netdev, mptcp

From: Paolo Abeni <pabeni@redhat.com>

Currently __mptcp_tcp_fallback() always return NULL
on incoming connections, because MPTCP does not create
the additional socket for the first subflow.
Since the previous commit no __mptcp_tcp_fallback()
caller needs a struct socket, so let __mptcp_tcp_fallback()
return the first subflow sock and cope correctly even with
incoming connections.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
---
 net/mptcp/protocol.c | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index ad619bda71cc..f2b2bd37e371 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -76,14 +76,14 @@ static bool mptcp_is_tcpsk(struct sock *sk)
 	return false;
 }
 
-static struct socket *__mptcp_tcp_fallback(struct mptcp_sock *msk)
+static struct sock *__mptcp_tcp_fallback(struct mptcp_sock *msk)
 {
 	sock_owned_by_me((const struct sock *)msk);
 
 	if (likely(!__mptcp_check_fallback(msk)))
 		return NULL;
 
-	return msk->subflow;
+	return msk->first;
 }
 
 static int __mptcp_socket_create(struct mptcp_sock *msk)
@@ -1498,7 +1498,7 @@ static int mptcp_setsockopt(struct sock *sk, int level, int optname,
 			    char __user *optval, unsigned int optlen)
 {
 	struct mptcp_sock *msk = mptcp_sk(sk);
-	struct socket *ssock;
+	struct sock *ssk;
 
 	pr_debug("msk=%p", msk);
 
@@ -1509,11 +1509,10 @@ static int mptcp_setsockopt(struct sock *sk, int level, int optname,
 	 * to the one remaining subflow.
 	 */
 	lock_sock(sk);
-	ssock = __mptcp_tcp_fallback(msk);
+	ssk = __mptcp_tcp_fallback(msk);
 	release_sock(sk);
-	if (ssock)
-		return tcp_setsockopt(ssock->sk, level, optname, optval,
-				      optlen);
+	if (ssk)
+		return tcp_setsockopt(ssk, level, optname, optval, optlen);
 
 	return -EOPNOTSUPP;
 }
@@ -1522,7 +1521,7 @@ static int mptcp_getsockopt(struct sock *sk, int level, int optname,
 			    char __user *optval, int __user *option)
 {
 	struct mptcp_sock *msk = mptcp_sk(sk);
-	struct socket *ssock;
+	struct sock *ssk;
 
 	pr_debug("msk=%p", msk);
 
@@ -1533,11 +1532,10 @@ static int mptcp_getsockopt(struct sock *sk, int level, int optname,
 	 * to the one remaining subflow.
 	 */
 	lock_sock(sk);
-	ssock = __mptcp_tcp_fallback(msk);
+	ssk = __mptcp_tcp_fallback(msk);
 	release_sock(sk);
-	if (ssock)
-		return tcp_getsockopt(ssock->sk, level, optname, optval,
-				      option);
+	if (ssk)
+		return tcp_getsockopt(ssk, level, optname, optval, option);
 
 	return -EOPNOTSUPP;
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 6/6] mptcp: close poll() races
  2020-06-29 20:26 [PATCH net-next 0/6] MPTCP: improve fallback to TCP Davide Caratti
                   ` (4 preceding siblings ...)
  2020-06-29 20:26 ` [PATCH net-next 5/6] mptcp: __mptcp_tcp_fallback() returns a struct sock Davide Caratti
@ 2020-06-29 20:26 ` Davide Caratti
  2020-06-29 23:31   ` Mat Martineau
  2020-06-30  0:29 ` [PATCH net-next 0/6] MPTCP: improve fallback to TCP David Miller
  6 siblings, 1 reply; 14+ messages in thread
From: Davide Caratti @ 2020-06-29 20:26 UTC (permalink / raw)
  To: Mat Martineau, Matthieu Baerts, David S. Miller, Jakub Kicinski,
	netdev, mptcp

From: Paolo Abeni <pabeni@redhat.com>

mptcp_poll always return POLLOUT for unblocking
connect(), ensure that the socket is a suitable
state.
The MPTCP_DATA_READY bit is never cleared on accept:
ensure we don't leave mptcp_accept() with an empty
accept queue and such bit set.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
---
 net/mptcp/protocol.c | 25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index f2b2bd37e371..28ec26d97f96 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1841,6 +1841,7 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
 	if (!ssock)
 		goto unlock_fail;
 
+	clear_bit(MPTCP_DATA_READY, &msk->flags);
 	sock_hold(ssock->sk);
 	release_sock(sock->sk);
 
@@ -1861,6 +1862,8 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
 		}
 	}
 
+	if (inet_csk_listen_poll(ssock->sk))
+		set_bit(MPTCP_DATA_READY, &msk->flags);
 	sock_put(ssock->sk);
 	return err;
 
@@ -1869,21 +1872,33 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
 	return -EINVAL;
 }
 
+static __poll_t mptcp_check_readable(struct mptcp_sock *msk)
+{
+	return test_bit(MPTCP_DATA_READY, &msk->flags) ? EPOLLIN | EPOLLRDNORM :
+	       0;
+}
+
 static __poll_t mptcp_poll(struct file *file, struct socket *sock,
 			   struct poll_table_struct *wait)
 {
 	struct sock *sk = sock->sk;
 	struct mptcp_sock *msk;
 	__poll_t mask = 0;
+	int state;
 
 	msk = mptcp_sk(sk);
 	sock_poll_wait(file, sock, wait);
 
-	if (test_bit(MPTCP_DATA_READY, &msk->flags))
-		mask = EPOLLIN | EPOLLRDNORM;
-	if (sk_stream_is_writeable(sk) &&
-	    test_bit(MPTCP_SEND_SPACE, &msk->flags))
-		mask |= EPOLLOUT | EPOLLWRNORM;
+	state = inet_sk_state_load(sk);
+	if (state == TCP_LISTEN)
+		return mptcp_check_readable(msk);
+
+	if (state != TCP_SYN_SENT && state != TCP_SYN_RECV) {
+		mask |= mptcp_check_readable(msk);
+		if (sk_stream_is_writeable(sk) &&
+		    test_bit(MPTCP_SEND_SPACE, &msk->flags))
+			mask |= EPOLLOUT | EPOLLWRNORM;
+	}
 	if (sk->sk_shutdown & RCV_SHUTDOWN)
 		mask |= EPOLLIN | EPOLLRDNORM | EPOLLRDHUP;
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 1/6] net: mptcp: improve fallback to TCP
  2020-06-29 20:26 ` [PATCH net-next 1/6] net: mptcp: " Davide Caratti
@ 2020-06-29 23:29   ` Mat Martineau
  0 siblings, 0 replies; 14+ messages in thread
From: Mat Martineau @ 2020-06-29 23:29 UTC (permalink / raw)
  To: Davide Caratti
  Cc: Matthieu Baerts, David S. Miller, Jakub Kicinski, netdev, mptcp

On Mon, 29 Jun 2020, Davide Caratti wrote:

> Keep using MPTCP sockets and a use "dummy mapping" in case of fallback
> to regular TCP. When fallback is triggered, skip addition of the MPTCP
> option on send.
>
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/11
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/22
> Co-developed-by: Paolo Abeni <pabeni@redhat.com>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>
> ---
> net/mptcp/options.c  |  9 +++-
> net/mptcp/protocol.c | 98 ++++++++++++--------------------------------
> net/mptcp/protocol.h | 33 +++++++++++++++
> net/mptcp/subflow.c  | 47 +++++++++++++--------
> 4 files changed, 98 insertions(+), 89 deletions(-)
>

Thanks Davide and Paolo! Appreciate the simplification of the fallback 
code.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 2/6] mptcp: fallback in case of simultaneous connect
  2020-06-29 20:26 ` [PATCH net-next 2/6] mptcp: fallback in case of simultaneous connect Davide Caratti
@ 2020-06-29 23:29   ` Mat Martineau
  0 siblings, 0 replies; 14+ messages in thread
From: Mat Martineau @ 2020-06-29 23:29 UTC (permalink / raw)
  To: Davide Caratti
  Cc: Matthieu Baerts, David S. Miller, Jakub Kicinski, netdev, mptcp

On Mon, 29 Jun 2020, Davide Caratti wrote:

> when a MPTCP client tries to connect to itself, tcp_finish_connect() is
> never reached. Because of this, depending on the socket current state,
> multiple faulty behaviours can be observed:
>
> 1) a WARN_ON() in subflow_data_ready() is hit
> WARNING: CPU: 2 PID: 882 at net/mptcp/subflow.c:911 subflow_data_ready+0x18b/0x230
> [...]
> CPU: 2 PID: 882 Comm: gh35 Not tainted 5.7.0+ #187
> [...]
> RIP: 0010:subflow_data_ready+0x18b/0x230
> [...]
> Call Trace:
>  tcp_data_queue+0xd2f/0x4250
>  tcp_rcv_state_process+0xb1c/0x49d3
>  tcp_v4_do_rcv+0x2bc/0x790
>  __release_sock+0x153/0x2d0
>  release_sock+0x4f/0x170
>  mptcp_shutdown+0x167/0x4e0
>  __sys_shutdown+0xe6/0x180
>  __x64_sys_shutdown+0x50/0x70
>  do_syscall_64+0x9a/0x370
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> 2) client is stuck forever in mptcp_sendmsg() because the socket is not
>   TCP_ESTABLISHED
>
> crash> bt 4847
> PID: 4847   TASK: ffff88814b2fb100  CPU: 1   COMMAND: "gh35"
>  #0 [ffff8881376ff680] __schedule at ffffffff97248da4
>  #1 [ffff8881376ff778] schedule at ffffffff9724a34f
>  #2 [ffff8881376ff7a0] schedule_timeout at ffffffff97252ba0
>  #3 [ffff8881376ff8a8] wait_woken at ffffffff958ab4ba
>  #4 [ffff8881376ff940] sk_stream_wait_connect at ffffffff96c2d859
>  #5 [ffff8881376ffa28] mptcp_sendmsg at ffffffff97207fca
>  #6 [ffff8881376ffbc0] sock_sendmsg at ffffffff96be1b5b
>  #7 [ffff8881376ffbe8] sock_write_iter at ffffffff96be1daa
>  #8 [ffff8881376ffce8] new_sync_write at ffffffff95e5cb52
>  #9 [ffff8881376ffe50] vfs_write at ffffffff95e6547f
> #10 [ffff8881376ffe90] ksys_write at ffffffff95e65d26
> #11 [ffff8881376fff28] do_syscall_64 at ffffffff956088ba
> #12 [ffff8881376fff50] entry_SYSCALL_64_after_hwframe at ffffffff9740008c
>     RIP: 00007f126f6956ed  RSP: 00007ffc2a320278  RFLAGS: 00000217
>     RAX: ffffffffffffffda  RBX: 0000000020000044  RCX: 00007f126f6956ed
>     RDX: 0000000000000004  RSI: 00000000004007b8  RDI: 0000000000000003
>     RBP: 00007ffc2a3202a0   R8: 0000000000400720   R9: 0000000000400720
>     R10: 0000000000400720  R11: 0000000000000217  R12: 00000000004004b0
>     R13: 00007ffc2a320380  R14: 0000000000000000  R15: 0000000000000000
>     ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
>
> 3) tcpdump captures show that DSS is exchanged even when MP_CAPABLE handshake
>   didn't complete.
>
> $ tcpdump -tnnr bad.pcap
> IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [S], seq 3208913911, win 65483, options [mss 65495,sackOK,TS val 3291706876 ecr 3291694721,nop,wscale 7,mptcp capable v1], length 0
> IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [S.], seq 3208913911, ack 3208913912, win 65483, options [mss 65495,sackOK,TS val 3291706876 ecr 3291706876,nop,wscale 7,mptcp capable v1], length 0
> IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [.], ack 1, win 512, options [nop,nop,TS val 3291706876 ecr 3291706876], length 0
> IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [F.], seq 1, ack 1, win 512, options [nop,nop,TS val 3291707876 ecr 3291706876,mptcp dss fin seq 0 subseq 0 len 1,nop,nop], length 0
> IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [.], ack 2, win 512, options [nop,nop,TS val 3291707876 ecr 3291707876], length 0
>
> force a fallback to TCP in these cases, and adjust the main socket
> state to avoid hanging in mptcp_sendmsg().
>
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/35
> Reported-by: Christoph Paasch <cpaasch@apple.com>
> Suggested-by: Paolo Abeni <pabeni@redhat.com>
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>
> ---
> net/mptcp/protocol.h | 10 ++++++++++
> net/mptcp/subflow.c  | 10 ++++++++++
> 2 files changed, 20 insertions(+)

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 3/6] mptcp: check for plain TCP sock at accept time
  2020-06-29 20:26 ` [PATCH net-next 3/6] mptcp: check for plain TCP sock at accept time Davide Caratti
@ 2020-06-29 23:30   ` Mat Martineau
  0 siblings, 0 replies; 14+ messages in thread
From: Mat Martineau @ 2020-06-29 23:30 UTC (permalink / raw)
  To: Davide Caratti
  Cc: Matthieu Baerts, David S. Miller, Jakub Kicinski, netdev, mptcp

On Mon, 29 Jun 2020, Davide Caratti wrote:

> From: Paolo Abeni <pabeni@redhat.com>
>
> This cleanup the code a bit and avoid corrupted states
> on weird syscall sequence (accept(), connect()).
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>
> ---
> net/mptcp/protocol.c | 69 +++++---------------------------------------
> 1 file changed, 7 insertions(+), 62 deletions(-)
>

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 4/6] mptcp: create first subflow at msk creation time
  2020-06-29 20:26 ` [PATCH net-next 4/6] mptcp: create first subflow at msk creation time Davide Caratti
@ 2020-06-29 23:30   ` Mat Martineau
  0 siblings, 0 replies; 14+ messages in thread
From: Mat Martineau @ 2020-06-29 23:30 UTC (permalink / raw)
  To: Davide Caratti
  Cc: Matthieu Baerts, David S. Miller, Jakub Kicinski, netdev, mptcp

On Mon, 29 Jun 2020, Davide Caratti wrote:

> From: Paolo Abeni <pabeni@redhat.com>
>
> This cleans the code a bit and makes the behavior more consistent.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>
> ---
> net/mptcp/protocol.c | 53 +++++++++++++++++---------------------------
> 1 file changed, 20 insertions(+), 33 deletions(-)
>

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 5/6] mptcp: __mptcp_tcp_fallback() returns a struct sock
  2020-06-29 20:26 ` [PATCH net-next 5/6] mptcp: __mptcp_tcp_fallback() returns a struct sock Davide Caratti
@ 2020-06-29 23:30   ` Mat Martineau
  0 siblings, 0 replies; 14+ messages in thread
From: Mat Martineau @ 2020-06-29 23:30 UTC (permalink / raw)
  To: Davide Caratti
  Cc: Matthieu Baerts, David S. Miller, Jakub Kicinski, netdev, mptcp

On Mon, 29 Jun 2020, Davide Caratti wrote:

> From: Paolo Abeni <pabeni@redhat.com>
>
> Currently __mptcp_tcp_fallback() always return NULL
> on incoming connections, because MPTCP does not create
> the additional socket for the first subflow.
> Since the previous commit no __mptcp_tcp_fallback()
> caller needs a struct socket, so let __mptcp_tcp_fallback()
> return the first subflow sock and cope correctly even with
> incoming connections.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>
> ---
> net/mptcp/protocol.c | 22 ++++++++++------------
> 1 file changed, 10 insertions(+), 12 deletions(-)
>

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 6/6] mptcp: close poll() races
  2020-06-29 20:26 ` [PATCH net-next 6/6] mptcp: close poll() races Davide Caratti
@ 2020-06-29 23:31   ` Mat Martineau
  0 siblings, 0 replies; 14+ messages in thread
From: Mat Martineau @ 2020-06-29 23:31 UTC (permalink / raw)
  To: Davide Caratti
  Cc: Matthieu Baerts, David S. Miller, Jakub Kicinski, netdev, mptcp

On Mon, 29 Jun 2020, Davide Caratti wrote:

> From: Paolo Abeni <pabeni@redhat.com>
>
> mptcp_poll always return POLLOUT for unblocking
> connect(), ensure that the socket is a suitable
> state.
> The MPTCP_DATA_READY bit is never cleared on accept:
> ensure we don't leave mptcp_accept() with an empty
> accept queue and such bit set.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>
> ---
> net/mptcp/protocol.c | 25 ++++++++++++++++++++-----
> 1 file changed, 20 insertions(+), 5 deletions(-)
>

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 0/6] MPTCP: improve fallback to TCP
  2020-06-29 20:26 [PATCH net-next 0/6] MPTCP: improve fallback to TCP Davide Caratti
                   ` (5 preceding siblings ...)
  2020-06-29 20:26 ` [PATCH net-next 6/6] mptcp: close poll() races Davide Caratti
@ 2020-06-30  0:29 ` David Miller
  6 siblings, 0 replies; 14+ messages in thread
From: David Miller @ 2020-06-30  0:29 UTC (permalink / raw)
  To: dcaratti; +Cc: mathew.j.martineau, matthieu.baerts, kuba, netdev, mptcp

From: Davide Caratti <dcaratti@redhat.com>
Date: Mon, 29 Jun 2020 22:26:19 +0200

> there are situations where MPTCP sockets should fall-back to regular TCP:
> this series reworks the fallback code to pursue the following goals:
> 
> 1) cleanup the non fallback code, removing most of 'if (<fallback>)' in
>    the data path
> 2) improve performance for non-fallback sockets, avoiding locks in poll()
> 
> further work will also leverage on this changes to achieve:
> 
> a) more consistent behavior of gestockopt()/setsockopt() on passive sockets
>    after fallback
> b) support for "infinite maps" as per RFC8684, section 3.7
> 
> the series is made of the following items:
> 
> - patch 1 lets sendmsg() / recvmsg() / poll() use the main socket also
>   after fallback
> - patch 2 fixes 'simultaneous connect' scenario after fallback. The
>   problem was present also before the rework, but the fix is much easier
>   to implement after patch 1
> - patch 3, 4, 5 are clean-ups for code that is no more needed after the
>   fallback rework
> - patch 6 fixes a race condition between close() and poll(). The problem
>   was theoretically present before the rework, but it became almost
>   systematic after patch 1

Series applied to net-next, thank you.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-06-30  0:29 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-29 20:26 [PATCH net-next 0/6] MPTCP: improve fallback to TCP Davide Caratti
2020-06-29 20:26 ` [PATCH net-next 1/6] net: mptcp: " Davide Caratti
2020-06-29 23:29   ` Mat Martineau
2020-06-29 20:26 ` [PATCH net-next 2/6] mptcp: fallback in case of simultaneous connect Davide Caratti
2020-06-29 23:29   ` Mat Martineau
2020-06-29 20:26 ` [PATCH net-next 3/6] mptcp: check for plain TCP sock at accept time Davide Caratti
2020-06-29 23:30   ` Mat Martineau
2020-06-29 20:26 ` [PATCH net-next 4/6] mptcp: create first subflow at msk creation time Davide Caratti
2020-06-29 23:30   ` Mat Martineau
2020-06-29 20:26 ` [PATCH net-next 5/6] mptcp: __mptcp_tcp_fallback() returns a struct sock Davide Caratti
2020-06-29 23:30   ` Mat Martineau
2020-06-29 20:26 ` [PATCH net-next 6/6] mptcp: close poll() races Davide Caratti
2020-06-29 23:31   ` Mat Martineau
2020-06-30  0:29 ` [PATCH net-next 0/6] MPTCP: improve fallback to TCP David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).