All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 mptcp-next 0/4] mptcp: improve subflow creation on errors
@ 2021-11-26 12:19 Paolo Abeni
  2021-11-26 12:19 ` [PATCH v2 mptcp-next 1/4] mptcp: clean-up MPJ option writing Paolo Abeni
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Paolo Abeni @ 2021-11-26 12:19 UTC (permalink / raw)
  To: mptcp

This iteration just addresses a compile warning spotted by the
CI.

Currently when a subflow connection fails - either the TCP
connection is reset by the peer, or the MPJ handshake never
completes - the in kernel PM don't perform any further action.
Notably no additional subflow creation is attempted.

This series is aimed at improving the in-kernel path manager
behavior in the above scenario:
- each endpoint is used only once per connection (patch 2/4)
- on failure we try to move to the next subflow, if any
  (patch 3/4)

The first patch is minor cleanup, and the last patch adds
specific self-tests.

This should address/close:
https://github.com/multipath-tcp/mptcp_net-next/issues/235
https://github.com/multipath-tcp/mptcp_net-next/issues/242

Paolo Abeni (4):
  mptcp: clean-up MPJ option writing.
  mptcp: keep track of used local endpoint
  mptcp: do not block subflows creation on errors
  selftests: mptcp: add tests for subflow creation failure

 net/mptcp/options.c                           | 65 ++++++++++++-------
 net/mptcp/pm.c                                | 28 +++++++-
 net/mptcp/pm_netlink.c                        |  8 ++-
 net/mptcp/protocol.c                          |  6 ++
 net/mptcp/protocol.h                          |  5 +-
 net/mptcp/subflow.c                           |  5 +-
 tools/testing/selftests/net/mptcp/config      |  1 +
 .../testing/selftests/net/mptcp/mptcp_join.sh | 65 ++++++++++++++++++-
 8 files changed, 153 insertions(+), 30 deletions(-)

-- 
2.33.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 mptcp-next 1/4] mptcp: clean-up MPJ option writing.
  2021-11-26 12:19 [PATCH v2 mptcp-next 0/4] mptcp: improve subflow creation on errors Paolo Abeni
@ 2021-11-26 12:19 ` Paolo Abeni
  2021-11-26 12:19 ` [PATCH v2 mptcp-next 2/4] mptcp: keep track of used local endpoint Paolo Abeni
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Paolo Abeni @ 2021-11-26 12:19 UTC (permalink / raw)
  To: mptcp

Check for all MPJ variant at once, this reduces the number
of conditionals traversed on average and will simplify the
next patch.

No functional change intended.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/options.c | 44 +++++++++++++++++++++++---------------------
 1 file changed, 23 insertions(+), 21 deletions(-)

diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 8a1020e4285c..7a6a39b71633 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -1381,27 +1381,29 @@ void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp,
 
 		/* MPC is additionally mutually exclusive with MP_PRIO */
 		goto mp_capable_done;
-	} else if (OPTION_MPTCP_MPJ_SYN & opts->suboptions) {
-		*ptr++ = mptcp_option(MPTCPOPT_MP_JOIN,
-				      TCPOLEN_MPTCP_MPJ_SYN,
-				      opts->backup, opts->join_id);
-		put_unaligned_be32(opts->token, ptr);
-		ptr += 1;
-		put_unaligned_be32(opts->nonce, ptr);
-		ptr += 1;
-	} else if (OPTION_MPTCP_MPJ_SYNACK & opts->suboptions) {
-		*ptr++ = mptcp_option(MPTCPOPT_MP_JOIN,
-				      TCPOLEN_MPTCP_MPJ_SYNACK,
-				      opts->backup, opts->join_id);
-		put_unaligned_be64(opts->thmac, ptr);
-		ptr += 2;
-		put_unaligned_be32(opts->nonce, ptr);
-		ptr += 1;
-	} else if (OPTION_MPTCP_MPJ_ACK & opts->suboptions) {
-		*ptr++ = mptcp_option(MPTCPOPT_MP_JOIN,
-				      TCPOLEN_MPTCP_MPJ_ACK, 0, 0);
-		memcpy(ptr, opts->hmac, MPTCPOPT_HMAC_LEN);
-		ptr += 5;
+	} else if (OPTIONS_MPTCP_MPJ & opts->suboptions) {
+		if (OPTION_MPTCP_MPJ_SYN & opts->suboptions) {
+			*ptr++ = mptcp_option(MPTCPOPT_MP_JOIN,
+					      TCPOLEN_MPTCP_MPJ_SYN,
+					      opts->backup, opts->join_id);
+			put_unaligned_be32(opts->token, ptr);
+			ptr += 1;
+			put_unaligned_be32(opts->nonce, ptr);
+			ptr += 1;
+		} else if (OPTION_MPTCP_MPJ_SYNACK & opts->suboptions) {
+			*ptr++ = mptcp_option(MPTCPOPT_MP_JOIN,
+					      TCPOLEN_MPTCP_MPJ_SYNACK,
+					      opts->backup, opts->join_id);
+			put_unaligned_be64(opts->thmac, ptr);
+			ptr += 2;
+			put_unaligned_be32(opts->nonce, ptr);
+			ptr += 1;
+		} else {
+			*ptr++ = mptcp_option(MPTCPOPT_MP_JOIN,
+					      TCPOLEN_MPTCP_MPJ_ACK, 0, 0);
+			memcpy(ptr, opts->hmac, MPTCPOPT_HMAC_LEN);
+			ptr += 5;
+		}
 	} else if (OPTION_MPTCP_ADD_ADDR & opts->suboptions) {
 		u8 len = TCPOLEN_MPTCP_ADD_ADDR_BASE;
 		u8 echo = MPTCP_ADDR_ECHO;
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 mptcp-next 2/4] mptcp: keep track of used local endpoint
  2021-11-26 12:19 [PATCH v2 mptcp-next 0/4] mptcp: improve subflow creation on errors Paolo Abeni
  2021-11-26 12:19 ` [PATCH v2 mptcp-next 1/4] mptcp: clean-up MPJ option writing Paolo Abeni
@ 2021-11-26 12:19 ` Paolo Abeni
  2021-11-26 17:53   ` Matthieu Baerts
  2021-11-26 12:19 ` [PATCH v2 mptcp-next 3/4] mptcp: do not block subflows creation on errors Paolo Abeni
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Paolo Abeni @ 2021-11-26 12:19 UTC (permalink / raw)
  To: mptcp

So that we can skip already used endpoint at local address
selection time.
This allows for fair local endpoints usage in case of subflow
failure.

As a side effect, this patch also enforces that each endpoint
is used at most once for each mptcp connection.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/pm.c         | 1 +
 net/mptcp/pm_netlink.c | 8 +++++++-
 net/mptcp/protocol.h   | 1 +
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
index 761995a34124..1ff856dd92c4 100644
--- a/net/mptcp/pm.c
+++ b/net/mptcp/pm.c
@@ -376,6 +376,7 @@ void mptcp_pm_data_reset(struct mptcp_sock *msk)
 	WRITE_ONCE(msk->pm.accept_subflow, false);
 	WRITE_ONCE(msk->pm.remote_deny_join_id0, false);
 	msk->pm.status = 0;
+	bitmap_zero(msk->pm.endpoint_usage_mask, 256);
 
 	mptcp_pm_nl_data_init(msk);
 }
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 3186d33b5208..78b68a8216b6 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -173,6 +173,9 @@ select_local_address(const struct pm_nl_pernet *pernet,
 		if (!(entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW))
 			continue;
 
+		if (test_bit(entry->addr.id, msk->pm.endpoint_usage_mask))
+			continue;
+
 		if (entry->addr.family != sk->sk_family) {
 #if IS_ENABLED(CONFIG_MPTCP_IPV6)
 			if ((entry->addr.family == AF_INET &&
@@ -512,8 +515,10 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
 			check_work_pending(msk);
 			nr = fill_remote_addresses_vec(msk, fullmesh, addrs);
 			spin_unlock_bh(&msk->pm.lock);
-			for (i = 0; i < nr; i++)
+			for (i = 0; i < nr; i++) {
+				set_bit(local->addr.id, msk->pm.endpoint_usage_mask);
 				__mptcp_subflow_connect(sk, &local->addr, &addrs[i]);
+			}
 			spin_lock_bh(&msk->pm.lock);
 			return;
 		}
@@ -1282,6 +1287,7 @@ static int mptcp_nl_remove_subflow_and_signal_addr(struct net *net,
 		mptcp_pm_remove_anno_addr(msk, addr, remove_subflow);
 		if (remove_subflow)
 			mptcp_pm_remove_subflow(msk, &list);
+		clear_bit(addr->id, msk->pm.endpoint_usage_mask);
 		release_sock(sk);
 
 next:
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index a6a4bd7de5b4..d384f285b6c1 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -190,6 +190,7 @@ struct mptcp_pm_data {
 
 	spinlock_t	lock;		/*protects the whole PM data */
 
+	DECLARE_BITMAP(endpoint_usage_mask, 256);
 	u8		addr_signal;
 	bool		server_side;
 	bool		work_pending;
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 mptcp-next 3/4] mptcp: do not block subflows creation on errors
  2021-11-26 12:19 [PATCH v2 mptcp-next 0/4] mptcp: improve subflow creation on errors Paolo Abeni
  2021-11-26 12:19 ` [PATCH v2 mptcp-next 1/4] mptcp: clean-up MPJ option writing Paolo Abeni
  2021-11-26 12:19 ` [PATCH v2 mptcp-next 2/4] mptcp: keep track of used local endpoint Paolo Abeni
@ 2021-11-26 12:19 ` Paolo Abeni
  2021-11-26 17:53   ` Matthieu Baerts
  2021-11-26 12:19 ` [PATCH v2 mptcp-next 4/4] selftests: mptcp: add tests for subflow creation failure Paolo Abeni
  2021-11-26 15:16 ` [PATCH v2 mptcp-next 0/4] mptcp: improve subflow creation on errors Paolo Abeni
  4 siblings, 1 reply; 10+ messages in thread
From: Paolo Abeni @ 2021-11-26 12:19 UTC (permalink / raw)
  To: mptcp

If the MPTCP configuration allows for multiple subflows
creation, and the first additional subflows never reach
the fully established status - e.g. due to packets drop or
reset - the in kernel path manager do not move to the
next subflow.

This patch introduces a new PM helper to cope with MPJ
subflow creation failure and delay and hook it where appropriate.

Such helper triggers additional subflow creation, as needed
and updates the PM subflow counter, if the current one is
closing.

Note that we don't need an additional timer to catch timeout
and/or long delay in connection completion: we just need to
measure the time elapsed since the subflow creation every
time we emit an MPJ sub-option.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v2 -> v3:
 - fix compile warning (CI)

v1 -> v2:
 - explicitly hook on subflow close instead of error notification
 - fix checkpatch issue (Mat)
 - introduce and use a new pm helper to cope with subflow close
 - update pm subflow counters on close
---
 net/mptcp/options.c  | 21 ++++++++++++++++++---
 net/mptcp/pm.c       | 27 +++++++++++++++++++++++++--
 net/mptcp/protocol.c |  6 ++++++
 net/mptcp/protocol.h |  4 +++-
 net/mptcp/subflow.c  |  5 ++++-
 5 files changed, 56 insertions(+), 7 deletions(-)

diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 7a6a39b71633..00a8697addcd 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -1257,10 +1257,10 @@ static u16 mptcp_make_csum(const struct mptcp_ext *mpext)
 void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp,
 			 struct mptcp_out_options *opts)
 {
-	if (unlikely(OPTION_MPTCP_FAIL & opts->suboptions)) {
-		const struct sock *ssk = (const struct sock *)tp;
-		struct mptcp_subflow_context *subflow;
+	const struct sock *ssk = (const struct sock *)tp;
+	struct mptcp_subflow_context *subflow;
 
+	if (unlikely(OPTION_MPTCP_FAIL & opts->suboptions)) {
 		subflow = mptcp_subflow_ctx(ssk);
 		subflow->send_mp_fail = 0;
 
@@ -1382,6 +1382,21 @@ void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp,
 		/* MPC is additionally mutually exclusive with MP_PRIO */
 		goto mp_capable_done;
 	} else if (OPTIONS_MPTCP_MPJ & opts->suboptions) {
+		if (ssk) {
+			/* we are still in the MPJ handshake and "a lot" of time passed
+			 * e.g. due to syn retranmissions. We can attempt next
+			 * subflow creation
+			 */
+			subflow = mptcp_subflow_ctx(ssk);
+			if (subflow->start_stamp &&
+			    unlikely(after(tcp_jiffies32, subflow->start_stamp + HZ / 10))) {
+				mptcp_pm_subflow_check_next(mptcp_sk(subflow->conn), ssk, subflow);
+
+				/* avoid triggering the PM multiple times due to timeout */
+				subflow->start_stamp = 0;
+			}
+		}
+
 		if (OPTION_MPTCP_MPJ_SYN & opts->suboptions) {
 			*ptr++ = mptcp_option(MPTCPOPT_MP_JOIN,
 					      TCPOLEN_MPTCP_MPJ_SYN,
diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
index 1ff856dd92c4..df8a596cd3ff 100644
--- a/net/mptcp/pm.c
+++ b/net/mptcp/pm.c
@@ -172,9 +172,32 @@ void mptcp_pm_subflow_established(struct mptcp_sock *msk)
 	spin_unlock_bh(&pm->lock);
 }
 
-void mptcp_pm_subflow_closed(struct mptcp_sock *msk, u8 id)
+void mptcp_pm_subflow_check_next(struct mptcp_sock *msk, const struct sock *ssk,
+				 const struct mptcp_subflow_context *subflow)
 {
-	pr_debug("msk=%p", msk);
+	struct mptcp_pm_data *pm = &msk->pm;
+	bool closed;
+
+	closed = ssk->sk_state == TCP_CLOSE;
+	if (subflow->fully_established && !closed)
+		return;
+
+	spin_lock_bh(&pm->lock);
+	if (closed) {
+		pm->local_addr_used--;
+		pm->subflows--;
+		/* do not enable the pm worker: we don't want to pick again
+		 * the just closed subflow
+		 */
+	}
+
+	/* Even if this subflow is not really established, tell the PM to try
+	 * to pick the next one, if possible.
+	 */
+	if (pm->work_pending)
+		mptcp_pm_schedule_work(msk, MPTCP_PM_SUBFLOW_ESTABLISHED);
+
+	spin_unlock_bh(&pm->lock);
 }
 
 void mptcp_pm_add_addr_received(struct mptcp_sock *msk,
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index ef2125798e64..2ee2fe97c553 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2351,6 +2351,12 @@ void mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 {
 	if (sk->sk_state == TCP_ESTABLISHED)
 		mptcp_event(MPTCP_EVENT_SUB_CLOSED, mptcp_sk(sk), ssk, GFP_KERNEL);
+
+	/* subflow aborted before reaching the fully_established status
+	 * attempt the creation of the next subflow
+	 */
+	mptcp_pm_subflow_check_next(mptcp_sk(sk), ssk, subflow);
+
 	__mptcp_close_ssk(sk, ssk, subflow, MPTCP_CF_PUSH);
 }
 
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index d384f285b6c1..4da1ef964b88 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -433,6 +433,7 @@ struct mptcp_subflow_context {
 		stale : 1;	    /* unable to snd/rcv data, do not use for xmit */
 	enum mptcp_data_avail data_avail;
 	u32	remote_nonce;
+	u32	start_stamp;
 	u64	thmac;
 	u32	local_nonce;
 	u32	remote_token;
@@ -733,7 +734,8 @@ void mptcp_pm_fully_established(struct mptcp_sock *msk, const struct sock *ssk,
 bool mptcp_pm_allow_new_subflow(struct mptcp_sock *msk);
 void mptcp_pm_connection_closed(struct mptcp_sock *msk);
 void mptcp_pm_subflow_established(struct mptcp_sock *msk);
-void mptcp_pm_subflow_closed(struct mptcp_sock *msk, u8 id);
+void mptcp_pm_subflow_check_next(struct mptcp_sock *msk, const struct sock *ssk,
+				 const struct mptcp_subflow_context *subflow);
 void mptcp_pm_add_addr_received(struct mptcp_sock *msk,
 				const struct mptcp_addr_info *addr);
 void mptcp_pm_add_addr_echoed(struct mptcp_sock *msk,
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 0f90bd61de01..76556743e952 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1273,7 +1273,8 @@ void __mptcp_error_report(struct sock *sk)
 
 static void subflow_error_report(struct sock *ssk)
 {
-	struct sock *sk = mptcp_subflow_ctx(ssk)->conn;
+	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
+	struct sock *sk = subflow->conn;
 
 	mptcp_data_lock(sk);
 	if (!sock_owned_by_user(sk))
@@ -1444,6 +1445,7 @@ int __mptcp_subflow_connect(struct sock *sk, const struct mptcp_addr_info *loc,
 	subflow->remote_id = remote_id;
 	subflow->request_join = 1;
 	subflow->request_bkup = !!(flags & MPTCP_PM_ADDR_FLAG_BACKUP);
+	subflow->start_stamp = tcp_jiffies32;
 	mptcp_info2sockaddr(remote, &addr, ssk->sk_family);
 
 	mptcp_add_pending_subflow(msk, subflow);
@@ -1461,6 +1463,7 @@ int __mptcp_subflow_connect(struct sock *sk, const struct mptcp_addr_info *loc,
 	spin_lock_bh(&msk->join_list_lock);
 	list_del(&subflow->node);
 	spin_unlock_bh(&msk->join_list_lock);
+	mptcp_pm_subflow_check_next(msk, ssk, subflow);
 	sock_put(mptcp_subflow_tcp_sock(subflow));
 
 failed:
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 mptcp-next 4/4] selftests: mptcp: add tests for subflow creation failure
  2021-11-26 12:19 [PATCH v2 mptcp-next 0/4] mptcp: improve subflow creation on errors Paolo Abeni
                   ` (2 preceding siblings ...)
  2021-11-26 12:19 ` [PATCH v2 mptcp-next 3/4] mptcp: do not block subflows creation on errors Paolo Abeni
@ 2021-11-26 12:19 ` Paolo Abeni
  2021-11-26 17:57   ` Matthieu Baerts
  2021-11-26 15:16 ` [PATCH v2 mptcp-next 0/4] mptcp: improve subflow creation on errors Paolo Abeni
  4 siblings, 1 reply; 10+ messages in thread
From: Paolo Abeni @ 2021-11-26 12:19 UTC (permalink / raw)
  To: mptcp

Verify that, when multiple endpoints are available, subflows
creation proceed even when the first additional subflow creation
fails - due to packet drop on the relevant link

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v1 -> v2:
 - add missing NF kconfig
 - add more test-cases (drop, later subflow creation)
---
 tools/testing/selftests/net/mptcp/config      |  1 +
 .../testing/selftests/net/mptcp/mptcp_join.sh | 65 ++++++++++++++++++-
 2 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/mptcp/config b/tools/testing/selftests/net/mptcp/config
index 419e71560fd1..41e94e004bdc 100644
--- a/tools/testing/selftests/net/mptcp/config
+++ b/tools/testing/selftests/net/mptcp/config
@@ -18,4 +18,5 @@ CONFIG_NFT_TPROXY=m
 CONFIG_NFT_SOCKET=m
 CONFIG_IP_ADVANCED_ROUTER=y
 CONFIG_IP_MULTIPLE_TABLES=y
+CONFIG_IP_NF_TARGET_REJECT=m
 CONFIG_IPV6_MULTIPLE_TABLES=y
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index 2684ef9c0d42..6d342f9985b8 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -975,6 +975,22 @@ chk_link_usage()
 	fi
 }
 
+wait_for_tw()
+{
+	local timeout_ms=$((timeout_poll * 1000))
+	local time=0
+	local ns=$1
+
+	while [ $time -lt $timeout_ms ]; do
+		local cnt=`ip netns exec $ns ss -t state time-wait |wc -l`
+
+		[ "$cnt" = 1 ] && return 1
+		time=$((time + 100))
+		sleep 0.1
+	done
+	return 1
+}
+
 subflows_tests()
 {
 	reset
@@ -1032,6 +1048,48 @@ subflows_tests()
 	chk_join_nr "single subflow, dev" 1 1 1
 }
 
+subflows_error_tests()
+{
+	# multiple subflows, with subflow creation error
+	reset
+	ip netns exec $ns1 ./pm_nl_ctl limits 0 2
+	ip netns exec $ns2 ./pm_nl_ctl limits 0 2
+	ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 flags subflow
+	ip netns exec $ns2 ./pm_nl_ctl add 10.0.2.2 flags subflow
+	ip netns exec $ns1 iptables -A INPUT -s 10.0.3.2 -p tcp -j REJECT
+	run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow
+	chk_join_nr "multiple subflows, with failing subflow" 1 1 1
+
+	# multiple subflows, with subflow timeout on MPJ
+	reset
+	ip netns exec $ns1 ./pm_nl_ctl limits 0 2
+	ip netns exec $ns2 ./pm_nl_ctl limits 0 2
+	ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 flags subflow
+	ip netns exec $ns2 ./pm_nl_ctl add 10.0.2.2 flags subflow
+	ip netns exec $ns1 iptables -A INPUT -s 10.0.3.2 -p tcp -j DROP
+	run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow
+	chk_join_nr "multiple subflows, with subflow timeout" 1 1 1
+
+	# multiple subflows, check that the endpoint corresponding to
+	# closed subflow (due to reset) is not reused if additional
+	# subflows are added later
+	reset
+	ip netns exec $ns1 ./pm_nl_ctl limits 0 1
+	ip netns exec $ns2 ./pm_nl_ctl limits 0 1
+	ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 flags subflow
+	ip netns exec $ns1 iptables -A INPUT -s 10.0.3.2 -p tcp -j REJECT
+	run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow &
+
+	# mpj subflow will be in TW after the reset
+	wait_for_tw $ns2
+	ip netns exec $ns2 ./pm_nl_ctl add 10.0.2.2 flags subflow
+	wait
+
+	# additional subflow could be created only if the PM select
+	# the later endpoint, skipping the already used one
+	chk_join_nr "multiple subflows, fair usage on close" 1 1 1
+}
+
 signal_address_tests()
 {
 	# add_address, unused
@@ -1840,6 +1898,7 @@ fullmesh_tests()
 all_tests()
 {
 	subflows_tests
+	subflows_error_tests
 	signal_address_tests
 	link_failure_tests
 	add_addr_timeout_tests
@@ -1859,6 +1918,7 @@ usage()
 {
 	echo "mptcp_join usage:"
 	echo "  -f subflows_tests"
+	echo "  -f subflows_error_tests"
 	echo "  -s signal_address_tests"
 	echo "  -l link_failure_tests"
 	echo "  -t add_addr_timeout_tests"
@@ -1907,11 +1967,14 @@ if [ $do_all_tests -eq 1 ]; then
 	exit $ret
 fi
 
-while getopts 'fsltra64bpkdmchCS' opt; do
+while getopts 'fesltra64bpkdmchCS' opt; do
 	case $opt in
 		f)
 			subflows_tests
 			;;
+		e)
+			subflows_error_tests
+			;;
 		s)
 			signal_address_tests
 			;;
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 mptcp-next 0/4] mptcp: improve subflow creation on errors
  2021-11-26 12:19 [PATCH v2 mptcp-next 0/4] mptcp: improve subflow creation on errors Paolo Abeni
                   ` (3 preceding siblings ...)
  2021-11-26 12:19 ` [PATCH v2 mptcp-next 4/4] selftests: mptcp: add tests for subflow creation failure Paolo Abeni
@ 2021-11-26 15:16 ` Paolo Abeni
  4 siblings, 0 replies; 10+ messages in thread
From: Paolo Abeni @ 2021-11-26 15:16 UTC (permalink / raw)
  To: mptcp

On Fri, 2021-11-26 at 13:19 +0100, Paolo Abeni wrote:
> This iteration just addresses a compile warning spotted by the
> CI.

I'm sorry for the confusion: this is really a v3, not v2. Hopefully
patchwork is still happy?

I could not even say ENOCOFFEE because I just had one before posting.

Cheers,

Paolo



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 mptcp-next 2/4] mptcp: keep track of used local endpoint
  2021-11-26 12:19 ` [PATCH v2 mptcp-next 2/4] mptcp: keep track of used local endpoint Paolo Abeni
@ 2021-11-26 17:53   ` Matthieu Baerts
  0 siblings, 0 replies; 10+ messages in thread
From: Matthieu Baerts @ 2021-11-26 17:53 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

Hi Paolo,

On 26/11/2021 13:19, Paolo Abeni wrote:
> So that we can skip already used endpoint at local address
> selection time.
> This allows for fair local endpoints usage in case of subflow
> failure.
> 
> As a side effect, this patch also enforces that each endpoint
> is used at most once for each mptcp connection.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Good idea!

Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>


Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 mptcp-next 3/4] mptcp: do not block subflows creation on errors
  2021-11-26 12:19 ` [PATCH v2 mptcp-next 3/4] mptcp: do not block subflows creation on errors Paolo Abeni
@ 2021-11-26 17:53   ` Matthieu Baerts
  2021-11-29 14:24     ` Paolo Abeni
  0 siblings, 1 reply; 10+ messages in thread
From: Matthieu Baerts @ 2021-11-26 17:53 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

Hi Paolo,

On 26/11/2021 13:19, Paolo Abeni wrote:
> If the MPTCP configuration allows for multiple subflows
> creation, and the first additional subflows never reach
> the fully established status - e.g. due to packets drop or
> reset - the in kernel path manager do not move to the
> next subflow.
> 
> This patch introduces a new PM helper to cope with MPJ
> subflow creation failure and delay and hook it where appropriate.
> 
> Such helper triggers additional subflow creation, as needed
> and updates the PM subflow counter, if the current one is
> closing.
> 
> Note that we don't need an additional timer to catch timeout
> and/or long delay in connection completion: we just need to
> measure the time elapsed since the subflow creation every
> time we emit an MPJ sub-option.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> v2 -> v3:
>  - fix compile warning (CI)
> 
> v1 -> v2:
>  - explicitly hook on subflow close instead of error notification
>  - fix checkpatch issue (Mat)

(+t :-P )

>  - introduce and use a new pm helper to cope with subflow close
>  - update pm subflow counters on close

Thank you for the new version!

> diff --git a/net/mptcp/options.c b/net/mptcp/options.c
> index 7a6a39b71633..00a8697addcd 100644
> --- a/net/mptcp/options.c
> +++ b/net/mptcp/options.c
> @@ -1382,6 +1382,21 @@ void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp,
>  		/* MPC is additionally mutually exclusive with MP_PRIO */
>  		goto mp_capable_done;
>  	} else if (OPTIONS_MPTCP_MPJ & opts->suboptions) {
> +		if (ssk) {
> +			/* we are still in the MPJ handshake and "a lot" of time passed
> +			 * e.g. due to syn retranmissions. We can attempt next
> +			 * subflow creation
> +			 */
> +			subflow = mptcp_subflow_ctx(ssk);
> +			if (subflow->start_stamp &&
> +			    unlikely(after(tcp_jiffies32, subflow->start_stamp + HZ / 10))) {

As discussed at the last meeting, we will reach this code path when
retransmitting the SYN, one second after having sent the initial one. In
this case, we might not need to keep 'start_stamp'.

But we can improve that in a new version after.

> +				mptcp_pm_subflow_check_next(mptcp_sk(subflow->conn), ssk, subflow);
> +
> +				/* avoid triggering the PM multiple times due to timeout */
> +				subflow->start_stamp = 0;
> +			}
> +		}
> +
>  		if (OPTION_MPTCP_MPJ_SYN & opts->suboptions) {
>  			*ptr++ = mptcp_option(MPTCPOPT_MP_JOIN,
>  					      TCPOLEN_MPTCP_MPJ_SYN,
> diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
> index 1ff856dd92c4..df8a596cd3ff 100644
> --- a/net/mptcp/pm.c
> +++ b/net/mptcp/pm.c
> @@ -172,9 +172,32 @@ void mptcp_pm_subflow_established(struct mptcp_sock *msk)
>  	spin_unlock_bh(&pm->lock);
>  }
>  
> -void mptcp_pm_subflow_closed(struct mptcp_sock *msk, u8 id)
> +void mptcp_pm_subflow_check_next(struct mptcp_sock *msk, const struct sock *ssk,
> +				 const struct mptcp_subflow_context *subflow)

Detail: do we need to pass the 'msk'? Can we not get it from 'subflow'?

>  {
> -	pr_debug("msk=%p", msk);
> +	struct mptcp_pm_data *pm = &msk->pm;
> +	bool closed;
> +
> +	closed = ssk->sk_state == TCP_CLOSE;
> +	if (subflow->fully_established && !closed)
> +		return;
> +
> +	spin_lock_bh(&pm->lock);
> +	if (closed) {
> +		pm->local_addr_used--;
> +		pm->subflows--;
> +		/* do not enable the pm worker: we don't want to pick again
> +		 * the just closed subflow
> +		 */
> +	}
> +
> +	/* Even if this subflow is not really established, tell the PM to try
> +	 * to pick the next one, if possible.
> +	 */
> +	if (pm->work_pending)

Does it not need to be "protected" with a READ_ONCE?

> +		mptcp_pm_schedule_work(msk, MPTCP_PM_SUBFLOW_ESTABLISHED);
> +
> +	spin_unlock_bh(&pm->lock);
>  }
>  
>  void mptcp_pm_add_addr_received(struct mptcp_sock *msk,

(not related to this line)

Did you see that the "fastclose" packetdrill tests are failing? Is it
due to these modifications?

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 mptcp-next 4/4] selftests: mptcp: add tests for subflow creation failure
  2021-11-26 12:19 ` [PATCH v2 mptcp-next 4/4] selftests: mptcp: add tests for subflow creation failure Paolo Abeni
@ 2021-11-26 17:57   ` Matthieu Baerts
  0 siblings, 0 replies; 10+ messages in thread
From: Matthieu Baerts @ 2021-11-26 17:57 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

Hi Paolo,

On 26/11/2021 13:19, Paolo Abeni wrote:
> Verify that, when multiple endpoints are available, subflows
> creation proceed even when the first additional subflow creation
> fails - due to packet drop on the relevant link
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> v1 -> v2:
>  - add missing NF kconfig
>  - add more test-cases (drop, later subflow creation)

Thank you for the modifications!

> diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
> index 2684ef9c0d42..6d342f9985b8 100755
> --- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
> +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
> @@ -975,6 +975,22 @@ chk_link_usage()
>  	fi
>  }
>  
> +wait_for_tw()
> +{
> +	local timeout_ms=$((timeout_poll * 1000))
> +	local time=0
> +	local ns=$1
> +
> +	while [ $time -lt $timeout_ms ]; do
> +		local cnt=`ip netns exec $ns ss -t state time-wait |wc -l`

(no need to change but it is often recommended to use $(...) instead of
`...` because it supports nested commands and is more "visible")

(...)

> @@ -1859,6 +1918,7 @@ usage()
>  {
>  	echo "mptcp_join usage:"
>  	echo "  -f subflows_tests"
> +	echo "  -f subflows_error_tests"

Detail: s/-f/-e/

I can fix it before applying if you didn't already plan to send a new v4 :)

Cheers,
Matt

PS: for the lagging a bit with the reviews!
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 mptcp-next 3/4] mptcp: do not block subflows creation on errors
  2021-11-26 17:53   ` Matthieu Baerts
@ 2021-11-29 14:24     ` Paolo Abeni
  0 siblings, 0 replies; 10+ messages in thread
From: Paolo Abeni @ 2021-11-29 14:24 UTC (permalink / raw)
  To: Matthieu Baerts; +Cc: mptcp

On Fri, 2021-11-26 at 18:53 +0100, Matthieu Baerts wrote:
> Hi Paolo,
> 
> On 26/11/2021 13:19, Paolo Abeni wrote:
> > If the MPTCP configuration allows for multiple subflows
> > creation, and the first additional subflows never reach
> > the fully established status - e.g. due to packets drop or
> > reset - the in kernel path manager do not move to the
> > next subflow.
> > 
> > This patch introduces a new PM helper to cope with MPJ
> > subflow creation failure and delay and hook it where appropriate.
> > 
> > Such helper triggers additional subflow creation, as needed
> > and updates the PM subflow counter, if the current one is
> > closing.
> > 
> > Note that we don't need an additional timer to catch timeout
> > and/or long delay in connection completion: we just need to
> > measure the time elapsed since the subflow creation every
> > time we emit an MPJ sub-option.
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> > v2 -> v3:
> >  - fix compile warning (CI)
> > 
> > v1 -> v2:
> >  - explicitly hook on subflow close instead of error notification
> >  - fix checkpatch issue (Mat)
> 
> (+t :-P )
> 
> >  - introduce and use a new pm helper to cope with subflow close
> >  - update pm subflow counters on close
> 
> Thank you for the new version!
> 
> > diff --git a/net/mptcp/options.c b/net/mptcp/options.c
> > index 7a6a39b71633..00a8697addcd 100644
> > --- a/net/mptcp/options.c
> > +++ b/net/mptcp/options.c
> > @@ -1382,6 +1382,21 @@ void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp,
> >  		/* MPC is additionally mutually exclusive with MP_PRIO */
> >  		goto mp_capable_done;
> >  	} else if (OPTIONS_MPTCP_MPJ & opts->suboptions) {
> > +		if (ssk) {
> > +			/* we are still in the MPJ handshake and "a lot" of time passed
> > +			 * e.g. due to syn retranmissions. We can attempt next
> > +			 * subflow creation
> > +			 */
> > +			subflow = mptcp_subflow_ctx(ssk);
> > +			if (subflow->start_stamp &&
> > +			    unlikely(after(tcp_jiffies32, subflow->start_stamp + HZ / 10))) {
> 
> As discussed at the last meeting, we will reach this code path when
> retransmitting the SYN, one second after having sent the initial one. In
> this case, we might not need to keep 'start_stamp'.
> 
> But we can improve that in a new version after.
> 
> > +				mptcp_pm_subflow_check_next(mptcp_sk(subflow->conn), ssk, subflow);
> > +
> > +				/* avoid triggering the PM multiple times due to timeout */
> > +				subflow->start_stamp = 0;
> > +			}
> > +		}
> > +
> >  		if (OPTION_MPTCP_MPJ_SYN & opts->suboptions) {
> >  			*ptr++ = mptcp_option(MPTCPOPT_MP_JOIN,
> >  					      TCPOLEN_MPTCP_MPJ_SYN,
> > diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
> > index 1ff856dd92c4..df8a596cd3ff 100644
> > --- a/net/mptcp/pm.c
> > +++ b/net/mptcp/pm.c
> > @@ -172,9 +172,32 @@ void mptcp_pm_subflow_established(struct mptcp_sock *msk)
> >  	spin_unlock_bh(&pm->lock);
> >  }
> >  
> > -void mptcp_pm_subflow_closed(struct mptcp_sock *msk, u8 id)
> > +void mptcp_pm_subflow_check_next(struct mptcp_sock *msk, const struct sock *ssk,
> > +				 const struct mptcp_subflow_context *subflow)
> 
> Detail: do we need to pass the 'msk'? Can we not get it from 'subflow'?

The caler has the msk handy, passing it to the helper keeps the args
list similar to other releated helpers and avoid some more dup code.

> >  {
> > -	pr_debug("msk=%p", msk);
> > +	struct mptcp_pm_data *pm = &msk->pm;
> > +	bool closed;
> > +
> > +	closed = ssk->sk_state == TCP_CLOSE;
> > +	if (subflow->fully_established && !closed)
> > +		return;
> > +
> > +	spin_lock_bh(&pm->lock);
> > +	if (closed) {
> > +		pm->local_addr_used--;
> > +		pm->subflows--;
> > +		/* do not enable the pm worker: we don't want to pick again
> > +		 * the just closed subflow
> > +		 */
> > +	}
> > +
> > +	/* Even if this subflow is not really established, tell the PM to try
> > +	 * to pick the next one, if possible.
> > +	 */
> > +	if (pm->work_pending)
> 
> Does it not need to be "protected" with a READ_ONCE?

AFAIK not needed: this is the only read issued on such field in this
scope and and is done under the relevant lock (all write operation
occours under the pm lock).

> > +		mptcp_pm_schedule_work(msk, MPTCP_PM_SUBFLOW_ESTABLISHED);
> > +
> > +	spin_unlock_bh(&pm->lock);
> >  }
> >  
> >  void mptcp_pm_add_addr_received(struct mptcp_sock *msk,
> 
> (not related to this line)
> 
> Did you see that the "fastclose" packetdrill tests are failing? Is it
> due to these modifications?

uhm... I haven't check them. I'll have a look soon.

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-11-29 14:24 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-26 12:19 [PATCH v2 mptcp-next 0/4] mptcp: improve subflow creation on errors Paolo Abeni
2021-11-26 12:19 ` [PATCH v2 mptcp-next 1/4] mptcp: clean-up MPJ option writing Paolo Abeni
2021-11-26 12:19 ` [PATCH v2 mptcp-next 2/4] mptcp: keep track of used local endpoint Paolo Abeni
2021-11-26 17:53   ` Matthieu Baerts
2021-11-26 12:19 ` [PATCH v2 mptcp-next 3/4] mptcp: do not block subflows creation on errors Paolo Abeni
2021-11-26 17:53   ` Matthieu Baerts
2021-11-29 14:24     ` Paolo Abeni
2021-11-26 12:19 ` [PATCH v2 mptcp-next 4/4] selftests: mptcp: add tests for subflow creation failure Paolo Abeni
2021-11-26 17:57   ` Matthieu Baerts
2021-11-26 15:16 ` [PATCH v2 mptcp-next 0/4] mptcp: improve subflow creation on errors Paolo Abeni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.