All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH mptcp-next v5 0/5] mptcp: improve subflow creation on errors
@ 2021-12-09 17:30 Paolo Abeni
  2021-12-09 17:30 ` [PATCH mptcp-next v5 1/5] mptcp: fix per socket endpoint accounting Paolo Abeni
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Paolo Abeni @ 2021-12-09 17:30 UTC (permalink / raw)
  To: mptcp

This iteration adds tracking for the endpoint used by the MPC subflow,
updated the self-tests accordingly (1 signal tests-case is affected)
and introduces a new test case for the relevant scenario.

Currently when a subflow connection fails - either the TCP
connection is reset by the peer, or the MPJ handshake never
completes - the in kernel PM don't perform any further action.
Notably no additional subflow creation is attempted.

This series is aimed at improving the in-kernel path manager
behavior in the above scenario:
- each endpoint is used only once per connection (patch 2/4)
- on failure we try to move to the next subflow, if any
  (patch 3/4)

The first patch is minor cleanup, and the last patch adds
specific self-tests.

This should address/close:
https://github.com/multipath-tcp/mptcp_net-next/issues/235
https://github.com/multipath-tcp/mptcp_net-next/issues/242

Paolo Abeni (5):
  mptcp: fix per socket endpoint accounting.
  mptcp: clean-up MPJ option writing.
  mptcp: keep track of local endpoint still available for each msk
  mptcp: do not block subflows creation on errors
  selftests: mptcp: add tests for subflow creation failure

 net/mptcp/options.c                           |  65 +++++---
 net/mptcp/pm.c                                |  24 ++-
 net/mptcp/pm_netlink.c                        | 142 +++++++++++-------
 net/mptcp/protocol.c                          |   9 +-
 net/mptcp/protocol.h                          |  16 +-
 net/mptcp/subflow.c                           |   5 +-
 tools/testing/selftests/net/mptcp/config      |   1 +
 .../testing/selftests/net/mptcp/mptcp_join.sh |  79 +++++++++-
 8 files changed, 254 insertions(+), 87 deletions(-)

-- 
2.33.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH mptcp-next v5 1/5] mptcp: fix per socket endpoint accounting.
  2021-12-09 17:30 [PATCH mptcp-next v5 0/5] mptcp: improve subflow creation on errors Paolo Abeni
@ 2021-12-09 17:30 ` Paolo Abeni
  2021-12-09 17:30 ` [PATCH mptcp-next v5 2/5] mptcp: clean-up MPJ option writing Paolo Abeni
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Paolo Abeni @ 2021-12-09 17:30 UTC (permalink / raw)
  To: mptcp

Since full-mesh endpoint support, the reception of a single ADD_ADDR
option can cause multiple subflows creation. When such option is
accepted we increment 'add_addr_accepted' by one. When we received
a paired RM_ADDR option, we deleted all the relevant subflows,
decrementing 'add_addr_accepted' by one for each of them.

We have a similar issue for 'local_addr_used'

Fix them moving the pm endpoint accounting outside the subflow
traversal.

Fixes: 1a0d6136c5f0 ("mptcp: local addresses fullmesh")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/pm_netlink.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 3186d33b5208..d18b13e3e74c 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -710,6 +710,8 @@ static void mptcp_pm_nl_rm_addr_or_subflow(struct mptcp_sock *msk,
 		return;
 
 	for (i = 0; i < rm_list->nr; i++) {
+		bool removed = false;
+
 		list_for_each_entry_safe(subflow, tmp, &msk->conn_list, node) {
 			struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
 			int how = RCV_SHUTDOWN | SEND_SHUTDOWN;
@@ -729,15 +731,19 @@ static void mptcp_pm_nl_rm_addr_or_subflow(struct mptcp_sock *msk,
 			mptcp_close_ssk(sk, ssk, subflow);
 			spin_lock_bh(&msk->pm.lock);
 
-			if (rm_type == MPTCP_MIB_RMADDR) {
-				msk->pm.add_addr_accepted--;
-				WRITE_ONCE(msk->pm.accept_addr, true);
-			} else if (rm_type == MPTCP_MIB_RMSUBFLOW) {
-				msk->pm.local_addr_used--;
-			}
+			removed = true;
 			msk->pm.subflows--;
 			__MPTCP_INC_STATS(sock_net(sk), rm_type);
 		}
+		if (!removed)
+			continue;
+
+		if (rm_type == MPTCP_MIB_RMADDR) {
+			msk->pm.add_addr_accepted--;
+			WRITE_ONCE(msk->pm.accept_addr, true);
+		} else if (rm_type == MPTCP_MIB_RMSUBFLOW) {
+			msk->pm.local_addr_used--;
+		}
 	}
 }
 
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH mptcp-next v5 2/5] mptcp: clean-up MPJ option writing.
  2021-12-09 17:30 [PATCH mptcp-next v5 0/5] mptcp: improve subflow creation on errors Paolo Abeni
  2021-12-09 17:30 ` [PATCH mptcp-next v5 1/5] mptcp: fix per socket endpoint accounting Paolo Abeni
@ 2021-12-09 17:30 ` Paolo Abeni
  2021-12-09 17:30 ` [PATCH mptcp-next v5 3/5] mptcp: keep track of local endpoint still available for each msk Paolo Abeni
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Paolo Abeni @ 2021-12-09 17:30 UTC (permalink / raw)
  To: mptcp

Check for all MPJ variant at once, this reduces the number
of conditionals traversed on average and will simplify the
next patch.

No functional change intended.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/options.c | 44 +++++++++++++++++++++++---------------------
 1 file changed, 23 insertions(+), 21 deletions(-)

diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 8a1020e4285c..7a6a39b71633 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -1381,27 +1381,29 @@ void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp,
 
 		/* MPC is additionally mutually exclusive with MP_PRIO */
 		goto mp_capable_done;
-	} else if (OPTION_MPTCP_MPJ_SYN & opts->suboptions) {
-		*ptr++ = mptcp_option(MPTCPOPT_MP_JOIN,
-				      TCPOLEN_MPTCP_MPJ_SYN,
-				      opts->backup, opts->join_id);
-		put_unaligned_be32(opts->token, ptr);
-		ptr += 1;
-		put_unaligned_be32(opts->nonce, ptr);
-		ptr += 1;
-	} else if (OPTION_MPTCP_MPJ_SYNACK & opts->suboptions) {
-		*ptr++ = mptcp_option(MPTCPOPT_MP_JOIN,
-				      TCPOLEN_MPTCP_MPJ_SYNACK,
-				      opts->backup, opts->join_id);
-		put_unaligned_be64(opts->thmac, ptr);
-		ptr += 2;
-		put_unaligned_be32(opts->nonce, ptr);
-		ptr += 1;
-	} else if (OPTION_MPTCP_MPJ_ACK & opts->suboptions) {
-		*ptr++ = mptcp_option(MPTCPOPT_MP_JOIN,
-				      TCPOLEN_MPTCP_MPJ_ACK, 0, 0);
-		memcpy(ptr, opts->hmac, MPTCPOPT_HMAC_LEN);
-		ptr += 5;
+	} else if (OPTIONS_MPTCP_MPJ & opts->suboptions) {
+		if (OPTION_MPTCP_MPJ_SYN & opts->suboptions) {
+			*ptr++ = mptcp_option(MPTCPOPT_MP_JOIN,
+					      TCPOLEN_MPTCP_MPJ_SYN,
+					      opts->backup, opts->join_id);
+			put_unaligned_be32(opts->token, ptr);
+			ptr += 1;
+			put_unaligned_be32(opts->nonce, ptr);
+			ptr += 1;
+		} else if (OPTION_MPTCP_MPJ_SYNACK & opts->suboptions) {
+			*ptr++ = mptcp_option(MPTCPOPT_MP_JOIN,
+					      TCPOLEN_MPTCP_MPJ_SYNACK,
+					      opts->backup, opts->join_id);
+			put_unaligned_be64(opts->thmac, ptr);
+			ptr += 2;
+			put_unaligned_be32(opts->nonce, ptr);
+			ptr += 1;
+		} else {
+			*ptr++ = mptcp_option(MPTCPOPT_MP_JOIN,
+					      TCPOLEN_MPTCP_MPJ_ACK, 0, 0);
+			memcpy(ptr, opts->hmac, MPTCPOPT_HMAC_LEN);
+			ptr += 5;
+		}
 	} else if (OPTION_MPTCP_ADD_ADDR & opts->suboptions) {
 		u8 len = TCPOLEN_MPTCP_ADD_ADDR_BASE;
 		u8 echo = MPTCP_ADDR_ECHO;
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH mptcp-next v5 3/5] mptcp: keep track of local endpoint still available for each msk
  2021-12-09 17:30 [PATCH mptcp-next v5 0/5] mptcp: improve subflow creation on errors Paolo Abeni
  2021-12-09 17:30 ` [PATCH mptcp-next v5 1/5] mptcp: fix per socket endpoint accounting Paolo Abeni
  2021-12-09 17:30 ` [PATCH mptcp-next v5 2/5] mptcp: clean-up MPJ option writing Paolo Abeni
@ 2021-12-09 17:30 ` Paolo Abeni
  2021-12-09 17:30 ` [PATCH mptcp-next v5 4/5] mptcp: do not block subflows creation on errors Paolo Abeni
  2021-12-09 17:30 ` [PATCH mptcp-next v5 5/5] selftests: mptcp: add tests for subflow creation failure Paolo Abeni
  4 siblings, 0 replies; 10+ messages in thread
From: Paolo Abeni @ 2021-12-09 17:30 UTC (permalink / raw)
  To: mptcp

Include into the path manager status a bitmap tracking the list
of local endpoints still available - not yet used - for the
relavant mptcp socket.
Keep such map updated at endpoint creation/deleteion time, so
that we can easily skip already used endpoint at local address
selection time.
The endpoint used by the initial subflow is lazyly accounted at
subflow creation time: the usage bitmap is be up2date before
endpoint selection and we avoid such unneeded task in some relevant
scenarions - e.g. busy servers accepting incoming subflows but
not creating any additional ones nor annuncing additional addresses.

Overall this allows for fair local endpoints usage in case of
subflow failure.

As a side effect, this patch also enforces that each endpoint
is used at most once for each mptcp connection.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v4 -> v5:
 - track MPC subflow, too
 - update self-tests accordingly

v3 -> v4:
 - track the available (not yet used) id instead of the used ones
 - track both
 - cleanup duplication with id_bitmap
---
 net/mptcp/pm.c                                |   1 +
 net/mptcp/pm_netlink.c                        | 116 +++++++++++-------
 net/mptcp/protocol.c                          |   3 +-
 net/mptcp/protocol.h                          |  11 +-
 .../testing/selftests/net/mptcp/mptcp_join.sh |   5 +-
 5 files changed, 86 insertions(+), 50 deletions(-)

diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
index 761995a34124..d6d22f18c418 100644
--- a/net/mptcp/pm.c
+++ b/net/mptcp/pm.c
@@ -376,6 +376,7 @@ void mptcp_pm_data_reset(struct mptcp_sock *msk)
 	WRITE_ONCE(msk->pm.accept_subflow, false);
 	WRITE_ONCE(msk->pm.remote_deny_join_id0, false);
 	msk->pm.status = 0;
+	bitmap_fill(msk->pm.id_avail_bitmap, MAX_ADDR_ID + 1);
 
 	mptcp_pm_nl_data_init(msk);
 }
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index d18b13e3e74c..1be7f92e9fc8 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -38,9 +38,6 @@ struct mptcp_pm_add_entry {
 	u8			retrans_times;
 };
 
-#define MAX_ADDR_ID		255
-#define BITMAP_SZ DIV_ROUND_UP(MAX_ADDR_ID + 1, BITS_PER_LONG)
-
 struct pm_nl_pernet {
 	/* protects pernet updates */
 	spinlock_t		lock;
@@ -52,14 +49,14 @@ struct pm_nl_pernet {
 	unsigned int		local_addr_max;
 	unsigned int		subflows_max;
 	unsigned int		next_id;
-	unsigned long		id_bitmap[BITMAP_SZ];
+	DECLARE_BITMAP(id_bitmap, MAX_ADDR_ID + 1);
 };
 
 #define MPTCP_PM_ADDR_MAX	8
 #define ADD_ADDR_RETRANS_MAX	3
 
 static bool addresses_equal(const struct mptcp_addr_info *a,
-			    struct mptcp_addr_info *b, bool use_port)
+			    const struct mptcp_addr_info *b, bool use_port)
 {
 	bool addr_equals = false;
 
@@ -173,6 +170,9 @@ select_local_address(const struct pm_nl_pernet *pernet,
 		if (!(entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW))
 			continue;
 
+		if (!test_bit(entry->addr.id, msk->pm.id_avail_bitmap))
+			continue;
+
 		if (entry->addr.family != sk->sk_family) {
 #if IS_ENABLED(CONFIG_MPTCP_IPV6)
 			if ((entry->addr.family == AF_INET &&
@@ -183,23 +183,17 @@ select_local_address(const struct pm_nl_pernet *pernet,
 				continue;
 		}
 
-		/* avoid any address already in use by subflows and
-		 * pending join
-		 */
-		if (!lookup_subflow_by_saddr(&msk->conn_list, &entry->addr)) {
-			ret = entry;
-			break;
-		}
+		ret = entry;
+		break;
 	}
 	rcu_read_unlock();
 	return ret;
 }
 
 static struct mptcp_pm_addr_entry *
-select_signal_address(struct pm_nl_pernet *pernet, unsigned int pos)
+select_signal_address(struct pm_nl_pernet *pernet, struct mptcp_sock *msk)
 {
 	struct mptcp_pm_addr_entry *entry, *ret = NULL;
-	int i = 0;
 
 	rcu_read_lock();
 	/* do not keep any additional per socket state, just signal
@@ -208,12 +202,14 @@ select_signal_address(struct pm_nl_pernet *pernet, unsigned int pos)
 	 * can lead to additional addresses not being announced.
 	 */
 	list_for_each_entry_rcu(entry, &pernet->local_addr_list, list) {
+		if (!test_bit(entry->addr.id, msk->pm.id_avail_bitmap))
+			continue;
+
 		if (!(entry->flags & MPTCP_PM_ADDR_FLAG_SIGNAL))
 			continue;
-		if (i++ == pos) {
-			ret = entry;
-			break;
-		}
+
+		ret = entry;
+		break;
 	}
 	rcu_read_unlock();
 	return ret;
@@ -257,9 +253,11 @@ EXPORT_SYMBOL_GPL(mptcp_pm_get_local_addr_max);
 
 static void check_work_pending(struct mptcp_sock *msk)
 {
-	if (msk->pm.add_addr_signaled == mptcp_pm_get_add_addr_signal_max(msk) &&
-	    (msk->pm.local_addr_used == mptcp_pm_get_local_addr_max(msk) ||
-	     msk->pm.subflows == mptcp_pm_get_subflows_max(msk)))
+	struct pm_nl_pernet *pernet = net_generic(sock_net((struct sock *)msk), pm_nl_pernet_id);
+
+	if (msk->pm.subflows == mptcp_pm_get_subflows_max(msk) ||
+	    (find_next_and_bit(pernet->id_bitmap, msk->pm.id_avail_bitmap, MAX_ADDR_ID + 1, 0) ==
+	     MAX_ADDR_ID + 1))
 		WRITE_ONCE(msk->pm.work_pending, false);
 }
 
@@ -459,6 +457,35 @@ static unsigned int fill_remote_addresses_vec(struct mptcp_sock *msk, bool fullm
 	return i;
 }
 
+static struct mptcp_pm_addr_entry *
+__lookup_addr_by_id(struct pm_nl_pernet *pernet, unsigned int id)
+{
+	struct mptcp_pm_addr_entry *entry;
+
+	list_for_each_entry(entry, &pernet->local_addr_list, list) {
+		if (entry->addr.id == id)
+			return entry;
+	}
+	return NULL;
+}
+
+static int
+lookup_id_by_addr(struct pm_nl_pernet *pernet, const struct mptcp_addr_info *addr)
+{
+	struct mptcp_pm_addr_entry *entry;
+	int ret = -1;
+
+	rcu_read_lock();
+	list_for_each_entry(entry, &pernet->local_addr_list, list) {
+		if (addresses_equal(&entry->addr, addr, entry->addr.port)) {
+			ret = entry->addr.id;
+			break;
+		}
+	}
+	rcu_read_unlock();
+	return ret;
+}
+
 static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
 {
 	struct sock *sk = (struct sock *)msk;
@@ -474,6 +501,19 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
 	local_addr_max = mptcp_pm_get_local_addr_max(msk);
 	subflows_max = mptcp_pm_get_subflows_max(msk);
 
+	/* do lazy endpoint usage accounting for the MPC subflows */
+	if (unlikely(!(msk->pm.status & BIT(MPTCP_PM_MPC_ENDPOINT_ACCOUNTED))) && msk->first) {
+		struct mptcp_addr_info local;
+		int mpc_id;
+
+		local_address((struct sock_common *)msk->first, &local);
+		mpc_id = lookup_id_by_addr(pernet, &local);
+		if (mpc_id < 0)
+			__clear_bit(mpc_id, msk->pm.id_avail_bitmap);
+
+		msk->pm.status |= BIT(MPTCP_PM_MPC_ENDPOINT_ACCOUNTED);
+	}
+
 	pr_debug("local %d:%d signal %d:%d subflows %d:%d\n",
 		 msk->pm.local_addr_used, local_addr_max,
 		 msk->pm.add_addr_signaled, add_addr_signal_max,
@@ -481,21 +521,16 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
 
 	/* check first for announce */
 	if (msk->pm.add_addr_signaled < add_addr_signal_max) {
-		local = select_signal_address(pernet,
-					      msk->pm.add_addr_signaled);
+		local = select_signal_address(pernet, msk);
 
 		if (local) {
 			if (mptcp_pm_alloc_anno_list(msk, local)) {
+				__clear_bit(local->addr.id, msk->pm.id_avail_bitmap);
 				msk->pm.add_addr_signaled++;
 				mptcp_pm_announce_addr(msk, &local->addr, false);
 				mptcp_pm_nl_addr_send_ack(msk);
 			}
-		} else {
-			/* pick failed, avoid fourther attempts later */
-			msk->pm.local_addr_used = add_addr_signal_max;
 		}
-
-		check_work_pending(msk);
 	}
 
 	/* check if should create a new subflow */
@@ -509,19 +544,16 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
 			int i, nr;
 
 			msk->pm.local_addr_used++;
-			check_work_pending(msk);
 			nr = fill_remote_addresses_vec(msk, fullmesh, addrs);
+			if (nr)
+				__clear_bit(local->addr.id, msk->pm.id_avail_bitmap);
 			spin_unlock_bh(&msk->pm.lock);
 			for (i = 0; i < nr; i++)
 				__mptcp_subflow_connect(sk, &local->addr, &addrs[i]);
 			spin_lock_bh(&msk->pm.lock);
-			return;
 		}
-
-		/* lookup failed, avoid fourther attempts later */
-		msk->pm.local_addr_used = local_addr_max;
-		check_work_pending(msk);
 	}
+	check_work_pending(msk);
 }
 
 static void mptcp_pm_nl_fully_established(struct mptcp_sock *msk)
@@ -735,6 +767,7 @@ static void mptcp_pm_nl_rm_addr_or_subflow(struct mptcp_sock *msk,
 			msk->pm.subflows--;
 			__MPTCP_INC_STATS(sock_net(sk), rm_type);
 		}
+		__set_bit(rm_list->ids[1], msk->pm.id_avail_bitmap);
 		if (!removed)
 			continue;
 
@@ -764,6 +797,9 @@ void mptcp_pm_nl_work(struct mptcp_sock *msk)
 
 	msk_owned_by_me(msk);
 
+	if (!(pm->status & MPTCP_PM_WORK_MASK))
+		return;
+
 	spin_lock_bh(&msk->pm.lock);
 
 	pr_debug("msk=%p status=%x", msk, pm->status);
@@ -1197,18 +1233,6 @@ static int mptcp_nl_cmd_add_addr(struct sk_buff *skb, struct genl_info *info)
 	return 0;
 }
 
-static struct mptcp_pm_addr_entry *
-__lookup_addr_by_id(struct pm_nl_pernet *pernet, unsigned int id)
-{
-	struct mptcp_pm_addr_entry *entry;
-
-	list_for_each_entry(entry, &pernet->local_addr_list, list) {
-		if (entry->addr.id == id)
-			return entry;
-	}
-	return NULL;
-}
-
 int mptcp_pm_get_flags_and_ifindex_by_id(struct net *net, unsigned int id,
 					 u8 *flags, int *ifindex)
 {
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 4a8f2476cc75..63602c68f03d 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2508,8 +2508,7 @@ static void mptcp_worker(struct work_struct *work)
 
 	mptcp_check_fastclose(msk);
 
-	if (msk->pm.status)
-		mptcp_pm_nl_work(msk);
+	mptcp_pm_nl_work(msk);
 
 	if (test_and_clear_bit(MPTCP_WORK_EOF, &msk->flags))
 		mptcp_check_for_eof(msk);
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 47d24478763c..e63fe60f70b8 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -173,16 +173,24 @@ enum mptcp_pm_status {
 	MPTCP_PM_ADD_ADDR_SEND_ACK,
 	MPTCP_PM_RM_ADDR_RECEIVED,
 	MPTCP_PM_ESTABLISHED,
-	MPTCP_PM_ALREADY_ESTABLISHED,	/* persistent status, set after ESTABLISHED event */
 	MPTCP_PM_SUBFLOW_ESTABLISHED,
+	MPTCP_PM_ALREADY_ESTABLISHED,	/* persistent status, set after ESTABLISHED event */
+	MPTCP_PM_MPC_ENDPOINT_ACCOUNTED /* persistent status, set after MPC local address is
+					 * accounted int id_avail_bitmap
+					 */
 };
 
+/* Status bits below MPTCP_PM_ALREADY_ESTABLISHED need pm worker actions */
+#define MPTCP_PM_WORK_MASK ((1 << MPTCP_PM_ALREADY_ESTABLISHED) - 1)
+
 enum mptcp_addr_signal_status {
 	MPTCP_ADD_ADDR_SIGNAL,
 	MPTCP_ADD_ADDR_ECHO,
 	MPTCP_RM_ADDR_SIGNAL,
 };
 
+#define MAX_ADDR_ID		255
+
 struct mptcp_pm_data {
 	struct mptcp_addr_info local;
 	struct mptcp_addr_info remote;
@@ -201,6 +209,7 @@ struct mptcp_pm_data {
 	u8		local_addr_used;
 	u8		subflows;
 	u8		status;
+	DECLARE_BITMAP(id_avail_bitmap, MAX_ADDR_ID + 1);
 	struct mptcp_rm_list rm_list_tx;
 	struct mptcp_rm_list rm_list_rx;
 };
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index 2684ef9c0d42..526b05771d08 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -1109,7 +1109,10 @@ signal_address_tests()
 	ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 flags signal
 	ip netns exec $ns2 ./pm_nl_ctl add 10.0.4.2 flags signal
 	run_tests $ns1 $ns2 10.0.1.1
-	chk_add_nr 4 4
+
+	# the server will not signal the address teminating
+	# the MPC subflow
+	chk_add_nr 3 3
 }
 
 link_failure_tests()
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH mptcp-next v5 4/5] mptcp: do not block subflows creation on errors
  2021-12-09 17:30 [PATCH mptcp-next v5 0/5] mptcp: improve subflow creation on errors Paolo Abeni
                   ` (2 preceding siblings ...)
  2021-12-09 17:30 ` [PATCH mptcp-next v5 3/5] mptcp: keep track of local endpoint still available for each msk Paolo Abeni
@ 2021-12-09 17:30 ` Paolo Abeni
  2021-12-13 23:10   ` Mat Martineau
  2021-12-09 17:30 ` [PATCH mptcp-next v5 5/5] selftests: mptcp: add tests for subflow creation failure Paolo Abeni
  4 siblings, 1 reply; 10+ messages in thread
From: Paolo Abeni @ 2021-12-09 17:30 UTC (permalink / raw)
  To: mptcp

If the MPTCP configuration allows for multiple subflows
creation, and the first additional subflows never reach
the fully established status - e.g. due to packets drop or
reset - the in kernel path manager do not move to the
next subflow.

This patch introduces a new PM helper to cope with MPJ
subflow creation failure and delay and hook it where appropriate.

Such helper triggers additional subflow creation, as needed
and updates the PM subflow counter, if the current one is
closing.

Note that we don't need an additional timer to catch timeout
and/or long delay in connection completion: we just need to
measure the time elapsed since the subflow creation every
time we emit an MPJ sub-option.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v3 -> v4:
 - fix 'subflows' underflow

v2 -> v3:
 - fix compile warning (CI)

v1 -> v2:
 - explicitly hook on subflow close instead of error notification
 - fix checkpatch issue (Mat)
 - introduce and use a new pm helper to cope with subflow close
 - update pm subflow counters on close
---
 net/mptcp/options.c    | 21 ++++++++++++++++++---
 net/mptcp/pm.c         | 23 +++++++++++++++++++++--
 net/mptcp/pm_netlink.c | 20 ++++++++++++--------
 net/mptcp/protocol.c   |  6 ++++++
 net/mptcp/protocol.h   |  5 ++++-
 net/mptcp/subflow.c    |  5 ++++-
 6 files changed, 65 insertions(+), 15 deletions(-)

diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 7a6a39b71633..00a8697addcd 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -1257,10 +1257,10 @@ static u16 mptcp_make_csum(const struct mptcp_ext *mpext)
 void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp,
 			 struct mptcp_out_options *opts)
 {
-	if (unlikely(OPTION_MPTCP_FAIL & opts->suboptions)) {
-		const struct sock *ssk = (const struct sock *)tp;
-		struct mptcp_subflow_context *subflow;
+	const struct sock *ssk = (const struct sock *)tp;
+	struct mptcp_subflow_context *subflow;
 
+	if (unlikely(OPTION_MPTCP_FAIL & opts->suboptions)) {
 		subflow = mptcp_subflow_ctx(ssk);
 		subflow->send_mp_fail = 0;
 
@@ -1382,6 +1382,21 @@ void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp,
 		/* MPC is additionally mutually exclusive with MP_PRIO */
 		goto mp_capable_done;
 	} else if (OPTIONS_MPTCP_MPJ & opts->suboptions) {
+		if (ssk) {
+			/* we are still in the MPJ handshake and "a lot" of time passed
+			 * e.g. due to syn retranmissions. We can attempt next
+			 * subflow creation
+			 */
+			subflow = mptcp_subflow_ctx(ssk);
+			if (subflow->start_stamp &&
+			    unlikely(after(tcp_jiffies32, subflow->start_stamp + HZ / 10))) {
+				mptcp_pm_subflow_check_next(mptcp_sk(subflow->conn), ssk, subflow);
+
+				/* avoid triggering the PM multiple times due to timeout */
+				subflow->start_stamp = 0;
+			}
+		}
+
 		if (OPTION_MPTCP_MPJ_SYN & opts->suboptions) {
 			*ptr++ = mptcp_option(MPTCPOPT_MP_JOIN,
 					      TCPOLEN_MPTCP_MPJ_SYN,
diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
index d6d22f18c418..7e2c76924492 100644
--- a/net/mptcp/pm.c
+++ b/net/mptcp/pm.c
@@ -172,9 +172,28 @@ void mptcp_pm_subflow_established(struct mptcp_sock *msk)
 	spin_unlock_bh(&pm->lock);
 }
 
-void mptcp_pm_subflow_closed(struct mptcp_sock *msk, u8 id)
+void mptcp_pm_subflow_check_next(struct mptcp_sock *msk, const struct sock *ssk,
+				 const struct mptcp_subflow_context *subflow)
 {
-	pr_debug("msk=%p", msk);
+	struct mptcp_pm_data *pm = &msk->pm;
+	bool update_subflows;
+
+	update_subflows = (ssk->sk_state == TCP_CLOSE) &&
+			  (subflow->request_join || subflow->mp_join);
+	if (!READ_ONCE(pm->work_pending) && !update_subflows)
+		return;
+
+	spin_lock_bh(&pm->lock);
+	if (update_subflows)
+		pm->subflows--;
+
+	/* Even if this subflow is not really established, tell the PM to try
+	 * to pick the next one, if possible.
+	 */
+	if (mptcp_pm_nl_check_work_pending(msk))
+		mptcp_pm_schedule_work(msk, MPTCP_PM_SUBFLOW_ESTABLISHED);
+
+	spin_unlock_bh(&pm->lock);
 }
 
 void mptcp_pm_add_addr_received(struct mptcp_sock *msk,
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 1be7f92e9fc8..d908c0a0477b 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -251,14 +251,17 @@ unsigned int mptcp_pm_get_local_addr_max(struct mptcp_sock *msk)
 }
 EXPORT_SYMBOL_GPL(mptcp_pm_get_local_addr_max);
 
-static void check_work_pending(struct mptcp_sock *msk)
+bool mptcp_pm_nl_check_work_pending(struct mptcp_sock *msk)
 {
 	struct pm_nl_pernet *pernet = net_generic(sock_net((struct sock *)msk), pm_nl_pernet_id);
 
 	if (msk->pm.subflows == mptcp_pm_get_subflows_max(msk) ||
 	    (find_next_and_bit(pernet->id_bitmap, msk->pm.id_avail_bitmap, MAX_ADDR_ID + 1, 0) ==
-	     MAX_ADDR_ID + 1))
+	     MAX_ADDR_ID + 1)) {
 		WRITE_ONCE(msk->pm.work_pending, false);
+		return false;
+	}
+	return true;
 }
 
 struct mptcp_pm_add_entry *
@@ -503,12 +506,12 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
 
 	/* do lazy endpoint usage accounting for the MPC subflows */
 	if (unlikely(!(msk->pm.status & BIT(MPTCP_PM_MPC_ENDPOINT_ACCOUNTED))) && msk->first) {
-		struct mptcp_addr_info local;
+		struct mptcp_addr_info mpc_addr;
 		int mpc_id;
 
-		local_address((struct sock_common *)msk->first, &local);
-		mpc_id = lookup_id_by_addr(pernet, &local);
-		if (mpc_id < 0)
+		local_address((struct sock_common *)msk->first, &mpc_addr);
+		mpc_id = lookup_id_by_addr(pernet, &mpc_addr);
+		if (mpc_id >= 0)
 			__clear_bit(mpc_id, msk->pm.id_avail_bitmap);
 
 		msk->pm.status |= BIT(MPTCP_PM_MPC_ENDPOINT_ACCOUNTED);
@@ -553,7 +556,7 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
 			spin_lock_bh(&msk->pm.lock);
 		}
 	}
-	check_work_pending(msk);
+	mptcp_pm_nl_check_work_pending(msk);
 }
 
 static void mptcp_pm_nl_fully_established(struct mptcp_sock *msk)
@@ -760,11 +763,12 @@ static void mptcp_pm_nl_rm_addr_or_subflow(struct mptcp_sock *msk,
 				 i, rm_list->ids[i], subflow->local_id, subflow->remote_id);
 			spin_unlock_bh(&msk->pm.lock);
 			mptcp_subflow_shutdown(sk, ssk, how);
+
+			/* the following takes care of updating the subflows counter */
 			mptcp_close_ssk(sk, ssk, subflow);
 			spin_lock_bh(&msk->pm.lock);
 
 			removed = true;
-			msk->pm.subflows--;
 			__MPTCP_INC_STATS(sock_net(sk), rm_type);
 		}
 		__set_bit(rm_list->ids[1], msk->pm.id_avail_bitmap);
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 63602c68f03d..aa0a1fe749d9 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2351,6 +2351,12 @@ void mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 {
 	if (sk->sk_state == TCP_ESTABLISHED)
 		mptcp_event(MPTCP_EVENT_SUB_CLOSED, mptcp_sk(sk), ssk, GFP_KERNEL);
+
+	/* subflow aborted before reaching the fully_established status
+	 * attempt the creation of the next subflow
+	 */
+	mptcp_pm_subflow_check_next(mptcp_sk(sk), ssk, subflow);
+
 	__mptcp_close_ssk(sk, ssk, subflow, MPTCP_CF_PUSH);
 }
 
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index e63fe60f70b8..bd1165189233 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -443,6 +443,7 @@ struct mptcp_subflow_context {
 		stale : 1;	    /* unable to snd/rcv data, do not use for xmit */
 	enum mptcp_data_avail data_avail;
 	u32	remote_nonce;
+	u32	start_stamp;
 	u64	thmac;
 	u32	local_nonce;
 	u32	remote_token;
@@ -744,7 +745,9 @@ void mptcp_pm_fully_established(struct mptcp_sock *msk, const struct sock *ssk,
 bool mptcp_pm_allow_new_subflow(struct mptcp_sock *msk);
 void mptcp_pm_connection_closed(struct mptcp_sock *msk);
 void mptcp_pm_subflow_established(struct mptcp_sock *msk);
-void mptcp_pm_subflow_closed(struct mptcp_sock *msk, u8 id);
+bool mptcp_pm_nl_check_work_pending(struct mptcp_sock *msk);
+void mptcp_pm_subflow_check_next(struct mptcp_sock *msk, const struct sock *ssk,
+				 const struct mptcp_subflow_context *subflow);
 void mptcp_pm_add_addr_received(struct mptcp_sock *msk,
 				const struct mptcp_addr_info *addr);
 void mptcp_pm_add_addr_echoed(struct mptcp_sock *msk,
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 0f90bd61de01..76556743e952 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1273,7 +1273,8 @@ void __mptcp_error_report(struct sock *sk)
 
 static void subflow_error_report(struct sock *ssk)
 {
-	struct sock *sk = mptcp_subflow_ctx(ssk)->conn;
+	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
+	struct sock *sk = subflow->conn;
 
 	mptcp_data_lock(sk);
 	if (!sock_owned_by_user(sk))
@@ -1444,6 +1445,7 @@ int __mptcp_subflow_connect(struct sock *sk, const struct mptcp_addr_info *loc,
 	subflow->remote_id = remote_id;
 	subflow->request_join = 1;
 	subflow->request_bkup = !!(flags & MPTCP_PM_ADDR_FLAG_BACKUP);
+	subflow->start_stamp = tcp_jiffies32;
 	mptcp_info2sockaddr(remote, &addr, ssk->sk_family);
 
 	mptcp_add_pending_subflow(msk, subflow);
@@ -1461,6 +1463,7 @@ int __mptcp_subflow_connect(struct sock *sk, const struct mptcp_addr_info *loc,
 	spin_lock_bh(&msk->join_list_lock);
 	list_del(&subflow->node);
 	spin_unlock_bh(&msk->join_list_lock);
+	mptcp_pm_subflow_check_next(msk, ssk, subflow);
 	sock_put(mptcp_subflow_tcp_sock(subflow));
 
 failed:
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH mptcp-next v5 5/5] selftests: mptcp: add tests for subflow creation failure
  2021-12-09 17:30 [PATCH mptcp-next v5 0/5] mptcp: improve subflow creation on errors Paolo Abeni
                   ` (3 preceding siblings ...)
  2021-12-09 17:30 ` [PATCH mptcp-next v5 4/5] mptcp: do not block subflows creation on errors Paolo Abeni
@ 2021-12-09 17:30 ` Paolo Abeni
  2021-12-13 23:12   ` Mat Martineau
  4 siblings, 1 reply; 10+ messages in thread
From: Paolo Abeni @ 2021-12-09 17:30 UTC (permalink / raw)
  To: mptcp

Verify that, when multiple endpoints are available, subflows
creation proceed even when the first additional subflow creation
fails - due to packet drop on the relevant link

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v4 -> v5:
 - new test checking for wrong MPC reuse

v3 -> v4:
 - use $() where applicable - Matt
 - fix typo in help() - Matt

v1 -> v2:
 - add missing NF kconfig
 - add more test-cases (drop, later subflow creation)
---
 tools/testing/selftests/net/mptcp/config      |  1 +
 .../testing/selftests/net/mptcp/mptcp_join.sh | 74 ++++++++++++++++++-
 2 files changed, 74 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/mptcp/config b/tools/testing/selftests/net/mptcp/config
index 419e71560fd1..41e94e004bdc 100644
--- a/tools/testing/selftests/net/mptcp/config
+++ b/tools/testing/selftests/net/mptcp/config
@@ -18,4 +18,5 @@ CONFIG_NFT_TPROXY=m
 CONFIG_NFT_SOCKET=m
 CONFIG_IP_ADVANCED_ROUTER=y
 CONFIG_IP_MULTIPLE_TABLES=y
+CONFIG_IP_NF_TARGET_REJECT=m
 CONFIG_IPV6_MULTIPLE_TABLES=y
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index 526b05771d08..8f49bbd4a201 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -975,6 +975,22 @@ chk_link_usage()
 	fi
 }
 
+wait_for_tw()
+{
+	local timeout_ms=$((timeout_poll * 1000))
+	local time=0
+	local ns=$1
+
+	while [ $time -lt $timeout_ms ]; do
+		local cnt=$(ip netns exec $ns ss -t state time-wait |wc -l)
+
+		[ "$cnt" = 1 ] && return 1
+		time=$((time + 100))
+		sleep 0.1
+	done
+	return 1
+}
+
 subflows_tests()
 {
 	reset
@@ -1032,6 +1048,57 @@ subflows_tests()
 	chk_join_nr "single subflow, dev" 1 1 1
 }
 
+subflows_error_tests()
+{
+	# If a single subflow is configured, and matches the MPC src
+	# address, no additional subflow should be created
+	reset
+	ip netns exec $ns1 ./pm_nl_ctl limits 0 1
+	ip netns exec $ns2 ./pm_nl_ctl limits 0 1
+	ip netns exec $ns2 ./pm_nl_ctl add 10.0.1.2 flags subflow
+	run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow
+	chk_join_nr "no MPC address reuse with single endpoint" 0 0 0
+
+	# multiple subflows, with subflow creation error
+	reset
+	ip netns exec $ns1 ./pm_nl_ctl limits 0 2
+	ip netns exec $ns2 ./pm_nl_ctl limits 0 2
+	ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 flags subflow
+	ip netns exec $ns2 ./pm_nl_ctl add 10.0.2.2 flags subflow
+	ip netns exec $ns1 iptables -A INPUT -s 10.0.3.2 -p tcp -j REJECT
+	run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow
+	chk_join_nr "multiple subflows, with failing subflow" 1 1 1
+
+	# multiple subflows, with subflow timeout on MPJ
+	reset
+	ip netns exec $ns1 ./pm_nl_ctl limits 0 2
+	ip netns exec $ns2 ./pm_nl_ctl limits 0 2
+	ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 flags subflow
+	ip netns exec $ns2 ./pm_nl_ctl add 10.0.2.2 flags subflow
+	ip netns exec $ns1 iptables -A INPUT -s 10.0.3.2 -p tcp -j DROP
+	run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow
+	chk_join_nr "multiple subflows, with subflow timeout" 1 1 1
+
+	# multiple subflows, check that the endpoint corresponding to
+	# closed subflow (due to reset) is not reused if additional
+	# subflows are added later
+	reset
+	ip netns exec $ns1 ./pm_nl_ctl limits 0 1
+	ip netns exec $ns2 ./pm_nl_ctl limits 0 1
+	ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 flags subflow
+	ip netns exec $ns1 iptables -A INPUT -s 10.0.3.2 -p tcp -j REJECT
+	run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow &
+
+	# mpj subflow will be in TW after the reset
+	wait_for_tw $ns2
+	ip netns exec $ns2 ./pm_nl_ctl add 10.0.2.2 flags subflow
+	wait
+
+	# additional subflow could be created only if the PM select
+	# the later endpoint, skipping the already used one
+	chk_join_nr "multiple subflows, fair usage on close" 1 1 1
+}
+
 signal_address_tests()
 {
 	# add_address, unused
@@ -1843,6 +1910,7 @@ fullmesh_tests()
 all_tests()
 {
 	subflows_tests
+	subflows_error_tests
 	signal_address_tests
 	link_failure_tests
 	add_addr_timeout_tests
@@ -1862,6 +1930,7 @@ usage()
 {
 	echo "mptcp_join usage:"
 	echo "  -f subflows_tests"
+	echo "  -e subflows_error_tests"
 	echo "  -s signal_address_tests"
 	echo "  -l link_failure_tests"
 	echo "  -t add_addr_timeout_tests"
@@ -1910,11 +1979,14 @@ if [ $do_all_tests -eq 1 ]; then
 	exit $ret
 fi
 
-while getopts 'fsltra64bpkdmchCS' opt; do
+while getopts 'fesltra64bpkdmchCS' opt; do
 	case $opt in
 		f)
 			subflows_tests
 			;;
+		e)
+			subflows_error_tests
+			;;
 		s)
 			signal_address_tests
 			;;
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH mptcp-next v5 4/5] mptcp: do not block subflows creation on errors
  2021-12-09 17:30 ` [PATCH mptcp-next v5 4/5] mptcp: do not block subflows creation on errors Paolo Abeni
@ 2021-12-13 23:10   ` Mat Martineau
  2021-12-14 15:37     ` Paolo Abeni
  0 siblings, 1 reply; 10+ messages in thread
From: Mat Martineau @ 2021-12-13 23:10 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

On Thu, 9 Dec 2021, Paolo Abeni wrote:

> If the MPTCP configuration allows for multiple subflows
> creation, and the first additional subflows never reach
> the fully established status - e.g. due to packets drop or
> reset - the in kernel path manager do not move to the
> next subflow.
>
> This patch introduces a new PM helper to cope with MPJ
> subflow creation failure and delay and hook it where appropriate.
>
> Such helper triggers additional subflow creation, as needed
> and updates the PM subflow counter, if the current one is
> closing.
>
> Note that we don't need an additional timer to catch timeout
> and/or long delay in connection completion: we just need to
> measure the time elapsed since the subflow creation every
> time we emit an MPJ sub-option.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> v3 -> v4:
> - fix 'subflows' underflow
>
> v2 -> v3:
> - fix compile warning (CI)
>
> v1 -> v2:
> - explicitly hook on subflow close instead of error notification
> - fix checkpatch issue (Mat)
> - introduce and use a new pm helper to cope with subflow close
> - update pm subflow counters on close
> ---
> net/mptcp/options.c    | 21 ++++++++++++++++++---
> net/mptcp/pm.c         | 23 +++++++++++++++++++++--
> net/mptcp/pm_netlink.c | 20 ++++++++++++--------
> net/mptcp/protocol.c   |  6 ++++++
> net/mptcp/protocol.h   |  5 ++++-
> net/mptcp/subflow.c    |  5 ++++-
> 6 files changed, 65 insertions(+), 15 deletions(-)
>
> diff --git a/net/mptcp/options.c b/net/mptcp/options.c
> index 7a6a39b71633..00a8697addcd 100644
> --- a/net/mptcp/options.c
> +++ b/net/mptcp/options.c
> @@ -1257,10 +1257,10 @@ static u16 mptcp_make_csum(const struct mptcp_ext *mpext)
> void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp,
> 			 struct mptcp_out_options *opts)
> {
> -	if (unlikely(OPTION_MPTCP_FAIL & opts->suboptions)) {
> -		const struct sock *ssk = (const struct sock *)tp;
> -		struct mptcp_subflow_context *subflow;
> +	const struct sock *ssk = (const struct sock *)tp;
> +	struct mptcp_subflow_context *subflow;
>
> +	if (unlikely(OPTION_MPTCP_FAIL & opts->suboptions)) {
> 		subflow = mptcp_subflow_ctx(ssk);
> 		subflow->send_mp_fail = 0;
>
> @@ -1382,6 +1382,21 @@ void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp,
> 		/* MPC is additionally mutually exclusive with MP_PRIO */
> 		goto mp_capable_done;
> 	} else if (OPTIONS_MPTCP_MPJ & opts->suboptions) {
> +		if (ssk) {
> +			/* we are still in the MPJ handshake and "a lot" of time passed
> +			 * e.g. due to syn retranmissions. We can attempt next
> +			 * subflow creation
> +			 */
> +			subflow = mptcp_subflow_ctx(ssk);
> +			if (subflow->start_stamp &&
> +			    unlikely(after(tcp_jiffies32, subflow->start_stamp + HZ / 10))) {
> +				mptcp_pm_subflow_check_next(mptcp_sk(subflow->conn), ssk, subflow);
> +
> +				/* avoid triggering the PM multiple times due to timeout */
> +				subflow->start_stamp = 0;
> +			}
> +		}
> +

tcp_options_write() and the various helpers it calls are (so far) free 
from side effects and locking, and I don't think we want to introduce 
those now (__tcp_transmit_skb() seems like a sensitive code path to tinker 
with, even if it is only in this narrow case for packets with MPJ 
suboptions).

The msk sk_timer seems well-suited for this? I think we're only using it 
at disconnect time so there shouldn't be conflicts. What do you think?


Mat


> 		if (OPTION_MPTCP_MPJ_SYN & opts->suboptions) {
> 			*ptr++ = mptcp_option(MPTCPOPT_MP_JOIN,
> 					      TCPOLEN_MPTCP_MPJ_SYN,
> diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
> index d6d22f18c418..7e2c76924492 100644
> --- a/net/mptcp/pm.c
> +++ b/net/mptcp/pm.c
> @@ -172,9 +172,28 @@ void mptcp_pm_subflow_established(struct mptcp_sock *msk)
> 	spin_unlock_bh(&pm->lock);
> }
>
> -void mptcp_pm_subflow_closed(struct mptcp_sock *msk, u8 id)
> +void mptcp_pm_subflow_check_next(struct mptcp_sock *msk, const struct sock *ssk,
> +				 const struct mptcp_subflow_context *subflow)
> {
> -	pr_debug("msk=%p", msk);
> +	struct mptcp_pm_data *pm = &msk->pm;
> +	bool update_subflows;
> +
> +	update_subflows = (ssk->sk_state == TCP_CLOSE) &&
> +			  (subflow->request_join || subflow->mp_join);
> +	if (!READ_ONCE(pm->work_pending) && !update_subflows)
> +		return;
> +
> +	spin_lock_bh(&pm->lock);
> +	if (update_subflows)
> +		pm->subflows--;
> +
> +	/* Even if this subflow is not really established, tell the PM to try
> +	 * to pick the next one, if possible.
> +	 */
> +	if (mptcp_pm_nl_check_work_pending(msk))
> +		mptcp_pm_schedule_work(msk, MPTCP_PM_SUBFLOW_ESTABLISHED);
> +
> +	spin_unlock_bh(&pm->lock);
> }
>
> void mptcp_pm_add_addr_received(struct mptcp_sock *msk,
> diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
> index 1be7f92e9fc8..d908c0a0477b 100644
> --- a/net/mptcp/pm_netlink.c
> +++ b/net/mptcp/pm_netlink.c
> @@ -251,14 +251,17 @@ unsigned int mptcp_pm_get_local_addr_max(struct mptcp_sock *msk)
> }
> EXPORT_SYMBOL_GPL(mptcp_pm_get_local_addr_max);
>
> -static void check_work_pending(struct mptcp_sock *msk)
> +bool mptcp_pm_nl_check_work_pending(struct mptcp_sock *msk)
> {
> 	struct pm_nl_pernet *pernet = net_generic(sock_net((struct sock *)msk), pm_nl_pernet_id);
>
> 	if (msk->pm.subflows == mptcp_pm_get_subflows_max(msk) ||
> 	    (find_next_and_bit(pernet->id_bitmap, msk->pm.id_avail_bitmap, MAX_ADDR_ID + 1, 0) ==
> -	     MAX_ADDR_ID + 1))
> +	     MAX_ADDR_ID + 1)) {
> 		WRITE_ONCE(msk->pm.work_pending, false);
> +		return false;
> +	}
> +	return true;
> }
>
> struct mptcp_pm_add_entry *
> @@ -503,12 +506,12 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
>
> 	/* do lazy endpoint usage accounting for the MPC subflows */
> 	if (unlikely(!(msk->pm.status & BIT(MPTCP_PM_MPC_ENDPOINT_ACCOUNTED))) && msk->first) {
> -		struct mptcp_addr_info local;
> +		struct mptcp_addr_info mpc_addr;
> 		int mpc_id;
>
> -		local_address((struct sock_common *)msk->first, &local);
> -		mpc_id = lookup_id_by_addr(pernet, &local);
> -		if (mpc_id < 0)
> +		local_address((struct sock_common *)msk->first, &mpc_addr);
> +		mpc_id = lookup_id_by_addr(pernet, &mpc_addr);
> +		if (mpc_id >= 0)
> 			__clear_bit(mpc_id, msk->pm.id_avail_bitmap);
>
> 		msk->pm.status |= BIT(MPTCP_PM_MPC_ENDPOINT_ACCOUNTED);
> @@ -553,7 +556,7 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
> 			spin_lock_bh(&msk->pm.lock);
> 		}
> 	}
> -	check_work_pending(msk);
> +	mptcp_pm_nl_check_work_pending(msk);
> }
>
> static void mptcp_pm_nl_fully_established(struct mptcp_sock *msk)
> @@ -760,11 +763,12 @@ static void mptcp_pm_nl_rm_addr_or_subflow(struct mptcp_sock *msk,
> 				 i, rm_list->ids[i], subflow->local_id, subflow->remote_id);
> 			spin_unlock_bh(&msk->pm.lock);
> 			mptcp_subflow_shutdown(sk, ssk, how);
> +
> +			/* the following takes care of updating the subflows counter */
> 			mptcp_close_ssk(sk, ssk, subflow);
> 			spin_lock_bh(&msk->pm.lock);
>
> 			removed = true;
> -			msk->pm.subflows--;
> 			__MPTCP_INC_STATS(sock_net(sk), rm_type);
> 		}
> 		__set_bit(rm_list->ids[1], msk->pm.id_avail_bitmap);
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 63602c68f03d..aa0a1fe749d9 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -2351,6 +2351,12 @@ void mptcp_close_ssk(struct sock *sk, struct sock *ssk,
> {
> 	if (sk->sk_state == TCP_ESTABLISHED)
> 		mptcp_event(MPTCP_EVENT_SUB_CLOSED, mptcp_sk(sk), ssk, GFP_KERNEL);
> +
> +	/* subflow aborted before reaching the fully_established status
> +	 * attempt the creation of the next subflow
> +	 */
> +	mptcp_pm_subflow_check_next(mptcp_sk(sk), ssk, subflow);
> +
> 	__mptcp_close_ssk(sk, ssk, subflow, MPTCP_CF_PUSH);
> }
>
> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> index e63fe60f70b8..bd1165189233 100644
> --- a/net/mptcp/protocol.h
> +++ b/net/mptcp/protocol.h
> @@ -443,6 +443,7 @@ struct mptcp_subflow_context {
> 		stale : 1;	    /* unable to snd/rcv data, do not use for xmit */
> 	enum mptcp_data_avail data_avail;
> 	u32	remote_nonce;
> +	u32	start_stamp;
> 	u64	thmac;
> 	u32	local_nonce;
> 	u32	remote_token;
> @@ -744,7 +745,9 @@ void mptcp_pm_fully_established(struct mptcp_sock *msk, const struct sock *ssk,
> bool mptcp_pm_allow_new_subflow(struct mptcp_sock *msk);
> void mptcp_pm_connection_closed(struct mptcp_sock *msk);
> void mptcp_pm_subflow_established(struct mptcp_sock *msk);
> -void mptcp_pm_subflow_closed(struct mptcp_sock *msk, u8 id);
> +bool mptcp_pm_nl_check_work_pending(struct mptcp_sock *msk);
> +void mptcp_pm_subflow_check_next(struct mptcp_sock *msk, const struct sock *ssk,
> +				 const struct mptcp_subflow_context *subflow);
> void mptcp_pm_add_addr_received(struct mptcp_sock *msk,
> 				const struct mptcp_addr_info *addr);
> void mptcp_pm_add_addr_echoed(struct mptcp_sock *msk,
> diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> index 0f90bd61de01..76556743e952 100644
> --- a/net/mptcp/subflow.c
> +++ b/net/mptcp/subflow.c
> @@ -1273,7 +1273,8 @@ void __mptcp_error_report(struct sock *sk)
>
> static void subflow_error_report(struct sock *ssk)
> {
> -	struct sock *sk = mptcp_subflow_ctx(ssk)->conn;
> +	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
> +	struct sock *sk = subflow->conn;
>
> 	mptcp_data_lock(sk);
> 	if (!sock_owned_by_user(sk))
> @@ -1444,6 +1445,7 @@ int __mptcp_subflow_connect(struct sock *sk, const struct mptcp_addr_info *loc,
> 	subflow->remote_id = remote_id;
> 	subflow->request_join = 1;
> 	subflow->request_bkup = !!(flags & MPTCP_PM_ADDR_FLAG_BACKUP);
> +	subflow->start_stamp = tcp_jiffies32;
> 	mptcp_info2sockaddr(remote, &addr, ssk->sk_family);
>
> 	mptcp_add_pending_subflow(msk, subflow);
> @@ -1461,6 +1463,7 @@ int __mptcp_subflow_connect(struct sock *sk, const struct mptcp_addr_info *loc,
> 	spin_lock_bh(&msk->join_list_lock);
> 	list_del(&subflow->node);
> 	spin_unlock_bh(&msk->join_list_lock);
> +	mptcp_pm_subflow_check_next(msk, ssk, subflow);
> 	sock_put(mptcp_subflow_tcp_sock(subflow));
>
> failed:
> -- 
> 2.33.1
>
>
>

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH mptcp-next v5 5/5] selftests: mptcp: add tests for subflow creation failure
  2021-12-09 17:30 ` [PATCH mptcp-next v5 5/5] selftests: mptcp: add tests for subflow creation failure Paolo Abeni
@ 2021-12-13 23:12   ` Mat Martineau
  2021-12-14 11:56     ` Paolo Abeni
  0 siblings, 1 reply; 10+ messages in thread
From: Mat Martineau @ 2021-12-13 23:12 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

On Thu, 9 Dec 2021, Paolo Abeni wrote:

> Verify that, when multiple endpoints are available, subflows
> creation proceed even when the first additional subflow creation
> fails - due to packet drop on the relevant link
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> v4 -> v5:
> - new test checking for wrong MPC reuse
>
> v3 -> v4:
> - use $() where applicable - Matt
> - fix typo in help() - Matt
>
> v1 -> v2:
> - add missing NF kconfig
> - add more test-cases (drop, later subflow creation)
> ---
> tools/testing/selftests/net/mptcp/config      |  1 +
> .../testing/selftests/net/mptcp/mptcp_join.sh | 74 ++++++++++++++++++-
> 2 files changed, 74 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/net/mptcp/config b/tools/testing/selftests/net/mptcp/config
> index 419e71560fd1..41e94e004bdc 100644
> --- a/tools/testing/selftests/net/mptcp/config
> +++ b/tools/testing/selftests/net/mptcp/config
> @@ -18,4 +18,5 @@ CONFIG_NFT_TPROXY=m
> CONFIG_NFT_SOCKET=m
> CONFIG_IP_ADVANCED_ROUTER=y
> CONFIG_IP_MULTIPLE_TABLES=y
> +CONFIG_IP_NF_TARGET_REJECT=m
> CONFIG_IPV6_MULTIPLE_TABLES=y
> diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
> index 526b05771d08..8f49bbd4a201 100755
> --- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
> +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
> @@ -975,6 +975,22 @@ chk_link_usage()
> 	fi
> }
>
> +wait_for_tw()
> +{
> +	local timeout_ms=$((timeout_poll * 1000))
> +	local time=0
> +	local ns=$1
> +
> +	while [ $time -lt $timeout_ms ]; do
> +		local cnt=$(ip netns exec $ns ss -t state time-wait |wc -l)
> +
> +		[ "$cnt" = 1 ] && return 1
> +		time=$((time + 100))
> +		sleep 0.1
> +	done
> +	return 1
> +}
> +
> subflows_tests()
> {
> 	reset
> @@ -1032,6 +1048,57 @@ subflows_tests()
> 	chk_join_nr "single subflow, dev" 1 1 1
> }
>
> +subflows_error_tests()
> +{
> +	# If a single subflow is configured, and matches the MPC src
> +	# address, no additional subflow should be created
> +	reset
> +	ip netns exec $ns1 ./pm_nl_ctl limits 0 1
> +	ip netns exec $ns2 ./pm_nl_ctl limits 0 1
> +	ip netns exec $ns2 ./pm_nl_ctl add 10.0.1.2 flags subflow
> +	run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow
> +	chk_join_nr "no MPC address reuse with single endpoint" 0 0 0
> +
> +	# multiple subflows, with subflow creation error
> +	reset
> +	ip netns exec $ns1 ./pm_nl_ctl limits 0 2
> +	ip netns exec $ns2 ./pm_nl_ctl limits 0 2
> +	ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 flags subflow
> +	ip netns exec $ns2 ./pm_nl_ctl add 10.0.2.2 flags subflow
> +	ip netns exec $ns1 iptables -A INPUT -s 10.0.3.2 -p tcp -j REJECT
> +	run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow
> +	chk_join_nr "multiple subflows, with failing subflow" 1 1 1
> +
> +	# multiple subflows, with subflow timeout on MPJ
> +	reset
> +	ip netns exec $ns1 ./pm_nl_ctl limits 0 2
> +	ip netns exec $ns2 ./pm_nl_ctl limits 0 2
> +	ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 flags subflow
> +	ip netns exec $ns2 ./pm_nl_ctl add 10.0.2.2 flags subflow
> +	ip netns exec $ns1 iptables -A INPUT -s 10.0.3.2 -p tcp -j DROP
> +	run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow
> +	chk_join_nr "multiple subflows, with subflow timeout" 1 1 1
> +
> +	# multiple subflows, check that the endpoint corresponding to
> +	# closed subflow (due to reset) is not reused if additional
> +	# subflows are added later
> +	reset
> +	ip netns exec $ns1 ./pm_nl_ctl limits 0 1
> +	ip netns exec $ns2 ./pm_nl_ctl limits 0 1
> +	ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 flags subflow
> +	ip netns exec $ns1 iptables -A INPUT -s 10.0.3.2 -p tcp -j REJECT
> +	run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow &

Still seeing the problem I reported in v4 where the test results are not 
correctly numbered. I think $TEST_COUNT needs to be bumped here so the 
next results are numbered correctly.

Mat


> +
> +	# mpj subflow will be in TW after the reset
> +	wait_for_tw $ns2
> +	ip netns exec $ns2 ./pm_nl_ctl add 10.0.2.2 flags subflow
> +	wait
> +
> +	# additional subflow could be created only if the PM select
> +	# the later endpoint, skipping the already used one
> +	chk_join_nr "multiple subflows, fair usage on close" 1 1 1
> +}
> +
> signal_address_tests()
> {
> 	# add_address, unused
> @@ -1843,6 +1910,7 @@ fullmesh_tests()
> all_tests()
> {
> 	subflows_tests
> +	subflows_error_tests
> 	signal_address_tests
> 	link_failure_tests
> 	add_addr_timeout_tests
> @@ -1862,6 +1930,7 @@ usage()
> {
> 	echo "mptcp_join usage:"
> 	echo "  -f subflows_tests"
> +	echo "  -e subflows_error_tests"
> 	echo "  -s signal_address_tests"
> 	echo "  -l link_failure_tests"
> 	echo "  -t add_addr_timeout_tests"
> @@ -1910,11 +1979,14 @@ if [ $do_all_tests -eq 1 ]; then
> 	exit $ret
> fi
>
> -while getopts 'fsltra64bpkdmchCS' opt; do
> +while getopts 'fesltra64bpkdmchCS' opt; do
> 	case $opt in
> 		f)
> 			subflows_tests
> 			;;
> +		e)
> +			subflows_error_tests
> +			;;
> 		s)
> 			signal_address_tests
> 			;;
> -- 
> 2.33.1
>
>
>

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH mptcp-next v5 5/5] selftests: mptcp: add tests for subflow creation failure
  2021-12-13 23:12   ` Mat Martineau
@ 2021-12-14 11:56     ` Paolo Abeni
  0 siblings, 0 replies; 10+ messages in thread
From: Paolo Abeni @ 2021-12-14 11:56 UTC (permalink / raw)
  To: Mat Martineau; +Cc: mptcp

On Mon, 2021-12-13 at 15:12 -0800, Mat Martineau wrote:
> On Thu, 9 Dec 2021, Paolo Abeni wrote:
> 
> > Verify that, when multiple endpoints are available, subflows
> > creation proceed even when the first additional subflow creation
> > fails - due to packet drop on the relevant link
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> > v4 -> v5:
> > - new test checking for wrong MPC reuse
> > 
> > v3 -> v4:
> > - use $() where applicable - Matt
> > - fix typo in help() - Matt
> > 
> > v1 -> v2:
> > - add missing NF kconfig
> > - add more test-cases (drop, later subflow creation)
> > ---
> > tools/testing/selftests/net/mptcp/config      |  1 +
> > .../testing/selftests/net/mptcp/mptcp_join.sh | 74 ++++++++++++++++++-
> > 2 files changed, 74 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tools/testing/selftests/net/mptcp/config b/tools/testing/selftests/net/mptcp/config
> > index 419e71560fd1..41e94e004bdc 100644
> > --- a/tools/testing/selftests/net/mptcp/config
> > +++ b/tools/testing/selftests/net/mptcp/config
> > @@ -18,4 +18,5 @@ CONFIG_NFT_TPROXY=m
> > CONFIG_NFT_SOCKET=m
> > CONFIG_IP_ADVANCED_ROUTER=y
> > CONFIG_IP_MULTIPLE_TABLES=y
> > +CONFIG_IP_NF_TARGET_REJECT=m
> > CONFIG_IPV6_MULTIPLE_TABLES=y
> > diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
> > index 526b05771d08..8f49bbd4a201 100755
> > --- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
> > +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
> > @@ -975,6 +975,22 @@ chk_link_usage()
> > 	fi
> > }
> > 
> > +wait_for_tw()
> > +{
> > +	local timeout_ms=$((timeout_poll * 1000))
> > +	local time=0
> > +	local ns=$1
> > +
> > +	while [ $time -lt $timeout_ms ]; do
> > +		local cnt=$(ip netns exec $ns ss -t state time-wait |wc -l)
> > +
> > +		[ "$cnt" = 1 ] && return 1
> > +		time=$((time + 100))
> > +		sleep 0.1
> > +	done
> > +	return 1
> > +}
> > +
> > subflows_tests()
> > {
> > 	reset
> > @@ -1032,6 +1048,57 @@ subflows_tests()
> > 	chk_join_nr "single subflow, dev" 1 1 1
> > }
> > 
> > +subflows_error_tests()
> > +{
> > +	# If a single subflow is configured, and matches the MPC src
> > +	# address, no additional subflow should be created
> > +	reset
> > +	ip netns exec $ns1 ./pm_nl_ctl limits 0 1
> > +	ip netns exec $ns2 ./pm_nl_ctl limits 0 1
> > +	ip netns exec $ns2 ./pm_nl_ctl add 10.0.1.2 flags subflow
> > +	run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow
> > +	chk_join_nr "no MPC address reuse with single endpoint" 0 0 0
> > +
> > +	# multiple subflows, with subflow creation error
> > +	reset
> > +	ip netns exec $ns1 ./pm_nl_ctl limits 0 2
> > +	ip netns exec $ns2 ./pm_nl_ctl limits 0 2
> > +	ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 flags subflow
> > +	ip netns exec $ns2 ./pm_nl_ctl add 10.0.2.2 flags subflow
> > +	ip netns exec $ns1 iptables -A INPUT -s 10.0.3.2 -p tcp -j REJECT
> > +	run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow
> > +	chk_join_nr "multiple subflows, with failing subflow" 1 1 1
> > +
> > +	# multiple subflows, with subflow timeout on MPJ
> > +	reset
> > +	ip netns exec $ns1 ./pm_nl_ctl limits 0 2
> > +	ip netns exec $ns2 ./pm_nl_ctl limits 0 2
> > +	ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 flags subflow
> > +	ip netns exec $ns2 ./pm_nl_ctl add 10.0.2.2 flags subflow
> > +	ip netns exec $ns1 iptables -A INPUT -s 10.0.3.2 -p tcp -j DROP
> > +	run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow
> > +	chk_join_nr "multiple subflows, with subflow timeout" 1 1 1
> > +
> > +	# multiple subflows, check that the endpoint corresponding to
> > +	# closed subflow (due to reset) is not reused if additional
> > +	# subflows are added later
> > +	reset
> > +	ip netns exec $ns1 ./pm_nl_ctl limits 0 1
> > +	ip netns exec $ns2 ./pm_nl_ctl limits 0 1
> > +	ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 flags subflow
> > +	ip netns exec $ns1 iptables -A INPUT -s 10.0.3.2 -p tcp -j REJECT
> > +	run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow &
> 
> Still seeing the problem I reported in v4 where the test results are not 
> correctly numbered. I think $TEST_COUNT needs to be bumped here so the 
> next results are numbered correctly.

Oh, sorry, I missed that part. I'll cover in the next iteration.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH mptcp-next v5 4/5] mptcp: do not block subflows creation on errors
  2021-12-13 23:10   ` Mat Martineau
@ 2021-12-14 15:37     ` Paolo Abeni
  0 siblings, 0 replies; 10+ messages in thread
From: Paolo Abeni @ 2021-12-14 15:37 UTC (permalink / raw)
  To: Mat Martineau; +Cc: mptcp

On Mon, 2021-12-13 at 15:10 -0800, Mat Martineau wrote:
> On Thu, 9 Dec 2021, Paolo Abeni wrote:
> 
> > If the MPTCP configuration allows for multiple subflows
> > creation, and the first additional subflows never reach
> > the fully established status - e.g. due to packets drop or
> > reset - the in kernel path manager do not move to the
> > next subflow.
> > 
> > This patch introduces a new PM helper to cope with MPJ
> > subflow creation failure and delay and hook it where appropriate.
> > 
> > Such helper triggers additional subflow creation, as needed
> > and updates the PM subflow counter, if the current one is
> > closing.
> > 
> > Note that we don't need an additional timer to catch timeout
> > and/or long delay in connection completion: we just need to
> > measure the time elapsed since the subflow creation every
> > time we emit an MPJ sub-option.
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> > v3 -> v4:
> > - fix 'subflows' underflow
> > 
> > v2 -> v3:
> > - fix compile warning (CI)
> > 
> > v1 -> v2:
> > - explicitly hook on subflow close instead of error notification
> > - fix checkpatch issue (Mat)
> > - introduce and use a new pm helper to cope with subflow close
> > - update pm subflow counters on close
> > ---
> > net/mptcp/options.c    | 21 ++++++++++++++++++---
> > net/mptcp/pm.c         | 23 +++++++++++++++++++++--
> > net/mptcp/pm_netlink.c | 20 ++++++++++++--------
> > net/mptcp/protocol.c   |  6 ++++++
> > net/mptcp/protocol.h   |  5 ++++-
> > net/mptcp/subflow.c    |  5 ++++-
> > 6 files changed, 65 insertions(+), 15 deletions(-)
> > 
> > diff --git a/net/mptcp/options.c b/net/mptcp/options.c
> > index 7a6a39b71633..00a8697addcd 100644
> > --- a/net/mptcp/options.c
> > +++ b/net/mptcp/options.c
> > @@ -1257,10 +1257,10 @@ static u16 mptcp_make_csum(const struct mptcp_ext *mpext)
> > void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp,
> > 			 struct mptcp_out_options *opts)
> > {
> > -	if (unlikely(OPTION_MPTCP_FAIL & opts->suboptions)) {
> > -		const struct sock *ssk = (const struct sock *)tp;
> > -		struct mptcp_subflow_context *subflow;
> > +	const struct sock *ssk = (const struct sock *)tp;
> > +	struct mptcp_subflow_context *subflow;
> > 
> > +	if (unlikely(OPTION_MPTCP_FAIL & opts->suboptions)) {
> > 		subflow = mptcp_subflow_ctx(ssk);
> > 		subflow->send_mp_fail = 0;
> > 
> > @@ -1382,6 +1382,21 @@ void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp,
> > 		/* MPC is additionally mutually exclusive with MP_PRIO */
> > 		goto mp_capable_done;
> > 	} else if (OPTIONS_MPTCP_MPJ & opts->suboptions) {
> > +		if (ssk) {
> > +			/* we are still in the MPJ handshake and "a lot" of time passed
> > +			 * e.g. due to syn retranmissions. We can attempt next
> > +			 * subflow creation
> > +			 */
> > +			subflow = mptcp_subflow_ctx(ssk);
> > +			if (subflow->start_stamp &&
> > +			    unlikely(after(tcp_jiffies32, subflow->start_stamp + HZ / 10))) {
> > +				mptcp_pm_subflow_check_next(mptcp_sk(subflow->conn), ssk, subflow);
> > +
> > +				/* avoid triggering the PM multiple times due to timeout */
> > +				subflow->start_stamp = 0;
> > +			}
> > +		}
> > +
> 
> tcp_options_write() and the various helpers it calls are (so far) free 
> from side effects and locking, and I don't think we want to introduce 
> those now (__tcp_transmit_skb() seems like a sensitive code path to tinker 
> with, even if it is only in this narrow case for packets with MPJ 
> suboptions).
> 
> The msk sk_timer seems well-suited for this? I think we're only using it 
> at disconnect time so there shouldn't be conflicts. What do you think?

I really prefer to avoid using another timer.

I'll try a different approach: creating all the subflows at once - as
per Matt suggestion - so that we will not need such hook at all.

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-12-14 15:37 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-09 17:30 [PATCH mptcp-next v5 0/5] mptcp: improve subflow creation on errors Paolo Abeni
2021-12-09 17:30 ` [PATCH mptcp-next v5 1/5] mptcp: fix per socket endpoint accounting Paolo Abeni
2021-12-09 17:30 ` [PATCH mptcp-next v5 2/5] mptcp: clean-up MPJ option writing Paolo Abeni
2021-12-09 17:30 ` [PATCH mptcp-next v5 3/5] mptcp: keep track of local endpoint still available for each msk Paolo Abeni
2021-12-09 17:30 ` [PATCH mptcp-next v5 4/5] mptcp: do not block subflows creation on errors Paolo Abeni
2021-12-13 23:10   ` Mat Martineau
2021-12-14 15:37     ` Paolo Abeni
2021-12-09 17:30 ` [PATCH mptcp-next v5 5/5] selftests: mptcp: add tests for subflow creation failure Paolo Abeni
2021-12-13 23:12   ` Mat Martineau
2021-12-14 11:56     ` Paolo Abeni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.