All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH mptcp-next v4 0/5] mptcp: improve subflow creation on errors
@ 2021-12-03 15:19 Paolo Abeni
  2021-12-03 15:19 ` [PATCH mptcp-next v4 1/5] mptcp: fix per socket endpoint accounting Paolo Abeni
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Paolo Abeni @ 2021-12-03 15:19 UTC (permalink / raw)
  To: mptcp

This iteration addresses the issue catched by the pktdrill
test-cases - namely an underflow on pm->subflows counter.
Additionally includes a related fix on pm counters accounting.

Currently when a subflow connection fails - either the TCP
connection is reset by the peer, or the MPJ handshake never
completes - the in kernel PM don't perform any further action.
Notably no additional subflow creation is attempted.

This series is aimed at improving the in-kernel path manager
behavior in the above scenario:
- each endpoint is used only once per connection (patch 2/4)
- on failure we try to move to the next subflow, if any
  (patch 3/4)

The first patch is minor cleanup, and the last patch adds
specific self-tests.

This should address/close:
https://github.com/multipath-tcp/mptcp_net-next/issues/235
https://github.com/multipath-tcp/mptcp_net-next/issues/242

Paolo Abeni (5):
  mptcp: fix per socket endpoint accounting.
  mptcp: clean-up MPJ option writing.
  mptcp: keep track of local endpoint still available for each msk
  mptcp: do not block subflows creation on errors
  selftests: mptcp: add tests for subflow creation failure

 net/mptcp/options.c                           | 65 +++++++++------
 net/mptcp/pm.c                                | 24 +++++-
 net/mptcp/pm_netlink.c                        | 83 ++++++++++---------
 net/mptcp/protocol.c                          |  6 ++
 net/mptcp/protocol.h                          |  8 +-
 net/mptcp/subflow.c                           |  5 +-
 tools/testing/selftests/net/mptcp/config      |  1 +
 .../testing/selftests/net/mptcp/mptcp_join.sh | 65 ++++++++++++++-
 8 files changed, 187 insertions(+), 70 deletions(-)

-- 
2.33.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH mptcp-next v4 1/5] mptcp: fix per socket endpoint accounting.
  2021-12-03 15:19 [PATCH mptcp-next v4 0/5] mptcp: improve subflow creation on errors Paolo Abeni
@ 2021-12-03 15:19 ` Paolo Abeni
  2021-12-03 15:19 ` [PATCH mptcp-next v4 2/5] mptcp: clean-up MPJ option writing Paolo Abeni
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Paolo Abeni @ 2021-12-03 15:19 UTC (permalink / raw)
  To: mptcp

Since full-mesh endpoint support, the reception of a single ADD_ADDR
option can cause multiple subflows creation. When such option is
accepted we increment 'add_addr_accepted' by one. When we received
a paired RM_ADDR option, we deleted all the relevant subflows,
decrementing 'add_addr_accepted' by one for each of them.

We have a similar issue for 'local_addr_used'

Fix them moving the pm endpoint accounting outside the subflow
traversal.

Fixes: 1a0d6136c5f0 ("mptcp: local addresses fullmesh")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/pm_netlink.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 3186d33b5208..d18b13e3e74c 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -710,6 +710,8 @@ static void mptcp_pm_nl_rm_addr_or_subflow(struct mptcp_sock *msk,
 		return;
 
 	for (i = 0; i < rm_list->nr; i++) {
+		bool removed = false;
+
 		list_for_each_entry_safe(subflow, tmp, &msk->conn_list, node) {
 			struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
 			int how = RCV_SHUTDOWN | SEND_SHUTDOWN;
@@ -729,15 +731,19 @@ static void mptcp_pm_nl_rm_addr_or_subflow(struct mptcp_sock *msk,
 			mptcp_close_ssk(sk, ssk, subflow);
 			spin_lock_bh(&msk->pm.lock);
 
-			if (rm_type == MPTCP_MIB_RMADDR) {
-				msk->pm.add_addr_accepted--;
-				WRITE_ONCE(msk->pm.accept_addr, true);
-			} else if (rm_type == MPTCP_MIB_RMSUBFLOW) {
-				msk->pm.local_addr_used--;
-			}
+			removed = true;
 			msk->pm.subflows--;
 			__MPTCP_INC_STATS(sock_net(sk), rm_type);
 		}
+		if (!removed)
+			continue;
+
+		if (rm_type == MPTCP_MIB_RMADDR) {
+			msk->pm.add_addr_accepted--;
+			WRITE_ONCE(msk->pm.accept_addr, true);
+		} else if (rm_type == MPTCP_MIB_RMSUBFLOW) {
+			msk->pm.local_addr_used--;
+		}
 	}
 }
 
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH mptcp-next v4 2/5] mptcp: clean-up MPJ option writing.
  2021-12-03 15:19 [PATCH mptcp-next v4 0/5] mptcp: improve subflow creation on errors Paolo Abeni
  2021-12-03 15:19 ` [PATCH mptcp-next v4 1/5] mptcp: fix per socket endpoint accounting Paolo Abeni
@ 2021-12-03 15:19 ` Paolo Abeni
  2021-12-03 15:19 ` [PATCH mptcp-next v4 3/5] mptcp: keep track of local endpoint still available for each msk Paolo Abeni
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Paolo Abeni @ 2021-12-03 15:19 UTC (permalink / raw)
  To: mptcp

Check for all MPJ variant at once, this reduces the number
of conditionals traversed on average and will simplify the
next patch.

No functional change intended.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/options.c | 44 +++++++++++++++++++++++---------------------
 1 file changed, 23 insertions(+), 21 deletions(-)

diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 8a1020e4285c..7a6a39b71633 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -1381,27 +1381,29 @@ void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp,
 
 		/* MPC is additionally mutually exclusive with MP_PRIO */
 		goto mp_capable_done;
-	} else if (OPTION_MPTCP_MPJ_SYN & opts->suboptions) {
-		*ptr++ = mptcp_option(MPTCPOPT_MP_JOIN,
-				      TCPOLEN_MPTCP_MPJ_SYN,
-				      opts->backup, opts->join_id);
-		put_unaligned_be32(opts->token, ptr);
-		ptr += 1;
-		put_unaligned_be32(opts->nonce, ptr);
-		ptr += 1;
-	} else if (OPTION_MPTCP_MPJ_SYNACK & opts->suboptions) {
-		*ptr++ = mptcp_option(MPTCPOPT_MP_JOIN,
-				      TCPOLEN_MPTCP_MPJ_SYNACK,
-				      opts->backup, opts->join_id);
-		put_unaligned_be64(opts->thmac, ptr);
-		ptr += 2;
-		put_unaligned_be32(opts->nonce, ptr);
-		ptr += 1;
-	} else if (OPTION_MPTCP_MPJ_ACK & opts->suboptions) {
-		*ptr++ = mptcp_option(MPTCPOPT_MP_JOIN,
-				      TCPOLEN_MPTCP_MPJ_ACK, 0, 0);
-		memcpy(ptr, opts->hmac, MPTCPOPT_HMAC_LEN);
-		ptr += 5;
+	} else if (OPTIONS_MPTCP_MPJ & opts->suboptions) {
+		if (OPTION_MPTCP_MPJ_SYN & opts->suboptions) {
+			*ptr++ = mptcp_option(MPTCPOPT_MP_JOIN,
+					      TCPOLEN_MPTCP_MPJ_SYN,
+					      opts->backup, opts->join_id);
+			put_unaligned_be32(opts->token, ptr);
+			ptr += 1;
+			put_unaligned_be32(opts->nonce, ptr);
+			ptr += 1;
+		} else if (OPTION_MPTCP_MPJ_SYNACK & opts->suboptions) {
+			*ptr++ = mptcp_option(MPTCPOPT_MP_JOIN,
+					      TCPOLEN_MPTCP_MPJ_SYNACK,
+					      opts->backup, opts->join_id);
+			put_unaligned_be64(opts->thmac, ptr);
+			ptr += 2;
+			put_unaligned_be32(opts->nonce, ptr);
+			ptr += 1;
+		} else {
+			*ptr++ = mptcp_option(MPTCPOPT_MP_JOIN,
+					      TCPOLEN_MPTCP_MPJ_ACK, 0, 0);
+			memcpy(ptr, opts->hmac, MPTCPOPT_HMAC_LEN);
+			ptr += 5;
+		}
 	} else if (OPTION_MPTCP_ADD_ADDR & opts->suboptions) {
 		u8 len = TCPOLEN_MPTCP_ADD_ADDR_BASE;
 		u8 echo = MPTCP_ADDR_ECHO;
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH mptcp-next v4 3/5] mptcp: keep track of local endpoint still available for each msk
  2021-12-03 15:19 [PATCH mptcp-next v4 0/5] mptcp: improve subflow creation on errors Paolo Abeni
  2021-12-03 15:19 ` [PATCH mptcp-next v4 1/5] mptcp: fix per socket endpoint accounting Paolo Abeni
  2021-12-03 15:19 ` [PATCH mptcp-next v4 2/5] mptcp: clean-up MPJ option writing Paolo Abeni
@ 2021-12-03 15:19 ` Paolo Abeni
  2021-12-04  1:05   ` Mat Martineau
  2021-12-03 15:19 ` [PATCH mptcp-next v4 4/5] mptcp: do not block subflows creation on errors Paolo Abeni
  2021-12-03 15:19 ` [PATCH mptcp-next v4 5/5] selftests: mptcp: add tests for subflow creation failure Paolo Abeni
  4 siblings, 1 reply; 13+ messages in thread
From: Paolo Abeni @ 2021-12-03 15:19 UTC (permalink / raw)
  To: mptcp

Include into the path manager status a bitmap tracking the list
of local endpoints still available - not yet used - for the
relavant mptcp socket.
Keep such map updated at endpoint creation/deleteion time, so
that we can easily skip already used endpoint at local address
selection time.
This allows for fair local endpoints usage in case of subflow
failure.

As a side effect, this patch also enforces that each endpoint
is used at most once for each mptcp connection.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v3 -> v4:
 - track the available (not yet used) id instead of the used ones
 - track both
 - cleanup duplication with id_bitmap
---
 net/mptcp/pm.c         |  1 +
 net/mptcp/pm_netlink.c | 57 ++++++++++++++++++------------------------
 net/mptcp/protocol.h   |  3 +++
 3 files changed, 28 insertions(+), 33 deletions(-)

diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
index 761995a34124..d6d22f18c418 100644
--- a/net/mptcp/pm.c
+++ b/net/mptcp/pm.c
@@ -376,6 +376,7 @@ void mptcp_pm_data_reset(struct mptcp_sock *msk)
 	WRITE_ONCE(msk->pm.accept_subflow, false);
 	WRITE_ONCE(msk->pm.remote_deny_join_id0, false);
 	msk->pm.status = 0;
+	bitmap_fill(msk->pm.id_avail_bitmap, MAX_ADDR_ID + 1);
 
 	mptcp_pm_nl_data_init(msk);
 }
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index d18b13e3e74c..b3dc19fd1d37 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -38,9 +38,6 @@ struct mptcp_pm_add_entry {
 	u8			retrans_times;
 };
 
-#define MAX_ADDR_ID		255
-#define BITMAP_SZ DIV_ROUND_UP(MAX_ADDR_ID + 1, BITS_PER_LONG)
-
 struct pm_nl_pernet {
 	/* protects pernet updates */
 	spinlock_t		lock;
@@ -52,7 +49,7 @@ struct pm_nl_pernet {
 	unsigned int		local_addr_max;
 	unsigned int		subflows_max;
 	unsigned int		next_id;
-	unsigned long		id_bitmap[BITMAP_SZ];
+	DECLARE_BITMAP(id_bitmap, MAX_ADDR_ID + 1);
 };
 
 #define MPTCP_PM_ADDR_MAX	8
@@ -173,6 +170,9 @@ select_local_address(const struct pm_nl_pernet *pernet,
 		if (!(entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW))
 			continue;
 
+		if (!test_bit(entry->addr.id, msk->pm.id_avail_bitmap))
+			continue;
+
 		if (entry->addr.family != sk->sk_family) {
 #if IS_ENABLED(CONFIG_MPTCP_IPV6)
 			if ((entry->addr.family == AF_INET &&
@@ -183,23 +183,17 @@ select_local_address(const struct pm_nl_pernet *pernet,
 				continue;
 		}
 
-		/* avoid any address already in use by subflows and
-		 * pending join
-		 */
-		if (!lookup_subflow_by_saddr(&msk->conn_list, &entry->addr)) {
-			ret = entry;
-			break;
-		}
+		ret = entry;
+		break;
 	}
 	rcu_read_unlock();
 	return ret;
 }
 
 static struct mptcp_pm_addr_entry *
-select_signal_address(struct pm_nl_pernet *pernet, unsigned int pos)
+select_signal_address(struct pm_nl_pernet *pernet, struct mptcp_sock *msk)
 {
 	struct mptcp_pm_addr_entry *entry, *ret = NULL;
-	int i = 0;
 
 	rcu_read_lock();
 	/* do not keep any additional per socket state, just signal
@@ -208,12 +202,14 @@ select_signal_address(struct pm_nl_pernet *pernet, unsigned int pos)
 	 * can lead to additional addresses not being announced.
 	 */
 	list_for_each_entry_rcu(entry, &pernet->local_addr_list, list) {
+		if (!test_bit(entry->addr.id, msk->pm.id_avail_bitmap))
+			continue;
+
 		if (!(entry->flags & MPTCP_PM_ADDR_FLAG_SIGNAL))
 			continue;
-		if (i++ == pos) {
-			ret = entry;
-			break;
-		}
+
+		ret = entry;
+		break;
 	}
 	rcu_read_unlock();
 	return ret;
@@ -257,9 +253,11 @@ EXPORT_SYMBOL_GPL(mptcp_pm_get_local_addr_max);
 
 static void check_work_pending(struct mptcp_sock *msk)
 {
-	if (msk->pm.add_addr_signaled == mptcp_pm_get_add_addr_signal_max(msk) &&
-	    (msk->pm.local_addr_used == mptcp_pm_get_local_addr_max(msk) ||
-	     msk->pm.subflows == mptcp_pm_get_subflows_max(msk)))
+	struct pm_nl_pernet *pernet = net_generic(sock_net((struct sock *)msk), pm_nl_pernet_id);
+
+	if (msk->pm.subflows == mptcp_pm_get_subflows_max(msk) ||
+	    (find_next_and_bit(pernet->id_bitmap, msk->pm.id_avail_bitmap, MAX_ADDR_ID + 1, 0) ==
+	     MAX_ADDR_ID + 1))
 		WRITE_ONCE(msk->pm.work_pending, false);
 }
 
@@ -481,21 +479,16 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
 
 	/* check first for announce */
 	if (msk->pm.add_addr_signaled < add_addr_signal_max) {
-		local = select_signal_address(pernet,
-					      msk->pm.add_addr_signaled);
+		local = select_signal_address(pernet, msk);
 
 		if (local) {
 			if (mptcp_pm_alloc_anno_list(msk, local)) {
+				__clear_bit(local->addr.id, msk->pm.id_avail_bitmap);
 				msk->pm.add_addr_signaled++;
 				mptcp_pm_announce_addr(msk, &local->addr, false);
 				mptcp_pm_nl_addr_send_ack(msk);
 			}
-		} else {
-			/* pick failed, avoid fourther attempts later */
-			msk->pm.local_addr_used = add_addr_signal_max;
 		}
-
-		check_work_pending(msk);
 	}
 
 	/* check if should create a new subflow */
@@ -509,19 +502,16 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
 			int i, nr;
 
 			msk->pm.local_addr_used++;
-			check_work_pending(msk);
 			nr = fill_remote_addresses_vec(msk, fullmesh, addrs);
+			if (nr)
+				__clear_bit(local->addr.id, msk->pm.id_avail_bitmap);
 			spin_unlock_bh(&msk->pm.lock);
 			for (i = 0; i < nr; i++)
 				__mptcp_subflow_connect(sk, &local->addr, &addrs[i]);
 			spin_lock_bh(&msk->pm.lock);
-			return;
 		}
-
-		/* lookup failed, avoid fourther attempts later */
-		msk->pm.local_addr_used = local_addr_max;
-		check_work_pending(msk);
 	}
+	check_work_pending(msk);
 }
 
 static void mptcp_pm_nl_fully_established(struct mptcp_sock *msk)
@@ -735,6 +725,7 @@ static void mptcp_pm_nl_rm_addr_or_subflow(struct mptcp_sock *msk,
 			msk->pm.subflows--;
 			__MPTCP_INC_STATS(sock_net(sk), rm_type);
 		}
+		__set_bit(rm_list->ids[1], msk->pm.id_avail_bitmap);
 		if (!removed)
 			continue;
 
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 47d24478763c..b1ab476520aa 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -183,6 +183,8 @@ enum mptcp_addr_signal_status {
 	MPTCP_RM_ADDR_SIGNAL,
 };
 
+#define MAX_ADDR_ID		255
+
 struct mptcp_pm_data {
 	struct mptcp_addr_info local;
 	struct mptcp_addr_info remote;
@@ -190,6 +192,7 @@ struct mptcp_pm_data {
 
 	spinlock_t	lock;		/*protects the whole PM data */
 
+	DECLARE_BITMAP(id_avail_bitmap, MAX_ADDR_ID + 1);
 	u8		addr_signal;
 	bool		server_side;
 	bool		work_pending;
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH mptcp-next v4 4/5] mptcp: do not block subflows creation on errors
  2021-12-03 15:19 [PATCH mptcp-next v4 0/5] mptcp: improve subflow creation on errors Paolo Abeni
                   ` (2 preceding siblings ...)
  2021-12-03 15:19 ` [PATCH mptcp-next v4 3/5] mptcp: keep track of local endpoint still available for each msk Paolo Abeni
@ 2021-12-03 15:19 ` Paolo Abeni
  2021-12-03 15:19 ` [PATCH mptcp-next v4 5/5] selftests: mptcp: add tests for subflow creation failure Paolo Abeni
  4 siblings, 0 replies; 13+ messages in thread
From: Paolo Abeni @ 2021-12-03 15:19 UTC (permalink / raw)
  To: mptcp

If the MPTCP configuration allows for multiple subflows
creation, and the first additional subflows never reach
the fully established status - e.g. due to packets drop or
reset - the in kernel path manager do not move to the
next subflow.

This patch introduces a new PM helper to cope with MPJ
subflow creation failure and delay and hook it where appropriate.

Such helper triggers additional subflow creation, as needed
and updates the PM subflow counter, if the current one is
closing.

Note that we don't need an additional timer to catch timeout
and/or long delay in connection completion: we just need to
measure the time elapsed since the subflow creation every
time we emit an MPJ sub-option.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v3 -> v4:
 - fix 'subflows' underflow

v2 -> v3:
 - fix compile warning (CI)

v1 -> v2:
 - explicitly hook on subflow close instead of error notification
 - fix checkpatch issue (Mat)
 - introduce and use a new pm helper to cope with subflow close
 - update pm subflow counters on close
---
 net/mptcp/options.c    | 21 ++++++++++++++++++---
 net/mptcp/pm.c         | 23 +++++++++++++++++++++--
 net/mptcp/pm_netlink.c | 12 ++++++++----
 net/mptcp/protocol.c   |  6 ++++++
 net/mptcp/protocol.h   |  5 ++++-
 net/mptcp/subflow.c    |  5 ++++-
 6 files changed, 61 insertions(+), 11 deletions(-)

diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 7a6a39b71633..00a8697addcd 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -1257,10 +1257,10 @@ static u16 mptcp_make_csum(const struct mptcp_ext *mpext)
 void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp,
 			 struct mptcp_out_options *opts)
 {
-	if (unlikely(OPTION_MPTCP_FAIL & opts->suboptions)) {
-		const struct sock *ssk = (const struct sock *)tp;
-		struct mptcp_subflow_context *subflow;
+	const struct sock *ssk = (const struct sock *)tp;
+	struct mptcp_subflow_context *subflow;
 
+	if (unlikely(OPTION_MPTCP_FAIL & opts->suboptions)) {
 		subflow = mptcp_subflow_ctx(ssk);
 		subflow->send_mp_fail = 0;
 
@@ -1382,6 +1382,21 @@ void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp,
 		/* MPC is additionally mutually exclusive with MP_PRIO */
 		goto mp_capable_done;
 	} else if (OPTIONS_MPTCP_MPJ & opts->suboptions) {
+		if (ssk) {
+			/* we are still in the MPJ handshake and "a lot" of time passed
+			 * e.g. due to syn retranmissions. We can attempt next
+			 * subflow creation
+			 */
+			subflow = mptcp_subflow_ctx(ssk);
+			if (subflow->start_stamp &&
+			    unlikely(after(tcp_jiffies32, subflow->start_stamp + HZ / 10))) {
+				mptcp_pm_subflow_check_next(mptcp_sk(subflow->conn), ssk, subflow);
+
+				/* avoid triggering the PM multiple times due to timeout */
+				subflow->start_stamp = 0;
+			}
+		}
+
 		if (OPTION_MPTCP_MPJ_SYN & opts->suboptions) {
 			*ptr++ = mptcp_option(MPTCPOPT_MP_JOIN,
 					      TCPOLEN_MPTCP_MPJ_SYN,
diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
index d6d22f18c418..7e2c76924492 100644
--- a/net/mptcp/pm.c
+++ b/net/mptcp/pm.c
@@ -172,9 +172,28 @@ void mptcp_pm_subflow_established(struct mptcp_sock *msk)
 	spin_unlock_bh(&pm->lock);
 }
 
-void mptcp_pm_subflow_closed(struct mptcp_sock *msk, u8 id)
+void mptcp_pm_subflow_check_next(struct mptcp_sock *msk, const struct sock *ssk,
+				 const struct mptcp_subflow_context *subflow)
 {
-	pr_debug("msk=%p", msk);
+	struct mptcp_pm_data *pm = &msk->pm;
+	bool update_subflows;
+
+	update_subflows = (ssk->sk_state == TCP_CLOSE) &&
+			  (subflow->request_join || subflow->mp_join);
+	if (!READ_ONCE(pm->work_pending) && !update_subflows)
+		return;
+
+	spin_lock_bh(&pm->lock);
+	if (update_subflows)
+		pm->subflows--;
+
+	/* Even if this subflow is not really established, tell the PM to try
+	 * to pick the next one, if possible.
+	 */
+	if (mptcp_pm_nl_check_work_pending(msk))
+		mptcp_pm_schedule_work(msk, MPTCP_PM_SUBFLOW_ESTABLISHED);
+
+	spin_unlock_bh(&pm->lock);
 }
 
 void mptcp_pm_add_addr_received(struct mptcp_sock *msk,
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index b3dc19fd1d37..48963ae4c610 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -251,14 +251,17 @@ unsigned int mptcp_pm_get_local_addr_max(struct mptcp_sock *msk)
 }
 EXPORT_SYMBOL_GPL(mptcp_pm_get_local_addr_max);
 
-static void check_work_pending(struct mptcp_sock *msk)
+bool mptcp_pm_nl_check_work_pending(struct mptcp_sock *msk)
 {
 	struct pm_nl_pernet *pernet = net_generic(sock_net((struct sock *)msk), pm_nl_pernet_id);
 
 	if (msk->pm.subflows == mptcp_pm_get_subflows_max(msk) ||
 	    (find_next_and_bit(pernet->id_bitmap, msk->pm.id_avail_bitmap, MAX_ADDR_ID + 1, 0) ==
-	     MAX_ADDR_ID + 1))
+	     MAX_ADDR_ID + 1)) {
 		WRITE_ONCE(msk->pm.work_pending, false);
+		return false;
+	}
+	return true;
 }
 
 struct mptcp_pm_add_entry *
@@ -511,7 +514,7 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
 			spin_lock_bh(&msk->pm.lock);
 		}
 	}
-	check_work_pending(msk);
+	mptcp_pm_nl_check_work_pending(msk);
 }
 
 static void mptcp_pm_nl_fully_established(struct mptcp_sock *msk)
@@ -718,11 +721,12 @@ static void mptcp_pm_nl_rm_addr_or_subflow(struct mptcp_sock *msk,
 				 i, rm_list->ids[i], subflow->local_id, subflow->remote_id);
 			spin_unlock_bh(&msk->pm.lock);
 			mptcp_subflow_shutdown(sk, ssk, how);
+
+			/* the following takes care of updating the subflows counter */
 			mptcp_close_ssk(sk, ssk, subflow);
 			spin_lock_bh(&msk->pm.lock);
 
 			removed = true;
-			msk->pm.subflows--;
 			__MPTCP_INC_STATS(sock_net(sk), rm_type);
 		}
 		__set_bit(rm_list->ids[1], msk->pm.id_avail_bitmap);
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 8b49866bcc25..c9f302169145 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2351,6 +2351,12 @@ void mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 {
 	if (sk->sk_state == TCP_ESTABLISHED)
 		mptcp_event(MPTCP_EVENT_SUB_CLOSED, mptcp_sk(sk), ssk, GFP_KERNEL);
+
+	/* subflow aborted before reaching the fully_established status
+	 * attempt the creation of the next subflow
+	 */
+	mptcp_pm_subflow_check_next(mptcp_sk(sk), ssk, subflow);
+
 	__mptcp_close_ssk(sk, ssk, subflow, MPTCP_CF_PUSH);
 }
 
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index b1ab476520aa..3e7e541a013f 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -437,6 +437,7 @@ struct mptcp_subflow_context {
 		stale : 1;	    /* unable to snd/rcv data, do not use for xmit */
 	enum mptcp_data_avail data_avail;
 	u32	remote_nonce;
+	u32	start_stamp;
 	u64	thmac;
 	u32	local_nonce;
 	u32	remote_token;
@@ -738,7 +739,9 @@ void mptcp_pm_fully_established(struct mptcp_sock *msk, const struct sock *ssk,
 bool mptcp_pm_allow_new_subflow(struct mptcp_sock *msk);
 void mptcp_pm_connection_closed(struct mptcp_sock *msk);
 void mptcp_pm_subflow_established(struct mptcp_sock *msk);
-void mptcp_pm_subflow_closed(struct mptcp_sock *msk, u8 id);
+bool mptcp_pm_nl_check_work_pending(struct mptcp_sock *msk);
+void mptcp_pm_subflow_check_next(struct mptcp_sock *msk, const struct sock *ssk,
+				 const struct mptcp_subflow_context *subflow);
 void mptcp_pm_add_addr_received(struct mptcp_sock *msk,
 				const struct mptcp_addr_info *addr);
 void mptcp_pm_add_addr_echoed(struct mptcp_sock *msk,
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 0f90bd61de01..76556743e952 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1273,7 +1273,8 @@ void __mptcp_error_report(struct sock *sk)
 
 static void subflow_error_report(struct sock *ssk)
 {
-	struct sock *sk = mptcp_subflow_ctx(ssk)->conn;
+	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
+	struct sock *sk = subflow->conn;
 
 	mptcp_data_lock(sk);
 	if (!sock_owned_by_user(sk))
@@ -1444,6 +1445,7 @@ int __mptcp_subflow_connect(struct sock *sk, const struct mptcp_addr_info *loc,
 	subflow->remote_id = remote_id;
 	subflow->request_join = 1;
 	subflow->request_bkup = !!(flags & MPTCP_PM_ADDR_FLAG_BACKUP);
+	subflow->start_stamp = tcp_jiffies32;
 	mptcp_info2sockaddr(remote, &addr, ssk->sk_family);
 
 	mptcp_add_pending_subflow(msk, subflow);
@@ -1461,6 +1463,7 @@ int __mptcp_subflow_connect(struct sock *sk, const struct mptcp_addr_info *loc,
 	spin_lock_bh(&msk->join_list_lock);
 	list_del(&subflow->node);
 	spin_unlock_bh(&msk->join_list_lock);
+	mptcp_pm_subflow_check_next(msk, ssk, subflow);
 	sock_put(mptcp_subflow_tcp_sock(subflow));
 
 failed:
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH mptcp-next v4 5/5] selftests: mptcp: add tests for subflow creation failure
  2021-12-03 15:19 [PATCH mptcp-next v4 0/5] mptcp: improve subflow creation on errors Paolo Abeni
                   ` (3 preceding siblings ...)
  2021-12-03 15:19 ` [PATCH mptcp-next v4 4/5] mptcp: do not block subflows creation on errors Paolo Abeni
@ 2021-12-03 15:19 ` Paolo Abeni
  2021-12-04  1:13   ` Mat Martineau
  4 siblings, 1 reply; 13+ messages in thread
From: Paolo Abeni @ 2021-12-03 15:19 UTC (permalink / raw)
  To: mptcp

Verify that, when multiple endpoints are available, subflows
creation proceed even when the first additional subflow creation
fails - due to packet drop on the relevant link

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v3 -> v4:
 - use $() where applicable - Matt
 - fix typo in help() - Matt

v1 -> v2:
 - add missing NF kconfig
 - add more test-cases (drop, later subflow creation)
---
 tools/testing/selftests/net/mptcp/config      |  1 +
 .../testing/selftests/net/mptcp/mptcp_join.sh | 65 ++++++++++++++++++-
 2 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/mptcp/config b/tools/testing/selftests/net/mptcp/config
index 419e71560fd1..41e94e004bdc 100644
--- a/tools/testing/selftests/net/mptcp/config
+++ b/tools/testing/selftests/net/mptcp/config
@@ -18,4 +18,5 @@ CONFIG_NFT_TPROXY=m
 CONFIG_NFT_SOCKET=m
 CONFIG_IP_ADVANCED_ROUTER=y
 CONFIG_IP_MULTIPLE_TABLES=y
+CONFIG_IP_NF_TARGET_REJECT=m
 CONFIG_IPV6_MULTIPLE_TABLES=y
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index 2684ef9c0d42..54a826ae984b 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -975,6 +975,22 @@ chk_link_usage()
 	fi
 }
 
+wait_for_tw()
+{
+	local timeout_ms=$((timeout_poll * 1000))
+	local time=0
+	local ns=$1
+
+	while [ $time -lt $timeout_ms ]; do
+		local cnt=$(ip netns exec $ns ss -t state time-wait |wc -l)
+
+		[ "$cnt" = 1 ] && return 1
+		time=$((time + 100))
+		sleep 0.1
+	done
+	return 1
+}
+
 subflows_tests()
 {
 	reset
@@ -1032,6 +1048,48 @@ subflows_tests()
 	chk_join_nr "single subflow, dev" 1 1 1
 }
 
+subflows_error_tests()
+{
+	# multiple subflows, with subflow creation error
+	reset
+	ip netns exec $ns1 ./pm_nl_ctl limits 0 2
+	ip netns exec $ns2 ./pm_nl_ctl limits 0 2
+	ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 flags subflow
+	ip netns exec $ns2 ./pm_nl_ctl add 10.0.2.2 flags subflow
+	ip netns exec $ns1 iptables -A INPUT -s 10.0.3.2 -p tcp -j REJECT
+	run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow
+	chk_join_nr "multiple subflows, with failing subflow" 1 1 1
+
+	# multiple subflows, with subflow timeout on MPJ
+	reset
+	ip netns exec $ns1 ./pm_nl_ctl limits 0 2
+	ip netns exec $ns2 ./pm_nl_ctl limits 0 2
+	ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 flags subflow
+	ip netns exec $ns2 ./pm_nl_ctl add 10.0.2.2 flags subflow
+	ip netns exec $ns1 iptables -A INPUT -s 10.0.3.2 -p tcp -j DROP
+	run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow
+	chk_join_nr "multiple subflows, with subflow timeout" 1 1 1
+
+	# multiple subflows, check that the endpoint corresponding to
+	# closed subflow (due to reset) is not reused if additional
+	# subflows are added later
+	reset
+	ip netns exec $ns1 ./pm_nl_ctl limits 0 1
+	ip netns exec $ns2 ./pm_nl_ctl limits 0 1
+	ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 flags subflow
+	ip netns exec $ns1 iptables -A INPUT -s 10.0.3.2 -p tcp -j REJECT
+	run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow &
+
+	# mpj subflow will be in TW after the reset
+	wait_for_tw $ns2
+	ip netns exec $ns2 ./pm_nl_ctl add 10.0.2.2 flags subflow
+	wait
+
+	# additional subflow could be created only if the PM select
+	# the later endpoint, skipping the already used one
+	chk_join_nr "multiple subflows, fair usage on close" 1 1 1
+}
+
 signal_address_tests()
 {
 	# add_address, unused
@@ -1840,6 +1898,7 @@ fullmesh_tests()
 all_tests()
 {
 	subflows_tests
+	subflows_error_tests
 	signal_address_tests
 	link_failure_tests
 	add_addr_timeout_tests
@@ -1859,6 +1918,7 @@ usage()
 {
 	echo "mptcp_join usage:"
 	echo "  -f subflows_tests"
+	echo "  -e subflows_error_tests"
 	echo "  -s signal_address_tests"
 	echo "  -l link_failure_tests"
 	echo "  -t add_addr_timeout_tests"
@@ -1907,11 +1967,14 @@ if [ $do_all_tests -eq 1 ]; then
 	exit $ret
 fi
 
-while getopts 'fsltra64bpkdmchCS' opt; do
+while getopts 'fesltra64bpkdmchCS' opt; do
 	case $opt in
 		f)
 			subflows_tests
 			;;
+		e)
+			subflows_error_tests
+			;;
 		s)
 			signal_address_tests
 			;;
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH mptcp-next v4 3/5] mptcp: keep track of local endpoint still available for each msk
  2021-12-03 15:19 ` [PATCH mptcp-next v4 3/5] mptcp: keep track of local endpoint still available for each msk Paolo Abeni
@ 2021-12-04  1:05   ` Mat Martineau
  2021-12-06 10:18     ` Paolo Abeni
  0 siblings, 1 reply; 13+ messages in thread
From: Mat Martineau @ 2021-12-04  1:05 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

On Fri, 3 Dec 2021, Paolo Abeni wrote:

> Include into the path manager status a bitmap tracking the list
> of local endpoints still available - not yet used - for the
> relavant mptcp socket.
> Keep such map updated at endpoint creation/deleteion time, so
> that we can easily skip already used endpoint at local address
> selection time.
> This allows for fair local endpoints usage in case of subflow
> failure.
>
> As a side effect, this patch also enforces that each endpoint
> is used at most once for each mptcp connection.

This seems like a significant change, possibly with unexpected 
consequences (but maybe I've misunderstood the new constraint?). 
Apparently the topoligies in our self tests happen to work fine!

If you have peer A with one interface and peer B with two, it seems like 
this should be allowed:

Peer A connects to Peer B endpoint 0
Peer B announces endpoint 1
Peer A sends an MP_JOIN to Peer B interface 1

From the description above, it sounds like peer A would not be able to 
create that second subflow. I think there's a quirk where the subflow 0 
local interface gets to be used twice since bit 0 is not cleared 
until the second subflow is established.

My suggestion is that the max number of subflows should be determined by 
the existing limit values, not the number of local interfaces.

Could the change in behavior (fair use of local endpoints) also have 
unexpected effects if it causes backup endpoints to be used?


-Mat

>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> v3 -> v4:
> - track the available (not yet used) id instead of the used ones
> - track both
> - cleanup duplication with id_bitmap
> ---
> net/mptcp/pm.c         |  1 +
> net/mptcp/pm_netlink.c | 57 ++++++++++++++++++------------------------
> net/mptcp/protocol.h   |  3 +++
> 3 files changed, 28 insertions(+), 33 deletions(-)
>
> diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
> index 761995a34124..d6d22f18c418 100644
> --- a/net/mptcp/pm.c
> +++ b/net/mptcp/pm.c
> @@ -376,6 +376,7 @@ void mptcp_pm_data_reset(struct mptcp_sock *msk)
> 	WRITE_ONCE(msk->pm.accept_subflow, false);
> 	WRITE_ONCE(msk->pm.remote_deny_join_id0, false);
> 	msk->pm.status = 0;
> +	bitmap_fill(msk->pm.id_avail_bitmap, MAX_ADDR_ID + 1);
>
> 	mptcp_pm_nl_data_init(msk);
> }
> diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
> index d18b13e3e74c..b3dc19fd1d37 100644
> --- a/net/mptcp/pm_netlink.c
> +++ b/net/mptcp/pm_netlink.c
> @@ -38,9 +38,6 @@ struct mptcp_pm_add_entry {
> 	u8			retrans_times;
> };
>
> -#define MAX_ADDR_ID		255
> -#define BITMAP_SZ DIV_ROUND_UP(MAX_ADDR_ID + 1, BITS_PER_LONG)
> -
> struct pm_nl_pernet {
> 	/* protects pernet updates */
> 	spinlock_t		lock;
> @@ -52,7 +49,7 @@ struct pm_nl_pernet {
> 	unsigned int		local_addr_max;
> 	unsigned int		subflows_max;
> 	unsigned int		next_id;
> -	unsigned long		id_bitmap[BITMAP_SZ];
> +	DECLARE_BITMAP(id_bitmap, MAX_ADDR_ID + 1);
> };
>
> #define MPTCP_PM_ADDR_MAX	8
> @@ -173,6 +170,9 @@ select_local_address(const struct pm_nl_pernet *pernet,
> 		if (!(entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW))
> 			continue;
>
> +		if (!test_bit(entry->addr.id, msk->pm.id_avail_bitmap))
> +			continue;
> +
> 		if (entry->addr.family != sk->sk_family) {
> #if IS_ENABLED(CONFIG_MPTCP_IPV6)
> 			if ((entry->addr.family == AF_INET &&
> @@ -183,23 +183,17 @@ select_local_address(const struct pm_nl_pernet *pernet,
> 				continue;
> 		}
>
> -		/* avoid any address already in use by subflows and
> -		 * pending join
> -		 */
> -		if (!lookup_subflow_by_saddr(&msk->conn_list, &entry->addr)) {
> -			ret = entry;
> -			break;
> -		}
> +		ret = entry;
> +		break;
> 	}
> 	rcu_read_unlock();
> 	return ret;
> }
>
> static struct mptcp_pm_addr_entry *
> -select_signal_address(struct pm_nl_pernet *pernet, unsigned int pos)
> +select_signal_address(struct pm_nl_pernet *pernet, struct mptcp_sock *msk)
> {
> 	struct mptcp_pm_addr_entry *entry, *ret = NULL;
> -	int i = 0;
>
> 	rcu_read_lock();
> 	/* do not keep any additional per socket state, just signal
> @@ -208,12 +202,14 @@ select_signal_address(struct pm_nl_pernet *pernet, unsigned int pos)
> 	 * can lead to additional addresses not being announced.
> 	 */
> 	list_for_each_entry_rcu(entry, &pernet->local_addr_list, list) {
> +		if (!test_bit(entry->addr.id, msk->pm.id_avail_bitmap))
> +			continue;
> +
> 		if (!(entry->flags & MPTCP_PM_ADDR_FLAG_SIGNAL))
> 			continue;
> -		if (i++ == pos) {
> -			ret = entry;
> -			break;
> -		}
> +
> +		ret = entry;
> +		break;
> 	}
> 	rcu_read_unlock();
> 	return ret;
> @@ -257,9 +253,11 @@ EXPORT_SYMBOL_GPL(mptcp_pm_get_local_addr_max);
>
> static void check_work_pending(struct mptcp_sock *msk)
> {
> -	if (msk->pm.add_addr_signaled == mptcp_pm_get_add_addr_signal_max(msk) &&
> -	    (msk->pm.local_addr_used == mptcp_pm_get_local_addr_max(msk) ||
> -	     msk->pm.subflows == mptcp_pm_get_subflows_max(msk)))
> +	struct pm_nl_pernet *pernet = net_generic(sock_net((struct sock *)msk), pm_nl_pernet_id);
> +
> +	if (msk->pm.subflows == mptcp_pm_get_subflows_max(msk) ||
> +	    (find_next_and_bit(pernet->id_bitmap, msk->pm.id_avail_bitmap, MAX_ADDR_ID + 1, 0) ==
> +	     MAX_ADDR_ID + 1))
> 		WRITE_ONCE(msk->pm.work_pending, false);
> }
>
> @@ -481,21 +479,16 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
>
> 	/* check first for announce */
> 	if (msk->pm.add_addr_signaled < add_addr_signal_max) {
> -		local = select_signal_address(pernet,
> -					      msk->pm.add_addr_signaled);
> +		local = select_signal_address(pernet, msk);
>
> 		if (local) {
> 			if (mptcp_pm_alloc_anno_list(msk, local)) {
> +				__clear_bit(local->addr.id, msk->pm.id_avail_bitmap);
> 				msk->pm.add_addr_signaled++;
> 				mptcp_pm_announce_addr(msk, &local->addr, false);
> 				mptcp_pm_nl_addr_send_ack(msk);
> 			}
> -		} else {
> -			/* pick failed, avoid fourther attempts later */
> -			msk->pm.local_addr_used = add_addr_signal_max;
> 		}
> -
> -		check_work_pending(msk);
> 	}
>
> 	/* check if should create a new subflow */
> @@ -509,19 +502,16 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
> 			int i, nr;
>
> 			msk->pm.local_addr_used++;
> -			check_work_pending(msk);
> 			nr = fill_remote_addresses_vec(msk, fullmesh, addrs);
> +			if (nr)
> +				__clear_bit(local->addr.id, msk->pm.id_avail_bitmap);
> 			spin_unlock_bh(&msk->pm.lock);
> 			for (i = 0; i < nr; i++)
> 				__mptcp_subflow_connect(sk, &local->addr, &addrs[i]);
> 			spin_lock_bh(&msk->pm.lock);
> -			return;
> 		}
> -
> -		/* lookup failed, avoid fourther attempts later */
> -		msk->pm.local_addr_used = local_addr_max;
> -		check_work_pending(msk);
> 	}
> +	check_work_pending(msk);
> }
>
> static void mptcp_pm_nl_fully_established(struct mptcp_sock *msk)
> @@ -735,6 +725,7 @@ static void mptcp_pm_nl_rm_addr_or_subflow(struct mptcp_sock *msk,
> 			msk->pm.subflows--;
> 			__MPTCP_INC_STATS(sock_net(sk), rm_type);
> 		}
> +		__set_bit(rm_list->ids[1], msk->pm.id_avail_bitmap);
> 		if (!removed)
> 			continue;
>
> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> index 47d24478763c..b1ab476520aa 100644
> --- a/net/mptcp/protocol.h
> +++ b/net/mptcp/protocol.h
> @@ -183,6 +183,8 @@ enum mptcp_addr_signal_status {
> 	MPTCP_RM_ADDR_SIGNAL,
> };
>
> +#define MAX_ADDR_ID		255
> +
> struct mptcp_pm_data {
> 	struct mptcp_addr_info local;
> 	struct mptcp_addr_info remote;
> @@ -190,6 +192,7 @@ struct mptcp_pm_data {
>
> 	spinlock_t	lock;		/*protects the whole PM data */
>
> +	DECLARE_BITMAP(id_avail_bitmap, MAX_ADDR_ID + 1);
> 	u8		addr_signal;
> 	bool		server_side;
> 	bool		work_pending;
> -- 
> 2.33.1
>
>
>

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH mptcp-next v4 5/5] selftests: mptcp: add tests for subflow creation failure
  2021-12-03 15:19 ` [PATCH mptcp-next v4 5/5] selftests: mptcp: add tests for subflow creation failure Paolo Abeni
@ 2021-12-04  1:13   ` Mat Martineau
  0 siblings, 0 replies; 13+ messages in thread
From: Mat Martineau @ 2021-12-04  1:13 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

On Fri, 3 Dec 2021, Paolo Abeni wrote:

> Verify that, when multiple endpoints are available, subflows
> creation proceed even when the first additional subflow creation
> fails - due to packet drop on the relevant link
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> v3 -> v4:
> - use $() where applicable - Matt
> - fix typo in help() - Matt
>
> v1 -> v2:
> - add missing NF kconfig
> - add more test-cases (drop, later subflow creation)
> ---
> tools/testing/selftests/net/mptcp/config      |  1 +
> .../testing/selftests/net/mptcp/mptcp_join.sh | 65 ++++++++++++++++++-
> 2 files changed, 65 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/net/mptcp/config b/tools/testing/selftests/net/mptcp/config
> index 419e71560fd1..41e94e004bdc 100644
> --- a/tools/testing/selftests/net/mptcp/config
> +++ b/tools/testing/selftests/net/mptcp/config
> @@ -18,4 +18,5 @@ CONFIG_NFT_TPROXY=m
> CONFIG_NFT_SOCKET=m
> CONFIG_IP_ADVANCED_ROUTER=y
> CONFIG_IP_MULTIPLE_TABLES=y
> +CONFIG_IP_NF_TARGET_REJECT=m
> CONFIG_IPV6_MULTIPLE_TABLES=y
> diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
> index 2684ef9c0d42..54a826ae984b 100755
> --- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
> +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
> @@ -975,6 +975,22 @@ chk_link_usage()
> 	fi
> }
>
> +wait_for_tw()
> +{
> +	local timeout_ms=$((timeout_poll * 1000))
> +	local time=0
> +	local ns=$1
> +
> +	while [ $time -lt $timeout_ms ]; do
> +		local cnt=$(ip netns exec $ns ss -t state time-wait |wc -l)
> +
> +		[ "$cnt" = 1 ] && return 1
> +		time=$((time + 100))
> +		sleep 0.1
> +	done
> +	return 1
> +}
> +
> subflows_tests()
> {
> 	reset
> @@ -1032,6 +1048,48 @@ subflows_tests()
> 	chk_join_nr "single subflow, dev" 1 1 1
> }
>
> +subflows_error_tests()
> +{
> +	# multiple subflows, with subflow creation error
> +	reset
> +	ip netns exec $ns1 ./pm_nl_ctl limits 0 2
> +	ip netns exec $ns2 ./pm_nl_ctl limits 0 2
> +	ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 flags subflow
> +	ip netns exec $ns2 ./pm_nl_ctl add 10.0.2.2 flags subflow
> +	ip netns exec $ns1 iptables -A INPUT -s 10.0.3.2 -p tcp -j REJECT
> +	run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow
> +	chk_join_nr "multiple subflows, with failing subflow" 1 1 1
> +
> +	# multiple subflows, with subflow timeout on MPJ
> +	reset
> +	ip netns exec $ns1 ./pm_nl_ctl limits 0 2
> +	ip netns exec $ns2 ./pm_nl_ctl limits 0 2
> +	ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 flags subflow
> +	ip netns exec $ns2 ./pm_nl_ctl add 10.0.2.2 flags subflow
> +	ip netns exec $ns1 iptables -A INPUT -s 10.0.3.2 -p tcp -j DROP
> +	run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow
> +	chk_join_nr "multiple subflows, with subflow timeout" 1 1 1
> +
> +	# multiple subflows, check that the endpoint corresponding to
> +	# closed subflow (due to reset) is not reused if additional
> +	# subflows are added later
> +	reset
> +	ip netns exec $ns1 ./pm_nl_ctl limits 0 1
> +	ip netns exec $ns2 ./pm_nl_ctl limits 0 1
> +	ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 flags subflow
> +	ip netns exec $ns1 iptables -A INPUT -s 10.0.3.2 -p tcp -j REJECT
> +	run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow &
> +
> +	# mpj subflow will be in TW after the reset
> +	wait_for_tw $ns2
> +	ip netns exec $ns2 ./pm_nl_ctl add 10.0.2.2 flags subflow
> +	wait
> +
> +	# additional subflow could be created only if the PM select
> +	# the later endpoint, skipping the already used one
> +	chk_join_nr "multiple subflows, fair usage on close" 1 1 1
> +}

The results are printed like this for me:

01 multiple subflows, with failing subflow syn[ ok ] - synack[ ok ] - ack[ ok ]
02 multiple subflows, with subflow timeout syn[ ok ] - synack[ ok ] - ack[ ok ]
02 multiple subflows, fair usage on close syn[ ok ] - synack[ ok ] - ack[ ok ]
^^ not 03

With test 3 running in the background, I guess $TEST_COUNT doesn't get 
updated. Hopefully there aren't other globals doing unexpected things...


-Mat

> +
> signal_address_tests()
> {
> 	# add_address, unused
> @@ -1840,6 +1898,7 @@ fullmesh_tests()
> all_tests()
> {
> 	subflows_tests
> +	subflows_error_tests
> 	signal_address_tests
> 	link_failure_tests
> 	add_addr_timeout_tests
> @@ -1859,6 +1918,7 @@ usage()
> {
> 	echo "mptcp_join usage:"
> 	echo "  -f subflows_tests"
> +	echo "  -e subflows_error_tests"
> 	echo "  -s signal_address_tests"
> 	echo "  -l link_failure_tests"
> 	echo "  -t add_addr_timeout_tests"
> @@ -1907,11 +1967,14 @@ if [ $do_all_tests -eq 1 ]; then
> 	exit $ret
> fi
>
> -while getopts 'fsltra64bpkdmchCS' opt; do
> +while getopts 'fesltra64bpkdmchCS' opt; do
> 	case $opt in
> 		f)
> 			subflows_tests
> 			;;
> +		e)
> +			subflows_error_tests
> +			;;
> 		s)
> 			signal_address_tests
> 			;;
> -- 
> 2.33.1
>
>
>

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH mptcp-next v4 3/5] mptcp: keep track of local endpoint still available for each msk
  2021-12-04  1:05   ` Mat Martineau
@ 2021-12-06 10:18     ` Paolo Abeni
  2021-12-07 11:28       ` Paolo Abeni
  0 siblings, 1 reply; 13+ messages in thread
From: Paolo Abeni @ 2021-12-06 10:18 UTC (permalink / raw)
  To: Mat Martineau; +Cc: mptcp

On Fri, 2021-12-03 at 17:05 -0800, Mat Martineau wrote:
> On Fri, 3 Dec 2021, Paolo Abeni wrote:
> 
> > Include into the path manager status a bitmap tracking the list
> > of local endpoints still available - not yet used - for the
> > relavant mptcp socket.
> > Keep such map updated at endpoint creation/deleteion time, so
> > that we can easily skip already used endpoint at local address
> > selection time.
> > This allows for fair local endpoints usage in case of subflow
> > failure.
> > 
> > As a side effect, this patch also enforces that each endpoint
> > is used at most once for each mptcp connection.
> 
> This seems like a significant change, possibly with unexpected 
> consequences (but maybe I've misunderstood the new constraint?). 
> Apparently the topoligies in our self tests happen to work fine!

Uhmmm... probably less then it may looks like. We already respected the
above constraint in almost every scenarios. AFAICS, the only exception
prior to this patch was following scenario:

1) subflow is established endpoint A

2) the peer closes the additional subflow

2) a new local endpoint B is created

3) the PM will create the 2nd subflow using again endpoint A. 

which is arguibly unexpected, as reported by issues/242. 

> If you have peer A with one interface and peer B with two, it seems like 
> this should be allowed:
> 
> Peer A connects to Peer B endpoint 0
> Peer B announces endpoint 1
> Peer A sends an MP_JOIN to Peer B interface 1
> 
> From the description above, it sounds like peer A would not be able to 
> create that second subflow.

Well, AFAICS the peer A will actually create the subflow, if the limits
allow that, both with or without this patch.
Specifically:

1) no endpoint configured, add_addr_accepted >=1, subflows >=1 on peer
A -> the subflow to peer be interface 1 will be created (with or
without this patch) because we don't have any other additional
constraint beyond the limits. 

2) endpoint configured on peer A matching the MPC subflow source
address -> again the subflow towards peer B endpoint 1 will be created,
because that will be the first time the PM will use the peer A
endpoint.

>  I think there's a quirk where the subflow 0 
> local interface gets to be used twice since bit 0 is not cleared 
> until the second subflow is established.

Uhmm? the 'endpoint available bit' is cleared before attempting to
create the subflow, and is set (making the endpoint available again)
after mptcp_close_ssk() is invoked on the relevant subflow.

> My suggestion is that the max number of subflows should be determined by 
> the existing limit values, not the number of local interfaces.

Sorry, I do not follow ?!? the max number of subflows are determinted
by the limits, the interfaces are never taken in account (with or
without this patch).

> Could the change in behavior (fair use of local endpoints) also have 
> unexpected effects if it causes backup endpoints to be used?

I can't think of any critical scenario, but that is probably lack of
fantasy on my side...

/P


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH mptcp-next v4 3/5] mptcp: keep track of local endpoint still available for each msk
  2021-12-06 10:18     ` Paolo Abeni
@ 2021-12-07 11:28       ` Paolo Abeni
  2021-12-07 18:08         ` Paolo Abeni
  2021-12-07 18:18         ` Mat Martineau
  0 siblings, 2 replies; 13+ messages in thread
From: Paolo Abeni @ 2021-12-07 11:28 UTC (permalink / raw)
  To: Mat Martineau; +Cc: mptcp

On Mon, 2021-12-06 at 11:18 +0100, Paolo Abeni wrote:
> On Fri, 2021-12-03 at 17:05 -0800, Mat Martineau wrote:
> > On Fri, 3 Dec 2021, Paolo Abeni wrote:
> > 
> > > Include into the path manager status a bitmap tracking the list
> > > of local endpoints still available - not yet used - for the
> > > relavant mptcp socket.
> > > Keep such map updated at endpoint creation/deleteion time, so
> > > that we can easily skip already used endpoint at local address
> > > selection time.
> > > This allows for fair local endpoints usage in case of subflow
> > > failure.
> > > 
> > > As a side effect, this patch also enforces that each endpoint
> > > is used at most once for each mptcp connection.
> > 
> > This seems like a significant change, possibly with unexpected 
> > consequences (but maybe I've misunderstood the new constraint?). 
> > Apparently the topoligies in our self tests happen to work fine!
> 
> Uhmmm... probably less then it may looks like. We already respected the
> above constraint in almost every scenarios. AFAICS, the only exception
> prior to this patch was following scenario:
> 
> 1) subflow is established endpoint A
> 
> 2) the peer closes the additional subflow
> 
> 2) a new local endpoint B is created
> 
> 3) the PM will create the 2nd subflow using again endpoint A. 
> 
> which is arguibly unexpected, as reported by issues/242. 
> 
> > If you have peer A with one interface and peer B with two, it seems like 
> > this should be allowed:
> > 
> > Peer A connects to Peer B endpoint 0
> > Peer B announces endpoint 1
> > Peer A sends an MP_JOIN to Peer B interface 1
> > 
> > From the description above, it sounds like peer A would not be able to 
> > create that second subflow.
> 
> Well, AFAICS the peer A will actually create the subflow, if the limits
> allow that, both with or without this patch.
> Specifically:
> 
> 1) no endpoint configured, add_addr_accepted >=1, subflows >=1 on peer
> A -> the subflow to peer be interface 1 will be created (with or
> without this patch) because we don't have any other additional
> constraint beyond the limits. 
> 
> 2) endpoint configured on peer A matching the MPC subflow source
> address -> again the subflow towards peer B endpoint 1 will be created,
> because that will be the first time the PM will use the peer A
> endpoint.

Thinking again about this last scenario, I think there is something to
improve:

if peer A connects to peer B using the local endpoint address, it can
later use again such address for additional sublflows, as the MPC
subflow is no accounted by the current version of this patch. I'll cook
a v5 adding such account, too - and possibly a related self-tests

Cheers,

/P


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH mptcp-next v4 3/5] mptcp: keep track of local endpoint still available for each msk
  2021-12-07 11:28       ` Paolo Abeni
@ 2021-12-07 18:08         ` Paolo Abeni
  2021-12-07 18:18         ` Mat Martineau
  1 sibling, 0 replies; 13+ messages in thread
From: Paolo Abeni @ 2021-12-07 18:08 UTC (permalink / raw)
  To: Mat Martineau, Florian Westphal, Davide Caratti; +Cc: mptcp

On Tue, 2021-12-07 at 12:28 +0100, Paolo Abeni wrote:
> On Mon, 2021-12-06 at 11:18 +0100, Paolo Abeni wrote:
> > On Fri, 2021-12-03 at 17:05 -0800, Mat Martineau wrote:
> > 
> > > If you have peer A with one interface and peer B with two, it seems like 
> > > this should be allowed:
> > > 
> > > Peer A connects to Peer B endpoint 0
> > > Peer B announces endpoint 1
> > > Peer A sends an MP_JOIN to Peer B interface 1
> > > 
> > > From the description above, it sounds like peer A would not be able to 
> > > create that second subflow.
> > 
> > Well, AFAICS the peer A will actually create the subflow, if the limits
> > allow that, both with or without this patch.
> > Specifically:
> > 
> > 1) no endpoint configured, add_addr_accepted >=1, subflows >=1 on peer
> > A -> the subflow to peer be interface 1 will be created (with or
> > without this patch) because we don't have any other additional
> > constraint beyond the limits. 
> > 
> > 2) endpoint configured on peer A matching the MPC subflow source
> > address -> again the subflow towards peer B endpoint 1 will be created,
> > because that will be the first time the PM will use the peer A
> > endpoint.
> 
> Thinking again about this last scenario, I think there is something to
> improve:
> 
> if peer A connects to peer B using the local endpoint address, it can
> later use again such address for additional sublflows, as the MPC
> subflow is no accounted by the current version of this patch. I'll cook
> a v5 adding such account, too - and possibly a related self-tests

While working on the above I stumbed upon the following:

if we create a subflow towards the peer using a local address with no 
local endpoint defined on top of it - it can happen for a multi-homed
system in response to incoming ADD_ADDR - the current code pick a
join_id == 0 for such subflow, see:

https://elixir.bootlin.com/linux/latest/source/net/mptcp/pm_netlink.c#L923

That choice is wrong: it violates the RFC and confuses the endpoint
accounting.

I don't see a good way to address the above, except a quite radical
change: refusing to create such subflow. That would require moving the
join_id lookup from mptcp_subflow_connect() to 
inet_csk(ssk)->icsk_af_ops->rebuild_header(), but it should address
some 'wrong address id == 0' issues that Davide noted in the past.

Comments more than welcome!

/P


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH mptcp-next v4 3/5] mptcp: keep track of local endpoint still available for each msk
  2021-12-07 11:28       ` Paolo Abeni
  2021-12-07 18:08         ` Paolo Abeni
@ 2021-12-07 18:18         ` Mat Martineau
  2021-12-09 10:13           ` Paolo Abeni
  1 sibling, 1 reply; 13+ messages in thread
From: Mat Martineau @ 2021-12-07 18:18 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

On Tue, 7 Dec 2021, Paolo Abeni wrote:

> On Mon, 2021-12-06 at 11:18 +0100, Paolo Abeni wrote:
>> On Fri, 2021-12-03 at 17:05 -0800, Mat Martineau wrote:
>>> On Fri, 3 Dec 2021, Paolo Abeni wrote:
>>>
>>>> Include into the path manager status a bitmap tracking the list
>>>> of local endpoints still available - not yet used - for the
>>>> relavant mptcp socket.
>>>> Keep such map updated at endpoint creation/deleteion time, so
>>>> that we can easily skip already used endpoint at local address
>>>> selection time.
>>>> This allows for fair local endpoints usage in case of subflow
>>>> failure.
>>>>
>>>> As a side effect, this patch also enforces that each endpoint
>>>> is used at most once for each mptcp connection.
>>>
>>> This seems like a significant change, possibly with unexpected
>>> consequences (but maybe I've misunderstood the new constraint?).
>>> Apparently the topoligies in our self tests happen to work fine!
>>
>> Uhmmm... probably less then it may looks like. We already respected the
>> above constraint in almost every scenarios. AFAICS, the only exception
>> prior to this patch was following scenario:
>>
>> 1) subflow is established endpoint A
>>
>> 2) the peer closes the additional subflow
>>
>> 2) a new local endpoint B is created
>>
>> 3) the PM will create the 2nd subflow using again endpoint A.
>>
>> which is arguibly unexpected, as reported by issues/242.
>>
>>> If you have peer A with one interface and peer B with two, it seems like
>>> this should be allowed:
>>>
>>> Peer A connects to Peer B endpoint 0
>>> Peer B announces endpoint 1
>>> Peer A sends an MP_JOIN to Peer B interface 1
>>>
>>> From the description above, it sounds like peer A would not be able to
>>> create that second subflow.
>>
>> Well, AFAICS the peer A will actually create the subflow, if the limits
>> allow that, both with or without this patch.
>> Specifically:
>>
>> 1) no endpoint configured, add_addr_accepted >=1, subflows >=1 on peer
>> A -> the subflow to peer be interface 1 will be created (with or
>> without this patch) because we don't have any other additional
>> constraint beyond the limits.
>>
>> 2) endpoint configured on peer A matching the MPC subflow source
>> address -> again the subflow towards peer B endpoint 1 will be created,
>> because that will be the first time the PM will use the peer A
>> endpoint.
>
> Thinking again about this last scenario, I think there is something to
> improve:
>
> if peer A connects to peer B using the local endpoint address, it can
> later use again such address for additional sublflows, as the MPC
> subflow is no accounted by the current version of this patch. I'll cook
> a v5 adding such account, too - and possibly a related self-tests
>

Thanks Paolo, that will help.

My main concern is what happens when the bitmap fills up. While 
__mptcp_subflow_connect() can still be called from 
mptcp_pm_nl_add_addr_received() (so, you're right, we can still create 
multiple outgoing connections this way), it won't be called from 
mptcp_pm_create_subflow_or_signal_addr() because select_local_address() 
will now return NULL. The latter case doesn't seem like expected/desired 
behavior.


--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH mptcp-next v4 3/5] mptcp: keep track of local endpoint still available for each msk
  2021-12-07 18:18         ` Mat Martineau
@ 2021-12-09 10:13           ` Paolo Abeni
  0 siblings, 0 replies; 13+ messages in thread
From: Paolo Abeni @ 2021-12-09 10:13 UTC (permalink / raw)
  To: Mat Martineau; +Cc: mptcp

On Tue, 2021-12-07 at 10:18 -0800, Mat Martineau wrote:
> On Tue, 7 Dec 2021, Paolo Abeni wrote:
> 
> > On Mon, 2021-12-06 at 11:18 +0100, Paolo Abeni wrote:
> > > On Fri, 2021-12-03 at 17:05 -0800, Mat Martineau wrote:
> > > > On Fri, 3 Dec 2021, Paolo Abeni wrote:
> > > > 
> > > > > Include into the path manager status a bitmap tracking the list
> > > > > of local endpoints still available - not yet used - for the
> > > > > relavant mptcp socket.
> > > > > Keep such map updated at endpoint creation/deleteion time, so
> > > > > that we can easily skip already used endpoint at local address
> > > > > selection time.
> > > > > This allows for fair local endpoints usage in case of subflow
> > > > > failure.
> > > > > 
> > > > > As a side effect, this patch also enforces that each endpoint
> > > > > is used at most once for each mptcp connection.
> > > > 
> > > > This seems like a significant change, possibly with unexpected
> > > > consequences (but maybe I've misunderstood the new constraint?).
> > > > Apparently the topoligies in our self tests happen to work fine!
> > > 
> > > Uhmmm... probably less then it may looks like. We already respected the
> > > above constraint in almost every scenarios. AFAICS, the only exception
> > > prior to this patch was following scenario:
> > > 
> > > 1) subflow is established endpoint A
> > > 
> > > 2) the peer closes the additional subflow
> > > 
> > > 2) a new local endpoint B is created
> > > 
> > > 3) the PM will create the 2nd subflow using again endpoint A.
> > > 
> > > which is arguibly unexpected, as reported by issues/242.
> > > 
> > > > If you have peer A with one interface and peer B with two, it seems like
> > > > this should be allowed:
> > > > 
> > > > Peer A connects to Peer B endpoint 0
> > > > Peer B announces endpoint 1
> > > > Peer A sends an MP_JOIN to Peer B interface 1
> > > > 
> > > > From the description above, it sounds like peer A would not be able to
> > > > create that second subflow.
> > > 
> > > Well, AFAICS the peer A will actually create the subflow, if the limits
> > > allow that, both with or without this patch.
> > > Specifically:
> > > 
> > > 1) no endpoint configured, add_addr_accepted >=1, subflows >=1 on peer
> > > A -> the subflow to peer be interface 1 will be created (with or
> > > without this patch) because we don't have any other additional
> > > constraint beyond the limits.
> > > 
> > > 2) endpoint configured on peer A matching the MPC subflow source
> > > address -> again the subflow towards peer B endpoint 1 will be created,
> > > because that will be the first time the PM will use the peer A
> > > endpoint.
> > 
> > Thinking again about this last scenario, I think there is something to
> > improve:
> > 
> > if peer A connects to peer B using the local endpoint address, it can
> > later use again such address for additional sublflows, as the MPC
> > subflow is no accounted by the current version of this patch. I'll cook
> > a v5 adding such account, too - and possibly a related self-tests
> > 
> 
> Thanks Paolo, that will help.

Still working on that, I hit a few additional corner-cases
> 
> My main concern is what happens when the bitmap fills up. While 
> __mptcp_subflow_connect() can still be called from 
> mptcp_pm_nl_add_addr_received() (so, you're right, we can still create 
> multiple outgoing connections this way), it won't be called from 
> mptcp_pm_create_subflow_or_signal_addr() because select_local_address() 
> will now return NULL. The latter case doesn't seem like expected/desired 
> behavior.

The scenario you describe already happens with the current code (prior
to this series): when all local endpoints have been used
(local_addr_used == local_addr_max),
mptcp_pm_create_subflow_or_signal_addr() will not invoke anymore
__mptcp_subflow_connect() even if subflows < subflows_max.

The problem with reling only on 'local_addr_used' is that it allows
duplicate endpoint usage (as opposed to fair endpoint usage) as
described by issues/242.


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-12-09 10:13 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-03 15:19 [PATCH mptcp-next v4 0/5] mptcp: improve subflow creation on errors Paolo Abeni
2021-12-03 15:19 ` [PATCH mptcp-next v4 1/5] mptcp: fix per socket endpoint accounting Paolo Abeni
2021-12-03 15:19 ` [PATCH mptcp-next v4 2/5] mptcp: clean-up MPJ option writing Paolo Abeni
2021-12-03 15:19 ` [PATCH mptcp-next v4 3/5] mptcp: keep track of local endpoint still available for each msk Paolo Abeni
2021-12-04  1:05   ` Mat Martineau
2021-12-06 10:18     ` Paolo Abeni
2021-12-07 11:28       ` Paolo Abeni
2021-12-07 18:08         ` Paolo Abeni
2021-12-07 18:18         ` Mat Martineau
2021-12-09 10:13           ` Paolo Abeni
2021-12-03 15:19 ` [PATCH mptcp-next v4 4/5] mptcp: do not block subflows creation on errors Paolo Abeni
2021-12-03 15:19 ` [PATCH mptcp-next v4 5/5] selftests: mptcp: add tests for subflow creation failure Paolo Abeni
2021-12-04  1:13   ` Mat Martineau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.