All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 mptcp-next 0/3] mptcp: more self-tests improvements
@ 2022-02-14 15:38 Paolo Abeni
  2022-02-14 15:38 ` [PATCH v4 mptcp-next 1/3] Squash-to: "mptcp: constify a bunch of helpers" Paolo Abeni
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Paolo Abeni @ 2022-02-14 15:38 UTC (permalink / raw)
  To: mptcp

This iteration tries to address the feedback from Mat on v3.

patch 1/3 is a somewhat related cleanup (to be squashed into existing
patch)
patch 2/3 should address the self-test failure
patch 3/3 avoids RFC breakage reported into the previous iteration,
leveraging the previois one

Patch 2/3 would probably deserve some additional self-tests. My plan is
to add them after that the existing ones are stable enough.

Paolo Abeni (3):
  Squash-to: "mptcp: constify a bunch of helpers"
  mptcp: more careful RM_ADDR generation
  mptcp: strict local address ID selection.

 include/uapi/linux/mptcp.h |  1 +
 net/mptcp/pm_netlink.c     | 59 ++++++++++++++++++---------------
 net/mptcp/protocol.c       |  3 ++
 net/mptcp/protocol.h       |  3 +-
 net/mptcp/subflow.c        | 67 +++++++++++++++++++++++++++++++++-----
 5 files changed, 97 insertions(+), 36 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v4 mptcp-next 1/3] Squash-to: "mptcp: constify a bunch of helpers"
  2022-02-14 15:38 [PATCH v4 mptcp-next 0/3] mptcp: more self-tests improvements Paolo Abeni
@ 2022-02-14 15:38 ` Paolo Abeni
  2022-02-15  9:48   ` Matthieu Baerts
  2022-02-14 15:38 ` [PATCH v4 mptcp-next 2/3] mptcp: more careful RM_ADDR generation Paolo Abeni
  2022-02-14 15:38 ` [PATCH v4 mptcp-next 3/3] mptcp: strict local address ID selection Paolo Abeni
  2 siblings, 1 reply; 11+ messages in thread
From: Paolo Abeni @ 2022-02-14 15:38 UTC (permalink / raw)
  To: mptcp

There are a few more helpers that can be const-ified and I missed
in the previous patch, just squash these change into there.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/pm_netlink.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 46346f009485..56f5603c10f2 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -1289,7 +1289,7 @@ int mptcp_pm_get_flags_and_ifindex_by_id(struct net *net, unsigned int id,
 }
 
 static bool remove_anno_list_by_saddr(struct mptcp_sock *msk,
-				      struct mptcp_addr_info *addr)
+				      const struct mptcp_addr_info *addr)
 {
 	struct mptcp_pm_add_entry *entry;
 
@@ -1304,7 +1304,7 @@ static bool remove_anno_list_by_saddr(struct mptcp_sock *msk,
 }
 
 static bool mptcp_pm_remove_anno_addr(struct mptcp_sock *msk,
-				      struct mptcp_addr_info *addr,
+				      const struct mptcp_addr_info *addr,
 				      bool force)
 {
 	struct mptcp_rm_list list = { .nr = 0 };
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 mptcp-next 2/3] mptcp: more careful RM_ADDR generation
  2022-02-14 15:38 [PATCH v4 mptcp-next 0/3] mptcp: more self-tests improvements Paolo Abeni
  2022-02-14 15:38 ` [PATCH v4 mptcp-next 1/3] Squash-to: "mptcp: constify a bunch of helpers" Paolo Abeni
@ 2022-02-14 15:38 ` Paolo Abeni
  2022-02-14 18:41   ` Paolo Abeni
  2022-02-14 15:38 ` [PATCH v4 mptcp-next 3/3] mptcp: strict local address ID selection Paolo Abeni
  2 siblings, 1 reply; 11+ messages in thread
From: Paolo Abeni @ 2022-02-14 15:38 UTC (permalink / raw)
  To: mptcp

In some edge scenarios, an MPTCP subflows can use a local address
mapped by a "implicit" endpoint created by the in-kernel path manager.

When such endpoint is deleted, the in kernel PM sends a RM_ADDR MPTCP
suboption. That is somewhat unexpected, as an MPTCP listener will keep
accepting incoming subflows targeting such address and the unexpected
options can confuse some self-tests.

Be more conservative about RM_ADDR generation: explicitly track the
implicit endpoint with an appropriate flag and exclude them from the
RM_ADDR generation.

Additionally allow the user-space to replace implicit endpoint with
user-provided data at endpoint creation time.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/uapi/linux/mptcp.h |  1 +
 net/mptcp/pm_netlink.c     | 42 ++++++++++++++++++++++++++++----------
 2 files changed, 32 insertions(+), 11 deletions(-)

diff --git a/include/uapi/linux/mptcp.h b/include/uapi/linux/mptcp.h
index f106a3941cdf..9690efedb5fa 100644
--- a/include/uapi/linux/mptcp.h
+++ b/include/uapi/linux/mptcp.h
@@ -81,6 +81,7 @@ enum {
 #define MPTCP_PM_ADDR_FLAG_SUBFLOW			(1 << 1)
 #define MPTCP_PM_ADDR_FLAG_BACKUP			(1 << 2)
 #define MPTCP_PM_ADDR_FLAG_FULLMESH			(1 << 3)
+#define MPTCP_PM_ADDR_FLAG_IMPLICIT			(1 << 4)
 
 enum {
 	MPTCP_PM_CMD_UNSPEC,
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 56f5603c10f2..66cda3a425c4 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -901,8 +901,19 @@ static int mptcp_pm_nl_append_new_local_addr(struct pm_nl_pernet *pernet,
 	list_for_each_entry(cur, &pernet->local_addr_list, list) {
 		if (addresses_equal(&cur->addr, &entry->addr,
 				    address_use_port(entry) &&
-				    address_use_port(cur)))
-			goto out;
+				    address_use_port(cur))) {
+			/* allow replacing the exiting endpoint only if such
+			 * endpoint is an implicit one and the user-space
+			 * did not provide an endpoint id
+			 */
+			if (!(cur->flags & MPTCP_PM_ADDR_FLAG_IMPLICIT))
+				goto out;
+			if (entry->addr.id)
+				goto out;
+
+			pernet->addrs--;
+			list_del_rcu(&entry->list);
+		}
 	}
 
 	if (!entry->addr.id) {
@@ -1036,7 +1047,7 @@ int mptcp_pm_nl_get_local_id(struct mptcp_sock *msk, struct sock_common *skc)
 	entry->addr.id = 0;
 	entry->addr.port = 0;
 	entry->ifindex = 0;
-	entry->flags = 0;
+	entry->flags = MPTCP_PM_ADDR_FLAG_IMPLICIT;
 	entry->lsk = NULL;
 	ret = mptcp_pm_nl_append_new_local_addr(pernet, entry);
 	if (ret < 0)
@@ -1238,6 +1249,11 @@ static int mptcp_nl_cmd_add_addr(struct sk_buff *skb, struct genl_info *info)
 		return -EINVAL;
 	}
 
+	if (addr.flags & MPTCP_PM_ADDR_FLAG_IMPLICIT) {
+		GENL_SET_ERR_MSG(info, "can't create IMPLICIT endpoint");
+		return -EINVAL;
+	}
+
 	entry = kmalloc(sizeof(*entry), GFP_KERNEL);
 	if (!entry) {
 		GENL_SET_ERR_MSG(info, "can't allocate addr");
@@ -1322,11 +1338,12 @@ static bool mptcp_pm_remove_anno_addr(struct mptcp_sock *msk,
 }
 
 static int mptcp_nl_remove_subflow_and_signal_addr(struct net *net,
-						   struct mptcp_addr_info *addr)
+						   const struct mptcp_pm_addr_entry *entry)
 {
-	struct mptcp_sock *msk;
-	long s_slot = 0, s_num = 0;
+	const struct mptcp_addr_info *addr = &entry->addr;
 	struct mptcp_rm_list list = { .nr = 0 };
+	long s_slot = 0, s_num = 0;
+	struct mptcp_sock *msk;
 
 	pr_debug("remove_id=%d", addr->id);
 
@@ -1346,7 +1363,8 @@ static int mptcp_nl_remove_subflow_and_signal_addr(struct net *net,
 
 		lock_sock(sk);
 		remove_subflow = lookup_subflow_by_saddr(&msk->conn_list, addr);
-		mptcp_pm_remove_anno_addr(msk, addr, remove_subflow);
+		mptcp_pm_remove_anno_addr(msk, addr, remove_subflow &&
+					  !(entry->flags & MPTCP_PM_ADDR_FLAG_IMPLICIT));
 		if (remove_subflow)
 			mptcp_pm_remove_subflow(msk, &list);
 		release_sock(sk);
@@ -1443,7 +1461,7 @@ static int mptcp_nl_cmd_del_addr(struct sk_buff *skb, struct genl_info *info)
 	__clear_bit(entry->addr.id, pernet->id_bitmap);
 	spin_unlock_bh(&pernet->lock);
 
-	mptcp_nl_remove_subflow_and_signal_addr(sock_net(skb->sk), &entry->addr);
+	mptcp_nl_remove_subflow_and_signal_addr(sock_net(skb->sk), entry);
 	synchronize_rcu();
 	__mptcp_pm_release_addr_entry(entry);
 
@@ -1458,9 +1476,11 @@ static void mptcp_pm_remove_addrs_and_subflows(struct mptcp_sock *msk,
 
 	list_for_each_entry(entry, rm_list, list) {
 		if (lookup_subflow_by_saddr(&msk->conn_list, &entry->addr) &&
-		    alist.nr < MPTCP_RM_IDS_MAX &&
 		    slist.nr < MPTCP_RM_IDS_MAX) {
-			alist.ids[alist.nr++] = entry->addr.id;
+			/* skip RM_ADDR for dummy endpoints */
+			if (!(entry->flags & MPTCP_PM_ADDR_FLAG_IMPLICIT) &&
+			    alist.nr < MPTCP_RM_IDS_MAX)
+				alist.ids[alist.nr++] = entry->addr.id;
 			slist.ids[slist.nr++] = entry->addr.id;
 		} else if (remove_anno_list_by_saddr(msk, &entry->addr) &&
 			 alist.nr < MPTCP_RM_IDS_MAX) {
@@ -1811,7 +1831,7 @@ static int mptcp_nl_cmd_set_flags(struct sk_buff *skb, struct genl_info *info)
 
 	spin_lock_bh(&pernet->lock);
 	entry = __lookup_addr(pernet, &addr.addr, lookup_by_id);
-	if (!entry) {
+	if (!entry || (entry->flags & MPTCP_PM_ADDR_FLAG_IMPLICIT)) {
 		spin_unlock_bh(&pernet->lock);
 		return -EINVAL;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 mptcp-next 3/3] mptcp: strict local address ID selection.
  2022-02-14 15:38 [PATCH v4 mptcp-next 0/3] mptcp: more self-tests improvements Paolo Abeni
  2022-02-14 15:38 ` [PATCH v4 mptcp-next 1/3] Squash-to: "mptcp: constify a bunch of helpers" Paolo Abeni
  2022-02-14 15:38 ` [PATCH v4 mptcp-next 2/3] mptcp: more careful RM_ADDR generation Paolo Abeni
@ 2022-02-14 15:38 ` Paolo Abeni
  2 siblings, 0 replies; 11+ messages in thread
From: Paolo Abeni @ 2022-02-14 15:38 UTC (permalink / raw)
  To: mptcp

The address ID selection for MPJ subflows created in response
to incoming ADD_ADDR option is currently unreliable: it happens
at MPJ socket creation time, when the local address could be
unknown.

Additionally, if the no local endpoint is available for the local
address, a new dummy endpoint is created, confusing the user-land.

This change refactor the code to move the address ID seleciton inside
the rebuild_header() helper, when the local address eventually
selected by the route lookup is finally known. If the address used
is not mapped by any endpoint - and thus can't be advertised/removed
pick the id 0 instead of allocate a new endpoint.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 v3 -> v4:
- really create endpoints in mptcp_pm_get_local_id() - Mat

 v2 -> v3:
- keep creating dummy endpoint

 v1 -> v2:
- hopefully fix build issue with ipv6 disabled
- avoid looking-up multiple times the local_id for req sockets
- factor-out an helper for local_id initialization

RFC -> v1:
- don't bail if ID lookup fails, use 0 instead
---
 net/mptcp/pm_netlink.c | 13 --------
 net/mptcp/protocol.c   |  3 ++
 net/mptcp/protocol.h   |  3 +-
 net/mptcp/subflow.c    | 67 ++++++++++++++++++++++++++++++++++++------
 4 files changed, 63 insertions(+), 23 deletions(-)

diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 66cda3a425c4..3686d4c803b6 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -83,16 +83,6 @@ static bool addresses_equal(const struct mptcp_addr_info *a,
 	return a->port == b->port;
 }
 
-static bool address_zero(const struct mptcp_addr_info *addr)
-{
-	struct mptcp_addr_info zero;
-
-	memset(&zero, 0, sizeof(zero));
-	zero.family = addr->family;
-
-	return addresses_equal(addr, &zero, true);
-}
-
 static void local_address(const struct sock_common *skc,
 			  struct mptcp_addr_info *addr)
 {
@@ -1022,9 +1012,6 @@ int mptcp_pm_nl_get_local_id(struct mptcp_sock *msk, struct sock_common *skc)
 	if (addresses_equal(&msk_local, &skc_local, false))
 		return 0;
 
-	if (address_zero(&skc_local))
-		return 0;
-
 	pernet = net_generic(sock_net((struct sock *)msk), pm_nl_pernet_id);
 
 	rcu_read_lock();
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 3324e1c61576..57caf470e500 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -117,6 +117,9 @@ static int __mptcp_socket_create(struct mptcp_sock *msk)
 	list_add(&subflow->node, &msk->conn_list);
 	sock_hold(ssock->sk);
 	subflow->request_mptcp = 1;
+
+	/* This is the first subflow, always with id 0 */
+	subflow->local_id_valid = 1;
 	mptcp_sock_graft(msk->first, sk->sk_socket);
 
 	return 0;
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index a1ce1fd005ab..663b8d83154e 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -453,7 +453,8 @@ struct mptcp_subflow_context {
 		rx_eof : 1,
 		can_ack : 1,        /* only after processing the remote a key */
 		disposable : 1,	    /* ctx can be free at ulp release time */
-		stale : 1;	    /* unable to snd/rcv data, do not use for xmit */
+		stale : 1,	    /* unable to snd/rcv data, do not use for xmit */
+		local_id_valid : 1; /* local_id is correctly initialized */
 	enum mptcp_data_avail data_avail;
 	u32	remote_nonce;
 	u64	thmac;
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index b53b392dd280..283e5d57e003 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -481,6 +481,51 @@ static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb)
 	mptcp_subflow_reset(sk);
 }
 
+static void subflow_set_local_id(struct mptcp_subflow_context *subflow, int local_id)
+{
+	subflow->local_id = local_id;
+	subflow->local_id_valid = 1;
+}
+
+static int subflow_chk_local_id(struct sock *sk)
+{
+	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
+	struct mptcp_sock *msk = mptcp_sk(subflow->conn);
+	int err;
+
+	if (likely(subflow->local_id_valid))
+		return 0;
+
+	err = mptcp_pm_get_local_id(msk, (struct sock_common *)sk);
+	if (err < 0)
+		return err;
+
+	subflow_set_local_id(subflow, err);
+	return 0;
+}
+
+static int subflow_rebuild_header(struct sock *sk)
+{
+	int err = subflow_chk_local_id(sk);
+
+	if (unlikely(err < 0))
+		return err;
+
+	return inet_sk_rebuild_header(sk);
+}
+
+#if IS_ENABLED(CONFIG_MPTCP_IPV6)
+static int subflow_v6_rebuild_header(struct sock *sk)
+{
+	int err = subflow_chk_local_id(sk);
+
+	if (unlikely(err < 0))
+		return err;
+
+	return inet6_sk_rebuild_header(sk);
+}
+#endif
+
 struct request_sock_ops mptcp_subflow_request_sock_ops;
 static struct tcp_request_sock_ops subflow_request_sock_ipv4_ops __ro_after_init;
 
@@ -1403,13 +1448,8 @@ int __mptcp_subflow_connect(struct sock *sk, const struct mptcp_addr_info *loc,
 		get_random_bytes(&subflow->local_nonce, sizeof(u32));
 	} while (!subflow->local_nonce);
 
-	if (!local_id) {
-		err = mptcp_pm_get_local_id(msk, (struct sock_common *)ssk);
-		if (err < 0)
-			goto failed;
-
-		local_id = err;
-	}
+	if (local_id)
+		subflow_set_local_id(subflow, local_id);
 
 	mptcp_pm_get_flags_and_ifindex_by_id(sock_net(sk), local_id,
 					     &flags, &ifindex);
@@ -1434,7 +1474,6 @@ int __mptcp_subflow_connect(struct sock *sk, const struct mptcp_addr_info *loc,
 	pr_debug("msk=%p remote_token=%u local_id=%d remote_id=%d", msk,
 		 remote_token, local_id, remote_id);
 	subflow->remote_token = remote_token;
-	subflow->local_id = local_id;
 	subflow->remote_id = remote_id;
 	subflow->request_join = 1;
 	subflow->request_bkup = !!(flags & MPTCP_PM_ADDR_FLAG_BACKUP);
@@ -1734,15 +1773,22 @@ static void subflow_ulp_clone(const struct request_sock *req,
 		new_ctx->token = subflow_req->token;
 		new_ctx->ssn_offset = subflow_req->ssn_offset;
 		new_ctx->idsn = subflow_req->idsn;
+
+		/* this is the first subflow, id is always 0 */
+		new_ctx->local_id_valid = 1;
 	} else if (subflow_req->mp_join) {
 		new_ctx->ssn_offset = subflow_req->ssn_offset;
 		new_ctx->mp_join = 1;
 		new_ctx->fully_established = 1;
 		new_ctx->backup = subflow_req->backup;
-		new_ctx->local_id = subflow_req->local_id;
 		new_ctx->remote_id = subflow_req->remote_id;
 		new_ctx->token = subflow_req->token;
 		new_ctx->thmac = subflow_req->thmac;
+
+		/* the subflow req id is valid, fetched via subflow_check_req()
+		 * and subflow_token_join_request()
+		 */
+		subflow_set_local_id(new_ctx, subflow_req->local_id);
 	}
 }
 
@@ -1795,6 +1841,7 @@ void __init mptcp_subflow_init(void)
 	subflow_specific.conn_request = subflow_v4_conn_request;
 	subflow_specific.syn_recv_sock = subflow_syn_recv_sock;
 	subflow_specific.sk_rx_dst_set = subflow_finish_connect;
+	subflow_specific.rebuild_header = subflow_rebuild_header;
 
 	tcp_prot_override = tcp_prot;
 	tcp_prot_override.release_cb = tcp_release_cb_override;
@@ -1807,6 +1854,7 @@ void __init mptcp_subflow_init(void)
 	subflow_v6_specific.conn_request = subflow_v6_conn_request;
 	subflow_v6_specific.syn_recv_sock = subflow_syn_recv_sock;
 	subflow_v6_specific.sk_rx_dst_set = subflow_finish_connect;
+	subflow_v6_specific.rebuild_header = subflow_v6_rebuild_header;
 
 	subflow_v6m_specific = subflow_v6_specific;
 	subflow_v6m_specific.queue_xmit = ipv4_specific.queue_xmit;
@@ -1814,6 +1862,7 @@ void __init mptcp_subflow_init(void)
 	subflow_v6m_specific.net_header_len = ipv4_specific.net_header_len;
 	subflow_v6m_specific.mtu_reduced = ipv4_specific.mtu_reduced;
 	subflow_v6m_specific.net_frag_header_len = 0;
+	subflow_v6m_specific.rebuild_header = subflow_rebuild_header;
 
 	tcpv6_prot_override = tcpv6_prot;
 	tcpv6_prot_override.release_cb = tcp_release_cb_override;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 mptcp-next 2/3] mptcp: more careful RM_ADDR generation
  2022-02-14 15:38 ` [PATCH v4 mptcp-next 2/3] mptcp: more careful RM_ADDR generation Paolo Abeni
@ 2022-02-14 18:41   ` Paolo Abeni
  2022-02-15  1:19     ` Mat Martineau
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Abeni @ 2022-02-14 18:41 UTC (permalink / raw)
  To: mptcp

On Mon, 2022-02-14 at 16:38 +0100, Paolo Abeni wrote:
> In some edge scenarios, an MPTCP subflows can use a local address
> mapped by a "implicit" endpoint created by the in-kernel path manager.
> 
> When such endpoint is deleted, the in kernel PM sends a RM_ADDR MPTCP
> suboption. That is somewhat unexpected, as an MPTCP listener will keep
> accepting incoming subflows targeting such address and the unexpected
> options can confuse some self-tests.
> 
> Be more conservative about RM_ADDR generation: explicitly track the
> implicit endpoint with an appropriate flag and exclude them from the
> RM_ADDR generation.
> 
> Additionally allow the user-space to replace implicit endpoint with
> user-provided data at endpoint creation time.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>

The CI still reports an issue on top of this patch:

https://cirrus-ci.com/task/5380046008352768?logs=test#L7283

Even if the symtom is the same, the root cause is different. I think
such failure is caused by the flush operation deleting both all
existing endpoints and all the existing subflows for each MPTCP socket.

Even when the subflow is not attached to any known, non implicit,
endpoint.

When we have simult flush on both sides, deleting the subflow (with no
endpoint) on one end can make disappear the subflow on the other end
(tied to a local endpoint) before the PM could generate the relevant
RM_ADDR.

Overall the number of generated RM_ADDR become impredictable, even if
the testcase is quite stable, to the point I could not replicate the
failure reported by the CI.

AFAICS, there are 2 possible solutions:

- avoid flushing both ends in test-cases "flush subflows" and "flush
addresses". This is very simple, but could hide other (currently
unknown) problems.
- change mptcp_nl_cmd_flush_addrs() to only delete subflows tied to
known, non implicit, endpoint. That is possibly a saner behavior for ip
mptcp endpoint flush, but will need more testcases to be adjusted and
is still a change of behaviour.

Any hint on the preferred option more than welcome!

/P


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 mptcp-next 2/3] mptcp: more careful RM_ADDR generation
  2022-02-14 18:41   ` Paolo Abeni
@ 2022-02-15  1:19     ` Mat Martineau
  2022-02-15 17:07       ` Paolo Abeni
  0 siblings, 1 reply; 11+ messages in thread
From: Mat Martineau @ 2022-02-15  1:19 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

On Mon, 14 Feb 2022, Paolo Abeni wrote:

> On Mon, 2022-02-14 at 16:38 +0100, Paolo Abeni wrote:
>> In some edge scenarios, an MPTCP subflows can use a local address
>> mapped by a "implicit" endpoint created by the in-kernel path manager.
>>
>> When such endpoint is deleted, the in kernel PM sends a RM_ADDR MPTCP
>> suboption. That is somewhat unexpected, as an MPTCP listener will keep
>> accepting incoming subflows targeting such address and the unexpected
>> options can confuse some self-tests.
>>
>> Be more conservative about RM_ADDR generation: explicitly track the
>> implicit endpoint with an appropriate flag and exclude them from the
>> RM_ADDR generation.
>>
>> Additionally allow the user-space to replace implicit endpoint with
>> user-provided data at endpoint creation time.
>>
>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>
> The CI still reports an issue on top of this patch:
>
> https://cirrus-ci.com/task/5380046008352768?logs=test#L7283
>
> Even if the symtom is the same, the root cause is different. I think
> such failure is caused by the flush operation deleting both all
> existing endpoints and all the existing subflows for each MPTCP socket.
>
> Even when the subflow is not attached to any known, non implicit,
> endpoint.
>
> When we have simult flush on both sides, deleting the subflow (with no
> endpoint) on one end can make disappear the subflow on the other end
> (tied to a local endpoint) before the PM could generate the relevant
> RM_ADDR.
>

If we aren't closing all subflows during the flush, shouldn't there be at 
least one subflow where the RM_ADDR can be sent?

Maybe mptcp_pm_nl_addr_send_ack() needs to pick a better subflow for 
sending the RM_ADDR. Do you think it would help to make 
mptcp_subflow_send_ack() return a bool, and only return true if the ack 
was sent? Then it could retry the ack on other subflows in the conn_list, 
until it finds one that works or they all fail.


> Overall the number of generated RM_ADDR become impredictable, even if
> the testcase is quite stable, to the point I could not replicate the
> failure reported by the CI.
>
> AFAICS, there are 2 possible solutions:
>
> - avoid flushing both ends in test-cases "flush subflows" and "flush
> addresses". This is very simple, but could hide other (currently
> unknown) problems.
> - change mptcp_nl_cmd_flush_addrs() to only delete subflows tied to
> known, non implicit, endpoint. That is possibly a saner behavior for ip
> mptcp endpoint flush, but will need more testcases to be adjusted and
> is still a change of behaviour.
>
> Any hint on the preferred option more than welcome!



--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 mptcp-next 1/3] Squash-to: "mptcp: constify a bunch of helpers"
  2022-02-14 15:38 ` [PATCH v4 mptcp-next 1/3] Squash-to: "mptcp: constify a bunch of helpers" Paolo Abeni
@ 2022-02-15  9:48   ` Matthieu Baerts
  0 siblings, 0 replies; 11+ messages in thread
From: Matthieu Baerts @ 2022-02-15  9:48 UTC (permalink / raw)
  To: Paolo Abeni, mptcp

Hi Paolo, Mat,

On 14/02/2022 16:38, Paolo Abeni wrote:
> There are a few more helpers that can be const-ified and I missed
> in the previous patch, just squash these change into there.

I just applied this patch which was already OK in v3:

- ef783d7d6771: "squashed" patch 1/3 in "mptcp: constify a bunch of of
helpers"
- Results: a9327cdfe5bf..7cf2ef8f7b27

Builds and tests are now in progress:

https://cirrus-ci.com/github/multipath-tcp/mptcp_net-next/export/20220215T094543
https://github.com/multipath-tcp/mptcp_net-next/actions/workflows/build-validation.yml?query=branch:export

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 mptcp-next 2/3] mptcp: more careful RM_ADDR generation
  2022-02-15  1:19     ` Mat Martineau
@ 2022-02-15 17:07       ` Paolo Abeni
  2022-02-15 19:15         ` Mat Martineau
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Abeni @ 2022-02-15 17:07 UTC (permalink / raw)
  To: Mat Martineau; +Cc: mptcp

On Mon, 2022-02-14 at 17:19 -0800, Mat Martineau wrote:
> On Mon, 14 Feb 2022, Paolo Abeni wrote:
> 
> > On Mon, 2022-02-14 at 16:38 +0100, Paolo Abeni wrote:
> > > In some edge scenarios, an MPTCP subflows can use a local address
> > > mapped by a "implicit" endpoint created by the in-kernel path manager.
> > > 
> > > When such endpoint is deleted, the in kernel PM sends a RM_ADDR MPTCP
> > > suboption. That is somewhat unexpected, as an MPTCP listener will keep
> > > accepting incoming subflows targeting such address and the unexpected
> > > options can confuse some self-tests.
> > > 
> > > Be more conservative about RM_ADDR generation: explicitly track the
> > > implicit endpoint with an appropriate flag and exclude them from the
> > > RM_ADDR generation.
> > > 
> > > Additionally allow the user-space to replace implicit endpoint with
> > > user-provided data at endpoint creation time.
> > > 
> > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > 
> > The CI still reports an issue on top of this patch:
> > 
> > https://cirrus-ci.com/task/5380046008352768?logs=test#L7283
> > 
> > Even if the symtom is the same, the root cause is different. I think
> > such failure is caused by the flush operation deleting both all
> > existing endpoints and all the existing subflows for each MPTCP socket.
> > 
> > Even when the subflow is not attached to any known, non implicit,
> > endpoint.
> > 
> > When we have simult flush on both sides, deleting the subflow (with no
> > endpoint) on one end can make disappear the subflow on the other end
> > (tied to a local endpoint) before the PM could generate the relevant
> > RM_ADDR.
> > 
> 
> If we aren't closing all subflows during the flush, shouldn't there be at 
> least one subflow where the RM_ADDR can be sent?
> 
> Maybe mptcp_pm_nl_addr_send_ack() needs to pick a better subflow for 
> sending the RM_ADDR. Do you think it would help to make 
> mptcp_subflow_send_ack() return a bool, and only return true if the ack 
> was sent? Then it could retry the ack on other subflows in the conn_list, 
> until it finds one that works or they all fail.

I'm sorry it looks like I was unclear. 

What I mean is that in the following scenario:

Client					Server 
[endpoint1] ------- MPTCP-subflow ----> [addr0/no endpoint]
[endpoint2] ------- subflow 1  -------> [addr0/no endpoint]
[endpoint3] ------- subflow 2  -------> [addr0/no endpoint]

if we flush simultaneusly the endpoints on both the server and the
client (which is what the failing selftest is currently doing), the
number of RM_ADDR generated by the client is unpredictable.

On endpoint flush the server will try to delete all the subflows,
regardless of no endpoints attached there. The server can delete the
subflow 1 and/or the subflow 2 before the client processes the relevant
endpoint on the other side. If that happens, the client will not
generate (correctly) the related RM_ADDR.

The testcase is currently expectiong exactly 2 RM_ADDR in the above
scenario (well actually 3, because the testcase uses 3 MPJ subflows)

Picking a different subflow to send the RM_ADDR will not change the
results.

Not sure if the above is somewhat more clear.

I don't see other viable options other then the 2 mentioned in my
previous email.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 mptcp-next 2/3] mptcp: more careful RM_ADDR generation
  2022-02-15 17:07       ` Paolo Abeni
@ 2022-02-15 19:15         ` Mat Martineau
  2022-02-16 18:38           ` Paolo Abeni
  0 siblings, 1 reply; 11+ messages in thread
From: Mat Martineau @ 2022-02-15 19:15 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

On Tue, 15 Feb 2022, Paolo Abeni wrote:

> On Mon, 2022-02-14 at 17:19 -0800, Mat Martineau wrote:
>> On Mon, 14 Feb 2022, Paolo Abeni wrote:
>>
>>> On Mon, 2022-02-14 at 16:38 +0100, Paolo Abeni wrote:
>>>> In some edge scenarios, an MPTCP subflows can use a local address
>>>> mapped by a "implicit" endpoint created by the in-kernel path manager.
>>>>
>>>> When such endpoint is deleted, the in kernel PM sends a RM_ADDR MPTCP
>>>> suboption. That is somewhat unexpected, as an MPTCP listener will keep
>>>> accepting incoming subflows targeting such address and the unexpected
>>>> options can confuse some self-tests.
>>>>
>>>> Be more conservative about RM_ADDR generation: explicitly track the
>>>> implicit endpoint with an appropriate flag and exclude them from the
>>>> RM_ADDR generation.
>>>>
>>>> Additionally allow the user-space to replace implicit endpoint with
>>>> user-provided data at endpoint creation time.
>>>>
>>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>>>
>>> The CI still reports an issue on top of this patch:
>>>
>>> https://cirrus-ci.com/task/5380046008352768?logs=test#L7283
>>>
>>> Even if the symtom is the same, the root cause is different. I think
>>> such failure is caused by the flush operation deleting both all
>>> existing endpoints and all the existing subflows for each MPTCP socket.
>>>
>>> Even when the subflow is not attached to any known, non implicit,
>>> endpoint.
>>>
>>> When we have simult flush on both sides, deleting the subflow (with no
>>> endpoint) on one end can make disappear the subflow on the other end
>>> (tied to a local endpoint) before the PM could generate the relevant
>>> RM_ADDR.
>>>
>>
>> If we aren't closing all subflows during the flush, shouldn't there be at
>> least one subflow where the RM_ADDR can be sent?
>>
>> Maybe mptcp_pm_nl_addr_send_ack() needs to pick a better subflow for
>> sending the RM_ADDR. Do you think it would help to make
>> mptcp_subflow_send_ack() return a bool, and only return true if the ack
>> was sent? Then it could retry the ack on other subflows in the conn_list,
>> until it finds one that works or they all fail.
>
> I'm sorry it looks like I was unclear.
>
> What I mean is that in the following scenario:
>
> Client					Server
> [endpoint1] ------- MPTCP-subflow ----> [addr0/no endpoint]
> [endpoint2] ------- subflow 1  -------> [addr0/no endpoint]
> [endpoint3] ------- subflow 2  -------> [addr0/no endpoint]
>
> if we flush simultaneusly the endpoints on both the server and the
> client (which is what the failing selftest is currently doing), the
> number of RM_ADDR generated by the client is unpredictable.
>
> On endpoint flush the server will try to delete all the subflows,
> regardless of no endpoints attached there. The server can delete the
> subflow 1 and/or the subflow 2 before the client processes the relevant
> endpoint on the other side. If that happens, the client will not
> generate (correctly) the related RM_ADDR.

I'm not sure that's correct, though. Even if subflows 1 and/or 2 have been 
closed, if the client has sent an ADD_ADDR to advertise endpoints 2 and 3, 
it should send RM_ADDR to "unadvertise" them on the remaining open 
subflow. While RM_ADDR does in some cases trigger disconnects, its primary 
role is to inform the peer that previous advertisements have been revoked.

>
> The testcase is currently expectiong exactly 2 RM_ADDR in the above
> scenario (well actually 3, because the testcase uses 3 MPJ subflows)
>
> Picking a different subflow to send the RM_ADDR will not change the
> results.
>
> Not sure if the above is somewhat more clear.
>

I think your explanation is clear, but we are each trying to explain a 
different model of how things should work :)


In our in-kernel PM implementation, ADD_ADDR ends up being treated as a 
"request to connect" event and RM_ADDR is a "request to disconnect" event. 
That's doesn't completely capture what the RFC intends: ADD_ADDR is an 
advertisement that an endpoint is available for a peer PM to potentially 
connect to (at that moment or any later time before RM_ADDR or connection 
close). RM_ADDR revokes that advertisement. From the RFC (note middle 
sentence, especially):

3.4.2.  Remove Address

    If, during the lifetime of an MPTCP connection, a previously
    announced address becomes invalid (e.g., if the interface disappears
    or an IPv6 address is no longer preferred), the affected host SHOULD
    announce this situation so that the peer can remove subflows related
    to this address.  Even if an address is not in use by an MPTCP
    connection, if it has been previously announced, an implementation
    SHOULD announce its removal.  A host MAY also choose to announce that
    a valid IP address should not be used any longer -- for example, for
    make-before-break session continuity.


Those SHOULDs do say that maybe our self test is more strict than the RFC, 
but if we fix our in-kernel PM to send a predictable number of RM_ADDRs 
then the existing tests can be satisified.


--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 mptcp-next 2/3] mptcp: more careful RM_ADDR generation
  2022-02-15 19:15         ` Mat Martineau
@ 2022-02-16 18:38           ` Paolo Abeni
  2022-02-16 23:48             ` Mat Martineau
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Abeni @ 2022-02-16 18:38 UTC (permalink / raw)
  To: Mat Martineau; +Cc: mptcp

On Tue, 2022-02-15 at 11:15 -0800, Mat Martineau wrote:
> On Tue, 15 Feb 2022, Paolo Abeni wrote:
> 
> > On Mon, 2022-02-14 at 17:19 -0800, Mat Martineau wrote:
> > > On Mon, 14 Feb 2022, Paolo Abeni wrote:
> > > 
> > > > On Mon, 2022-02-14 at 16:38 +0100, Paolo Abeni wrote:
> > > > > In some edge scenarios, an MPTCP subflows can use a local address
> > > > > mapped by a "implicit" endpoint created by the in-kernel path manager.
> > > > > 
> > > > > When such endpoint is deleted, the in kernel PM sends a RM_ADDR MPTCP
> > > > > suboption. That is somewhat unexpected, as an MPTCP listener will keep
> > > > > accepting incoming subflows targeting such address and the unexpected
> > > > > options can confuse some self-tests.
> > > > > 
> > > > > Be more conservative about RM_ADDR generation: explicitly track the
> > > > > implicit endpoint with an appropriate flag and exclude them from the
> > > > > RM_ADDR generation.
> > > > > 
> > > > > Additionally allow the user-space to replace implicit endpoint with
> > > > > user-provided data at endpoint creation time.
> > > > > 
> > > > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > > 
> > > > The CI still reports an issue on top of this patch:
> > > > 
> > > > https://cirrus-ci.com/task/5380046008352768?logs=test#L7283
> > > > 
> > > > Even if the symtom is the same, the root cause is different. I think
> > > > such failure is caused by the flush operation deleting both all
> > > > existing endpoints and all the existing subflows for each MPTCP socket.
> > > > 
> > > > Even when the subflow is not attached to any known, non implicit,
> > > > endpoint.
> > > > 
> > > > When we have simult flush on both sides, deleting the subflow (with no
> > > > endpoint) on one end can make disappear the subflow on the other end
> > > > (tied to a local endpoint) before the PM could generate the relevant
> > > > RM_ADDR.
> > > > 
> > > 
> > > If we aren't closing all subflows during the flush, shouldn't there be at
> > > least one subflow where the RM_ADDR can be sent?
> > > 
> > > Maybe mptcp_pm_nl_addr_send_ack() needs to pick a better subflow for
> > > sending the RM_ADDR. Do you think it would help to make
> > > mptcp_subflow_send_ack() return a bool, and only return true if the ack
> > > was sent? Then it could retry the ack on other subflows in the conn_list,
> > > until it finds one that works or they all fail.
> > 
> > I'm sorry it looks like I was unclear.
> > 
> > What I mean is that in the following scenario:
> > 
> > Client					Server
> > [endpoint1] ------- MPTCP-subflow ----> [addr0/no endpoint]
> > [endpoint2] ------- subflow 1  -------> [addr0/no endpoint]
> > [endpoint3] ------- subflow 2  -------> [addr0/no endpoint]
> > 
> > if we flush simultaneusly the endpoints on both the server and the
> > client (which is what the failing selftest is currently doing), the
> > number of RM_ADDR generated by the client is unpredictable.
> > 
> > On endpoint flush the server will try to delete all the subflows,
> > regardless of no endpoints attached there. The server can delete the
> > subflow 1 and/or the subflow 2 before the client processes the relevant
> > endpoint on the other side. If that happens, the client will not
> > generate (correctly) the related RM_ADDR.
> 
> I'm not sure that's correct, though. Even if subflows 1 and/or 2 have been 
> closed, if the client has sent an ADD_ADDR to advertise endpoints 2 and 3, 
> it should send RM_ADDR to "unadvertise" them on the remaining open 
> subflow. 

If the peer sent ADD_ADDR for a given address, it will generate the
RM_ADDR, regardless of the related subflow being already closed. We
don't have bugs - al least not shown by the current self-tests failures
- in that situation. 

The criticial scenario is a bit different: the client created the
subflows, but it did not avertize any address with ADD_ADDR (the client
endpoints have the 'SUBFLOW' flag, not the 'SIGNAL' one).
	
If I read correctly the discussion about ADD_ADDR handling, a correct
and simple solution would be sending RM_ADDR only for 'SIGNAL'
endpoints, and update the test-cases accordingly.

It's not clear to me if in case of 'endpoint flush', we should keep
deleting all the subflows - including the ones not tied to 'SIGNAL'
endpoints on any end.

Cheers,

P


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 mptcp-next 2/3] mptcp: more careful RM_ADDR generation
  2022-02-16 18:38           ` Paolo Abeni
@ 2022-02-16 23:48             ` Mat Martineau
  0 siblings, 0 replies; 11+ messages in thread
From: Mat Martineau @ 2022-02-16 23:48 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

On Wed, 16 Feb 2022, Paolo Abeni wrote:

> On Tue, 2022-02-15 at 11:15 -0800, Mat Martineau wrote:
>> On Tue, 15 Feb 2022, Paolo Abeni wrote:
>>
>>> On Mon, 2022-02-14 at 17:19 -0800, Mat Martineau wrote:
>>>> On Mon, 14 Feb 2022, Paolo Abeni wrote:
>>>>
>>>>> On Mon, 2022-02-14 at 16:38 +0100, Paolo Abeni wrote:
>>>>>> In some edge scenarios, an MPTCP subflows can use a local address
>>>>>> mapped by a "implicit" endpoint created by the in-kernel path manager.
>>>>>>
>>>>>> When such endpoint is deleted, the in kernel PM sends a RM_ADDR MPTCP
>>>>>> suboption. That is somewhat unexpected, as an MPTCP listener will keep
>>>>>> accepting incoming subflows targeting such address and the unexpected
>>>>>> options can confuse some self-tests.
>>>>>>
>>>>>> Be more conservative about RM_ADDR generation: explicitly track the
>>>>>> implicit endpoint with an appropriate flag and exclude them from the
>>>>>> RM_ADDR generation.
>>>>>>
>>>>>> Additionally allow the user-space to replace implicit endpoint with
>>>>>> user-provided data at endpoint creation time.
>>>>>>
>>>>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>>>>>
>>>>> The CI still reports an issue on top of this patch:
>>>>>
>>>>> https://cirrus-ci.com/task/5380046008352768?logs=test#L7283
>>>>>
>>>>> Even if the symtom is the same, the root cause is different. I think
>>>>> such failure is caused by the flush operation deleting both all
>>>>> existing endpoints and all the existing subflows for each MPTCP socket.
>>>>>
>>>>> Even when the subflow is not attached to any known, non implicit,
>>>>> endpoint.
>>>>>
>>>>> When we have simult flush on both sides, deleting the subflow (with no
>>>>> endpoint) on one end can make disappear the subflow on the other end
>>>>> (tied to a local endpoint) before the PM could generate the relevant
>>>>> RM_ADDR.
>>>>>
>>>>
>>>> If we aren't closing all subflows during the flush, shouldn't there be at
>>>> least one subflow where the RM_ADDR can be sent?
>>>>
>>>> Maybe mptcp_pm_nl_addr_send_ack() needs to pick a better subflow for
>>>> sending the RM_ADDR. Do you think it would help to make
>>>> mptcp_subflow_send_ack() return a bool, and only return true if the ack
>>>> was sent? Then it could retry the ack on other subflows in the conn_list,
>>>> until it finds one that works or they all fail.
>>>
>>> I'm sorry it looks like I was unclear.
>>>
>>> What I mean is that in the following scenario:
>>>
>>> Client					Server
>>> [endpoint1] ------- MPTCP-subflow ----> [addr0/no endpoint]
>>> [endpoint2] ------- subflow 1  -------> [addr0/no endpoint]
>>> [endpoint3] ------- subflow 2  -------> [addr0/no endpoint]
>>>
>>> if we flush simultaneusly the endpoints on both the server and the
>>> client (which is what the failing selftest is currently doing), the
>>> number of RM_ADDR generated by the client is unpredictable.
>>>
>>> On endpoint flush the server will try to delete all the subflows,
>>> regardless of no endpoints attached there. The server can delete the
>>> subflow 1 and/or the subflow 2 before the client processes the relevant
>>> endpoint on the other side. If that happens, the client will not
>>> generate (correctly) the related RM_ADDR.
>>
>> I'm not sure that's correct, though. Even if subflows 1 and/or 2 have been
>> closed, if the client has sent an ADD_ADDR to advertise endpoints 2 and 3,
>> it should send RM_ADDR to "unadvertise" them on the remaining open
>> subflow.
>
> If the peer sent ADD_ADDR for a given address, it will generate the
> RM_ADDR, regardless of the related subflow being already closed. We
> don't have bugs - al least not shown by the current self-tests failures
> - in that situation.
>
> The criticial scenario is a bit different: the client created the
> subflows, but it did not avertize any address with ADD_ADDR (the client
> endpoints have the 'SUBFLOW' flag, not the 'SIGNAL' one).
>

Ok, that's the part I had missed.

> If I read correctly the discussion about ADD_ADDR handling, a correct
> and simple solution would be sending RM_ADDR only for 'SIGNAL'
> endpoints, and update the test-cases accordingly.
>

Yeah, I think that would work.

> It's not clear to me if in case of 'endpoint flush', we should keep
> deleting all the subflows - including the ones not tied to 'SIGNAL'
> endpoints on any end.
>

I like the idea of separating the "unadvertising" from subflow deletion. 
MPTCP_PM_CMD_FLUSH_ADDRS should continue to act the same as "foreach 
endpoint ID call MPTCP_PM_CMD_DEL_ADDR". Could add a flag to optionally 
delete/keep subflows?

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-02-16 23:48 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-14 15:38 [PATCH v4 mptcp-next 0/3] mptcp: more self-tests improvements Paolo Abeni
2022-02-14 15:38 ` [PATCH v4 mptcp-next 1/3] Squash-to: "mptcp: constify a bunch of helpers" Paolo Abeni
2022-02-15  9:48   ` Matthieu Baerts
2022-02-14 15:38 ` [PATCH v4 mptcp-next 2/3] mptcp: more careful RM_ADDR generation Paolo Abeni
2022-02-14 18:41   ` Paolo Abeni
2022-02-15  1:19     ` Mat Martineau
2022-02-15 17:07       ` Paolo Abeni
2022-02-15 19:15         ` Mat Martineau
2022-02-16 18:38           ` Paolo Abeni
2022-02-16 23:48             ` Mat Martineau
2022-02-14 15:38 ` [PATCH v4 mptcp-next 3/3] mptcp: strict local address ID selection Paolo Abeni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.