All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH mptcp-next v2 00/21] mptcp: support userspace path management
@ 2022-01-12 22:15 Kishen Maloor
  2022-01-12 22:15 ` [PATCH mptcp-next v2 01/21] mptcp: do not restrict subflows with non-kernel PMs Kishen Maloor
                   ` (20 more replies)
  0 siblings, 21 replies; 45+ messages in thread
From: Kishen Maloor @ 2022-01-12 22:15 UTC (permalink / raw)
  To: kishen.maloor, mptcp

This patch series brings together all the required changes to
flexibly manage paths/subflows over MPTCP connections
from path manager implementations running in userspace. Path
management decisions may be made on either end of MPTCP
connections based on state captured through MPTCP netlink events.

The series starts with a set of general fixes and enhancements
in the related kernel code. It is followed with base functionality
and new netlink APIs for handling userspace path management. Further,
it extends the MPTCP self-test framework with the new netlink APIs,
along with the ability to capture MPTCP netlink events to aid in 
functional/behavioral validations. Lastly, it adds a new self-test
script with a suite of test cases covering all the userspace path
management capabilities.

v1 -> v2:
fixed formatting, check for 3rd ACK retransmission only on passive side of
the MPJ handshake, fixed build error when IPv6 is not enabled

Florian Westphal (2):
  mptcp: netlink: split mptcp_pm_parse_addr into two functions
  mptcp: netlink: allow userspace-driven subflow establishment

Kishen Maloor (19):
  mptcp: do not restrict subflows with non-kernel PMs
  mptcp: store remote id from MP_JOIN SYN/ACK in local ctx
  mptcp: reflect remote port (not 0) in ANNOUNCED events
  mptcp: establish subflows from either end of connection
  mptcp: netlink: store per namespace list of refcounted listen socks
  mptcp: netlink: store lsk ref in mptcp_pm_addr_entry
  mptcp: netlink: process IPv6 addrs in creating listening sockets
  mptcp: attempt to add listening sockets for announced addrs
  mptcp: allow ADD_ADDR reissuance by userspace PMs
  mptcp: handle local addrs announced by userspace PMs
  mptcp: read attributes of addr entries managed by userspace PMs
  mptcp: netlink: Add MPTCP_PM_CMD_ANNOUNCE
  mptcp: selftests: support MPTCP_PM_CMD_ANNOUNCE
  mptcp: netlink: Add MPTCP_PM_CMD_REMOVE
  mptcp: selftests: support MPTCP_PM_CMD_REMOVE
  mptcp: selftests: support MPTCP_PM_CMD_SUBFLOW_CREATE
  mptcp: selftests: support MPTCP_PM_CMD_SUBFLOW_DESTROY
  mptcp: selftests: capture netlink events
  selftests: mptcp: functional tests for the userspace PM type

 include/uapi/linux/mptcp.h                    |   7 +
 net/mptcp/options.c                           |   6 +-
 net/mptcp/pm.c                                |  12 +-
 net/mptcp/pm_netlink.c                        | 844 ++++++++++++++++--
 net/mptcp/protocol.c                          |   7 +-
 net/mptcp/protocol.h                          |  12 +-
 net/mptcp/subflow.c                           |   6 +-
 tools/testing/selftests/net/mptcp/pm_nl_ctl.c | 587 +++++++++++-
 .../selftests/net/mptcp/userspace_pm.sh       | 561 ++++++++++++
 9 files changed, 1938 insertions(+), 104 deletions(-)
 create mode 100755 tools/testing/selftests/net/mptcp/userspace_pm.sh


base-commit: f81a8b95bfe9cae8ff02739e3e263d9310422af7
-- 
2.31.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH mptcp-next v2 01/21] mptcp: do not restrict subflows with non-kernel PMs
  2022-01-12 22:15 [PATCH mptcp-next v2 00/21] mptcp: support userspace path management Kishen Maloor
@ 2022-01-12 22:15 ` Kishen Maloor
  2022-01-12 22:15 ` [PATCH mptcp-next v2 02/21] mptcp: store remote id from MP_JOIN SYN/ACK in local ctx Kishen Maloor
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Kishen Maloor @ 2022-01-12 22:15 UTC (permalink / raw)
  To: kishen.maloor, mptcp

All numerical restrictions on # of addresses/subflows
currently apply only to in-kernel PM managed connections.
Thus this change removes limitations on adding new subflows
by non-kernel (e.g. userspace) PMs.

Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
---
 net/mptcp/pm.c      | 7 +++++++
 net/mptcp/subflow.c | 3 ++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
index 1f8878cc29e3..5f35fe8a5e82 100644
--- a/net/mptcp/pm.c
+++ b/net/mptcp/pm.c
@@ -87,6 +87,13 @@ bool mptcp_pm_allow_new_subflow(struct mptcp_sock *msk)
 	unsigned int subflows_max;
 	int ret = 0;
 
+	if (READ_ONCE(pm->pm_type) != MPTCP_PM_TYPE_KERNEL) {
+		spin_lock_bh(&msk->pm.lock);
+		++pm->subflows;
+		spin_unlock_bh(&msk->pm.lock);
+		return true;
+	}
+
 	subflows_max = mptcp_pm_get_subflows_max(msk);
 
 	pr_debug("msk=%p subflows=%d max=%d allow=%d", msk, pm->subflows,
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index fd5fdb639088..b009d0f7fb77 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -62,7 +62,8 @@ static void subflow_generate_hmac(u64 key1, u64 key2, u32 nonce1, u32 nonce2,
 static bool mptcp_can_accept_new_subflow(const struct mptcp_sock *msk)
 {
 	return mptcp_is_fully_established((void *)msk) &&
-	       READ_ONCE(msk->pm.accept_subflow);
+	       ((READ_ONCE(msk->pm.pm_type) != MPTCP_PM_TYPE_KERNEL) ||
+		READ_ONCE(msk->pm.accept_subflow));
 }
 
 /* validate received token and create truncated hmac and nonce for SYN-ACK */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH mptcp-next v2 02/21] mptcp: store remote id from MP_JOIN SYN/ACK in local ctx
  2022-01-12 22:15 [PATCH mptcp-next v2 00/21] mptcp: support userspace path management Kishen Maloor
  2022-01-12 22:15 ` [PATCH mptcp-next v2 01/21] mptcp: do not restrict subflows with non-kernel PMs Kishen Maloor
@ 2022-01-12 22:15 ` Kishen Maloor
  2022-01-12 22:15 ` [PATCH mptcp-next v2 03/21] mptcp: reflect remote port (not 0) in ANNOUNCED events Kishen Maloor
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Kishen Maloor @ 2022-01-12 22:15 UTC (permalink / raw)
  To: kishen.maloor, mptcp

This change reads the addr id assigned to the remote endpoint
of a subflow from the MP_JOIN SYN/ACK message and stores it
in the related subflow context. The remote id was not being
captured prior to this change, and will now provide a consistent
view of remote endpoints and their ids as seen through netlink
events.

Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
---
 net/mptcp/subflow.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index b009d0f7fb77..b75b7b186d34 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -444,6 +444,7 @@ static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb)
 		subflow->backup = mp_opt.backup;
 		subflow->thmac = mp_opt.thmac;
 		subflow->remote_nonce = mp_opt.nonce;
+		subflow->remote_id = mp_opt.join_id;
 		pr_debug("subflow=%p, thmac=%llu, remote_nonce=%u backup=%d",
 			 subflow, subflow->thmac, subflow->remote_nonce,
 			 subflow->backup);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH mptcp-next v2 03/21] mptcp: reflect remote port (not 0) in ANNOUNCED events
  2022-01-12 22:15 [PATCH mptcp-next v2 00/21] mptcp: support userspace path management Kishen Maloor
  2022-01-12 22:15 ` [PATCH mptcp-next v2 01/21] mptcp: do not restrict subflows with non-kernel PMs Kishen Maloor
  2022-01-12 22:15 ` [PATCH mptcp-next v2 02/21] mptcp: store remote id from MP_JOIN SYN/ACK in local ctx Kishen Maloor
@ 2022-01-12 22:15 ` Kishen Maloor
  2022-01-12 22:15 ` [PATCH mptcp-next v2 04/21] mptcp: establish subflows from either end of connection Kishen Maloor
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Kishen Maloor @ 2022-01-12 22:15 UTC (permalink / raw)
  To: kishen.maloor, mptcp

Per RFC 8684, if no port is specified in an ADD_ADDR message, MPTCP
SHOULD attempt to connect to the specified address on the same port
as the port that is already in use by the subflow on which the
ADD_ADDR signal was sent.

To facilitate that, this change reflects the specific remote port in
use by that subflow in MPTCP_EVENT_ANNOUNCED events.

Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
---
 net/mptcp/options.c    | 2 +-
 net/mptcp/pm.c         | 5 +++--
 net/mptcp/pm_netlink.c | 8 ++++++--
 net/mptcp/protocol.h   | 6 ++++--
 4 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 7a6a39b71633..cceba8c7806d 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -1123,7 +1123,7 @@ bool mptcp_incoming_options(struct sock *sk, struct sk_buff *skb)
 		if ((mp_opt.suboptions & OPTION_MPTCP_ADD_ADDR) &&
 		    add_addr_hmac_valid(msk, &mp_opt)) {
 			if (!mp_opt.echo) {
-				mptcp_pm_add_addr_received(msk, &mp_opt.addr);
+				mptcp_pm_add_addr_received(msk, &mp_opt.addr, sk);
 				MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_ADDADDR);
 			} else {
 				mptcp_pm_add_addr_echoed(msk, &mp_opt.addr);
diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
index 5f35fe8a5e82..db889ff60326 100644
--- a/net/mptcp/pm.c
+++ b/net/mptcp/pm.c
@@ -204,14 +204,15 @@ void mptcp_pm_subflow_check_next(struct mptcp_sock *msk, const struct sock *ssk,
 }
 
 void mptcp_pm_add_addr_received(struct mptcp_sock *msk,
-				const struct mptcp_addr_info *addr)
+				const struct mptcp_addr_info *addr,
+				const struct sock *ssk)
 {
 	struct mptcp_pm_data *pm = &msk->pm;
 
 	pr_debug("msk=%p remote_id=%d accept=%d", msk, addr->id,
 		 READ_ONCE(pm->accept_addr));
 
-	mptcp_event_addr_announced(msk, addr);
+	mptcp_event_addr_announced(msk, addr, ssk);
 
 	spin_lock_bh(&pm->lock);
 
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index f12effa71942..fc07ab9a53ba 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -1933,7 +1933,8 @@ void mptcp_event_addr_removed(const struct mptcp_sock *msk, uint8_t id)
 }
 
 void mptcp_event_addr_announced(const struct mptcp_sock *msk,
-				const struct mptcp_addr_info *info)
+				const struct mptcp_addr_info *info,
+				const struct sock *ssk)
 {
 	struct net *net = sock_net((const struct sock *)msk);
 	struct nlmsghdr *nlh;
@@ -1957,7 +1958,10 @@ void mptcp_event_addr_announced(const struct mptcp_sock *msk,
 	if (nla_put_u8(skb, MPTCP_ATTR_REM_ID, info->id))
 		goto nla_put_failure;
 
-	if (nla_put_be16(skb, MPTCP_ATTR_DPORT, info->port))
+	if (nla_put_be16(skb, MPTCP_ATTR_DPORT,
+			 info->port  == 0 ?
+			 ((struct inet_sock *)inet_sk(ssk))->inet_dport :
+			 info->port))
 		goto nla_put_failure;
 
 	switch (info->family) {
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index d1b46c0d8c40..e2a67d3469f6 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -752,7 +752,8 @@ bool mptcp_pm_nl_check_work_pending(struct mptcp_sock *msk);
 void mptcp_pm_subflow_check_next(struct mptcp_sock *msk, const struct sock *ssk,
 				 const struct mptcp_subflow_context *subflow);
 void mptcp_pm_add_addr_received(struct mptcp_sock *msk,
-				const struct mptcp_addr_info *addr);
+				const struct mptcp_addr_info *addr,
+				const struct sock *ssk);
 void mptcp_pm_add_addr_echoed(struct mptcp_sock *msk,
 			      struct mptcp_addr_info *addr);
 void mptcp_pm_add_addr_send_ack(struct mptcp_sock *msk);
@@ -780,7 +781,8 @@ int mptcp_pm_remove_subflow(struct mptcp_sock *msk, const struct mptcp_rm_list *
 
 void mptcp_event(enum mptcp_event_type type, const struct mptcp_sock *msk,
 		 const struct sock *ssk, gfp_t gfp);
-void mptcp_event_addr_announced(const struct mptcp_sock *msk, const struct mptcp_addr_info *info);
+void mptcp_event_addr_announced(const struct mptcp_sock *msk, const struct mptcp_addr_info *info,
+				const struct sock *ssk);
 void mptcp_event_addr_removed(const struct mptcp_sock *msk, u8 id);
 
 static inline bool mptcp_pm_should_add_signal(struct mptcp_sock *msk)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH mptcp-next v2 04/21] mptcp: establish subflows from either end of connection
  2022-01-12 22:15 [PATCH mptcp-next v2 00/21] mptcp: support userspace path management Kishen Maloor
                   ` (2 preceding siblings ...)
  2022-01-12 22:15 ` [PATCH mptcp-next v2 03/21] mptcp: reflect remote port (not 0) in ANNOUNCED events Kishen Maloor
@ 2022-01-12 22:15 ` Kishen Maloor
  2022-01-14 22:43   ` Mat Martineau
  2022-01-12 22:15 ` [PATCH mptcp-next v2 05/21] mptcp: netlink: store per namespace list of refcounted listen socks Kishen Maloor
                   ` (16 subsequent siblings)
  20 siblings, 1 reply; 45+ messages in thread
From: Kishen Maloor @ 2022-01-12 22:15 UTC (permalink / raw)
  To: kishen.maloor, mptcp

This change updates internal logic to permit subflows to be
established from either the client or server ends of MPTCP
connections. This symmetry and added flexibility may be
harnessed by PM implementations running on either end in
creating new subflows.

The essence of this change lies in not relying on the
"server_side" flag (which continues to be available if needed).

v2: check for 3rd ACK retransmission only on passive side
of the MPJ handshake

Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
---
 net/mptcp/options.c  | 2 +-
 net/mptcp/protocol.c | 5 +----
 net/mptcp/protocol.h | 2 --
 3 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index cceba8c7806d..89808816db06 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -921,7 +921,7 @@ static bool check_fully_established(struct mptcp_sock *msk, struct sock *ssk,
 		if (TCP_SKB_CB(skb)->seq == subflow->ssn_offset + 1 &&
 		    TCP_SKB_CB(skb)->end_seq == TCP_SKB_CB(skb)->seq &&
 		    subflow->mp_join && (mp_opt->suboptions & OPTIONS_MPTCP_MPJ) &&
-		    READ_ONCE(msk->pm.server_side))
+		    !subflow->request_join)
 			tcp_send_ack(ssk);
 		goto fully_established;
 	}
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 4012f844eec1..408a05bff633 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -3248,15 +3248,12 @@ bool mptcp_finish_join(struct sock *ssk)
 		return false;
 	}
 
-	if (!msk->pm.server_side)
+	if (!list_empty(&subflow->node))
 		goto out;
 
 	if (!mptcp_pm_allow_new_subflow(msk))
 		goto err_prohibited;
 
-	if (WARN_ON_ONCE(!list_empty(&subflow->node)))
-		goto err_prohibited;
-
 	/* active connections are already on conn_list.
 	 * If we can't acquire msk socket lock here, let the release callback
 	 * handle it
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index e2a67d3469f6..c50247673c7e 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -908,10 +908,8 @@ static inline bool mptcp_check_infinite_map(struct sk_buff *skb)
 static inline bool subflow_simultaneous_connect(struct sock *sk)
 {
 	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
-	struct sock *parent = subflow->conn;
 
 	return sk->sk_state == TCP_ESTABLISHED &&
-	       !mptcp_sk(parent)->pm.server_side &&
 	       !subflow->conn_finished;
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH mptcp-next v2 05/21] mptcp: netlink: store per namespace list of refcounted listen socks
  2022-01-12 22:15 [PATCH mptcp-next v2 00/21] mptcp: support userspace path management Kishen Maloor
                   ` (3 preceding siblings ...)
  2022-01-12 22:15 ` [PATCH mptcp-next v2 04/21] mptcp: establish subflows from either end of connection Kishen Maloor
@ 2022-01-12 22:15 ` Kishen Maloor
  2022-01-12 22:15 ` [PATCH mptcp-next v2 06/21] mptcp: netlink: store lsk ref in mptcp_pm_addr_entry Kishen Maloor
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Kishen Maloor @ 2022-01-12 22:15 UTC (permalink / raw)
  To: kishen.maloor, mptcp

The kernel maintains listening sockets bound to announced addresses
via the ADD_ADDR option to be able to receive MP_JOIN requests. Path
managers may choose to advertise the same address over multiple
MPTCP connections. So this change provides a simple framework to
manage a list of all distinct listning sockets created in a
namespace by encapsulating it in a structure that is ref counted
and can be shared across multiple connections. The sockets (and
their enclosing structure) are released when there are no more
references.

v2: fixed formatting

Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
---
 net/mptcp/pm_netlink.c | 76 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 76 insertions(+)

diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index fc07ab9a53ba..4ad43310d50d 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -22,6 +22,14 @@ static struct genl_family mptcp_genl_family;
 
 static int pm_nl_pernet_id;
 
+struct mptcp_local_lsk {
+	struct list_head	list;
+	struct mptcp_addr_info	addr;
+	struct socket		*lsk;
+	struct rcu_head		rcu;
+	refcount_t		refcount;
+};
+
 struct mptcp_pm_addr_entry {
 	struct list_head	list;
 	struct mptcp_addr_info	addr;
@@ -41,7 +49,10 @@ struct mptcp_pm_add_entry {
 struct pm_nl_pernet {
 	/* protects pernet updates */
 	spinlock_t		lock;
+	/* protects access to pernet lsk list */
+	spinlock_t		lsk_list_lock;
 	struct list_head	local_addr_list;
+	struct list_head	lsk_list;
 	unsigned int		addrs;
 	unsigned int		stale_loss_cnt;
 	unsigned int		add_addr_signal_max;
@@ -83,6 +94,69 @@ static bool addresses_equal(const struct mptcp_addr_info *a,
 	return a->port == b->port;
 }
 
+static struct mptcp_local_lsk *lsk_list_find(struct pm_nl_pernet *pernet,
+					     struct mptcp_addr_info *addr)
+{
+	struct mptcp_local_lsk *lsk_ref = NULL;
+	struct mptcp_local_lsk *i;
+
+	rcu_read_lock();
+
+	list_for_each_entry_rcu(i, &pernet->lsk_list, list) {
+		if (addresses_equal(&i->addr, addr, true)) {
+			if (refcount_inc_not_zero(&i->refcount)) {
+				lsk_ref = i;
+				break;
+			}
+		}
+	}
+
+	rcu_read_unlock();
+
+	return lsk_ref;
+}
+
+static void lsk_list_add_ref(struct mptcp_local_lsk *lsk_ref)
+{
+	refcount_inc(&lsk_ref->refcount);
+}
+
+static struct mptcp_local_lsk *lsk_list_add(struct pm_nl_pernet *pernet,
+					    struct mptcp_addr_info *addr,
+					    struct socket *lsk)
+{
+	struct mptcp_local_lsk *lsk_ref;
+
+	lsk_ref = kmalloc(sizeof(*lsk_ref), GFP_ATOMIC);
+
+	if (!lsk_ref)
+		return NULL;
+
+	lsk_ref->lsk = lsk;
+	memcpy(&lsk_ref->addr, addr, sizeof(struct mptcp_addr_info));
+	refcount_set(&lsk_ref->refcount, 1);
+
+	spin_lock_bh(&pernet->lsk_list_lock);
+	list_add_rcu(&lsk_ref->list, &pernet->lsk_list);
+	spin_unlock_bh(&pernet->lsk_list_lock);
+
+	return lsk_ref;
+}
+
+static void lsk_list_release(struct pm_nl_pernet *pernet,
+			     struct mptcp_local_lsk *lsk_ref)
+{
+	if (lsk_ref && refcount_dec_and_test(&lsk_ref->refcount)) {
+		sock_release(lsk_ref->lsk);
+
+		spin_lock_bh(&pernet->lsk_list_lock);
+		list_del_rcu(&lsk_ref->list);
+		spin_unlock_bh(&pernet->lsk_list_lock);
+
+		kfree_rcu(lsk_ref, rcu);
+	}
+}
+
 static bool address_zero(const struct mptcp_addr_info *addr)
 {
 	struct mptcp_addr_info zero;
@@ -2098,12 +2172,14 @@ static int __net_init pm_nl_init_net(struct net *net)
 	struct pm_nl_pernet *pernet = net_generic(net, pm_nl_pernet_id);
 
 	INIT_LIST_HEAD_RCU(&pernet->local_addr_list);
+	INIT_LIST_HEAD_RCU(&pernet->lsk_list);
 
 	/* Cit. 2 subflows ought to be enough for anybody. */
 	pernet->subflows_max = 2;
 	pernet->next_id = 1;
 	pernet->stale_loss_cnt = 4;
 	spin_lock_init(&pernet->lock);
+	spin_lock_init(&pernet->lsk_list_lock);
 
 	/* No need to initialize other pernet fields, the struct is zeroed at
 	 * allocation time.
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH mptcp-next v2 06/21] mptcp: netlink: store lsk ref in mptcp_pm_addr_entry
  2022-01-12 22:15 [PATCH mptcp-next v2 00/21] mptcp: support userspace path management Kishen Maloor
                   ` (4 preceding siblings ...)
  2022-01-12 22:15 ` [PATCH mptcp-next v2 05/21] mptcp: netlink: store per namespace list of refcounted listen socks Kishen Maloor
@ 2022-01-12 22:15 ` Kishen Maloor
  2022-01-12 22:15 ` [PATCH mptcp-next v2 07/21] mptcp: netlink: process IPv6 addrs in creating listening sockets Kishen Maloor
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Kishen Maloor @ 2022-01-12 22:15 UTC (permalink / raw)
  To: kishen.maloor, mptcp

This change updates struct mptcp_pm_addr_entry to store a
listening socket (lsk) reference, i.e. a pointer to a reference
counted structure containing the lsk (struct socket *) instead
of the lsk itself. Code blocks that directly operated on
the lsk in struct mptcp_pm_addr_entry have been updated to work
with the lsk ref instead, utilizing the new helper functions that
operate on lsk refs.

v2: fixed formatting

Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
---
 net/mptcp/pm_netlink.c | 62 ++++++++++++++++++++++++++++--------------
 1 file changed, 41 insertions(+), 21 deletions(-)

diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 4ad43310d50d..4c1895dbc2a5 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -35,7 +35,7 @@ struct mptcp_pm_addr_entry {
 	struct mptcp_addr_info	addr;
 	u8			flags;
 	int			ifindex;
-	struct socket		*lsk;
+	struct mptcp_local_lsk	*lsk_ref;
 };
 
 struct mptcp_pm_add_entry {
@@ -983,7 +983,8 @@ static int mptcp_pm_nl_append_new_local_addr(struct pm_nl_pernet *pernet,
 }
 
 static int mptcp_pm_nl_create_listen_socket(struct sock *sk,
-					    struct mptcp_pm_addr_entry *entry)
+					    struct mptcp_pm_addr_entry *entry,
+					    struct socket **lsk)
 {
 	struct sockaddr_storage addr;
 	struct mptcp_sock *msk;
@@ -992,11 +993,11 @@ static int mptcp_pm_nl_create_listen_socket(struct sock *sk,
 	int err;
 
 	err = sock_create_kern(sock_net(sk), entry->addr.family,
-			       SOCK_STREAM, IPPROTO_MPTCP, &entry->lsk);
+			       SOCK_STREAM, IPPROTO_MPTCP, lsk);
 	if (err)
 		return err;
 
-	msk = mptcp_sk(entry->lsk->sk);
+	msk = mptcp_sk((*lsk)->sk);
 	if (!msk) {
 		err = -EINVAL;
 		goto out;
@@ -1025,7 +1026,8 @@ static int mptcp_pm_nl_create_listen_socket(struct sock *sk,
 	return 0;
 
 out:
-	sock_release(entry->lsk);
+	sock_release(*lsk);
+	*lsk = NULL;
 	return err;
 }
 
@@ -1074,7 +1076,7 @@ int mptcp_pm_nl_get_local_id(struct mptcp_sock *msk, struct sock_common *skc)
 	entry->addr.port = 0;
 	entry->ifindex = 0;
 	entry->flags = 0;
-	entry->lsk = NULL;
+	entry->lsk_ref = NULL;
 	ret = mptcp_pm_nl_append_new_local_addr(pernet, entry);
 	if (ret < 0)
 		kfree(entry);
@@ -1270,6 +1272,7 @@ static int mptcp_nl_cmd_add_addr(struct sk_buff *skb, struct genl_info *info)
 	struct nlattr *attr = info->attrs[MPTCP_PM_ATTR_ADDR];
 	struct pm_nl_pernet *pernet = genl_info_pm_nl(info);
 	struct mptcp_pm_addr_entry addr, *entry;
+	struct socket *lsk;
 	int ret;
 
 	ret = mptcp_pm_parse_addr(attr, info, true, &addr);
@@ -1284,18 +1287,34 @@ static int mptcp_nl_cmd_add_addr(struct sk_buff *skb, struct genl_info *info)
 
 	*entry = addr;
 	if (entry->addr.port) {
-		ret = mptcp_pm_nl_create_listen_socket(skb->sk, entry);
-		if (ret) {
-			GENL_SET_ERR_MSG(info, "create listen socket error");
-			kfree(entry);
-			return ret;
+		entry->lsk_ref = lsk_list_find(pernet, &entry->addr);
+
+		if (!entry->lsk_ref) {
+			ret = mptcp_pm_nl_create_listen_socket(skb->sk, entry, &lsk);
+
+			if (ret) {
+				GENL_SET_ERR_MSG(info, "create listen socket error");
+				kfree(entry);
+				return ret;
+			}
+
+			entry->lsk_ref = lsk_list_add(pernet, &entry->addr, lsk);
+
+			if (!entry->lsk_ref) {
+				GENL_SET_ERR_MSG(info, "can't allocate lsk ref");
+				sock_release(lsk);
+				kfree(entry);
+				return -ENOMEM;
+			}
 		}
 	}
+
 	ret = mptcp_pm_nl_append_new_local_addr(pernet, entry);
+
 	if (ret < 0) {
 		GENL_SET_ERR_MSG(info, "too many addresses or duplicate one");
-		if (entry->lsk)
-			sock_release(entry->lsk);
+		if (entry->lsk_ref)
+			lsk_list_release(pernet, entry->lsk_ref);
 		kfree(entry);
 		return ret;
 	}
@@ -1398,10 +1417,11 @@ static int mptcp_nl_remove_subflow_and_signal_addr(struct net *net,
 }
 
 /* caller must ensure the RCU grace period is already elapsed */
-static void __mptcp_pm_release_addr_entry(struct mptcp_pm_addr_entry *entry)
+static void __mptcp_pm_release_addr_entry(struct pm_nl_pernet *pernet,
+					  struct mptcp_pm_addr_entry *entry)
 {
-	if (entry->lsk)
-		sock_release(entry->lsk);
+	if (entry->lsk_ref)
+		lsk_list_release(pernet, entry->lsk_ref);
 	kfree(entry);
 }
 
@@ -1483,7 +1503,7 @@ static int mptcp_nl_cmd_del_addr(struct sk_buff *skb, struct genl_info *info)
 
 	mptcp_nl_remove_subflow_and_signal_addr(sock_net(skb->sk), &entry->addr);
 	synchronize_rcu();
-	__mptcp_pm_release_addr_entry(entry);
+	__mptcp_pm_release_addr_entry(pernet, entry);
 
 	return ret;
 }
@@ -1539,7 +1559,7 @@ static void mptcp_nl_remove_addrs_list(struct net *net,
 }
 
 /* caller must ensure the RCU grace period is already elapsed */
-static void __flush_addrs(struct list_head *list)
+static void __flush_addrs(struct pm_nl_pernet *pernet, struct list_head *list)
 {
 	while (!list_empty(list)) {
 		struct mptcp_pm_addr_entry *cur;
@@ -1547,7 +1567,7 @@ static void __flush_addrs(struct list_head *list)
 		cur = list_entry(list->next,
 				 struct mptcp_pm_addr_entry, list);
 		list_del_rcu(&cur->list);
-		__mptcp_pm_release_addr_entry(cur);
+		__mptcp_pm_release_addr_entry(pernet, cur);
 	}
 }
 
@@ -1572,7 +1592,7 @@ static int mptcp_nl_cmd_flush_addrs(struct sk_buff *skb, struct genl_info *info)
 	spin_unlock_bh(&pernet->lock);
 	mptcp_nl_remove_addrs_list(sock_net(skb->sk), &free_list);
 	synchronize_rcu();
-	__flush_addrs(&free_list);
+	__flush_addrs(pernet, &free_list);
 	return 0;
 }
 
@@ -2199,7 +2219,7 @@ static void __net_exit pm_nl_exit_net(struct list_head *net_list)
 		 * other modifiers, also netns core already waited for a
 		 * RCU grace period.
 		 */
-		__flush_addrs(&pernet->local_addr_list);
+		__flush_addrs(pernet, &pernet->local_addr_list);
 	}
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH mptcp-next v2 07/21] mptcp: netlink: process IPv6 addrs in creating listening sockets
  2022-01-12 22:15 [PATCH mptcp-next v2 00/21] mptcp: support userspace path management Kishen Maloor
                   ` (5 preceding siblings ...)
  2022-01-12 22:15 ` [PATCH mptcp-next v2 06/21] mptcp: netlink: store lsk ref in mptcp_pm_addr_entry Kishen Maloor
@ 2022-01-12 22:15 ` Kishen Maloor
  2022-01-14 15:27   ` Paolo Abeni
  2022-01-12 22:15 ` [PATCH mptcp-next v2 08/21] mptcp: attempt to add listening sockets for announced addrs Kishen Maloor
                   ` (13 subsequent siblings)
  20 siblings, 1 reply; 45+ messages in thread
From: Kishen Maloor @ 2022-01-12 22:15 UTC (permalink / raw)
  To: kishen.maloor, mptcp

This change updates mptcp_pm_nl_create_listen_socket() to create
listening sockets bound to IPv6 addresses (where IPv6 is supported).

Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
---
 net/mptcp/pm_netlink.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 4c1895dbc2a5..779ec9d375f0 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -986,6 +986,7 @@ static int mptcp_pm_nl_create_listen_socket(struct sock *sk,
 					    struct mptcp_pm_addr_entry *entry,
 					    struct socket **lsk)
 {
+	int addrlen = sizeof(struct sockaddr_in);
 	struct sockaddr_storage addr;
 	struct mptcp_sock *msk;
 	struct socket *ssock;
@@ -1010,8 +1011,11 @@ static int mptcp_pm_nl_create_listen_socket(struct sock *sk,
 	}
 
 	mptcp_info2sockaddr(&entry->addr, &addr, entry->addr.family);
-	err = kernel_bind(ssock, (struct sockaddr *)&addr,
-			  sizeof(struct sockaddr_in));
+#if IS_ENABLED(CONFIG_MPTCP_IPV6)
+	if (entry->addr.family == AF_INET6)
+		addrlen = sizeof(struct sockaddr_in6);
+#endif
+	err = kernel_bind(ssock, (struct sockaddr *)&addr, addrlen);
 	if (err) {
 		pr_warn("kernel_bind error, err=%d", err);
 		goto out;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH mptcp-next v2 08/21] mptcp: attempt to add listening sockets for announced addrs
  2022-01-12 22:15 [PATCH mptcp-next v2 00/21] mptcp: support userspace path management Kishen Maloor
                   ` (6 preceding siblings ...)
  2022-01-12 22:15 ` [PATCH mptcp-next v2 07/21] mptcp: netlink: process IPv6 addrs in creating listening sockets Kishen Maloor
@ 2022-01-12 22:15 ` Kishen Maloor
  2022-01-14 15:54   ` Paolo Abeni
  2022-01-12 22:15 ` [PATCH mptcp-next v2 09/21] mptcp: allow ADD_ADDR reissuance by userspace PMs Kishen Maloor
                   ` (12 subsequent siblings)
  20 siblings, 1 reply; 45+ messages in thread
From: Kishen Maloor @ 2022-01-12 22:15 UTC (permalink / raw)
  To: kishen.maloor, mptcp

When ADD_ADDR announcements use the port associated with an
active subflow, this change ensures that a listening socket is
bound to the announced address and port for subsequently
receiving MP_JOINs from the remote end. In case there's
a recorded lsk bound to that address+port, it is reused.
But if a listening socket for this address is already held by the
application then no further action is taken.

When a listening socket is created, it is stored in
struct mptcp_pm_add_entry and released accordingly.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/203

v2: fixed formatting

Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
---
 net/mptcp/pm_netlink.c | 56 ++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 54 insertions(+), 2 deletions(-)

diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 779ec9d375f0..e2211f3b8c8c 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -43,6 +43,7 @@ struct mptcp_pm_add_entry {
 	struct mptcp_addr_info	addr;
 	struct timer_list	add_timer;
 	struct mptcp_sock	*sock;
+	struct mptcp_local_lsk	*lsk_ref;
 	u8			retrans_times;
 };
 
@@ -66,6 +67,10 @@ struct pm_nl_pernet {
 #define MPTCP_PM_ADDR_MAX	8
 #define ADD_ADDR_RETRANS_MAX	3
 
+static int mptcp_pm_nl_create_listen_socket(struct sock *sk,
+					    struct mptcp_pm_addr_entry *entry,
+					    struct socket **lsk);
+
 static bool addresses_equal(const struct mptcp_addr_info *a,
 			    const struct mptcp_addr_info *b, bool use_port)
 {
@@ -438,7 +443,8 @@ mptcp_pm_del_add_timer(struct mptcp_sock *msk,
 }
 
 static bool mptcp_pm_alloc_anno_list(struct mptcp_sock *msk,
-				     struct mptcp_pm_addr_entry *entry)
+				     struct mptcp_pm_addr_entry *entry,
+				     struct mptcp_local_lsk *lsk_ref)
 {
 	struct mptcp_pm_add_entry *add_entry = NULL;
 	struct sock *sk = (struct sock *)msk;
@@ -458,6 +464,10 @@ static bool mptcp_pm_alloc_anno_list(struct mptcp_sock *msk,
 	add_entry->addr = entry->addr;
 	add_entry->sock = msk;
 	add_entry->retrans_times = 0;
+	add_entry->lsk_ref = lsk_ref;
+
+	if (lsk_ref)
+		lsk_list_add_ref(lsk_ref);
 
 	timer_setup(&add_entry->add_timer, mptcp_pm_add_timer, 0);
 	sk_reset_timer(sk, &add_entry->add_timer,
@@ -470,8 +480,11 @@ void mptcp_pm_free_anno_list(struct mptcp_sock *msk)
 {
 	struct mptcp_pm_add_entry *entry, *tmp;
 	struct sock *sk = (struct sock *)msk;
+	struct pm_nl_pernet *pernet;
 	LIST_HEAD(free_list);
 
+	pernet = net_generic(sock_net(sk), pm_nl_pernet_id);
+
 	pr_debug("msk=%p", msk);
 
 	spin_lock_bh(&msk->pm.lock);
@@ -480,6 +493,8 @@ void mptcp_pm_free_anno_list(struct mptcp_sock *msk)
 
 	list_for_each_entry_safe(entry, tmp, &free_list, list) {
 		sk_stop_timer_sync(sk, &entry->add_timer);
+		if (entry->lsk_ref)
+			lsk_list_release(pernet, entry->lsk_ref);
 		kfree(entry);
 	}
 }
@@ -570,13 +585,16 @@ lookup_id_by_addr(struct pm_nl_pernet *pernet, const struct mptcp_addr_info *add
 }
 
 static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
+	__must_hold(&msk->pm.lock)
 {
+	struct mptcp_local_lsk *lsk_ref = NULL;
 	struct sock *sk = (struct sock *)msk;
 	struct mptcp_pm_addr_entry *local;
 	unsigned int add_addr_signal_max;
 	unsigned int local_addr_max;
 	struct pm_nl_pernet *pernet;
 	unsigned int subflows_max;
+	struct socket *lsk;
 
 	pernet = net_generic(sock_net(sk), pm_nl_pernet_id);
 
@@ -607,12 +625,39 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
 		local = select_signal_address(pernet, msk);
 
 		if (local) {
-			if (mptcp_pm_alloc_anno_list(msk, local)) {
+			if (!local->addr.port) {
+				local->addr.port =
+					((struct inet_sock *)inet_sk
+					 ((struct sock *)msk))->inet_sport;
+
+				lsk_ref = lsk_list_find(pernet, &local->addr);
+
+				if (!lsk_ref) {
+					spin_unlock_bh(&msk->pm.lock);
+
+					mptcp_pm_nl_create_listen_socket(sk, local, &lsk);
+
+					spin_lock_bh(&msk->pm.lock);
+
+					if (lsk)
+						lsk_ref = lsk_list_add(pernet, &local->addr, lsk);
+
+					if (lsk && !lsk_ref)
+						sock_release(lsk);
+				}
+
+				local->addr.port = 0;
+			}
+
+			if (mptcp_pm_alloc_anno_list(msk, local, lsk_ref)) {
 				__clear_bit(local->addr.id, msk->pm.id_avail_bitmap);
 				msk->pm.add_addr_signaled++;
 				mptcp_pm_announce_addr(msk, &local->addr, false);
 				mptcp_pm_nl_addr_send_ack(msk);
 			}
+
+			if (lsk_ref)
+				lsk_list_release(pernet, lsk_ref);
 		}
 	}
 
@@ -704,6 +749,7 @@ static unsigned int fill_local_addresses_vec(struct mptcp_sock *msk,
 }
 
 static void mptcp_pm_nl_add_addr_received(struct mptcp_sock *msk)
+	__must_hold(&msk->pm.lock)
 {
 	struct mptcp_addr_info addrs[MPTCP_PM_ADDR_MAX];
 	struct sock *sk = (struct sock *)msk;
@@ -1352,11 +1398,17 @@ int mptcp_pm_get_flags_and_ifindex_by_id(struct net *net, unsigned int id,
 static bool remove_anno_list_by_saddr(struct mptcp_sock *msk,
 				      struct mptcp_addr_info *addr)
 {
+	struct sock *sk = (struct sock *)msk;
 	struct mptcp_pm_add_entry *entry;
+	struct pm_nl_pernet *pernet;
+
+	pernet = net_generic(sock_net(sk), pm_nl_pernet_id);
 
 	entry = mptcp_pm_del_add_timer(msk, addr, false);
 	if (entry) {
 		list_del(&entry->list);
+		if (entry->lsk_ref)
+			lsk_list_release(pernet, entry->lsk_ref);
 		kfree(entry);
 		return true;
 	}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH mptcp-next v2 09/21] mptcp: allow ADD_ADDR reissuance by userspace PMs
  2022-01-12 22:15 [PATCH mptcp-next v2 00/21] mptcp: support userspace path management Kishen Maloor
                   ` (7 preceding siblings ...)
  2022-01-12 22:15 ` [PATCH mptcp-next v2 08/21] mptcp: attempt to add listening sockets for announced addrs Kishen Maloor
@ 2022-01-12 22:15 ` Kishen Maloor
  2022-01-12 22:15 ` [PATCH mptcp-next v2 10/21] mptcp: handle local addrs announced " Kishen Maloor
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Kishen Maloor @ 2022-01-12 22:15 UTC (permalink / raw)
  To: kishen.maloor, mptcp

This change allows userspace PM implementations to reissue ADD_ADDR
announcements (if necessary) based on their chosen policy.

Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
---
 net/mptcp/pm_netlink.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index e2211f3b8c8c..a8c9a89ee1c5 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -452,8 +452,16 @@ static bool mptcp_pm_alloc_anno_list(struct mptcp_sock *msk,
 
 	lockdep_assert_held(&msk->pm.lock);
 
-	if (mptcp_lookup_anno_list_by_saddr(msk, &entry->addr))
-		return false;
+	add_entry = mptcp_lookup_anno_list_by_saddr(msk, &entry->addr);
+
+	if (add_entry) {
+		if (READ_ONCE(msk->pm.pm_type) == MPTCP_PM_TYPE_KERNEL)
+			return false;
+
+		sk_reset_timer(sk, &add_entry->add_timer,
+			       jiffies + mptcp_get_add_addr_timeout(net));
+		return true;
+	}
 
 	add_entry = kmalloc(sizeof(*add_entry), GFP_ATOMIC);
 	if (!add_entry)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH mptcp-next v2 10/21] mptcp: handle local addrs announced by userspace PMs
  2022-01-12 22:15 [PATCH mptcp-next v2 00/21] mptcp: support userspace path management Kishen Maloor
                   ` (8 preceding siblings ...)
  2022-01-12 22:15 ` [PATCH mptcp-next v2 09/21] mptcp: allow ADD_ADDR reissuance by userspace PMs Kishen Maloor
@ 2022-01-12 22:15 ` Kishen Maloor
  2022-01-14 17:11   ` Paolo Abeni
  2022-01-12 22:15 ` [PATCH mptcp-next v2 11/21] mptcp: read attributes of addr entries managed " Kishen Maloor
                   ` (10 subsequent siblings)
  20 siblings, 1 reply; 45+ messages in thread
From: Kishen Maloor @ 2022-01-12 22:15 UTC (permalink / raw)
  To: kishen.maloor, mptcp

This change adds a new internal function to store/retrieve local
addrs announced by userspace PM implementations from the kernel
context. The function does not stipulate any limitation on the #
of addrs, and handles the requirements of three scenarios:
1) ADD_ADDR announcements (which require that a local id be
provided), 2) retrieving the local id associated with an address,
also where one may need to be assigned, and 3) reissuance of
ADD_ADDRs when there's a successful match of addr/id.

The list of all stored local addr entries is held under the
MPTCP sock structure. This list, if not released by the REMOVE_ADDR
flow is freed while the sock is destructed.

Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
---
 net/mptcp/pm_netlink.c | 79 ++++++++++++++++++++++++++++++++++++++++++
 net/mptcp/protocol.c   |  2 ++
 net/mptcp/protocol.h   |  2 ++
 3 files changed, 83 insertions(+)

diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index a8c9a89ee1c5..052a803a7f71 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -484,6 +484,31 @@ static bool mptcp_pm_alloc_anno_list(struct mptcp_sock *msk,
 	return true;
 }
 
+void mptcp_free_local_addr_list(struct mptcp_sock *msk)
+{
+	struct mptcp_pm_addr_entry *entry, *tmp;
+	struct sock *sk = (struct sock *)msk;
+	struct pm_nl_pernet *pernet;
+	LIST_HEAD(free_list);
+
+	if (READ_ONCE(msk->pm.pm_type) == MPTCP_PM_TYPE_KERNEL)
+		return;
+
+	pernet = net_generic(sock_net(sk), pm_nl_pernet_id);
+
+	pr_debug("msk=%p", msk);
+
+	mptcp_data_lock(sk);
+	list_splice_init(&msk->local_addr_list, &free_list);
+	mptcp_data_unlock(sk);
+
+	list_for_each_entry_safe(entry, tmp, &free_list, list) {
+		if (entry->lsk_ref)
+			lsk_list_release(pernet, entry->lsk_ref);
+		kfree(entry);
+	}
+}
+
 void mptcp_pm_free_anno_list(struct mptcp_sock *msk)
 {
 	struct mptcp_pm_add_entry *entry, *tmp;
@@ -972,6 +997,60 @@ static bool address_use_port(struct mptcp_pm_addr_entry *entry)
 		MPTCP_PM_ADDR_FLAG_SIGNAL;
 }
 
+static int mptcp_userspace_pm_append_new_local_addr(struct mptcp_sock *msk,
+						    struct mptcp_pm_addr_entry *entry)
+{
+	DECLARE_BITMAP(id_bitmap, MPTCP_PM_MAX_ADDR_ID + 1);
+	struct mptcp_pm_addr_entry *match = NULL;
+	struct sock *sk = (struct sock *)msk;
+	struct mptcp_pm_addr_entry *e;
+	bool addr_match = false;
+	bool id_match = false;
+	int ret = -EINVAL;
+
+	bitmap_zero(id_bitmap, MPTCP_PM_MAX_ADDR_ID + 1);
+
+	mptcp_data_lock(sk);
+	list_for_each_entry(e, &msk->local_addr_list, list) {
+		addr_match = addresses_equal(&e->addr, &entry->addr, true);
+		if (addr_match && entry->addr.id == 0)
+			entry->addr.id = e->addr.id;
+		id_match = (e->addr.id == entry->addr.id);
+		if (addr_match && id_match) {
+			match = e;
+			break;
+		} else if (addr_match || id_match) {
+			break;
+		}
+		__set_bit(e->addr.id, id_bitmap);
+	}
+
+	if (!match && !addr_match && !id_match) {
+		e = kmalloc(sizeof(*e), GFP_ATOMIC);
+		if (!e) {
+			mptcp_data_unlock(sk);
+			return -ENOMEM;
+		}
+
+		*e = *entry;
+		if (!e->addr.id)
+			e->addr.id = find_next_zero_bit(id_bitmap,
+							MPTCP_PM_MAX_ADDR_ID + 1,
+							1);
+		list_add_tail_rcu(&e->list, &msk->local_addr_list);
+		ret = e->addr.id;
+
+		if (e->lsk_ref && e->addr.port)
+			lsk_list_add_ref(e->lsk_ref);
+	} else if (match) {
+		ret = entry->addr.id;
+	}
+
+	mptcp_data_unlock(sk);
+
+	return ret;
+}
+
 static int mptcp_pm_nl_append_new_local_addr(struct pm_nl_pernet *pernet,
 					     struct mptcp_pm_addr_entry *entry)
 {
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 408a05bff633..331c1080396d 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2531,6 +2531,7 @@ static int __mptcp_init_sock(struct sock *sk)
 	INIT_LIST_HEAD(&msk->conn_list);
 	INIT_LIST_HEAD(&msk->join_list);
 	INIT_LIST_HEAD(&msk->rtx_queue);
+	INIT_LIST_HEAD(&msk->local_addr_list);
 	INIT_WORK(&msk->work, mptcp_worker);
 	__skb_queue_head_init(&msk->receive_queue);
 	msk->out_of_order_queue = RB_ROOT;
@@ -3027,6 +3028,7 @@ void mptcp_destroy_common(struct mptcp_sock *msk)
 	msk->rmem_fwd_alloc = 0;
 	mptcp_token_destroy(msk);
 	mptcp_pm_free_anno_list(msk);
+	mptcp_free_local_addr_list(msk);
 }
 
 static void mptcp_destroy(struct sock *sk)
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index c50247673c7e..63b4ea850d07 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -281,6 +281,7 @@ struct mptcp_sock {
 	struct sk_buff_head receive_queue;
 	struct list_head conn_list;
 	struct list_head rtx_queue;
+	struct list_head local_addr_list;
 	struct mptcp_data_frag *first_pending;
 	struct list_head join_list;
 	struct socket	*subflow; /* outgoing connect/listener/!mp_capable */
@@ -733,6 +734,7 @@ struct mptcp_sock *mptcp_token_get_sock(struct net *net, u32 token);
 struct mptcp_sock *mptcp_token_iter_next(const struct net *net, long *s_slot,
 					 long *s_num);
 void mptcp_token_destroy(struct mptcp_sock *msk);
+void mptcp_free_local_addr_list(struct mptcp_sock *msk);
 
 void mptcp_crypto_key_sha(u64 key, u32 *token, u64 *idsn);
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH mptcp-next v2 11/21] mptcp: read attributes of addr entries managed by userspace PMs
  2022-01-12 22:15 [PATCH mptcp-next v2 00/21] mptcp: support userspace path management Kishen Maloor
                   ` (9 preceding siblings ...)
  2022-01-12 22:15 ` [PATCH mptcp-next v2 10/21] mptcp: handle local addrs announced " Kishen Maloor
@ 2022-01-12 22:15 ` Kishen Maloor
  2022-01-12 22:15 ` [PATCH mptcp-next v2 12/21] mptcp: netlink: split mptcp_pm_parse_addr into two functions Kishen Maloor
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Kishen Maloor @ 2022-01-12 22:15 UTC (permalink / raw)
  To: kishen.maloor, mptcp

This change introduces a parallel path in the kernel for retrieving
the local id, flags, if_index for an addr entry in the context of
an MPTCP connection that's being managed by a userspace PM. The
userspace and in-kernel PM modes deviate in their procedures for
obtaining this information.

Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
---
 net/mptcp/pm_netlink.c | 101 ++++++++++++++++++++++++++++-------------
 net/mptcp/protocol.h   |   2 +-
 net/mptcp/subflow.c    |   2 +-
 3 files changed, 71 insertions(+), 34 deletions(-)

diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 052a803a7f71..40c41a0498fc 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -1175,6 +1175,7 @@ int mptcp_pm_nl_get_local_id(struct mptcp_sock *msk, struct sock_common *skc)
 	struct mptcp_addr_info msk_local;
 	struct pm_nl_pernet *pernet;
 	int ret = -1;
+	int pm_type;
 
 	if (WARN_ON_ONCE(!msk))
 		return -1;
@@ -1192,31 +1193,50 @@ int mptcp_pm_nl_get_local_id(struct mptcp_sock *msk, struct sock_common *skc)
 
 	pernet = net_generic(sock_net((struct sock *)msk), pm_nl_pernet_id);
 
-	rcu_read_lock();
-	list_for_each_entry_rcu(entry, &pernet->local_addr_list, list) {
-		if (addresses_equal(&entry->addr, &skc_local, entry->addr.port)) {
-			ret = entry->addr.id;
-			break;
-		}
-	}
-	rcu_read_unlock();
-	if (ret >= 0)
-		return ret;
-
 	/* address not found, add to local list */
-	entry = kmalloc(sizeof(*entry), GFP_ATOMIC);
-	if (!entry)
-		return -ENOMEM;
-
-	entry->addr = skc_local;
-	entry->addr.id = 0;
-	entry->addr.port = 0;
-	entry->ifindex = 0;
-	entry->flags = 0;
-	entry->lsk_ref = NULL;
-	ret = mptcp_pm_nl_append_new_local_addr(pernet, entry);
-	if (ret < 0)
-		kfree(entry);
+
+	pm_type = READ_ONCE(msk->pm.pm_type);
+
+	if (pm_type == MPTCP_PM_TYPE_KERNEL) {
+		rcu_read_lock();
+		list_for_each_entry_rcu(entry, &pernet->local_addr_list, list) {
+			if (addresses_equal(&entry->addr, &skc_local, entry->addr.port)) {
+				ret = entry->addr.id;
+				break;
+			}
+		}
+		rcu_read_unlock();
+
+		if (ret >= 0)
+			return ret;
+
+		entry = kmalloc(sizeof(*entry), GFP_ATOMIC);
+		if (!entry)
+			return -ENOMEM;
+
+		entry->addr = skc_local;
+		entry->addr.id = 0;
+		entry->addr.port = 0;
+		entry->ifindex = 0;
+		entry->flags = 0;
+		entry->lsk_ref = NULL;
+		ret = mptcp_pm_nl_append_new_local_addr(pernet, entry);
+		if (ret < 0)
+			kfree(entry);
+	} else if (pm_type == MPTCP_PM_TYPE_USERSPACE) {
+		struct mptcp_pm_addr_entry new_entry;
+		__be16 msk_sport =  ((struct inet_sock *)
+				     inet_sk((struct sock *)msk))->inet_sport;
+
+		memset(&new_entry, 0, sizeof(struct mptcp_pm_addr_entry));
+		new_entry.addr = skc_local;
+		new_entry.addr.id = 0;
+
+		if (new_entry.addr.port == msk_sport)
+			new_entry.addr.port = 0;
+
+		ret = mptcp_userspace_pm_append_new_local_addr(msk, &new_entry);
+	}
 
 	return ret;
 }
@@ -1461,22 +1481,39 @@ static int mptcp_nl_cmd_add_addr(struct sk_buff *skb, struct genl_info *info)
 	return 0;
 }
 
-int mptcp_pm_get_flags_and_ifindex_by_id(struct net *net, unsigned int id,
+int mptcp_pm_get_flags_and_ifindex_by_id(struct mptcp_sock *msk, unsigned int id,
 					 u8 *flags, int *ifindex)
 {
-	struct mptcp_pm_addr_entry *entry;
+	struct mptcp_pm_addr_entry *entry, *match = NULL;
+	struct sock *sk = (struct sock *)msk;
+	struct net *net = sock_net(sk);
 
 	*flags = 0;
 	*ifindex = 0;
 
 	if (id) {
-		rcu_read_lock();
-		entry = __lookup_addr_by_id(net_generic(net, pm_nl_pernet_id), id);
-		if (entry) {
-			*flags = entry->flags;
-			*ifindex = entry->ifindex;
+		if (READ_ONCE(msk->pm.pm_type) == MPTCP_PM_TYPE_KERNEL) {
+			rcu_read_lock();
+			entry = __lookup_addr_by_id(net_generic(net, pm_nl_pernet_id), id);
+			if (entry) {
+				*flags = entry->flags;
+				*ifindex = entry->ifindex;
+			}
+			rcu_read_unlock();
+		} else {
+			mptcp_data_lock(sk);
+			list_for_each_entry(entry, &msk->local_addr_list, list) {
+				if (id == entry->addr.id) {
+					match = entry;
+					break;
+				}
+			}
+			mptcp_data_unlock(sk);
+			if (match) {
+				*flags = match->flags;
+				*ifindex = match->ifindex;
+			}
 		}
-		rcu_read_unlock();
 	}
 
 	return 0;
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 63b4ea850d07..c6f7c22d0e11 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -772,7 +772,7 @@ mptcp_pm_del_add_timer(struct mptcp_sock *msk,
 struct mptcp_pm_add_entry *
 mptcp_lookup_anno_list_by_saddr(struct mptcp_sock *msk,
 				struct mptcp_addr_info *addr);
-int mptcp_pm_get_flags_and_ifindex_by_id(struct net *net, unsigned int id,
+int mptcp_pm_get_flags_and_ifindex_by_id(struct mptcp_sock *msk, unsigned int id,
 					 u8 *flags, int *ifindex);
 
 int mptcp_pm_announce_addr(struct mptcp_sock *msk,
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index b75b7b186d34..29e51986c985 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1419,7 +1419,7 @@ int __mptcp_subflow_connect(struct sock *sk, const struct mptcp_addr_info *loc,
 		local_id = err;
 	}
 
-	mptcp_pm_get_flags_and_ifindex_by_id(sock_net(sk), local_id,
+	mptcp_pm_get_flags_and_ifindex_by_id(msk, local_id,
 					     &flags, &ifindex);
 	subflow->remote_key = msk->remote_key;
 	subflow->local_key = msk->local_key;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH mptcp-next v2 12/21] mptcp: netlink: split mptcp_pm_parse_addr into two functions
  2022-01-12 22:15 [PATCH mptcp-next v2 00/21] mptcp: support userspace path management Kishen Maloor
                   ` (10 preceding siblings ...)
  2022-01-12 22:15 ` [PATCH mptcp-next v2 11/21] mptcp: read attributes of addr entries managed " Kishen Maloor
@ 2022-01-12 22:15 ` Kishen Maloor
  2022-01-12 22:15 ` [PATCH mptcp-next v2 13/21] mptcp: netlink: Add MPTCP_PM_CMD_ANNOUNCE Kishen Maloor
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Kishen Maloor @ 2022-01-12 22:15 UTC (permalink / raw)
  To: kishen.maloor, mptcp; +Cc: Florian Westphal

From: Florian Westphal <fw@strlen.de>

Next patch will need to parse MPTCP_PM_ATTR_ADDR attributes and
fill an mptcp_addr_info structure from a different genl command
callback.

To avoid copy-paste, split the existing function to a helper
that does the common part and then call the helper from the
(renamed)mptcp_pm_parse_entry function.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/mptcp/pm_netlink.c | 60 +++++++++++++++++++++++++++---------------
 1 file changed, 39 insertions(+), 21 deletions(-)

diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 40c41a0498fc..07536abab3e5 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -1316,11 +1316,12 @@ static int mptcp_pm_family_to_addr(int family)
 	return MPTCP_PM_ADDR_ATTR_ADDR4;
 }
 
-static int mptcp_pm_parse_addr(struct nlattr *attr, struct genl_info *info,
-			       bool require_family,
-			       struct mptcp_pm_addr_entry *entry)
+static int mptcp_pm_parse_pm_addr_attr(struct nlattr *tb[],
+				       const struct nlattr *attr,
+				       struct genl_info *info,
+				       struct mptcp_addr_info *addr,
+				       bool require_family)
 {
-	struct nlattr *tb[MPTCP_PM_ADDR_ATTR_MAX + 1];
 	int err, addr_addr;
 
 	if (!attr) {
@@ -1334,27 +1335,29 @@ static int mptcp_pm_parse_addr(struct nlattr *attr, struct genl_info *info,
 	if (err)
 		return err;
 
-	memset(entry, 0, sizeof(*entry));
+	if (tb[MPTCP_PM_ADDR_ATTR_ID])
+		addr->id = nla_get_u8(tb[MPTCP_PM_ADDR_ATTR_ID]);
+
 	if (!tb[MPTCP_PM_ADDR_ATTR_FAMILY]) {
 		if (!require_family)
-			goto skip_family;
+			return err;
 
 		NL_SET_ERR_MSG_ATTR(info->extack, attr,
 				    "missing family");
 		return -EINVAL;
 	}
 
-	entry->addr.family = nla_get_u16(tb[MPTCP_PM_ADDR_ATTR_FAMILY]);
-	if (entry->addr.family != AF_INET
+	addr->family = nla_get_u16(tb[MPTCP_PM_ADDR_ATTR_FAMILY]);
+	if (addr->family != AF_INET
 #if IS_ENABLED(CONFIG_MPTCP_IPV6)
-	    && entry->addr.family != AF_INET6
+	    && addr->family != AF_INET6
 #endif
 	    ) {
 		NL_SET_ERR_MSG_ATTR(info->extack, attr,
 				    "unknown address family");
 		return -EINVAL;
 	}
-	addr_addr = mptcp_pm_family_to_addr(entry->addr.family);
+	addr_addr = mptcp_pm_family_to_addr(addr->family);
 	if (!tb[addr_addr]) {
 		NL_SET_ERR_MSG_ATTR(info->extack, attr,
 				    "missing address data");
@@ -1362,22 +1365,37 @@ static int mptcp_pm_parse_addr(struct nlattr *attr, struct genl_info *info,
 	}
 
 #if IS_ENABLED(CONFIG_MPTCP_IPV6)
-	if (entry->addr.family == AF_INET6)
-		entry->addr.addr6 = nla_get_in6_addr(tb[addr_addr]);
+	if (addr->family == AF_INET6)
+		addr->addr6 = nla_get_in6_addr(tb[addr_addr]);
 	else
 #endif
-		entry->addr.addr.s_addr = nla_get_in_addr(tb[addr_addr]);
+		addr->addr.s_addr = nla_get_in_addr(tb[addr_addr]);
+
+	if (tb[MPTCP_PM_ADDR_ATTR_PORT])
+		addr->port = htons(nla_get_u16(tb[MPTCP_PM_ADDR_ATTR_PORT]));
+
+	return err;
+}
+
+static int mptcp_pm_parse_entry(struct nlattr *attr, struct genl_info *info,
+				bool require_family,
+				struct mptcp_pm_addr_entry *entry)
+{
+	struct nlattr *tb[MPTCP_PM_ADDR_ATTR_MAX + 1];
+	int err;
+
+	memset(entry, 0, sizeof(*entry));
+
+	err = mptcp_pm_parse_pm_addr_attr(tb, attr, info, &entry->addr, require_family);
+	if (err)
+		return err;
 
-skip_family:
 	if (tb[MPTCP_PM_ADDR_ATTR_IF_IDX]) {
 		u32 val = nla_get_s32(tb[MPTCP_PM_ADDR_ATTR_IF_IDX]);
 
 		entry->ifindex = val;
 	}
 
-	if (tb[MPTCP_PM_ADDR_ATTR_ID])
-		entry->addr.id = nla_get_u8(tb[MPTCP_PM_ADDR_ATTR_ID]);
-
 	if (tb[MPTCP_PM_ADDR_ATTR_FLAGS])
 		entry->flags = nla_get_u32(tb[MPTCP_PM_ADDR_ATTR_FLAGS]);
 
@@ -1432,7 +1450,7 @@ static int mptcp_nl_cmd_add_addr(struct sk_buff *skb, struct genl_info *info)
 	struct socket *lsk;
 	int ret;
 
-	ret = mptcp_pm_parse_addr(attr, info, true, &addr);
+	ret = mptcp_pm_parse_entry(attr, info, true, &addr);
 	if (ret < 0)
 		return ret;
 
@@ -1648,7 +1666,7 @@ static int mptcp_nl_cmd_del_addr(struct sk_buff *skb, struct genl_info *info)
 	unsigned int addr_max;
 	int ret;
 
-	ret = mptcp_pm_parse_addr(attr, info, false, &addr);
+	ret = mptcp_pm_parse_entry(attr, info, false, &addr);
 	if (ret < 0)
 		return ret;
 
@@ -1824,7 +1842,7 @@ static int mptcp_nl_cmd_get_addr(struct sk_buff *skb, struct genl_info *info)
 	void *reply;
 	int ret;
 
-	ret = mptcp_pm_parse_addr(attr, info, false, &addr);
+	ret = mptcp_pm_parse_entry(attr, info, false, &addr);
 	if (ret < 0)
 		return ret;
 
@@ -2019,7 +2037,7 @@ static int mptcp_nl_cmd_set_flags(struct sk_buff *skb, struct genl_info *info)
 	u8 bkup = 0, lookup_by_id = 0;
 	int ret;
 
-	ret = mptcp_pm_parse_addr(attr, info, false, &addr);
+	ret = mptcp_pm_parse_entry(attr, info, false, &addr);
 	if (ret < 0)
 		return ret;
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH mptcp-next v2 13/21] mptcp: netlink: Add MPTCP_PM_CMD_ANNOUNCE
  2022-01-12 22:15 [PATCH mptcp-next v2 00/21] mptcp: support userspace path management Kishen Maloor
                   ` (11 preceding siblings ...)
  2022-01-12 22:15 ` [PATCH mptcp-next v2 12/21] mptcp: netlink: split mptcp_pm_parse_addr into two functions Kishen Maloor
@ 2022-01-12 22:15 ` Kishen Maloor
  2022-01-14 17:04   ` Paolo Abeni
  2022-01-12 22:15 ` [PATCH mptcp-next v2 14/21] mptcp: selftests: support MPTCP_PM_CMD_ANNOUNCE Kishen Maloor
                   ` (7 subsequent siblings)
  20 siblings, 1 reply; 45+ messages in thread
From: Kishen Maloor @ 2022-01-12 22:15 UTC (permalink / raw)
  To: kishen.maloor, mptcp

This change adds a MPTCP netlink interface for issuing
ADD_ADDR advertisements over the chosen MPTCP connection from a
userspace path manager.

The command requires the following parameters:
{ token, { loc_id, family, daddr4 | daddr6 [, dport] } [, if_idx],
flags/signal }.

Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
---
 include/uapi/linux/mptcp.h |   2 +
 net/mptcp/pm_netlink.c     | 111 +++++++++++++++++++++++++++++++++++++
 2 files changed, 113 insertions(+)

diff --git a/include/uapi/linux/mptcp.h b/include/uapi/linux/mptcp.h
index f106a3941cdf..40380be396c8 100644
--- a/include/uapi/linux/mptcp.h
+++ b/include/uapi/linux/mptcp.h
@@ -55,6 +55,7 @@ enum {
 	MPTCP_PM_ATTR_ADDR,				/* nested address */
 	MPTCP_PM_ATTR_RCV_ADD_ADDRS,			/* u32 */
 	MPTCP_PM_ATTR_SUBFLOWS,				/* u32 */
+	MPTCP_PM_ATTR_TOKEN,				/* u32 */
 
 	__MPTCP_PM_ATTR_MAX
 };
@@ -92,6 +93,7 @@ enum {
 	MPTCP_PM_CMD_SET_LIMITS,
 	MPTCP_PM_CMD_GET_LIMITS,
 	MPTCP_PM_CMD_SET_FLAGS,
+	MPTCP_PM_CMD_ANNOUNCE,
 
 	__MPTCP_PM_CMD_AFTER_LAST
 };
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 07536abab3e5..18228b2dfa2d 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -1268,6 +1268,7 @@ static const struct nla_policy mptcp_pm_policy[MPTCP_PM_ATTR_MAX + 1] = {
 					NLA_POLICY_NESTED(mptcp_pm_addr_policy),
 	[MPTCP_PM_ATTR_RCV_ADD_ADDRS]	= { .type	= NLA_U32,	},
 	[MPTCP_PM_ATTR_SUBFLOWS]	= { .type	= NLA_U32,	},
+	[MPTCP_PM_ATTR_TOKEN]		= { .type	= NLA_U32,	},
 };
 
 void mptcp_pm_nl_subflow_chk_stale(const struct mptcp_sock *msk, struct sock *ssk)
@@ -2028,6 +2029,111 @@ static int mptcp_nl_addr_backup(struct net *net,
 	return ret;
 }
 
+static int mptcp_nl_cmd_announce(struct sk_buff *skb, struct genl_info *info)
+{
+	struct nlattr *token = info->attrs[MPTCP_PM_ATTR_TOKEN];
+	struct nlattr *addr = info->attrs[MPTCP_PM_ATTR_ADDR];
+	struct pm_nl_pernet *pernet = genl_info_pm_nl(info);
+	struct mptcp_pm_addr_entry addr_val;
+	struct mptcp_local_lsk *lsk_ref;
+	bool reuse_port = false;
+	struct mptcp_sock *msk;
+	struct socket *lsk;
+	u32 token_val;
+	int err;
+
+	if (!addr || !token) {
+		GENL_SET_ERR_MSG(info, "missing required inputs");
+		return -EINVAL;
+	}
+
+	token_val = nla_get_u32(token);
+
+	msk = mptcp_token_get_sock(sock_net(skb->sk), token_val);
+	if (!msk) {
+		NL_SET_ERR_MSG_ATTR(info->extack, token, "invalid token");
+		return -EINVAL;
+	}
+
+	if (READ_ONCE(msk->pm.pm_type) != MPTCP_PM_TYPE_USERSPACE) {
+		GENL_SET_ERR_MSG(info, "invalid request; userspace PM not selected");
+		return -EINVAL;
+	}
+
+	err = mptcp_pm_parse_entry(addr, info, true, &addr_val);
+	if (err < 0) {
+		GENL_SET_ERR_MSG(info, "error parsing local address");
+		return err;
+	}
+
+	if (addr_val.addr.id == 0 || !(addr_val.flags & MPTCP_PM_ADDR_FLAG_SIGNAL)) {
+		GENL_SET_ERR_MSG(info, "invalid addr id or flags");
+		return -EINVAL;
+	}
+
+	if (!addr_val.addr.port) {
+		addr_val.addr.port =
+			((struct inet_sock *)inet_sk
+			 ((struct sock *)msk))->inet_sport;
+
+		reuse_port = true;
+	}
+
+	lsk_ref = lsk_list_find(pernet, &addr_val.addr);
+
+	if (!lsk_ref) {
+		err = mptcp_pm_nl_create_listen_socket(skb->sk, &addr_val, &lsk);
+		if ((err && !reuse_port) || (err && (err != -EADDRINUSE) && reuse_port)) {
+			GENL_SET_ERR_MSG(info, "error creating listen socket");
+			return err;
+		}
+
+		if (lsk) {
+			lsk_ref = lsk_list_add(pernet, &addr_val.addr, lsk);
+			if (!lsk_ref) {
+				GENL_SET_ERR_MSG(info, "can't allocate lsk ref");
+				sock_release(lsk);
+				return -ENOMEM;
+			}
+		}
+	}
+
+	if (!reuse_port) {
+		addr_val.lsk_ref = lsk_ref;
+		lsk_ref = NULL;
+	} else {
+		addr_val.addr.port = 0;
+	}
+
+	err = mptcp_userspace_pm_append_new_local_addr(msk, &addr_val);
+	if (err < 0) {
+		if (addr_val.lsk_ref)
+			lsk_list_release(pernet, addr_val.lsk_ref);
+		else if (lsk_ref)
+			lsk_list_release(pernet, lsk_ref);
+		GENL_SET_ERR_MSG(info, "did not match address and id");
+		return err;
+	}
+
+	lock_sock((struct sock *)msk);
+	spin_lock_bh(&msk->pm.lock);
+
+	if (mptcp_pm_alloc_anno_list(msk, &addr_val, lsk_ref)) {
+		mptcp_pm_announce_addr(msk, &addr_val.addr, false);
+		mptcp_pm_nl_addr_send_ack(msk);
+	}
+
+	spin_unlock_bh(&msk->pm.lock);
+	release_sock((struct sock *)msk);
+
+	if (addr_val.lsk_ref)
+		lsk_list_release(pernet, addr_val.lsk_ref);
+	else if (lsk_ref)
+		lsk_list_release(pernet, lsk_ref);
+
+	return 0;
+}
+
 static int mptcp_nl_cmd_set_flags(struct sk_buff *skb, struct genl_info *info)
 {
 	struct mptcp_pm_addr_entry addr = { .addr = { .family = AF_UNSPEC }, }, *entry;
@@ -2370,6 +2476,11 @@ static const struct genl_small_ops mptcp_pm_ops[] = {
 		.doit   = mptcp_nl_cmd_set_flags,
 		.flags  = GENL_ADMIN_PERM,
 	},
+	{
+		.cmd    = MPTCP_PM_CMD_ANNOUNCE,
+		.doit   = mptcp_nl_cmd_announce,
+		.flags  = GENL_ADMIN_PERM,
+	},
 };
 
 static struct genl_family mptcp_genl_family __ro_after_init = {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH mptcp-next v2 14/21] mptcp: selftests: support MPTCP_PM_CMD_ANNOUNCE
  2022-01-12 22:15 [PATCH mptcp-next v2 00/21] mptcp: support userspace path management Kishen Maloor
                   ` (12 preceding siblings ...)
  2022-01-12 22:15 ` [PATCH mptcp-next v2 13/21] mptcp: netlink: Add MPTCP_PM_CMD_ANNOUNCE Kishen Maloor
@ 2022-01-12 22:15 ` Kishen Maloor
  2022-01-12 22:15 ` [PATCH mptcp-next v2 15/21] mptcp: netlink: Add MPTCP_PM_CMD_REMOVE Kishen Maloor
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Kishen Maloor @ 2022-01-12 22:15 UTC (permalink / raw)
  To: kishen.maloor, mptcp

This change updates the "pm_nl_ctl" testing sample with an "ann" (announce)
option to support the newly added netlink interface command
MPTCP_PM_CMD_ANNOUNCE to issue ADD_ADDR advertisements over the
chosen MPTCP connection.

E.g. ./pm_nl_ctl ann 192.168.122.75 token 823274047 id 25 dev enp1s0

Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
---
 tools/testing/selftests/net/mptcp/pm_nl_ctl.c | 130 ++++++++++++++++++
 1 file changed, 130 insertions(+)

diff --git a/tools/testing/selftests/net/mptcp/pm_nl_ctl.c b/tools/testing/selftests/net/mptcp/pm_nl_ctl.c
index 354784512748..76dfb5a8984a 100644
--- a/tools/testing/selftests/net/mptcp/pm_nl_ctl.c
+++ b/tools/testing/selftests/net/mptcp/pm_nl_ctl.c
@@ -6,6 +6,7 @@
 #include <stdlib.h>
 #include <string.h>
 #include <unistd.h>
+#include <limits.h>
 
 #include <sys/socket.h>
 #include <sys/types.h>
@@ -26,6 +27,7 @@ static void syntax(char *argv[])
 {
 	fprintf(stderr, "%s add|get|set|del|flush|dump|accept [<args>]\n", argv[0]);
 	fprintf(stderr, "\tadd [flags signal|subflow|backup|fullmesh] [id <nr>] [dev <name>] <ip>\n");
+	fprintf(stderr, "\tann <local-ip> id <local-id> token <token> [port <local-port>] [dev <name>]\n");
 	fprintf(stderr, "\tdel <id> [<ip>]\n");
 	fprintf(stderr, "\tget <id>\n");
 	fprintf(stderr, "\tset <ip> [flags backup|nobackup]\n");
@@ -170,6 +172,132 @@ static int resolve_mptcp_pm_netlink(int fd)
 	return genl_parse_getfamily((void *)data);
 }
 
+int announce_addr(int fd, int pm_family, int argc, char *argv[])
+{
+	char data[NLMSG_ALIGN(sizeof(struct nlmsghdr)) +
+		  NLMSG_ALIGN(sizeof(struct genlmsghdr)) +
+		  1024];
+	u_int32_t flags = MPTCP_PM_ADDR_FLAG_SIGNAL;
+	u_int32_t token = UINT_MAX;
+	struct rtattr *rta, *addr;
+	u_int32_t id = UINT_MAX;
+	struct nlmsghdr *nh;
+	u_int16_t family;
+	int addr_start;
+	int off = 0;
+	int arg;
+
+	memset(data, 0, sizeof(data));
+	nh = (void *)data;
+	off = init_genl_req(data, pm_family, MPTCP_PM_CMD_ANNOUNCE,
+			    MPTCP_PM_VER);
+
+	if (argc < 7)
+		syntax(argv);
+
+	/* local-ip header */
+	addr_start = off;
+	addr = (void *)(data + off);
+	addr->rta_type = NLA_F_NESTED | MPTCP_PM_ATTR_ADDR;
+	addr->rta_len = RTA_LENGTH(0);
+	off += NLMSG_ALIGN(addr->rta_len);
+
+	/* local-ip data */
+	/* record addr type */
+	rta = (void *)(data + off);
+	if (inet_pton(AF_INET, argv[2], RTA_DATA(rta))) {
+		family = AF_INET;
+		rta->rta_type = MPTCP_PM_ADDR_ATTR_ADDR4;
+		rta->rta_len = RTA_LENGTH(4);
+	} else if (inet_pton(AF_INET6, argv[2], RTA_DATA(rta))) {
+		family = AF_INET6;
+		rta->rta_type = MPTCP_PM_ADDR_ATTR_ADDR6;
+		rta->rta_len = RTA_LENGTH(16);
+	} else
+		error(1, errno, "can't parse ip %s", argv[2]);
+	off += NLMSG_ALIGN(rta->rta_len);
+
+	/* addr family */
+	rta = (void *)(data + off);
+	rta->rta_type = MPTCP_PM_ADDR_ATTR_FAMILY;
+	rta->rta_len = RTA_LENGTH(2);
+	memcpy(RTA_DATA(rta), &family, 2);
+	off += NLMSG_ALIGN(rta->rta_len);
+
+	/* addr flags (..SIGNAL) */
+	rta = (void *)(data + off);
+	rta->rta_type = MPTCP_PM_ADDR_ATTR_FLAGS;
+	rta->rta_len = RTA_LENGTH(4);
+	memcpy(RTA_DATA(rta), &flags, 4);
+	off += NLMSG_ALIGN(rta->rta_len);
+
+	for (arg = 3; arg < argc; arg++) {
+		if (!strcmp(argv[arg], "id")) {
+			/* local-id */
+			if (++arg >= argc)
+				error(1, 0, " missing id value");
+
+			id = atoi(argv[arg]);
+			rta = (void *)(data + off);
+			rta->rta_type = MPTCP_PM_ADDR_ATTR_ID;
+			rta->rta_len = RTA_LENGTH(1);
+			memcpy(RTA_DATA(rta), &id, 1);
+			off += NLMSG_ALIGN(rta->rta_len);
+		} else if (!strcmp(argv[arg], "dev")) {
+			/* for the if_index */
+			int32_t ifindex;
+
+			if (++arg >= argc)
+				error(1, 0, " missing dev name");
+
+			ifindex = if_nametoindex(argv[arg]);
+			if (!ifindex)
+				error(1, errno, "unknown device %s", argv[arg]);
+
+			rta = (void *)(data + off);
+			rta->rta_type = MPTCP_PM_ADDR_ATTR_IF_IDX;
+			rta->rta_len = RTA_LENGTH(4);
+			memcpy(RTA_DATA(rta), &ifindex, 4);
+			off += NLMSG_ALIGN(rta->rta_len);
+		} else if (!strcmp(argv[arg], "port")) {
+			/* local-port (optional) */
+			u_int16_t port;
+
+			if (++arg >= argc)
+				error(1, 0, " missing port value");
+
+			port = atoi(argv[arg]);
+			rta = (void *)(data + off);
+			rta->rta_type = MPTCP_PM_ADDR_ATTR_PORT;
+			rta->rta_len = RTA_LENGTH(2);
+			memcpy(RTA_DATA(rta), &port, 2);
+			off += NLMSG_ALIGN(rta->rta_len);
+		} else if (!strcmp(argv[arg], "token")) {
+			/* MPTCP connection token */
+			if (++arg >= argc)
+				error(1, 0, " missing token value");
+
+			token = atoi(argv[arg]);
+		} else
+			error(1, 0, "unknown keyword %s", argv[arg]);
+	}
+	addr->rta_len = off - addr_start;
+
+	if (id == UINT_MAX || token == UINT_MAX)
+		error(1, 0, " missing mandatory inputs");
+
+	/* token */
+	rta = (void *)(data + off);
+	rta->rta_type = MPTCP_PM_ATTR_TOKEN;
+	rta->rta_len = RTA_LENGTH(4);
+	memcpy(RTA_DATA(rta), &token, 4);
+	off += NLMSG_ALIGN(rta->rta_len);
+
+	do_nl_req(fd, nh, off, 0);
+
+	return 0;
+}
+
 int add_addr(int fd, int pm_family, int argc, char *argv[])
 {
 	char data[NLMSG_ALIGN(sizeof(struct nlmsghdr)) +
@@ -744,6 +872,8 @@ int main(int argc, char *argv[])
 
 	if (!strcmp(argv[1], "add"))
 		return add_addr(fd, pm_family, argc, argv);
+	else if (!strcmp(argv[1], "ann"))
+		return announce_addr(fd, pm_family, argc, argv);
 	else if (!strcmp(argv[1], "del"))
 		return del_addr(fd, pm_family, argc, argv);
 	else if (!strcmp(argv[1], "flush"))
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH mptcp-next v2 15/21] mptcp: netlink: Add MPTCP_PM_CMD_REMOVE
  2022-01-12 22:15 [PATCH mptcp-next v2 00/21] mptcp: support userspace path management Kishen Maloor
                   ` (13 preceding siblings ...)
  2022-01-12 22:15 ` [PATCH mptcp-next v2 14/21] mptcp: selftests: support MPTCP_PM_CMD_ANNOUNCE Kishen Maloor
@ 2022-01-12 22:15 ` Kishen Maloor
  2022-01-12 22:15 ` [PATCH mptcp-next v2 16/21] mptcp: selftests: support MPTCP_PM_CMD_REMOVE Kishen Maloor
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Kishen Maloor @ 2022-01-12 22:15 UTC (permalink / raw)
  To: kishen.maloor, mptcp

This change adds a MPTCP netlink command for issuing
REMOVE_ADDR signals for a specific address over the chosen MPTCP
connection from a userspace path manager.

The command requires the following parameters: {token, loc_id}.

Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
---
 include/uapi/linux/mptcp.h |  2 ++
 net/mptcp/pm_netlink.c     | 67 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 69 insertions(+)

diff --git a/include/uapi/linux/mptcp.h b/include/uapi/linux/mptcp.h
index 40380be396c8..ec63f9382dbe 100644
--- a/include/uapi/linux/mptcp.h
+++ b/include/uapi/linux/mptcp.h
@@ -56,6 +56,7 @@ enum {
 	MPTCP_PM_ATTR_RCV_ADD_ADDRS,			/* u32 */
 	MPTCP_PM_ATTR_SUBFLOWS,				/* u32 */
 	MPTCP_PM_ATTR_TOKEN,				/* u32 */
+	MPTCP_PM_ATTR_LOC_ID,				/* u8 */
 
 	__MPTCP_PM_ATTR_MAX
 };
@@ -94,6 +95,7 @@ enum {
 	MPTCP_PM_CMD_GET_LIMITS,
 	MPTCP_PM_CMD_SET_FLAGS,
 	MPTCP_PM_CMD_ANNOUNCE,
+	MPTCP_PM_CMD_REMOVE,
 
 	__MPTCP_PM_CMD_AFTER_LAST
 };
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 18228b2dfa2d..bec45560060a 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -1269,6 +1269,7 @@ static const struct nla_policy mptcp_pm_policy[MPTCP_PM_ATTR_MAX + 1] = {
 	[MPTCP_PM_ATTR_RCV_ADD_ADDRS]	= { .type	= NLA_U32,	},
 	[MPTCP_PM_ATTR_SUBFLOWS]	= { .type	= NLA_U32,	},
 	[MPTCP_PM_ATTR_TOKEN]		= { .type	= NLA_U32,	},
+	[MPTCP_PM_ATTR_LOC_ID]		= { .type	= NLA_U8,	},
 };
 
 void mptcp_pm_nl_subflow_chk_stale(const struct mptcp_sock *msk, struct sock *ssk)
@@ -1719,6 +1720,7 @@ static void mptcp_pm_remove_addrs_and_subflows(struct mptcp_sock *msk,
 		    slist.nr < MPTCP_RM_IDS_MAX) {
 			alist.ids[alist.nr++] = entry->addr.id;
 			slist.ids[slist.nr++] = entry->addr.id;
+			remove_anno_list_by_saddr(msk, &entry->addr);
 		} else if (remove_anno_list_by_saddr(msk, &entry->addr) &&
 			 alist.nr < MPTCP_RM_IDS_MAX) {
 			alist.ids[alist.nr++] = entry->addr.id;
@@ -2330,6 +2332,66 @@ void mptcp_event_addr_removed(const struct mptcp_sock *msk, uint8_t id)
 	kfree_skb(skb);
 }
 
+static int mptcp_nl_cmd_remove(struct sk_buff *skb, struct genl_info *info)
+{
+	struct nlattr *token = info->attrs[MPTCP_PM_ATTR_TOKEN];
+	struct nlattr *id = info->attrs[MPTCP_PM_ATTR_LOC_ID];
+	struct pm_nl_pernet *pernet = genl_info_pm_nl(info);
+	struct mptcp_pm_addr_entry *match = NULL;
+	struct mptcp_pm_addr_entry *entry;
+	struct mptcp_sock *msk;
+	LIST_HEAD(free_list);
+	u32 token_val;
+	u8 id_val;
+
+	if (!id || !token) {
+		GENL_SET_ERR_MSG(info, "missing required inputs");
+		return -EINVAL;
+	}
+
+	id_val = nla_get_u8(id);
+	token_val = nla_get_u32(token);
+
+	msk = mptcp_token_get_sock(sock_net(skb->sk), token_val);
+	if (!msk) {
+		NL_SET_ERR_MSG_ATTR(info->extack, token, "invalid token");
+		return -EINVAL;
+	}
+
+	if (READ_ONCE(msk->pm.pm_type) != MPTCP_PM_TYPE_USERSPACE) {
+		GENL_SET_ERR_MSG(info, "invalid request; userspace PM not selected");
+		return -EINVAL;
+	}
+
+	lock_sock((struct sock *)msk);
+
+	list_for_each_entry(entry, &msk->local_addr_list, list) {
+		if (entry->addr.id == id_val) {
+			match = entry;
+			break;
+		}
+	}
+
+	if (!match) {
+		GENL_SET_ERR_MSG(info, "address with specified id not found");
+		release_sock((struct sock *)msk);
+		return -EINVAL;
+	}
+
+	list_move(&match->list, &free_list);
+
+	mptcp_pm_remove_addrs_and_subflows(msk, &free_list);
+
+	release_sock((struct sock *)msk);
+
+	list_for_each_entry_safe(match, entry, &free_list, list) {
+		if (match->lsk_ref)
+			lsk_list_release(pernet, match->lsk_ref);
+		kfree(match);
+	}
+	return 0;
+}
+
 void mptcp_event_addr_announced(const struct mptcp_sock *msk,
 				const struct mptcp_addr_info *info,
 				const struct sock *ssk)
@@ -2481,6 +2543,11 @@ static const struct genl_small_ops mptcp_pm_ops[] = {
 		.doit   = mptcp_nl_cmd_announce,
 		.flags  = GENL_ADMIN_PERM,
 	},
+	{
+		.cmd    = MPTCP_PM_CMD_REMOVE,
+		.doit   = mptcp_nl_cmd_remove,
+		.flags  = GENL_ADMIN_PERM,
+	},
 };
 
 static struct genl_family mptcp_genl_family __ro_after_init = {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH mptcp-next v2 16/21] mptcp: selftests: support MPTCP_PM_CMD_REMOVE
  2022-01-12 22:15 [PATCH mptcp-next v2 00/21] mptcp: support userspace path management Kishen Maloor
                   ` (14 preceding siblings ...)
  2022-01-12 22:15 ` [PATCH mptcp-next v2 15/21] mptcp: netlink: Add MPTCP_PM_CMD_REMOVE Kishen Maloor
@ 2022-01-12 22:15 ` Kishen Maloor
  2022-01-12 22:15 ` [PATCH mptcp-next v2 17/21] mptcp: netlink: allow userspace-driven subflow establishment Kishen Maloor
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Kishen Maloor @ 2022-01-12 22:15 UTC (permalink / raw)
  To: kishen.maloor, mptcp

This change updates the "pm_nl_ctl" testing sample with a "rem" (remove)
option to support the newly added netlink interface command
MPTCP_PM_CMD_REMOVE to issue REMOVE_ADDR signals over the
chosen MPTCP connection.

E.g. ./pm_nl_ctl rem token 823274047 id 23

Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
---
 tools/testing/selftests/net/mptcp/pm_nl_ctl.c | 52 +++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/tools/testing/selftests/net/mptcp/pm_nl_ctl.c b/tools/testing/selftests/net/mptcp/pm_nl_ctl.c
index 76dfb5a8984a..8892bf16693a 100644
--- a/tools/testing/selftests/net/mptcp/pm_nl_ctl.c
+++ b/tools/testing/selftests/net/mptcp/pm_nl_ctl.c
@@ -28,6 +28,7 @@ static void syntax(char *argv[])
 	fprintf(stderr, "%s add|get|set|del|flush|dump|accept [<args>]\n", argv[0]);
 	fprintf(stderr, "\tadd [flags signal|subflow|backup|fullmesh] [id <nr>] [dev <name>] <ip>\n");
 	fprintf(stderr, "\tann <local-ip> id <local-id> token <token> [port <local-port>] [dev <name>]\n");
+	fprintf(stderr, "\trem id <local-id> token <token>\n");
 	fprintf(stderr, "\tdel <id> [<ip>]\n");
 	fprintf(stderr, "\tget <id>\n");
 	fprintf(stderr, "\tset <ip> [flags backup|nobackup]\n");
@@ -172,6 +173,55 @@ static int resolve_mptcp_pm_netlink(int fd)
 	return genl_parse_getfamily((void *)data);
 }
 
+int remove_addr(int fd, int pm_family, int argc, char *argv[])
+{
+	char data[NLMSG_ALIGN(sizeof(struct nlmsghdr)) +
+		  NLMSG_ALIGN(sizeof(struct genlmsghdr)) +
+		  1024];
+	struct nlmsghdr *nh;
+	struct rtattr *rta;
+	u_int32_t token;
+	u_int8_t id;
+	int off = 0;
+	int arg;
+
+	memset(data, 0, sizeof(data));
+	nh = (void *)data;
+	off = init_genl_req(data, pm_family, MPTCP_PM_CMD_REMOVE,
+			    MPTCP_PM_VER);
+
+	if (argc < 6)
+		syntax(argv);
+
+	for (arg = 2; arg < argc; arg++) {
+		if (!strcmp(argv[arg], "id")) {
+			if (++arg >= argc)
+				error(1, 0, " missing id value");
+
+			id = atoi(argv[arg]);
+			rta = (void *)(data + off);
+			rta->rta_type = MPTCP_PM_ATTR_LOC_ID;
+			rta->rta_len = RTA_LENGTH(1);
+			memcpy(RTA_DATA(rta), &id, 1);
+			off += NLMSG_ALIGN(rta->rta_len);
+		} else if (!strcmp(argv[arg], "token")) {
+			if (++arg >= argc)
+				error(1, 0, " missing token value");
+
+			token = atoi(argv[arg]);
+			rta = (void *)(data + off);
+			rta->rta_type = MPTCP_PM_ATTR_TOKEN;
+			rta->rta_len = RTA_LENGTH(4);
+			memcpy(RTA_DATA(rta), &token, 4);
+			off += NLMSG_ALIGN(rta->rta_len);
+		} else
+			error(1, 0, "unknown keyword %s", argv[arg]);
+	}
+
+	do_nl_req(fd, nh, off, 0);
+	return 0;
+}
+
 int announce_addr(int fd, int pm_family, int argc, char *argv[])
 {
 	char data[NLMSG_ALIGN(sizeof(struct nlmsghdr)) +
@@ -874,6 +924,8 @@ int main(int argc, char *argv[])
 		return add_addr(fd, pm_family, argc, argv);
 	else if (!strcmp(argv[1], "ann"))
 		return announce_addr(fd, pm_family, argc, argv);
+	else if (!strcmp(argv[1], "rem"))
+		return remove_addr(fd, pm_family, argc, argv);
 	else if (!strcmp(argv[1], "del"))
 		return del_addr(fd, pm_family, argc, argv);
 	else if (!strcmp(argv[1], "flush"))
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH mptcp-next v2 17/21] mptcp: netlink: allow userspace-driven subflow establishment
  2022-01-12 22:15 [PATCH mptcp-next v2 00/21] mptcp: support userspace path management Kishen Maloor
                   ` (15 preceding siblings ...)
  2022-01-12 22:15 ` [PATCH mptcp-next v2 16/21] mptcp: selftests: support MPTCP_PM_CMD_REMOVE Kishen Maloor
@ 2022-01-12 22:15 ` Kishen Maloor
  2022-01-12 22:15 ` [PATCH mptcp-next v2 18/21] mptcp: selftests: support MPTCP_PM_CMD_SUBFLOW_CREATE Kishen Maloor
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Kishen Maloor @ 2022-01-12 22:15 UTC (permalink / raw)
  To: kishen.maloor, mptcp; +Cc: Florian Westphal

From: Florian Westphal <fw@strlen.de>

This allows userspace to tell kernel to add a new subflow to an existing
mptcp connection.

Userspace provides the token to identify the mptcp-level connection
that needs a change in active subflows and the local and remote
addresses of the new or the to-be-removed subflow.

MPTCP_PM_CMD_SUBFLOW_CREATE requires the following parameters:
{ token, { loc_id, family, loc_addr4 | loc_addr6 }, { family, rem_addr4 |
rem_addr6, rem_port }

MPTCP_PM_CMD_SUBFLOW_DESTROY requires the following parameters:
{ token, { family, loc_addr4 | loc_addr6, loc_port }, { family, rem_addr4 |
rem_addr6, rem_port }

v2: fix build error when IPv6 is not enabled

Signed-off-by: Florian Westphal <fw@strlen.de>
Co-developed-by: Kishen Maloor <kishen.maloor@intel.com>
Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
---
 include/uapi/linux/mptcp.h |   3 +
 net/mptcp/pm_netlink.c     | 206 +++++++++++++++++++++++++++++++++++++
 2 files changed, 209 insertions(+)

diff --git a/include/uapi/linux/mptcp.h b/include/uapi/linux/mptcp.h
index ec63f9382dbe..25fd6c679bfa 100644
--- a/include/uapi/linux/mptcp.h
+++ b/include/uapi/linux/mptcp.h
@@ -57,6 +57,7 @@ enum {
 	MPTCP_PM_ATTR_SUBFLOWS,				/* u32 */
 	MPTCP_PM_ATTR_TOKEN,				/* u32 */
 	MPTCP_PM_ATTR_LOC_ID,				/* u8 */
+	MPTCP_PM_ATTR_ADDR_REMOTE,			/* nested address */
 
 	__MPTCP_PM_ATTR_MAX
 };
@@ -96,6 +97,8 @@ enum {
 	MPTCP_PM_CMD_SET_FLAGS,
 	MPTCP_PM_CMD_ANNOUNCE,
 	MPTCP_PM_CMD_REMOVE,
+	MPTCP_PM_CMD_SUBFLOW_CREATE,
+	MPTCP_PM_CMD_SUBFLOW_DESTROY,
 
 	__MPTCP_PM_CMD_AFTER_LAST
 };
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index bec45560060a..c5cb0630af47 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -1270,6 +1270,8 @@ static const struct nla_policy mptcp_pm_policy[MPTCP_PM_ATTR_MAX + 1] = {
 	[MPTCP_PM_ATTR_SUBFLOWS]	= { .type	= NLA_U32,	},
 	[MPTCP_PM_ATTR_TOKEN]		= { .type	= NLA_U32,	},
 	[MPTCP_PM_ATTR_LOC_ID]		= { .type	= NLA_U8,	},
+	[MPTCP_PM_ATTR_ADDR_REMOTE]	=
+					NLA_POLICY_NESTED(mptcp_pm_addr_policy),
 };
 
 void mptcp_pm_nl_subflow_chk_stale(const struct mptcp_sock *msk, struct sock *ssk)
@@ -1379,6 +1381,16 @@ static int mptcp_pm_parse_pm_addr_attr(struct nlattr *tb[],
 	return err;
 }
 
+static int mptcp_pm_parse_addr(struct nlattr *attr, struct genl_info *info,
+			       struct mptcp_addr_info *addr)
+{
+	struct nlattr *tb[MPTCP_PM_ADDR_ATTR_MAX + 1];
+
+	memset(addr, 0, sizeof(*addr));
+
+	return mptcp_pm_parse_pm_addr_attr(tb, attr, info, addr, true);
+}
+
 static int mptcp_pm_parse_entry(struct nlattr *attr, struct genl_info *info,
 				bool require_family,
 				struct mptcp_pm_addr_entry *entry)
@@ -2503,6 +2515,190 @@ void mptcp_event(enum mptcp_event_type type, const struct mptcp_sock *msk,
 	kfree_skb(skb);
 }
 
+static int mptcp_nl_cmd_sf_create(struct sk_buff *skb, struct genl_info *info)
+{
+	struct nlattr *raddr = info->attrs[MPTCP_PM_ATTR_ADDR_REMOTE];
+	struct nlattr *token = info->attrs[MPTCP_PM_ATTR_TOKEN];
+	struct nlattr *laddr = info->attrs[MPTCP_PM_ATTR_ADDR];
+	struct mptcp_addr_info addr_r;
+	struct mptcp_addr_info addr_l;
+	struct mptcp_sock *msk;
+	struct sock *sk;
+	u32 token_val;
+	int ret;
+
+	if (!laddr || !raddr || !token) {
+		GENL_SET_ERR_MSG(info, "missing required inputs");
+		return -EINVAL;
+	}
+
+	token_val = nla_get_u32(token);
+
+	msk = mptcp_token_get_sock(genl_info_net(info), token_val);
+	if (!msk) {
+		NL_SET_ERR_MSG_ATTR(info->extack, token, "invalid token");
+		return -EINVAL;
+	}
+
+	if (READ_ONCE(msk->pm.pm_type) != MPTCP_PM_TYPE_USERSPACE) {
+		GENL_SET_ERR_MSG(info, "invalid request; userspace PM not selected");
+		return -EINVAL;
+	}
+
+	ret = mptcp_pm_parse_addr(laddr, info, &addr_l);
+	if (ret < 0) {
+		NL_SET_ERR_MSG_ATTR(info->extack, laddr, "error parsing local addr");
+		return -EINVAL;
+	}
+
+	if (addr_l.id == 0) {
+		NL_SET_ERR_MSG_ATTR(info->extack, laddr, "missing local addr id");
+		return -EINVAL;
+	}
+
+	ret = mptcp_pm_parse_addr(raddr, info, &addr_r);
+	if (ret < 0) {
+		NL_SET_ERR_MSG_ATTR(info->extack, raddr, "error parsing remote addr");
+		return -EINVAL;
+	}
+
+	sk = &msk->sk.icsk_inet.sk;
+	lock_sock(sk);
+
+	ret = __mptcp_subflow_connect(sk, &addr_l, &addr_r);
+
+	spin_lock_bh(&msk->pm.lock);
+	if (ret == 0)
+		msk->pm.local_addr_used++;
+	spin_unlock_bh(&msk->pm.lock);
+
+	release_sock(sk);
+
+	return ret;
+}
+
+static struct sock *mptcp_nl_find_ssk(struct mptcp_sock *msk,
+				      const struct mptcp_addr_info *local,
+				      const struct mptcp_addr_info *remote)
+{
+	struct sock *sk = &msk->sk.icsk_inet.sk;
+	struct mptcp_subflow_context *subflow;
+	struct sock *found = NULL;
+
+	if (local->family != remote->family)
+		return NULL;
+
+	lock_sock(sk);
+
+	mptcp_for_each_subflow(msk, subflow) {
+		const struct ipv6_pinfo *pinfo;
+		const struct inet_sock *issk;
+		struct sock *ssk;
+
+		ssk = mptcp_subflow_tcp_sock(subflow);
+
+		if (local->family != ssk->sk_family)
+			continue;
+
+		issk = inet_sk(ssk);
+
+		switch (ssk->sk_family) {
+		case AF_INET:
+			if (issk->inet_saddr != local->addr.s_addr ||
+			    issk->inet_daddr != remote->addr.s_addr)
+				continue;
+			break;
+#if IS_ENABLED(CONFIG_MPTCP_IPV6)
+		case AF_INET6:
+			pinfo = inet6_sk(ssk);
+			if (!ipv6_addr_equal(&local->addr6, &pinfo->saddr) ||
+			    !ipv6_addr_equal(&remote->addr6, &ssk->sk_v6_daddr))
+				continue;
+			break;
+#endif
+		default:
+			continue;
+		}
+
+		if (issk->inet_sport == local->port &&
+		    issk->inet_dport == remote->port) {
+			found = ssk;
+			goto found;
+		}
+	}
+
+found:
+	release_sock(sk);
+
+	return found;
+}
+
+static int mptcp_nl_cmd_sf_destroy(struct sk_buff *skb, struct genl_info *info)
+{
+	struct nlattr *raddr = info->attrs[MPTCP_PM_ATTR_ADDR_REMOTE];
+	struct nlattr *token = info->attrs[MPTCP_PM_ATTR_TOKEN];
+	struct nlattr *laddr = info->attrs[MPTCP_PM_ATTR_ADDR];
+	struct mptcp_addr_info addr_l;
+	struct mptcp_addr_info addr_r;
+	struct mptcp_sock *msk;
+	struct sock *sk, *ssk;
+	u32 token_val;
+	int ret;
+
+	if (!laddr || !raddr || !token) {
+		GENL_SET_ERR_MSG(info, "missing required inputs");
+		return -EINVAL;
+	}
+
+	token_val = nla_get_u32(token);
+
+	msk = mptcp_token_get_sock(genl_info_net(info), token_val);
+	if (!msk) {
+		NL_SET_ERR_MSG_ATTR(info->extack, token, "invalid token");
+		return -EINVAL;
+	}
+
+	if (READ_ONCE(msk->pm.pm_type) != MPTCP_PM_TYPE_USERSPACE) {
+		GENL_SET_ERR_MSG(info, "invalid request; userspace PM not selected");
+		return -EINVAL;
+	}
+
+	ret = mptcp_pm_parse_addr(laddr, info, &addr_l);
+	if (ret < 0) {
+		NL_SET_ERR_MSG_ATTR(info->extack, laddr, "error parsing local addr");
+		return ret;
+	}
+
+	ret = mptcp_pm_parse_addr(raddr, info, &addr_r);
+	if (ret < 0) {
+		NL_SET_ERR_MSG_ATTR(info->extack, raddr, "error parsing remote addr");
+		return ret;
+	}
+
+	if (addr_l.family != addr_r.family) {
+		GENL_SET_ERR_MSG(info, "address families do not match");
+		return -EINVAL;
+	}
+
+	if (!addr_l.port || !addr_r.port) {
+		GENL_SET_ERR_MSG(info, "missing local or remote port");
+		return -EINVAL;
+	}
+
+	sk = &msk->sk.icsk_inet.sk;
+	ssk = mptcp_nl_find_ssk(msk, &addr_l, &addr_r);
+	if (ssk) {
+		struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
+
+		mptcp_subflow_shutdown(sk, ssk, RCV_SHUTDOWN | SEND_SHUTDOWN);
+		mptcp_close_ssk(sk, ssk, subflow);
+	} else {
+		ret = -ESRCH;
+	}
+
+	return ret;
+}
+
 static const struct genl_small_ops mptcp_pm_ops[] = {
 	{
 		.cmd    = MPTCP_PM_CMD_ADD_ADDR,
@@ -2548,6 +2744,16 @@ static const struct genl_small_ops mptcp_pm_ops[] = {
 		.doit   = mptcp_nl_cmd_remove,
 		.flags  = GENL_ADMIN_PERM,
 	},
+	{
+		.cmd    = MPTCP_PM_CMD_SUBFLOW_CREATE,
+		.doit   = mptcp_nl_cmd_sf_create,
+		.flags  = GENL_ADMIN_PERM,
+	},
+	{
+		.cmd    = MPTCP_PM_CMD_SUBFLOW_DESTROY,
+		.doit   = mptcp_nl_cmd_sf_destroy,
+		.flags  = GENL_ADMIN_PERM,
+	},
 };
 
 static struct genl_family mptcp_genl_family __ro_after_init = {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH mptcp-next v2 18/21] mptcp: selftests: support MPTCP_PM_CMD_SUBFLOW_CREATE
  2022-01-12 22:15 [PATCH mptcp-next v2 00/21] mptcp: support userspace path management Kishen Maloor
                   ` (16 preceding siblings ...)
  2022-01-12 22:15 ` [PATCH mptcp-next v2 17/21] mptcp: netlink: allow userspace-driven subflow establishment Kishen Maloor
@ 2022-01-12 22:15 ` Kishen Maloor
  2022-01-12 22:15 ` [PATCH mptcp-next v2 19/21] mptcp: selftests: support MPTCP_PM_CMD_SUBFLOW_DESTROY Kishen Maloor
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Kishen Maloor @ 2022-01-12 22:15 UTC (permalink / raw)
  To: kishen.maloor, mptcp

This change updates the "pm_nl_ctl" testing sample with a "csf"
(create subflow) option to support the newly added netlink interface
command MPTCP_PM_CMD_SUBFLOW_CREATE over the chosen MPTCP connection.

E.g. ./pm_nl_ctl csf lip 10.0.2.1 lid 23 rip 10.0.2.2 rport 56789
token 823274047

Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
---
 tools/testing/selftests/net/mptcp/pm_nl_ctl.c | 129 ++++++++++++++++++
 1 file changed, 129 insertions(+)

diff --git a/tools/testing/selftests/net/mptcp/pm_nl_ctl.c b/tools/testing/selftests/net/mptcp/pm_nl_ctl.c
index 8892bf16693a..e3ae1d851a39 100644
--- a/tools/testing/selftests/net/mptcp/pm_nl_ctl.c
+++ b/tools/testing/selftests/net/mptcp/pm_nl_ctl.c
@@ -29,6 +29,7 @@ static void syntax(char *argv[])
 	fprintf(stderr, "\tadd [flags signal|subflow|backup|fullmesh] [id <nr>] [dev <name>] <ip>\n");
 	fprintf(stderr, "\tann <local-ip> id <local-id> token <token> [port <local-port>] [dev <name>]\n");
 	fprintf(stderr, "\trem id <local-id> token <token>\n");
+	fprintf(stderr, "\tcsf lip <local-ip> lid <local-id> rip <remote-ip> rport <remote-port> token <token>\n");
 	fprintf(stderr, "\tdel <id> [<ip>]\n");
 	fprintf(stderr, "\tget <id>\n");
 	fprintf(stderr, "\tset <ip> [flags backup|nobackup]\n");
@@ -173,6 +174,132 @@ static int resolve_mptcp_pm_netlink(int fd)
 	return genl_parse_getfamily((void *)data);
 }
 
+int csf(int fd, int pm_family, int argc, char *argv[])
+{
+	char data[NLMSG_ALIGN(sizeof(struct nlmsghdr)) +
+		  NLMSG_ALIGN(sizeof(struct genlmsghdr)) +
+		  1024];
+	const char *params[5];
+	struct nlmsghdr *nh;
+	struct rtattr *addr;
+	struct rtattr *rta;
+	u_int16_t family;
+	u_int32_t token;
+	u_int16_t port;
+	int addr_start;
+	u_int8_t id;
+	int off = 0;
+	int arg;
+
+	memset(params, 0, 5 * sizeof(const char *));
+
+	memset(data, 0, sizeof(data));
+	nh = (void *)data;
+	off = init_genl_req(data, pm_family, MPTCP_PM_CMD_SUBFLOW_CREATE,
+			    MPTCP_PM_VER);
+
+	if (argc < 12)
+		syntax(argv);
+
+	/* Params recorded in this order:
+	 * <local-ip>, <local-id>, <remote-ip>, <remote-port>, <token>
+	 */
+	for (arg = 2; arg < argc; arg++) {
+		if (!strcmp(argv[arg], "lip")) {
+			if (++arg >= argc)
+				error(1, 0, " missing local IP");
+
+			params[0] = argv[arg];
+		} else if (!strcmp(argv[arg], "lid")) {
+			if (++arg >= argc)
+				error(1, 0, " missing local id");
+
+			params[1] = argv[arg];
+		} else if (!strcmp(argv[arg], "rip")) {
+			if (++arg >= argc)
+				error(1, 0, " missing remote ip");
+
+			params[2] = argv[arg];
+		} else if (!strcmp(argv[arg], "rport")) {
+			if (++arg >= argc)
+				error(1, 0, " missing remote port");
+
+			params[3] = argv[arg];
+		} else if (!strcmp(argv[arg], "token")) {
+			if (++arg >= argc)
+				error(1, 0, " missing token");
+
+			params[4] = argv[arg];
+		} else
+			error(1, 0, "unknown param %s", argv[arg]);
+	}
+
+	for (arg = 0; arg < 4; arg = arg + 2) {
+		/*  addr header */
+		addr_start = off;
+		addr = (void *)(data + off);
+		addr->rta_type = NLA_F_NESTED |
+			((arg == 0) ? MPTCP_PM_ATTR_ADDR : MPTCP_PM_ATTR_ADDR_REMOTE);
+		addr->rta_len = RTA_LENGTH(0);
+		off += NLMSG_ALIGN(addr->rta_len);
+
+		/*  addr data */
+		rta = (void *)(data + off);
+		if (inet_pton(AF_INET, params[arg], RTA_DATA(rta))) {
+			family = AF_INET;
+			rta->rta_type = MPTCP_PM_ADDR_ATTR_ADDR4;
+			rta->rta_len = RTA_LENGTH(4);
+		} else if (inet_pton(AF_INET6, params[arg], RTA_DATA(rta))) {
+			family = AF_INET6;
+			rta->rta_type = MPTCP_PM_ADDR_ATTR_ADDR6;
+			rta->rta_len = RTA_LENGTH(16);
+		} else
+			error(1, errno, "can't parse ip %s", params[arg]);
+		off += NLMSG_ALIGN(rta->rta_len);
+
+		/* family */
+		rta = (void *)(data + off);
+		rta->rta_type = MPTCP_PM_ADDR_ATTR_FAMILY;
+		rta->rta_len = RTA_LENGTH(2);
+		memcpy(RTA_DATA(rta), &family, 2);
+		off += NLMSG_ALIGN(rta->rta_len);
+
+		if (arg == 2) {
+			/*  port */
+			port = atoi(params[arg + 1]);
+			rta = (void *)(data + off);
+			rta->rta_type = MPTCP_PM_ADDR_ATTR_PORT;
+			rta->rta_len = RTA_LENGTH(2);
+			memcpy(RTA_DATA(rta), &port, 2);
+			off += NLMSG_ALIGN(rta->rta_len);
+		}
+
+		if (arg == 0) {
+			/* id */
+			id = atoi(params[arg + 1]);
+			rta = (void *)(data + off);
+			rta->rta_type = MPTCP_PM_ADDR_ATTR_ID;
+			rta->rta_len = RTA_LENGTH(1);
+			memcpy(RTA_DATA(rta), &id, 1);
+			off += NLMSG_ALIGN(rta->rta_len);
+		}
+
+		addr->rta_len = off - addr_start;
+	}
+
+	/* token */
+	token = atoi(params[4]);
+	rta = (void *)(data + off);
+	rta->rta_type = MPTCP_PM_ATTR_TOKEN;
+	rta->rta_len = RTA_LENGTH(4);
+	memcpy(RTA_DATA(rta), &token, 4);
+	off += NLMSG_ALIGN(rta->rta_len);
+
+	do_nl_req(fd, nh, off, 0);
+
+	return 0;
+}
+
 int remove_addr(int fd, int pm_family, int argc, char *argv[])
 {
 	char data[NLMSG_ALIGN(sizeof(struct nlmsghdr)) +
@@ -926,6 +1053,8 @@ int main(int argc, char *argv[])
 		return announce_addr(fd, pm_family, argc, argv);
 	else if (!strcmp(argv[1], "rem"))
 		return remove_addr(fd, pm_family, argc, argv);
+	else if (!strcmp(argv[1], "csf"))
+		return csf(fd, pm_family, argc, argv);
 	else if (!strcmp(argv[1], "del"))
 		return del_addr(fd, pm_family, argc, argv);
 	else if (!strcmp(argv[1], "flush"))
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH mptcp-next v2 19/21] mptcp: selftests: support MPTCP_PM_CMD_SUBFLOW_DESTROY
  2022-01-12 22:15 [PATCH mptcp-next v2 00/21] mptcp: support userspace path management Kishen Maloor
                   ` (17 preceding siblings ...)
  2022-01-12 22:15 ` [PATCH mptcp-next v2 18/21] mptcp: selftests: support MPTCP_PM_CMD_SUBFLOW_CREATE Kishen Maloor
@ 2022-01-12 22:15 ` Kishen Maloor
  2022-01-12 22:15 ` [PATCH mptcp-next v2 20/21] mptcp: selftests: capture netlink events Kishen Maloor
  2022-01-12 22:15 ` [PATCH mptcp-next v2 21/21] selftests: mptcp: functional tests for the userspace PM type Kishen Maloor
  20 siblings, 0 replies; 45+ messages in thread
From: Kishen Maloor @ 2022-01-12 22:15 UTC (permalink / raw)
  To: kishen.maloor, mptcp

This change updates the "pm_nl_ctl" testing sample with a "dsf"
(destroy subflow) option to support the newly added netlink interface
command MPTCP_PM_CMD_SUBFLOW_DESTROY over the chosen MPTCP connection.

E.g. ./pm_nl_ctl dsf lip 10.0.2.1 lport 44567 rip 10.0.2.2 rport 56789
token 823274047

Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
---
 tools/testing/selftests/net/mptcp/pm_nl_ctl.c | 115 ++++++++++++++++++
 1 file changed, 115 insertions(+)

diff --git a/tools/testing/selftests/net/mptcp/pm_nl_ctl.c b/tools/testing/selftests/net/mptcp/pm_nl_ctl.c
index e3ae1d851a39..f28d80895a9c 100644
--- a/tools/testing/selftests/net/mptcp/pm_nl_ctl.c
+++ b/tools/testing/selftests/net/mptcp/pm_nl_ctl.c
@@ -30,6 +30,7 @@ static void syntax(char *argv[])
 	fprintf(stderr, "\tann <local-ip> id <local-id> token <token> [port <local-port>] [dev <name>]\n");
 	fprintf(stderr, "\trem id <local-id> token <token>\n");
 	fprintf(stderr, "\tcsf lip <local-ip> lid <local-id> rip <remote-ip> rport <remote-port> token <token>\n");
+	fprintf(stderr, "\tdsf lip <local-ip> lport <local-port> rip <remote-ip> rport <remote-port> token <token>\n");
 	fprintf(stderr, "\tdel <id> [<ip>]\n");
 	fprintf(stderr, "\tget <id>\n");
 	fprintf(stderr, "\tset <ip> [flags backup|nobackup]\n");
@@ -174,6 +175,118 @@ static int resolve_mptcp_pm_netlink(int fd)
 	return genl_parse_getfamily((void *)data);
 }
 
+int dsf(int fd, int pm_family, int argc, char *argv[])
+{
+	char data[NLMSG_ALIGN(sizeof(struct nlmsghdr)) +
+		  NLMSG_ALIGN(sizeof(struct genlmsghdr)) +
+		  1024];
+	struct rtattr *rta, *addr;
+	u_int16_t family, port;
+	struct nlmsghdr *nh;
+	u_int32_t token;
+	int addr_start;
+	int off = 0;
+	int arg;
+
+	const char *params[5];
+
+	memset(params, 0, 5 * sizeof(const char *));
+
+	memset(data, 0, sizeof(data));
+	nh = (void *)data;
+	off = init_genl_req(data, pm_family, MPTCP_PM_CMD_SUBFLOW_DESTROY,
+			    MPTCP_PM_VER);
+
+	if (argc < 12)
+		syntax(argv);
+
+	/* Params recorded in this order:
+	 * <local-ip>, <local-port>, <remote-ip>, <remote-port>, <token>
+	 */
+	for (arg = 2; arg < argc; arg++) {
+		if (!strcmp(argv[arg], "lip")) {
+			if (++arg >= argc)
+				error(1, 0, " missing local IP");
+
+			params[0] = argv[arg];
+		} else if (!strcmp(argv[arg], "lport")) {
+			if (++arg >= argc)
+				error(1, 0, " missing local port");
+
+			params[1] = argv[arg];
+		} else if (!strcmp(argv[arg], "rip")) {
+			if (++arg >= argc)
+				error(1, 0, " missing remote IP");
+
+			params[2] = argv[arg];
+		} else if (!strcmp(argv[arg], "rport")) {
+			if (++arg >= argc)
+				error(1, 0, " missing remote port");
+
+			params[3] = argv[arg];
+		} else if (!strcmp(argv[arg], "token")) {
+			if (++arg >= argc)
+				error(1, 0, " missing token");
+
+			params[4] = argv[arg];
+		} else
+			error(1, 0, "unknown keyword %s", argv[arg]);
+	}
+
+	for (arg = 0; arg < 4; arg = arg + 2) {
+		/*  addr header */
+		addr_start = off;
+		addr = (void *)(data + off);
+		addr->rta_type = NLA_F_NESTED |
+			((arg == 0) ? MPTCP_PM_ATTR_ADDR : MPTCP_PM_ATTR_ADDR_REMOTE);
+		addr->rta_len = RTA_LENGTH(0);
+		off += NLMSG_ALIGN(addr->rta_len);
+
+		/*  addr data */
+		rta = (void *)(data + off);
+		if (inet_pton(AF_INET, params[arg], RTA_DATA(rta))) {
+			family = AF_INET;
+			rta->rta_type = MPTCP_PM_ADDR_ATTR_ADDR4;
+			rta->rta_len = RTA_LENGTH(4);
+		} else if (inet_pton(AF_INET6, params[arg], RTA_DATA(rta))) {
+			family = AF_INET6;
+			rta->rta_type = MPTCP_PM_ADDR_ATTR_ADDR6;
+			rta->rta_len = RTA_LENGTH(16);
+		} else
+			error(1, errno, "can't parse ip %s", params[arg]);
+		off += NLMSG_ALIGN(rta->rta_len);
+
+		/* family */
+		rta = (void *)(data + off);
+		rta->rta_type = MPTCP_PM_ADDR_ATTR_FAMILY;
+		rta->rta_len = RTA_LENGTH(2);
+		memcpy(RTA_DATA(rta), &family, 2);
+		off += NLMSG_ALIGN(rta->rta_len);
+
+		/*  port */
+		port = atoi(params[arg + 1]);
+		rta = (void *)(data + off);
+		rta->rta_type = MPTCP_PM_ADDR_ATTR_PORT;
+		rta->rta_len = RTA_LENGTH(2);
+		memcpy(RTA_DATA(rta), &port, 2);
+		off += NLMSG_ALIGN(rta->rta_len);
+
+		addr->rta_len = off - addr_start;
+	}
+
+	/* token */
+	token = atoi(params[4]);
+	rta = (void *)(data + off);
+	rta->rta_type = MPTCP_PM_ATTR_TOKEN;
+	rta->rta_len = RTA_LENGTH(4);
+	memcpy(RTA_DATA(rta), &token, 4);
+	off += NLMSG_ALIGN(rta->rta_len);
+
+	do_nl_req(fd, nh, off, 0);
+
+	return 0;
+}
+
 int csf(int fd, int pm_family, int argc, char *argv[])
 {
 	char data[NLMSG_ALIGN(sizeof(struct nlmsghdr)) +
@@ -1055,6 +1168,8 @@ int main(int argc, char *argv[])
 		return remove_addr(fd, pm_family, argc, argv);
 	else if (!strcmp(argv[1], "csf"))
 		return csf(fd, pm_family, argc, argv);
+	else if (!strcmp(argv[1], "dsf"))
+		return dsf(fd, pm_family, argc, argv);
 	else if (!strcmp(argv[1], "del"))
 		return del_addr(fd, pm_family, argc, argv);
 	else if (!strcmp(argv[1], "flush"))
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH mptcp-next v2 20/21] mptcp: selftests: capture netlink events
  2022-01-12 22:15 [PATCH mptcp-next v2 00/21] mptcp: support userspace path management Kishen Maloor
                   ` (18 preceding siblings ...)
  2022-01-12 22:15 ` [PATCH mptcp-next v2 19/21] mptcp: selftests: support MPTCP_PM_CMD_SUBFLOW_DESTROY Kishen Maloor
@ 2022-01-12 22:15 ` Kishen Maloor
  2022-01-12 22:15 ` [PATCH mptcp-next v2 21/21] selftests: mptcp: functional tests for the userspace PM type Kishen Maloor
  20 siblings, 0 replies; 45+ messages in thread
From: Kishen Maloor @ 2022-01-12 22:15 UTC (permalink / raw)
  To: kishen.maloor, mptcp

This change adds to self-testing support for the MPTCP netlink interface
by capturing various MPTCP netlink events (and all their metadata)
associated with connections, subflows and local address announcements.
It can be incorporated into self-test scripts that exercise the
MPTCP netlink commands to then precisely validate those operations
through the dispatched MPTCP netlink events in response to those
commands.

Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
---
 tools/testing/selftests/net/mptcp/pm_nl_ctl.c | 161 +++++++++++++++++-
 1 file changed, 154 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/net/mptcp/pm_nl_ctl.c b/tools/testing/selftests/net/mptcp/pm_nl_ctl.c
index f28d80895a9c..e770df9b0e2a 100644
--- a/tools/testing/selftests/net/mptcp/pm_nl_ctl.c
+++ b/tools/testing/selftests/net/mptcp/pm_nl_ctl.c
@@ -22,6 +22,9 @@
 #ifndef MPTCP_PM_NAME
 #define MPTCP_PM_NAME		"mptcp_pm"
 #endif
+#ifndef MPTCP_PM_EVENTS
+#define MPTCP_PM_EVENTS		"mptcp_pm_events"
+#endif
 
 static void syntax(char *argv[])
 {
@@ -37,6 +40,7 @@ static void syntax(char *argv[])
 	fprintf(stderr, "\tflush\n");
 	fprintf(stderr, "\tdump\n");
 	fprintf(stderr, "\tlimits [<rcv addr max> <subflow max>]\n");
+	fprintf(stderr, "\tevents\n");
 	exit(0);
 }
 
@@ -88,6 +92,105 @@ static void nl_error(struct nlmsghdr *nh)
 	}
 }
 
+static int capture_events(int fd, int event_group)
+{
+	u_int8_t buffer[NLMSG_ALIGN(sizeof(struct nlmsghdr)) +
+			NLMSG_ALIGN(sizeof(struct genlmsghdr)) + 1024];
+	struct genlmsghdr *ghdr;
+	struct rtattr *attrs;
+	struct nlmsghdr *nh;
+	int ret = 0;
+	int res_len;
+	int msg_len;
+	fd_set rfds;
+
+	if (setsockopt(fd, SOL_NETLINK, NETLINK_ADD_MEMBERSHIP,
+		       &event_group, sizeof(event_group)) < 0)
+		error(1, errno, "could not join the " MPTCP_PM_EVENTS " mcast group");
+
+	do {
+		FD_ZERO(&rfds);
+		FD_SET(fd, &rfds);
+		res_len = NLMSG_ALIGN(sizeof(struct nlmsghdr)) +
+		  NLMSG_ALIGN(sizeof(struct genlmsghdr)) + 1024;
+
+		ret = select(FD_SETSIZE, &rfds, NULL, NULL, NULL);
+
+		if (ret < 0)
+			error(1, ret, "error in select() on NL socket");
+
+		res_len = recv(fd, buffer, res_len, 0);
+		if (res_len < 0)
+			error(1, res_len, "error on recv() from NL socket");
+
+		nh = (struct nlmsghdr *)buffer;
+
+		for (; NLMSG_OK(nh, res_len); nh = NLMSG_NEXT(nh, res_len)) {
+			if (nh->nlmsg_type == NLMSG_ERROR)
+				error(1, NLMSG_ERROR, "received invalid NL message");
+
+			ghdr = (struct genlmsghdr *)NLMSG_DATA(nh);
+
+			if (ghdr->cmd == 0)
+				continue;
+
+			fprintf(stderr, "type:%d", ghdr->cmd);
+
+			msg_len = nh->nlmsg_len - NLMSG_LENGTH(GENL_HDRLEN);
+
+			attrs = (struct rtattr *) ((char *) ghdr + GENL_HDRLEN);
+			while (RTA_OK(attrs, msg_len)) {
+				if (attrs->rta_type == MPTCP_ATTR_TOKEN)
+					fprintf(stderr, ",token:%u", *(__u32 *)RTA_DATA(attrs));
+				else if (attrs->rta_type == MPTCP_ATTR_FAMILY)
+					fprintf(stderr, ",family:%u", *(__u16 *)RTA_DATA(attrs));
+				else if (attrs->rta_type == MPTCP_ATTR_LOC_ID)
+					fprintf(stderr, ",loc_id:%u", *(__u8 *)RTA_DATA(attrs));
+				else if (attrs->rta_type == MPTCP_ATTR_REM_ID)
+					fprintf(stderr, ",rem_id:%u", *(__u8 *)RTA_DATA(attrs));
+				else if (attrs->rta_type == MPTCP_ATTR_SADDR4) {
+					u_int32_t saddr4 = ntohl(*(__u32 *)RTA_DATA(attrs));
+
+					fprintf(stderr, ",saddr4:%u.%u.%u.%u", saddr4 >> 24,
+					       (saddr4 >> 16) & 0xFF, (saddr4 >> 8) & 0xFF,
+					       (saddr4 & 0xFF));
+				} else if (attrs->rta_type == MPTCP_ATTR_SADDR6) {
+					char buf[INET6_ADDRSTRLEN];
+
+					if (inet_ntop(AF_INET6, RTA_DATA(attrs), buf,
+						      sizeof(buf)) != NULL)
+						fprintf(stderr, ",saddr6:%s", buf);
+				} else if (attrs->rta_type == MPTCP_ATTR_DADDR4) {
+					u_int32_t daddr4 = ntohl(*(__u32 *)RTA_DATA(attrs));
+
+					fprintf(stderr, ",daddr4:%u.%u.%u.%u", daddr4 >> 24,
+					       (daddr4 >> 16) & 0xFF, (daddr4 >> 8) & 0xFF,
+					       (daddr4 & 0xFF));
+				} else if (attrs->rta_type == MPTCP_ATTR_DADDR6) {
+					char buf[INET6_ADDRSTRLEN];
+
+					if (inet_ntop(AF_INET6, RTA_DATA(attrs), buf,
+						      sizeof(buf)) != NULL)
+						fprintf(stderr, ",daddr6:%s", buf);
+				} else if (attrs->rta_type == MPTCP_ATTR_SPORT)
+					fprintf(stderr, ",sport:%u",
+						ntohs(*(__u16 *)RTA_DATA(attrs)));
+				else if (attrs->rta_type == MPTCP_ATTR_DPORT)
+					fprintf(stderr, ",dport:%u",
+						ntohs(*(__u16 *)RTA_DATA(attrs)));
+				else if (attrs->rta_type == MPTCP_ATTR_BACKUP)
+					fprintf(stderr, ",backup:%u", *(__u8 *)RTA_DATA(attrs));
+				else if (attrs->rta_type == MPTCP_ATTR_ERROR)
+					fprintf(stderr, ",error:%u", *(__u8 *)RTA_DATA(attrs));
+				attrs = RTA_NEXT(attrs, msg_len);
+			}
+		}
+		fprintf(stderr, "\n");
+	} while (1);
+
+	return 0;
+}
+
 /* do a netlink command and, if max > 0, fetch the reply  */
 static int do_nl_req(int fd, struct nlmsghdr *nh, int len, int max)
 {
@@ -121,11 +224,18 @@ static int do_nl_req(int fd, struct nlmsghdr *nh, int len, int max)
 	return ret;
 }
 
-static int genl_parse_getfamily(struct nlmsghdr *nlh)
+static int genl_parse_getfamily(struct nlmsghdr *nlh, int *pm_family,
+				int *events_mcast_grp)
 {
 	struct genlmsghdr *ghdr = NLMSG_DATA(nlh);
 	int len = nlh->nlmsg_len;
 	struct rtattr *attrs;
+	struct rtattr *grps;
+	struct rtattr *grp;
+	int got_events_grp;
+	int got_family;
+	int grps_len;
+	int grp_len;
 
 	if (nlh->nlmsg_type != GENL_ID_CTRL)
 		error(1, errno, "Not a controller message, len=%d type=0x%x\n",
@@ -140,9 +250,42 @@ static int genl_parse_getfamily(struct nlmsghdr *nlh)
 		error(1, errno, "Unknown controller command %d\n", ghdr->cmd);
 
 	attrs = (struct rtattr *) ((char *) ghdr + GENL_HDRLEN);
+	got_family = 0;
+	got_events_grp = 0;
+
 	while (RTA_OK(attrs, len)) {
-		if (attrs->rta_type == CTRL_ATTR_FAMILY_ID)
-			return *(__u16 *)RTA_DATA(attrs);
+		if (attrs->rta_type == CTRL_ATTR_FAMILY_ID) {
+			*pm_family = *(__u16 *)RTA_DATA(attrs);
+			got_family = 1;
+		} else if (attrs->rta_type == CTRL_ATTR_MCAST_GROUPS) {
+			grps = RTA_DATA(attrs);
+			grps_len = RTA_PAYLOAD(attrs);
+
+			while (RTA_OK(grps, grps_len)) {
+				grp = RTA_DATA(grps);
+				grp_len = RTA_PAYLOAD(grps);
+				got_events_grp = 0;
+
+				while (RTA_OK(grp, grp_len)) {
+					if (grp->rta_type == CTRL_ATTR_MCAST_GRP_ID)
+						*events_mcast_grp = *(__u32 *)RTA_DATA(grp);
+					else if (grp->rta_type == CTRL_ATTR_MCAST_GRP_NAME &&
+						 !strcmp(RTA_DATA(grp), MPTCP_PM_EVENTS))
+						got_events_grp = 1;
+
+					grp = RTA_NEXT(grp, grp_len);
+				}
+
+				if (got_events_grp)
+					break;
+
+				grps = RTA_NEXT(grps, grps_len);
+			}
+		}
+
+		if (got_family && got_events_grp)
+			return 0;
+
 		attrs = RTA_NEXT(attrs, len);
 	}
 
@@ -150,7 +293,7 @@ static int genl_parse_getfamily(struct nlmsghdr *nlh)
 	return -1;
 }
 
-static int resolve_mptcp_pm_netlink(int fd)
+static int resolve_mptcp_pm_netlink(int fd, int *pm_family, int *events_mcast_grp)
 {
 	char data[NLMSG_ALIGN(sizeof(struct nlmsghdr)) +
 		  NLMSG_ALIGN(sizeof(struct genlmsghdr)) +
@@ -172,7 +315,7 @@ static int resolve_mptcp_pm_netlink(int fd)
 	off += NLMSG_ALIGN(rta->rta_len);
 
 	do_nl_req(fd, nh, off, sizeof(data));
-	return genl_parse_getfamily((void *)data);
+	return genl_parse_getfamily((void *)data, pm_family, events_mcast_grp);
 }
 
 int dsf(int fd, int pm_family, int argc, char *argv[])
@@ -1149,7 +1292,9 @@ int set_flags(int fd, int pm_family, int argc, char *argv[])
 
 int main(int argc, char *argv[])
 {
-	int fd, pm_family;
+	int events_mcast_grp;
+	int pm_family;
+	int fd;
 
 	if (argc < 2)
 		syntax(argv);
@@ -1158,7 +1303,7 @@ int main(int argc, char *argv[])
 	if (fd == -1)
 		error(1, errno, "socket netlink");
 
-	pm_family = resolve_mptcp_pm_netlink(fd);
+	resolve_mptcp_pm_netlink(fd, &pm_family, &events_mcast_grp);
 
 	if (!strcmp(argv[1], "add"))
 		return add_addr(fd, pm_family, argc, argv);
@@ -1182,6 +1327,8 @@ int main(int argc, char *argv[])
 		return get_set_limits(fd, pm_family, argc, argv);
 	else if (!strcmp(argv[1], "set"))
 		return set_flags(fd, pm_family, argc, argv);
+	else if (!strcmp(argv[1], "events"))
+		return capture_events(fd, events_mcast_grp);
 
 	fprintf(stderr, "unknown sub-command: %s", argv[1]);
 	syntax(argv);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH mptcp-next v2 21/21] selftests: mptcp: functional tests for the userspace PM type
  2022-01-12 22:15 [PATCH mptcp-next v2 00/21] mptcp: support userspace path management Kishen Maloor
                   ` (19 preceding siblings ...)
  2022-01-12 22:15 ` [PATCH mptcp-next v2 20/21] mptcp: selftests: capture netlink events Kishen Maloor
@ 2022-01-12 22:15 ` Kishen Maloor
  2022-01-12 22:35   ` selftests: mptcp: functional tests for the userspace PM type: Build Failure MPTCP CI
  20 siblings, 1 reply; 45+ messages in thread
From: Kishen Maloor @ 2022-01-12 22:15 UTC (permalink / raw)
  To: kishen.maloor, mptcp

This change adds a selftest script that performs a comprehensive
behavioral/functional test of all userspace PM capabilities by exercising
all the newly added APIs and changes to support said capabilities.

Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
---
 .../selftests/net/mptcp/userspace_pm.sh       | 561 ++++++++++++++++++
 1 file changed, 561 insertions(+)
 create mode 100755 tools/testing/selftests/net/mptcp/userspace_pm.sh

diff --git a/tools/testing/selftests/net/mptcp/userspace_pm.sh b/tools/testing/selftests/net/mptcp/userspace_pm.sh
new file mode 100755
index 000000000000..4149bb0f7cfe
--- /dev/null
+++ b/tools/testing/selftests/net/mptcp/userspace_pm.sh
@@ -0,0 +1,561 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+ip -Version > /dev/null 2>&1
+if [ $? -ne 0 ];then
+	echo "SKIP: Cannot not run test without ip tool"
+	exit 1
+fi
+
+ANNOUNCED=6        # MPTCP_EVENT_ANNOUNCED
+REMOVED=7          # MPTCP_EVENT_REMOVED
+SUB_ESTABLISHED=10 # MPTCP_EVENT_SUB_ESTABLISHED
+SUB_CLOSED=11      # MPTCP_EVENT_SUB_CLOSED
+
+AF_INET=2
+AF_INET6=10
+
+evts_pid=0
+client4_pid=0
+server4_pid=0
+client6_pid=0
+server6_pid=0
+client4_token=""
+server4_token=""
+client6_token=""
+server6_token=""
+client4_port=0;
+client6_port=0;
+app4_port=50002
+new4_port=50003
+app6_port=50004
+client_addr_id=${RANDOM:0:2}
+server_addr_id=${RANDOM:0:2}
+
+sec=$(date +%s)
+rndh=$(printf %x $sec)-$(mktemp -u XXXXXX)
+ns1="ns1-$rndh"
+ns2="ns2-$rndh"
+
+cleanup()
+{
+	echo "cleanup"
+
+	# Terminate the MPTCP connection and related processes
+	kill -SIGUSR1 $client4_pid > /dev/null 2>&1
+	kill $server4_pid > /dev/null 2>&1
+	kill -SIGUSR1 $client6_pid > /dev/null 2>&1
+	kill $server6_pid > /dev/null 2>&1
+
+	kill $evts_pid > /dev/null 2>&1
+
+	local netns
+	for netns in "$ns1" "$ns2" ;do
+		ip netns del $netns
+	done
+}
+
+trap cleanup EXIT
+
+for i in "$ns1" "$ns2" ;do
+	ip netns add $i || exit 1
+	ip -net $i link set lo up
+	ip netns exec $i sysctl -q net.mptcp.enabled=1
+	ip netns exec $i sysctl -q net.mptcp.pm_type=1
+done
+
+#  "$ns1"              ns2
+#     ns1eth2    ns2eth1
+
+ip link add ns1eth2 netns "$ns1" type veth peer name ns2eth1 netns "$ns2"
+
+ip -net "$ns1" addr add 10.0.1.1/24 dev ns1eth2
+ip -net "$ns1" addr add 10.0.2.1/24 dev ns1eth2
+ip -net "$ns1" addr add dead:beef:1::1/64 dev ns1eth2 nodad
+ip -net "$ns1" addr add dead:beef:2::1/64 dev ns1eth2 nodad
+ip -net "$ns1" link set ns1eth2 up
+
+ip -net "$ns2" addr add 10.0.1.2/24 dev ns2eth1
+ip -net "$ns2" addr add 10.0.2.2/24 dev ns2eth1
+ip -net "$ns2" addr add dead:beef:1::2/64 dev ns2eth1 nodad
+ip -net "$ns2" addr add dead:beef:2::2/64 dev ns2eth1 nodad
+ip -net "$ns2" link set ns2eth1 up
+
+printf "Created network namespaces ns1, ns2         \t\t\t[OK]\n"
+
+make_file()
+{
+	local name=$1
+	local who=$2
+	local ksize=1
+
+	dd if=/dev/urandom of="$name" bs=1024 count=$ksize 2> /dev/null
+	echo -e "\nMPTCP_TEST_FILE_END_MARKER" >> "$name"
+}
+
+make_connection()
+{
+	local file=$(mktemp)
+	make_file "$file" "client"
+
+	local is_v6=$1
+	local app_port=$app4_port
+	local connect_addr="10.0.1.1"
+	local listen_addr="0.0.0.0"
+	if [ "$is_v6" = "v6" ]
+	then
+		connect_addr="dead:beef:1::1"
+		listen_addr="::"
+		app_port=$app6_port
+	else
+		is_v6="v4"
+	fi
+
+	local client_evts=$(mktemp)
+	:>"$client_evts"
+	ip netns exec $ns2 ./pm_nl_ctl events >> "$client_evts" 2>&1 &
+	local client_evts_pid=$!
+	local server_evts=$(mktemp)
+	:>"$server_evts"
+	ip netns exec $ns1 ./pm_nl_ctl events >> "$server_evts" 2>&1 &
+	local server_evts_pid=$!
+	sleep 0.1
+
+	# Run the server
+	ip netns exec $ns1 \
+			./mptcp_connect -s MPTCP -w 300 -p $app_port -l $listen_addr 2>&1 > /dev/null &
+	local server_pid=$!
+	sleep 0.1
+
+	# Run the client, transfer $file and stay connected to the server
+	# to conduct tests
+	ip netns exec $ns2 \
+			./mptcp_connect -s MPTCP -w 300 -m sendfile -p $app_port $connect_addr 2>&1 > /dev/null < $file &
+	local client_pid=$!
+	sleep 0.1
+
+	kill $client_evts_pid
+	local client_token=$(sed -n 's/.*\(token:\)\([[:digit:]]*\).*$/\2/p;q' "$client_evts")
+	local client_port=$(sed -n 's/.*\(sport:\)\([[:digit:]]*\).*$/\2/p;q' "$client_evts")
+
+	kill $server_evts_pid
+	local server_token=$(sed -n 's/.*\(token:\)\([[:digit:]]*\).*$/\2/p;q' "$server_evts")
+	rm -f "$client_evts" "$server_evts" "$file"
+
+	if [ $client_token != "" ] && [ $server_token != "" ]
+	then
+		printf "Established IP%s MPTCP Connection ns2 => ns1    \t\t[OK]\n" $is_v6
+	else
+		exit 1
+	fi
+
+	if [ "$is_v6" = "v6" ]
+	then
+		client6_token=$client_token
+		server6_token=$server_token
+		client6_port=$client_port
+		client6_pid=$client_pid
+		server6_pid=$server_pid
+	else
+		client4_token=$client_token
+		server4_token=$server_token
+		client4_port=$client_port
+		client4_pid=$client_pid
+		server4_pid=$server_pid
+	fi
+}
+
+verify_announce_event()
+{
+	local evt=$1
+	local e_type=$2
+	local e_token=$3
+	local e_addr=$4
+	local e_id=$5
+	local e_dport=$6
+	local e_af=$7
+
+	local type=$(sed -n 's/.*\(type:\)\([[:digit:]]*\).*$/\2/p;q' "$evt")
+	local token=$(sed -n 's/.*\(token:\)\([[:digit:]]*\).*$/\2/p;q' "$evt")
+	local addr=""
+	if [ "$e_af" = "v6" ]
+	then
+		addr=$(sed -n 's/.*\(daddr6:\)\([0-9a-f:.]*\).*$/\2/p;q' "$evt")
+	else
+		addr=$(sed -n 's/.*\(daddr4:\)\([0-9.]*\).*$/\2/p;q' "$evt")
+	fi
+	local dport=$(sed -n 's/.*\(dport:\)\([[:digit:]]*\).*$/\2/p;q' "$evt")
+	local id=$(sed -n 's/.*\(rem_id:\)\([[:digit:]]*\).*$/\2/p;q' "$evt")
+        if [ "$type" = "$e_type" ] && [ "$token" = "$e_token" ] && [ "$addr" = "$e_addr" ] && [ "$dport" = "$e_dport" ] && [ "$id" = "$e_id" ]
+	then
+		printf "[OK]\n"
+		return 0
+	fi
+	printf "[FAIL]\n"
+	exit 1
+}
+
+test_announce()
+{
+	local evts=$(mktemp)
+	:>"$evts"
+	ip netns exec $ns1 ./pm_nl_ctl events >> "$evts" 2>&1 &
+	evts_pid=$!
+	sleep 0.1
+
+	local invalid_token=$(( $client4_token - 1))
+	ip netns exec $ns2 ./pm_nl_ctl ann 10.0.2.2 token $invalid_token id $client_addr_id dev ns2eth1 2>&1 > /dev/null
+	local type=$(sed -n 's/.*\(type:\)\([[:digit:]]*\).*$/\2/p;q' "$evts")
+	printf "ADD_ADDR 10.0.2.2 (ns2) => ns1, invalid token    \t\t"
+        if [ "$type" = "" ]
+	then
+		printf "[OK]\n"
+	else
+		printf "[FAIL]\n"
+		exit 1
+	fi
+
+	:>"$evts"
+	ip netns exec $ns2 ./pm_nl_ctl ann 10.0.2.2 token $client4_token id $client_addr_id dev ns2eth1 2>&1 > /dev/null
+	printf "ADD_ADDR id:%d 10.0.2.2 (ns2) => ns1, reuse port \t\t" $client_addr_id
+	sleep 0.1
+	verify_announce_event "$evts" "$ANNOUNCED" "$server4_token" "10.0.2.2" "$client_addr_id" "$client4_port"
+
+	:>"$evts"
+	ip netns exec $ns2 ./pm_nl_ctl ann dead:beef:2::2 token $client6_token id $client_addr_id dev ns2eth1 2>&1 > /dev/null
+	printf "ADD_ADDR6 id:%d dead:beef:2::2 (ns2) => ns1, reuse port\t\t" $client_addr_id
+	sleep 0.1
+	verify_announce_event "$evts" "$ANNOUNCED" "$server6_token" "dead:beef:2::2" "$client_addr_id" "$client6_port" "v6"
+
+	:>"$evts"
+	client_addr_id=$((client_addr_id+1))
+	ip netns exec $ns2 ./pm_nl_ctl ann 10.0.2.2 token $client4_token id $client_addr_id dev ns2eth1 port $new4_port 2>&1 > /dev/null
+	printf "ADD_ADDR id:%d 10.0.2.2 (ns2) => ns1, new port \t\t\t" $client_addr_id
+	sleep 0.1
+	verify_announce_event "$evts" "$ANNOUNCED" "$server4_token" "10.0.2.2" "$client_addr_id" "$new4_port"
+
+	kill $evts_pid
+
+	:>"$evts"
+	ip netns exec $ns2 ./pm_nl_ctl events >> "$evts" 2>&1 &
+	evts_pid=$!
+	sleep 0.1
+
+	ip netns exec $ns1 ./pm_nl_ctl ann 10.0.2.1 token $server4_token id $server_addr_id dev ns1eth2 2>&1 > /dev/null
+	printf "ADD_ADDR id:%d 10.0.2.1 (ns1) => ns2, reuse port \t\t" $server_addr_id
+	sleep 0.1
+	verify_announce_event "$evts" "$ANNOUNCED" "$client4_token" "10.0.2.1" "$server_addr_id" "$app4_port"
+
+	:>"$evts"
+	ip netns exec $ns1 ./pm_nl_ctl ann dead:beef:2::1 token $server6_token id $server_addr_id dev ns1eth2 2>&1 > /dev/null
+	printf "ADD_ADDR6 id:%d dead:beef:2::1 (ns1) => ns2, reuse port\t\t" $server_addr_id
+	sleep 0.1
+	verify_announce_event "$evts" "$ANNOUNCED" "$client6_token" "dead:beef:2::1" "$server_addr_id" "$app6_port" "v6"
+
+	:>"$evts"
+	server_addr_id=$((server_addr_id+1))
+	ip netns exec $ns1 ./pm_nl_ctl ann 10.0.2.1 token $server4_token id $server_addr_id dev ns1eth2 port $new4_port 2>&1 > /dev/null
+	printf "ADD_ADDR id:%d 10.0.2.1 (ns1) => ns2, new port \t\t\t" $server_addr_id
+	sleep 0.1
+	verify_announce_event "$evts" "$ANNOUNCED" "$client4_token" "10.0.2.1" "$server_addr_id" "$new4_port"
+
+	kill $evts_pid
+	rm -f "$evts"
+}
+
+verify_remove_event()
+{
+	local evt=$1
+	local e_type=$2
+	local e_token=$3
+	local e_id=$4
+
+	local type=$(sed -n 's/.*\(type:\)\([[:digit:]]*\).*$/\2/p;q' "$evt")
+	local token=$(sed -n 's/.*\(token:\)\([[:digit:]]*\).*$/\2/p;q' "$evt")
+	local id=$(sed -n 's/.*\(rem_id:\)\([[:digit:]]*\).*$/\2/p;q' "$evt")
+        if [ "$type" = "$e_type" ] && [ "$token" = "$e_token" ] && [ "$id" = "$e_id" ]
+	then
+		printf "[OK]\n"
+		return 0
+	fi
+	printf "[FAIL]\n"
+	exit 1
+}
+
+test_remove()
+{
+	local evts=$(mktemp)
+	:>"$evts"
+	ip netns exec $ns1 ./pm_nl_ctl events >> "$evts" 2>&1 &
+	evts_pid=$!
+	sleep 0.1
+
+	local invalid_token=$(( $client4_token - 1 ))
+	ip netns exec $ns2 ./pm_nl_ctl rem token $invalid_token id $client_addr_id 2>&1 > /dev/null
+	printf "RM_ADDR id:%d ns2 => ns1, invalid token                    \t" $client_addr_id
+	local type=$(sed -n 's/.*\(type:\)\([[:digit:]]*\).*$/\2/p;q' "$evts")
+	if [ "$type" = "" ]
+	then
+		printf "[OK]\n"
+	else
+		printf "[FAIL]\n"
+	fi
+
+	local invalid_id=$(( $client_addr_id + 1 ))
+	ip netns exec $ns2 ./pm_nl_ctl rem token $client4_token id $invalid_id 2>&1 > /dev/null
+	printf "RM_ADDR id:%d ns2 => ns1, invalid id                    \t" $invalid_id
+	type=$(sed -n 's/.*\(type:\)\([[:digit:]]*\).*$/\2/p;q' "$evts")
+	if [ "$type" = "" ]
+	then
+		printf "[OK]\n"
+	else
+		printf "[FAIL]\n"
+	fi
+
+	:>"$evts"
+	ip netns exec $ns2 ./pm_nl_ctl rem token $client4_token id $client_addr_id 2>&1 > /dev/null
+	printf "RM_ADDR id:%d ns2 => ns1                                \t" $client_addr_id
+	sleep 0.1
+	verify_remove_event "$evts" "$REMOVED" "$server4_token" "$client_addr_id"
+
+	:>"$evts"
+	client_addr_id=$(( $client_addr_id - 1 ))
+	ip netns exec $ns2 ./pm_nl_ctl rem token $client4_token id $client_addr_id 2>&1 > /dev/null
+	printf "RM_ADDR id:%d ns2 => ns1                                \t" $client_addr_id
+	sleep 0.1
+	verify_remove_event "$evts" "$REMOVED" "$server4_token" "$client_addr_id"
+
+	:>"$evts"
+	ip netns exec $ns2 ./pm_nl_ctl rem token $client6_token id $client_addr_id 2>&1 > /dev/null
+	printf "RM_ADDR6 id:%d ns2 => ns1                               \t" $client_addr_id
+	sleep 0.1
+	verify_remove_event "$evts" "$REMOVED" "$server6_token" "$client_addr_id"
+
+	kill $evts_pid
+
+	:>"$evts"
+	ip netns exec $ns2 ./pm_nl_ctl events >> "$evts" 2>&1 &
+	evts_pid=$!
+	sleep 0.1
+
+	ip netns exec $ns1 ./pm_nl_ctl rem token $server4_token id $server_addr_id 2>&1 > /dev/null
+	printf "RM_ADDR id:%d ns1 => ns2                                \t" $server_addr_id
+	sleep 0.1
+	verify_remove_event "$evts" "$REMOVED" "$client4_token" "$server_addr_id"
+
+	:>"$evts"
+	server_addr_id=$(( $server_addr_id - 1 ))
+	ip netns exec $ns1 ./pm_nl_ctl rem token $server4_token id $server_addr_id 2>&1 > /dev/null
+	printf "RM_ADDR id:%d ns1 => ns2                                \t" $server_addr_id
+	sleep 0.1
+	verify_remove_event "$evts" "$REMOVED" "$client4_token" "$server_addr_id"
+
+	:>"$evts"
+	ip netns exec $ns1 ./pm_nl_ctl rem token $server6_token id $server_addr_id 2>&1 > /dev/null
+	printf "RM_ADDR6 id:%d ns1 => ns2                               \t" $server_addr_id
+	sleep 0.1
+	verify_remove_event "$evts" "$REMOVED" "$client6_token" "$server_addr_id"
+
+	kill $evts_pid
+	rm -f "$evts"
+}
+
+verify_subflow_events()
+{
+	local evt=$1
+	local e_type=$2
+	local e_token=$3
+	local e_family=$4
+	local e_saddr=$5
+	local e_daddr=$6
+	local e_dport=$7
+	local e_locid=$8
+	local e_remid=$9
+	shift 2
+	local e_from=$8
+	local e_to=$9
+
+	if [ "$e_type" = "$SUB_ESTABLISHED" ]
+	then
+		if [ "$e_family" = "$AF_INET6" ]
+		then
+			printf "CREATE_SUBFLOW6 %s (%s) => %s (%s)    " $e_saddr $e_from $e_daddr $e_to
+		else
+			printf "CREATE_SUBFLOW %s (%s) => %s (%s)         \t" $e_saddr $e_from $e_daddr $e_to
+		fi
+	else
+		if [ "$e_family" = "$AF_INET6" ]
+		then
+			printf "DESTROY_SUBFLOW6 %s (%s) => %s (%s)   " $e_saddr $e_from $e_daddr $e_to
+		else
+			printf "DESTROY_SUBFLOW %s (%s) => %s (%s)         \t" $e_saddr $e_from $e_daddr $e_to
+		fi
+	fi
+
+	local type=$(sed -n 's/.*\(type:\)\([[:digit:]]*\).*$/\2/p;q' "$evt")
+	local token=$(sed -n 's/.*\(token:\)\([[:digit:]]*\).*$/\2/p;q' "$evt")
+	local family=$(sed -n 's/.*\(family:\)\([[:digit:]]*\).*$/\2/p;q' "$evt")
+	local dport=$(sed -n 's/.*\(dport:\)\([[:digit:]]*\).*$/\2/p;q' "$evt")
+	local locid=$(sed -n 's/.*\(loc_id:\)\([[:digit:]]*\).*$/\2/p;q' "$evt")
+	local remid=$(sed -n 's/.*\(rem_id:\)\([[:digit:]]*\).*$/\2/p;q' "$evt")
+	local saddr=""
+	local daddr=""
+	if [ "$family" = "$AF_INET6" ]
+	then
+		saddr=$(sed -n 's/.*\(saddr6:\)\([0-9a-f:.]*\).*$/\2/p;q' "$evt")
+		daddr=$(sed -n 's/.*\(daddr6:\)\([0-9a-f:.]*\).*$/\2/p;q' "$evt")
+	else
+		saddr=$(sed -n 's/.*\(saddr4:\)\([0-9.]*\).*$/\2/p;q' "$evt")
+		daddr=$(sed -n 's/.*\(daddr4:\)\([0-9.]*\).*$/\2/p;q' "$evt")
+	fi
+
+        if [ "$type" = "$e_type" ] && [ "$token" = "$e_token" ] && [ "$daddr" = "$e_daddr" ] && [ "$e_dport" = "$dport" ] && [ "$family" = "$e_family" ] && [ "$saddr" = "$e_saddr" ] && [ "$e_locid" = "$locid" ] && [ "$e_remid" = "$remid" ]
+	then
+		printf "[OK]\n"
+		return 0
+	fi
+	printf "[FAIL]\n"
+	exit 1
+}
+
+test_subflows()
+{
+	local evts=$(mktemp)
+	:>"$evts"
+	ip netns exec $ns1 ./pm_nl_ctl events >> "$evts" 2>&1 &
+	evts_pid=$!
+	sleep 0.1
+
+	ip netns exec $ns2 ./pm_nl_ctl ann 10.0.2.2 token $client4_token id $client_addr_id 2>&1 > /dev/null
+	sleep 0.1
+	printf "ADD_ADDR 10.0.2.2 (ns2) => ns1, reuse port              \t[OK]\n"
+
+	:>"$evts"
+	ip netns exec $ns1 ./pm_nl_ctl csf lip 10.0.2.1 lid 23 rip 10.0.2.2 rport $client4_port token $server4_token 2>&1 > /dev/null
+	sleep 0.1
+	verify_subflow_events "$evts" "$SUB_ESTABLISHED" "$server4_token" "$AF_INET" "10.0.2.1" "10.0.2.2" "$client4_port" "23" "$client_addr_id" "ns1" "ns2"
+
+	local sport=$(sed -n 's/.*\(sport:\)\([[:digit:]]*\).*$/\2/p;q' "$evts")
+
+	:>"$evts"
+	ip netns exec $ns1 ./pm_nl_ctl dsf lip 10.0.2.1 lport $sport rip 10.0.2.2 rport $client4_port token $server4_token 2>&1 > /dev/null
+	sleep 0.1
+	verify_subflow_events "$evts" "$SUB_CLOSED" "$server4_token" "$AF_INET" "10.0.2.1" "10.0.2.2" "$client4_port" "23" "$client_addr_id" "ns1" "ns2"
+
+	ip netns exec $ns2 ./pm_nl_ctl rem id $client_addr_id token $client4_token 2>&1 > /dev/null
+	sleep 0.1
+
+	:>"$evts"
+	ip netns exec $ns2 ./pm_nl_ctl ann dead:beef:2::2 token $client6_token id $client_addr_id 2>&1 > /dev/null
+	sleep 0.1
+	printf "ADD_ADDR6 dead:beef:2::2 (ns2) => ns1, reuse port              \t[OK]\n"
+
+	:>"$evts"
+	ip netns exec $ns1 ./pm_nl_ctl csf lip dead:beef:2::1 lid 23 rip dead:beef:2::2 rport $client6_port token $server6_token 2>&1 > /dev/null
+	sleep 0.1
+	verify_subflow_events "$evts" "$SUB_ESTABLISHED" "$server6_token" "$AF_INET6" "dead:beef:2::1" "dead:beef:2::2" "$client6_port" "23" "$client_addr_id" "ns1" "ns2"
+
+	local sport=$(sed -n 's/.*\(sport:\)\([[:digit:]]*\).*$/\2/p;q' "$evts")
+
+	:>"$evts"
+	ip netns exec $ns1 ./pm_nl_ctl dsf lip dead:beef:2::1 lport $sport rip dead:beef:2::2 rport $client6_port token $server6_token 2>&1 > /dev/null
+	sleep 0.1
+	verify_subflow_events "$evts" "$SUB_CLOSED" "$server6_token" "$AF_INET6" "dead:beef:2::1" "dead:beef:2::2" "$client6_port" "23" "$client_addr_id" "ns1" "ns2"
+
+	ip netns exec $ns2 ./pm_nl_ctl rem id $client_addr_id token $client6_token 2>&1 > /dev/null
+	sleep 0.1
+
+	:>"$evts"
+	ip netns exec $ns2 ./pm_nl_ctl ann 10.0.2.2 token $client4_token id $client_addr_id  port $new4_port 2>&1 > /dev/null
+	sleep 0.1
+	printf "ADD_ADDR 10.0.2.2 (ns2) => ns1, new port                \t[OK]\n"
+
+	:>"$evts"
+	ip netns exec $ns1 ./pm_nl_ctl csf lip 10.0.2.1 lid 23 rip 10.0.2.2 rport $new4_port token $server4_token 2>&1 > /dev/null
+	sleep 0.1
+	verify_subflow_events "$evts" "$SUB_ESTABLISHED" "$server4_token" "$AF_INET" "10.0.2.1" "10.0.2.2" "$new4_port" "23" "$client_addr_id" "ns1" "ns2"
+
+        sport=$(sed -n 's/.*\(sport:\)\([[:digit:]]*\).*$/\2/p;q' "$evts")
+
+	:>"$evts"
+	ip netns exec $ns1 ./pm_nl_ctl dsf lip 10.0.2.1 lport $sport rip 10.0.2.2 rport $new4_port token $server4_token 2>&1 > /dev/null
+	sleep 0.1
+	verify_subflow_events "$evts" "$SUB_CLOSED" "$server4_token" "$AF_INET" "10.0.2.1" "10.0.2.2" "$new4_port" "23" "$client_addr_id" "ns1" "ns2"
+
+	ip netns exec $ns2 ./pm_nl_ctl rem id $client_addr_id token $client4_token 2>&1 > /dev/null
+
+	kill $evts_pid
+
+	:>"$evts"
+	ip netns exec $ns2 ./pm_nl_ctl events >> "$evts" 2>&1 &
+	evts_pid=$!
+	sleep 0.1
+
+	ip netns exec $ns1 ./pm_nl_ctl ann 10.0.2.1 token $server4_token id $server_addr_id 2>&1 > /dev/null
+	sleep 0.1
+	printf "ADD_ADDR 10.0.2.1 (ns1) => ns2, reuse port              \t[OK]\n"
+
+	:>"$evts"
+	ip netns exec $ns2 ./pm_nl_ctl csf lip 10.0.2.2 lid 23 rip 10.0.2.1 rport $app4_port token $client4_token 2>&1 > /dev/null
+	sleep 0.1
+	verify_subflow_events "$evts" "$SUB_ESTABLISHED" "$client4_token" "$AF_INET" "10.0.2.2" "10.0.2.1" "$app4_port" "23" "$server_addr_id" "ns2" "ns1"
+
+        sport=$(sed -n 's/.*\(sport:\)\([[:digit:]]*\).*$/\2/p;q' "$evts")
+
+	:>"$evts"
+	ip netns exec $ns2 ./pm_nl_ctl dsf lip 10.0.2.2 lport $sport rip 10.0.2.1 rport $app4_port token $client4_token 2>&1 > /dev/null
+	sleep 0.1
+	verify_subflow_events "$evts" "$SUB_CLOSED" "$client4_token" "$AF_INET" "10.0.2.2" "10.0.2.1" "$app4_port" "23" "$server_addr_id" "ns2" "ns1"
+
+	ip netns exec $ns1 ./pm_nl_ctl rem id $server_addr_id token $server4_token 2>&1 > /dev/null
+	sleep 0.1
+
+	:>"$evts"
+	ip netns exec $ns1 ./pm_nl_ctl ann dead:beef:2::1 token $server6_token id $server_addr_id 2>&1 > /dev/null
+	sleep 0.1
+	printf "ADD_ADDR6 dead:beef:2::1 (ns1) => ns2, reuse port              \t[OK]\n"
+
+	:>"$evts"
+	ip netns exec $ns2 ./pm_nl_ctl csf lip dead:beef:2::2 lid 23 rip dead:beef:2::1 rport $app6_port token $client6_token 2>&1 > /dev/null
+	sleep 0.1
+	verify_subflow_events "$evts" "$SUB_ESTABLISHED" "$client6_token" "$AF_INET6" "dead:beef:2::2" "dead:beef:2::1" "$app6_port" "23" "$server_addr_id" "ns2" "ns1"
+
+	local sport=$(sed -n 's/.*\(sport:\)\([[:digit:]]*\).*$/\2/p;q' "$evts")
+
+	:>"$evts"
+	ip netns exec $ns2 ./pm_nl_ctl dsf lip dead:beef:2::2 lport $sport rip dead:beef:2::1 rport $app6_port token $client6_token 2>&1 > /dev/null
+	sleep 0.1
+	verify_subflow_events "$evts" "$SUB_CLOSED" "$client6_token" "$AF_INET6" "dead:beef:2::2" "dead:beef:2::1" "$app6_port" "23" "$server_addr_id" "ns2" "ns1"
+
+	ip netns exec $ns1 ./pm_nl_ctl rem id $server_addr_id token $server6_token 2>&1 > /dev/null
+	sleep 0.1
+
+	:>"$evts"
+	ip netns exec $ns1 ./pm_nl_ctl ann 10.0.2.1 token $server4_token id $server_addr_id  port $new4_port 2>&1 > /dev/null
+	sleep 0.1
+	printf "ADD_ADDR 10.0.2.1 (ns1) => ns2, new port                \t[OK]\n"
+
+	:>"$evts"
+	ip netns exec $ns2 ./pm_nl_ctl csf lip 10.0.2.2 lid 23 rip 10.0.2.1 rport $new4_port token $client4_token 2>&1 > /dev/null
+	sleep 0.1
+	verify_subflow_events "$evts" "$SUB_ESTABLISHED" "$client4_token" "$AF_INET" "10.0.2.2" "10.0.2.1" "$new4_port" "23" "$server_addr_id" "ns2" "ns1"
+
+        sport=$(sed -n 's/.*\(sport:\)\([[:digit:]]*\).*$/\2/p;q' "$evts")
+
+	:>"$evts"
+	ip netns exec $ns2 ./pm_nl_ctl dsf lip 10.0.2.2 lport $sport rip 10.0.2.1 rport $new4_port token $client4_token 2>&1 > /dev/null
+	sleep 0.1
+	verify_subflow_events "$evts" "$SUB_CLOSED" "$client4_token" "$AF_INET" "10.0.2.2" "10.0.2.1" "$new4_port" "23" "$server_addr_id" "ns2" "ns1"
+
+	ip netns exec $ns1 ./pm_nl_ctl rem id $server_addr_id token $server4_token 2>&1 > /dev/null
+
+	kill $evts_pid
+	rm -f "$evts"
+}
+
+make_connection
+make_connection "v6"
+test_announce
+test_remove
+test_subflows
+exit 0
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: selftests: mptcp: functional tests for the userspace PM type: Build Failure
  2022-01-12 22:15 ` [PATCH mptcp-next v2 21/21] selftests: mptcp: functional tests for the userspace PM type Kishen Maloor
@ 2022-01-12 22:35   ` MPTCP CI
  0 siblings, 0 replies; 45+ messages in thread
From: MPTCP CI @ 2022-01-12 22:35 UTC (permalink / raw)
  To: Kishen Maloor; +Cc: mptcp

Hi Kishen,

Thank you for your modifications, that's great!

But sadly, our CI spotted some issues with it when trying to build it.

You can find more details there:

  https://patchwork.kernel.org/project/mptcp/patch/20220112221523.1829397-22-kishen.maloor@intel.com/
  https://github.com/multipath-tcp/mptcp_net-next/actions/runs/1689981079

Status: failure
Initiator: MPTCPimporter
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/bce46a55a41c

Feel free to reply to this email if you cannot access logs, if you need
some support to fix the error, if this doesn't seem to be caused by your
modifications or if the error is a false positive one.

Cheers,
MPTCP GH Action bot

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH mptcp-next v2 07/21] mptcp: netlink: process IPv6 addrs in creating listening sockets
  2022-01-12 22:15 ` [PATCH mptcp-next v2 07/21] mptcp: netlink: process IPv6 addrs in creating listening sockets Kishen Maloor
@ 2022-01-14 15:27   ` Paolo Abeni
  2022-01-14 22:09     ` Mat Martineau
  0 siblings, 1 reply; 45+ messages in thread
From: Paolo Abeni @ 2022-01-14 15:27 UTC (permalink / raw)
  To: Kishen Maloor, mptcp

Hello,

On Wed, 2022-01-12 at 17:15 -0500, Kishen Maloor wrote:
> This change updates mptcp_pm_nl_create_listen_socket() to create
> listening sockets bound to IPv6 addresses (where IPv6 is supported).
> 
> Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
> ---
>  net/mptcp/pm_netlink.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
> index 4c1895dbc2a5..779ec9d375f0 100644
> --- a/net/mptcp/pm_netlink.c
> +++ b/net/mptcp/pm_netlink.c
> @@ -986,6 +986,7 @@ static int mptcp_pm_nl_create_listen_socket(struct sock *sk,
>  					    struct mptcp_pm_addr_entry *entry,
>  					    struct socket **lsk)
>  {
> +	int addrlen = sizeof(struct sockaddr_in);
>  	struct sockaddr_storage addr;
>  	struct mptcp_sock *msk;
>  	struct socket *ssock;
> @@ -1010,8 +1011,11 @@ static int mptcp_pm_nl_create_listen_socket(struct sock *sk,
>  	}
>  
>  	mptcp_info2sockaddr(&entry->addr, &addr, entry->addr.family);
> -	err = kernel_bind(ssock, (struct sockaddr *)&addr,
> -			  sizeof(struct sockaddr_in));
> +#if IS_ENABLED(CONFIG_MPTCP_IPV6)
> +	if (entry->addr.family == AF_INET6)
> +		addrlen = sizeof(struct sockaddr_in6);
> +#endif
> +	err = kernel_bind(ssock, (struct sockaddr *)&addr, addrlen);
>  	if (err) {
>  		pr_warn("kernel_bind error, err=%d", err);
>  		goto out;

This looks a bugfix for -net to me?

Possibly worthy additional an additional mp_join self-test for the ipv6
case.

Thanks!

/P


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH mptcp-next v2 08/21] mptcp: attempt to add listening sockets for announced addrs
  2022-01-12 22:15 ` [PATCH mptcp-next v2 08/21] mptcp: attempt to add listening sockets for announced addrs Kishen Maloor
@ 2022-01-14 15:54   ` Paolo Abeni
  2022-01-14 17:47     ` Matthieu Baerts
                       ` (2 more replies)
  0 siblings, 3 replies; 45+ messages in thread
From: Paolo Abeni @ 2022-01-14 15:54 UTC (permalink / raw)
  To: Kishen Maloor, mptcp

Hello,

On Wed, 2022-01-12 at 17:15 -0500, Kishen Maloor wrote:
> When ADD_ADDR announcements use the port associated with an
> active subflow, this change ensures that a listening socket is
> bound to the announced address and port for subsequently
> receiving MP_JOINs from the remote end. In case there's
> a recorded lsk bound to that address+port, it is reused.
> But if a listening socket for this address is already held by the
> application then no further action is taken.
> 
> When a listening socket is created, it is stored in
> struct mptcp_pm_add_entry and released accordingly.
> 
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/203
> 
> v2: fixed formatting
> 
> Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>

should be either:

"""
<changelog>

<tags>
"""

or:

"""
<tags>
---
<changelog>
"""

we usually keep the changelog outside the commit message for
development history before landing on netdev, that is:

"""
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/203
Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
---
v2: fixed formatting
"""

> ---
>  net/mptcp/pm_netlink.c | 56 ++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 54 insertions(+), 2 deletions(-)
> 
> diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
> index 779ec9d375f0..e2211f3b8c8c 100644
> --- a/net/mptcp/pm_netlink.c
> +++ b/net/mptcp/pm_netlink.c
> @@ -43,6 +43,7 @@ struct mptcp_pm_add_entry {
>  	struct mptcp_addr_info	addr;
>  	struct timer_list	add_timer;
>  	struct mptcp_sock	*sock;
> +	struct mptcp_local_lsk	*lsk_ref;
>  	u8			retrans_times;
>  };
>  
> @@ -66,6 +67,10 @@ struct pm_nl_pernet {
>  #define MPTCP_PM_ADDR_MAX	8
>  #define ADD_ADDR_RETRANS_MAX	3
>  
> +static int mptcp_pm_nl_create_listen_socket(struct sock *sk,
> +					    struct mptcp_pm_addr_entry *entry,
> +					    struct socket **lsk);
> +
>  static bool addresses_equal(const struct mptcp_addr_info *a,
>  			    const struct mptcp_addr_info *b, bool use_port)
>  {
> @@ -438,7 +443,8 @@ mptcp_pm_del_add_timer(struct mptcp_sock *msk,
>  }
>  
>  static bool mptcp_pm_alloc_anno_list(struct mptcp_sock *msk,
> -				     struct mptcp_pm_addr_entry *entry)
> +				     struct mptcp_pm_addr_entry *entry,
> +				     struct mptcp_local_lsk *lsk_ref)
>  {
>  	struct mptcp_pm_add_entry *add_entry = NULL;
>  	struct sock *sk = (struct sock *)msk;
> @@ -458,6 +464,10 @@ static bool mptcp_pm_alloc_anno_list(struct mptcp_sock *msk,
>  	add_entry->addr = entry->addr;
>  	add_entry->sock = msk;
>  	add_entry->retrans_times = 0;
> +	add_entry->lsk_ref = lsk_ref;
> +
> +	if (lsk_ref)
> +		lsk_list_add_ref(lsk_ref);
>  
>  	timer_setup(&add_entry->add_timer, mptcp_pm_add_timer, 0);
>  	sk_reset_timer(sk, &add_entry->add_timer,
> @@ -470,8 +480,11 @@ void mptcp_pm_free_anno_list(struct mptcp_sock *msk)
>  {
>  	struct mptcp_pm_add_entry *entry, *tmp;
>  	struct sock *sk = (struct sock *)msk;
> +	struct pm_nl_pernet *pernet;
>  	LIST_HEAD(free_list);
>  
> +	pernet = net_generic(sock_net(sk), pm_nl_pernet_id);
> +
>  	pr_debug("msk=%p", msk);
>  
>  	spin_lock_bh(&msk->pm.lock);
> @@ -480,6 +493,8 @@ void mptcp_pm_free_anno_list(struct mptcp_sock *msk)
>  
>  	list_for_each_entry_safe(entry, tmp, &free_list, list) {
>  		sk_stop_timer_sync(sk, &entry->add_timer);
> +		if (entry->lsk_ref)
> +			lsk_list_release(pernet, entry->lsk_ref);
>  		kfree(entry);
>  	}
>  }
> @@ -570,13 +585,16 @@ lookup_id_by_addr(struct pm_nl_pernet *pernet, const struct mptcp_addr_info *add
>  }
>  
>  static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
> +	__must_hold(&msk->pm.lock)
>  {
> +	struct mptcp_local_lsk *lsk_ref = NULL;
>  	struct sock *sk = (struct sock *)msk;
>  	struct mptcp_pm_addr_entry *local;
>  	unsigned int add_addr_signal_max;
>  	unsigned int local_addr_max;
>  	struct pm_nl_pernet *pernet;
>  	unsigned int subflows_max;
> +	struct socket *lsk;
>  
>  	pernet = net_generic(sock_net(sk), pm_nl_pernet_id);
>  
> @@ -607,12 +625,39 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
>  		local = select_signal_address(pernet, msk);
>  
>  		if (local) {
> -			if (mptcp_pm_alloc_anno_list(msk, local)) {
> +			if (!local->addr.port) {
> +				local->addr.port =
> +					((struct inet_sock *)inet_sk
> +					 ((struct sock *)msk))->inet_sport;
> +
> +				lsk_ref = lsk_list_find(pernet, &local->addr);
> +
> +				if (!lsk_ref) {
> +					spin_unlock_bh(&msk->pm.lock);
> +
> +					mptcp_pm_nl_create_listen_socket(sk, local, &lsk);
> +
> +					spin_lock_bh(&msk->pm.lock);
> +
> +					if (lsk)
> +						lsk_ref = lsk_list_add(pernet, &local->addr, lsk);
> +
> +					if (lsk && !lsk_ref)
> +						sock_release(lsk);

Let's suppose an user-space application listen on 2 different address
(A, B) and does:

"""
s1 = socket()
bind(s1, A)
listen(s1)
 // at this point incoming MPTCP connection can be established on s1
 // and ADD_ADDR sub-options could be sent back 

s2 = socket()
bind(s2, B)
listen(s2)
"""

If there is a signal endpoint on B, the above listen can race with the 
mptcp_pm_nl_create_listen_socket() call, leading to hard to track
startup issues for user-space application.

I really think we want at list a configuration option, off by default,
for this feature. Some specific self-test would be a plus.

It will help reviewing, splitting this series in at least 2 chunks:
* pre-reqs up to ~this patch
* user-space PM specific stuff

Side note: it would be nice reducing the level of intentation, e.g.
factoring-out part of the inner code in some helper.

> +				}
> +
> +				local->addr.port = 0;
> +			}
> +
> +			if (mptcp_pm_alloc_anno_list(msk, local, lsk_ref)) {
>  				__clear_bit(local->addr.id, msk->pm.id_avail_bitmap);
>  				msk->pm.add_addr_signaled++;
>  				mptcp_pm_announce_addr(msk, &local->addr, false);
>  				mptcp_pm_nl_addr_send_ack(msk);
>  			}
> +
> +			if (lsk_ref)
> +				lsk_list_release(pernet, lsk_ref);

Probaly not very relevant, but something alike:

	rcu_read_lock()
	lsk_ref = __lsk_list_find();
	if (lst_ref)
		if (mptcp_pm_alloc_anno_list(...)
	rcu_read_unlock()

would save a pair of possibly contended atomic operations in the common
case.

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH mptcp-next v2 13/21] mptcp: netlink: Add MPTCP_PM_CMD_ANNOUNCE
  2022-01-12 22:15 ` [PATCH mptcp-next v2 13/21] mptcp: netlink: Add MPTCP_PM_CMD_ANNOUNCE Kishen Maloor
@ 2022-01-14 17:04   ` Paolo Abeni
  2022-01-19  1:21     ` Kishen Maloor
  0 siblings, 1 reply; 45+ messages in thread
From: Paolo Abeni @ 2022-01-14 17:04 UTC (permalink / raw)
  To: Kishen Maloor, mptcp

On Wed, 2022-01-12 at 17:15 -0500, Kishen Maloor wrote:
> This change adds a MPTCP netlink interface for issuing
> ADD_ADDR advertisements over the chosen MPTCP connection from a
> userspace path manager.
> 
> The command requires the following parameters:
> { token, { loc_id, family, daddr4 | daddr6 [, dport] } [, if_idx],
> flags/signal }.
> 
> Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
> ---
>  include/uapi/linux/mptcp.h |   2 +
>  net/mptcp/pm_netlink.c     | 111 +++++++++++++++++++++++++++++++++++++
>  2 files changed, 113 insertions(+)
> 
> diff --git a/include/uapi/linux/mptcp.h b/include/uapi/linux/mptcp.h
> index f106a3941cdf..40380be396c8 100644
> --- a/include/uapi/linux/mptcp.h
> +++ b/include/uapi/linux/mptcp.h
> @@ -55,6 +55,7 @@ enum {
>  	MPTCP_PM_ATTR_ADDR,				/* nested address */
>  	MPTCP_PM_ATTR_RCV_ADD_ADDRS,			/* u32 */
>  	MPTCP_PM_ATTR_SUBFLOWS,				/* u32 */
> +	MPTCP_PM_ATTR_TOKEN,				/* u32 */
>  
>  	__MPTCP_PM_ATTR_MAX
>  };
> @@ -92,6 +93,7 @@ enum {
>  	MPTCP_PM_CMD_SET_LIMITS,
>  	MPTCP_PM_CMD_GET_LIMITS,
>  	MPTCP_PM_CMD_SET_FLAGS,
> +	MPTCP_PM_CMD_ANNOUNCE,
>  
>  	__MPTCP_PM_CMD_AFTER_LAST
>  };
> diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
> index 07536abab3e5..18228b2dfa2d 100644
> --- a/net/mptcp/pm_netlink.c
> +++ b/net/mptcp/pm_netlink.c
> @@ -1268,6 +1268,7 @@ static const struct nla_policy mptcp_pm_policy[MPTCP_PM_ATTR_MAX + 1] = {
>  					NLA_POLICY_NESTED(mptcp_pm_addr_policy),
>  	[MPTCP_PM_ATTR_RCV_ADD_ADDRS]	= { .type	= NLA_U32,	},
>  	[MPTCP_PM_ATTR_SUBFLOWS]	= { .type	= NLA_U32,	},
> +	[MPTCP_PM_ATTR_TOKEN]		= { .type	= NLA_U32,	},
>  };
>  
>  void mptcp_pm_nl_subflow_chk_stale(const struct mptcp_sock *msk, struct sock *ssk)
> @@ -2028,6 +2029,111 @@ static int mptcp_nl_addr_backup(struct net *net,
>  	return ret;
>  }
>  
> +static int mptcp_nl_cmd_announce(struct sk_buff *skb, struct genl_info *info)
> +{
> +	struct nlattr *token = info->attrs[MPTCP_PM_ATTR_TOKEN];
> +	struct nlattr *addr = info->attrs[MPTCP_PM_ATTR_ADDR];
> +	struct pm_nl_pernet *pernet = genl_info_pm_nl(info);
> +	struct mptcp_pm_addr_entry addr_val;
> +	struct mptcp_local_lsk *lsk_ref;
> +	bool reuse_port = false;
> +	struct mptcp_sock *msk;
> +	struct socket *lsk;
> +	u32 token_val;
> +	int err;
> +
> +	if (!addr || !token) {
> +		GENL_SET_ERR_MSG(info, "missing required inputs");
> +		return -EINVAL;
> +	}
> +
> +	token_val = nla_get_u32(token);
> +
> +	msk = mptcp_token_get_sock(sock_net(skb->sk), token_val);
> +	if (!msk) {
> +		NL_SET_ERR_MSG_ATTR(info->extack, token, "invalid token");
> +		return -EINVAL;
> +	}
> +
> +	if (READ_ONCE(msk->pm.pm_type) != MPTCP_PM_TYPE_USERSPACE) {
> +		GENL_SET_ERR_MSG(info, "invalid request; userspace PM not selected");
> +		return -EINVAL;
> +	}
> +
> +	err = mptcp_pm_parse_entry(addr, info, true, &addr_val);
> +	if (err < 0) {
> +		GENL_SET_ERR_MSG(info, "error parsing local address");
> +		return err;
> +	}
> +
> +	if (addr_val.addr.id == 0 || !(addr_val.flags & MPTCP_PM_ADDR_FLAG_SIGNAL)) {
> +		GENL_SET_ERR_MSG(info, "invalid addr id or flags");
> +		return -EINVAL;
> +	}
> +
> +	if (!addr_val.addr.port) {
> +		addr_val.addr.port =
> +			((struct inet_sock *)inet_sk
> +			 ((struct sock *)msk))->inet_sport;
> +
> +		reuse_port = true;
> +	}
> +
> +	lsk_ref = lsk_list_find(pernet, &addr_val.addr);
> +
> +	if (!lsk_ref) {
> +		err = mptcp_pm_nl_create_listen_socket(skb->sk, &addr_val, &lsk);
> +		if ((err && !reuse_port) || (err && (err != -EADDRINUSE) && reuse_port)) {
> +			GENL_SET_ERR_MSG(info, "error creating listen socket");
> +			return err;
> +		}
> +
> +		if (lsk) {
> +			lsk_ref = lsk_list_add(pernet, &addr_val.addr, lsk);
> +			if (!lsk_ref) {
> +				GENL_SET_ERR_MSG(info, "can't allocate lsk ref");
> +				sock_release(lsk);
> +				return -ENOMEM;
> +			}
> +		}
> +	}

It looks like that the above chunk of code is sort of recurring. it's
likely worthy to factor it in some helper alike
"lsk_list_find_or_create()"

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH mptcp-next v2 10/21] mptcp: handle local addrs announced by userspace PMs
  2022-01-12 22:15 ` [PATCH mptcp-next v2 10/21] mptcp: handle local addrs announced " Kishen Maloor
@ 2022-01-14 17:11   ` Paolo Abeni
  2022-01-19  1:24     ` Kishen Maloor
  0 siblings, 1 reply; 45+ messages in thread
From: Paolo Abeni @ 2022-01-14 17:11 UTC (permalink / raw)
  To: Kishen Maloor, mptcp

On Wed, 2022-01-12 at 17:15 -0500, Kishen Maloor wrote:
> This change adds a new internal function to store/retrieve local
> addrs announced by userspace PM implementations from the kernel
> context. The function does not stipulate any limitation on the #
> of addrs, and handles the requirements of three scenarios:
> 1) ADD_ADDR announcements (which require that a local id be
> provided), 2) retrieving the local id associated with an address,
> also where one may need to be assigned, and 3) reissuance of
> ADD_ADDRs when there's a successful match of addr/id.
> 
> The list of all stored local addr entries is held under the
> MPTCP sock structure. This list, if not released by the REMOVE_ADDR
> flow is freed while the sock is destructed.

It feels strange to me that we need to maintain an additional addresses
list inside the kernel for the user-space PM - which should take care
of all the status information.

Why anno_list is not enough? Why isn't cheapest to extend it? 

Being the list unlimited a malicius (or buggy) user-space could consume
all the kernel memory. I think we need some limits, or at least some
accounting.

/P


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH mptcp-next v2 08/21] mptcp: attempt to add listening sockets for announced addrs
  2022-01-14 15:54   ` Paolo Abeni
@ 2022-01-14 17:47     ` Matthieu Baerts
  2022-01-14 22:26     ` Mat Martineau
  2022-01-19  1:20     ` Kishen Maloor
  2 siblings, 0 replies; 45+ messages in thread
From: Matthieu Baerts @ 2022-01-14 17:47 UTC (permalink / raw)
  To: Paolo Abeni, Kishen Maloor, mptcp

Hi Kishen, Paolo,

On 14/01/2022 16:54, Paolo Abeni wrote:
> I really think we want at list a configuration option, off by default,
> for this feature. Some specific self-test would be a plus.
> 
> It will help reviewing, splitting this series in at least 2 chunks:
> * pre-reqs up to ~this patch
> * user-space PM specific stuff

Good idea, I think it will help indeed. And it will be required when we
will send these patches to netdev.

For me, it makes sense to have more features only for the userspace PM
and of course more flexibility. But we still need to make sure the
in-kernel PM can be controlled to avoid some behaviours. For example,
when sending the command to emit an ADD_ADDR, a parameter could be used
to (try to) create a listening socket as well.

I think an in-kernel PM makes sense for a typical server mainly having
to send ADD_ADDRs and accept subflows. Having a userspace PM daemon can
be costly for such "basic task".

Also a server could also send an ADD_ADDR for an IP that it doesn't
directly "own" and we might save some resources by not creating or
trying to create listening sockets. We could also have a new flag with
'ip mptcp' to create listening sockets for the IP we want to "signal".

I mean: even if it is now easier to create listening sockets, we might
not want to do it all the time.

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH mptcp-next v2 07/21] mptcp: netlink: process IPv6 addrs in creating listening sockets
  2022-01-14 15:27   ` Paolo Abeni
@ 2022-01-14 22:09     ` Mat Martineau
  2022-01-19  1:25       ` Kishen Maloor
  0 siblings, 1 reply; 45+ messages in thread
From: Mat Martineau @ 2022-01-14 22:09 UTC (permalink / raw)
  To: Paolo Abeni, Kishen Maloor; +Cc: mptcp

On Fri, 14 Jan 2022, Paolo Abeni wrote:

> Hello,
>
> On Wed, 2022-01-12 at 17:15 -0500, Kishen Maloor wrote:
>> This change updates mptcp_pm_nl_create_listen_socket() to create
>> listening sockets bound to IPv6 addresses (where IPv6 is supported).
>>

This tag -

Fixes: 1729cf186d8a ("mptcp: create the listening socket for new port")

would help with backporting to stable.

>> Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
>> ---
>>  net/mptcp/pm_netlink.c | 8 ++++++--
>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
>> index 4c1895dbc2a5..779ec9d375f0 100644
>> --- a/net/mptcp/pm_netlink.c
>> +++ b/net/mptcp/pm_netlink.c
>> @@ -986,6 +986,7 @@ static int mptcp_pm_nl_create_listen_socket(struct sock *sk,
>>  					    struct mptcp_pm_addr_entry *entry,
>>  					    struct socket **lsk)
>>  {
>> +	int addrlen = sizeof(struct sockaddr_in);
>>  	struct sockaddr_storage addr;
>>  	struct mptcp_sock *msk;
>>  	struct socket *ssock;
>> @@ -1010,8 +1011,11 @@ static int mptcp_pm_nl_create_listen_socket(struct sock *sk,
>>  	}
>>
>>  	mptcp_info2sockaddr(&entry->addr, &addr, entry->addr.family);
>> -	err = kernel_bind(ssock, (struct sockaddr *)&addr,
>> -			  sizeof(struct sockaddr_in));
>> +#if IS_ENABLED(CONFIG_MPTCP_IPV6)
>> +	if (entry->addr.family == AF_INET6)
>> +		addrlen = sizeof(struct sockaddr_in6);
>> +#endif
>> +	err = kernel_bind(ssock, (struct sockaddr *)&addr, addrlen);
>>  	if (err) {
>>  		pr_warn("kernel_bind error, err=%d", err);
>>  		goto out;
>
> This looks a bugfix for -net to me?
>
> Possibly worthy additional an additional mp_join self-test for the ipv6
> case.
>

I agree, this would be good for -net. Kishen can you add the suggested 
selftest too and repost separately for mptcp-net?

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH mptcp-next v2 08/21] mptcp: attempt to add listening sockets for announced addrs
  2022-01-14 15:54   ` Paolo Abeni
  2022-01-14 17:47     ` Matthieu Baerts
@ 2022-01-14 22:26     ` Mat Martineau
  2022-01-19  1:20     ` Kishen Maloor
  2 siblings, 0 replies; 45+ messages in thread
From: Mat Martineau @ 2022-01-14 22:26 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: Kishen Maloor, mptcp

On Fri, 14 Jan 2022, Paolo Abeni wrote:

...

> It will help reviewing, splitting this series in at least 2 chunks:
> * pre-reqs up to ~this patch
> * user-space PM specific stuff
>

Seems like the way to go here is to split patches 1-6 into another series, 
and have patch 7 as a standalone (or separate series w/ selftest) for 
mptcp-net.

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH mptcp-next v2 04/21] mptcp: establish subflows from either end of connection
  2022-01-12 22:15 ` [PATCH mptcp-next v2 04/21] mptcp: establish subflows from either end of connection Kishen Maloor
@ 2022-01-14 22:43   ` Mat Martineau
  2022-01-17  8:59     ` Paolo Abeni
  0 siblings, 1 reply; 45+ messages in thread
From: Mat Martineau @ 2022-01-14 22:43 UTC (permalink / raw)
  To: Kishen Maloor; +Cc: mptcp

On Wed, 12 Jan 2022, Kishen Maloor wrote:

> This change updates internal logic to permit subflows to be
> established from either the client or server ends of MPTCP
> connections. This symmetry and added flexibility may be
> harnessed by PM implementations running on either end in
> creating new subflows.
>
> The essence of this change lies in not relying on the
> "server_side" flag (which continues to be available if needed).
>

After this patch, server_side is unused except for some debug output. If 
it's really no longer referenced (see below first), may be better to just 
remove it.

> v2: check for 3rd ACK retransmission only on passive side
> of the MPJ handshake
>
> Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
> ---
> net/mptcp/options.c  | 2 +-
> net/mptcp/protocol.c | 5 +----
> net/mptcp/protocol.h | 2 --
> 3 files changed, 2 insertions(+), 7 deletions(-)
>
> diff --git a/net/mptcp/options.c b/net/mptcp/options.c
> index cceba8c7806d..89808816db06 100644
> --- a/net/mptcp/options.c
> +++ b/net/mptcp/options.c
> @@ -921,7 +921,7 @@ static bool check_fully_established(struct mptcp_sock *msk, struct sock *ssk,
> 		if (TCP_SKB_CB(skb)->seq == subflow->ssn_offset + 1 &&
> 		    TCP_SKB_CB(skb)->end_seq == TCP_SKB_CB(skb)->seq &&
> 		    subflow->mp_join && (mp_opt->suboptions & OPTIONS_MPTCP_MPJ) &&
> -		    READ_ONCE(msk->pm.server_side))
> +		    !subflow->request_join)
> 			tcp_send_ack(ssk);
> 		goto fully_established;
> 	}
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 4012f844eec1..408a05bff633 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -3248,15 +3248,12 @@ bool mptcp_finish_join(struct sock *ssk)
> 		return false;
> 	}
>
> -	if (!msk->pm.server_side)
> +	if (!list_empty(&subflow->node))
> 		goto out;
>
> 	if (!mptcp_pm_allow_new_subflow(msk))
> 		goto err_prohibited;
>
> -	if (WARN_ON_ONCE(!list_empty(&subflow->node)))
> -		goto err_prohibited;
> -
> 	/* active connections are already on conn_list.
> 	 * If we can't acquire msk socket lock here, let the release callback
> 	 * handle it
> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> index e2a67d3469f6..c50247673c7e 100644
> --- a/net/mptcp/protocol.h
> +++ b/net/mptcp/protocol.h
> @@ -908,10 +908,8 @@ static inline bool mptcp_check_infinite_map(struct sk_buff *skb)
> static inline bool subflow_simultaneous_connect(struct sock *sk)
> {
> 	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
> -	struct sock *parent = subflow->conn;
>
> 	return sk->sk_state == TCP_ESTABLISHED &&
> -	       !mptcp_sk(parent)->pm.server_side &&
> 	       !subflow->conn_finished;
> }

Are changes in this function needed? This code was added to address
https://github.com/multipath-tcp/mptcp_net-next/issues/35
which was a weird case encountered with syzkaller where a socket tried to 
connect to itself, triggering the "simultaneous open" code path. It 
doesn't seem to be applicable to MP_JOINs.

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH mptcp-next v2 04/21] mptcp: establish subflows from either end of connection
  2022-01-14 22:43   ` Mat Martineau
@ 2022-01-17  8:59     ` Paolo Abeni
  2022-01-19  1:26       ` Kishen Maloor
  0 siblings, 1 reply; 45+ messages in thread
From: Paolo Abeni @ 2022-01-17  8:59 UTC (permalink / raw)
  To: Mat Martineau, Kishen Maloor; +Cc: mptcp

On Fri, 2022-01-14 at 14:43 -0800, Mat Martineau wrote:
> On Wed, 12 Jan 2022, Kishen Maloor wrote:
> 
> > This change updates internal logic to permit subflows to be
> > established from either the client or server ends of MPTCP
> > connections. This symmetry and added flexibility may be
> > harnessed by PM implementations running on either end in
> > creating new subflows.
> > 
> > The essence of this change lies in not relying on the
> > "server_side" flag (which continues to be available if needed).
> > 
> 
> After this patch, server_side is unused except for some debug output. If 
> it's really no longer referenced (see below first), may be better to just 
> remove it.
> 
> > v2: check for 3rd ACK retransmission only on passive side
> > of the MPJ handshake
> > 
> > Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
> > ---
> > net/mptcp/options.c  | 2 +-
> > net/mptcp/protocol.c | 5 +----
> > net/mptcp/protocol.h | 2 --
> > 3 files changed, 2 insertions(+), 7 deletions(-)
> > 
> > diff --git a/net/mptcp/options.c b/net/mptcp/options.c
> > index cceba8c7806d..89808816db06 100644
> > --- a/net/mptcp/options.c
> > +++ b/net/mptcp/options.c
> > @@ -921,7 +921,7 @@ static bool check_fully_established(struct mptcp_sock *msk, struct sock *ssk,
> > 		if (TCP_SKB_CB(skb)->seq == subflow->ssn_offset + 1 &&
> > 		    TCP_SKB_CB(skb)->end_seq == TCP_SKB_CB(skb)->seq &&
> > 		    subflow->mp_join && (mp_opt->suboptions & OPTIONS_MPTCP_MPJ) &&
> > -		    READ_ONCE(msk->pm.server_side))
> > +		    !subflow->request_join)
> > 			tcp_send_ack(ssk);
> > 		goto fully_established;
> > 	}
> > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> > index 4012f844eec1..408a05bff633 100644
> > --- a/net/mptcp/protocol.c
> > +++ b/net/mptcp/protocol.c
> > @@ -3248,15 +3248,12 @@ bool mptcp_finish_join(struct sock *ssk)
> > 		return false;
> > 	}
> > 
> > -	if (!msk->pm.server_side)
> > +	if (!list_empty(&subflow->node))
> > 		goto out;
> > 
> > 	if (!mptcp_pm_allow_new_subflow(msk))
> > 		goto err_prohibited;
> > 
> > -	if (WARN_ON_ONCE(!list_empty(&subflow->node)))
> > -		goto err_prohibited;
> > -
> > 	/* active connections are already on conn_list.
> > 	 * If we can't acquire msk socket lock here, let the release callback
> > 	 * handle it
> > diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> > index e2a67d3469f6..c50247673c7e 100644
> > --- a/net/mptcp/protocol.h
> > +++ b/net/mptcp/protocol.h
> > @@ -908,10 +908,8 @@ static inline bool mptcp_check_infinite_map(struct sk_buff *skb)
> > static inline bool subflow_simultaneous_connect(struct sock *sk)
> > {
> > 	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
> > -	struct sock *parent = subflow->conn;
> > 
> > 	return sk->sk_state == TCP_ESTABLISHED &&
> > -	       !mptcp_sk(parent)->pm.server_side &&
> > 	       !subflow->conn_finished;
> > }
> 
> Are changes in this function needed? This code was added to address
> https://github.com/multipath-tcp/mptcp_net-next/issues/35
> which was a weird case encountered with syzkaller where a socket tried to 
> connect to itself, triggering the "simultaneous open" code path. It 
> doesn't seem to be applicable to MP_JOINs.

I *think* we can reach here even with MPJ, e.g. if we have port-based
endpoint matching the src address/port - possibly syzkaller could be
able to reach such scenario.

Additionally I fear dropping the '!mptcp_sk(parent)->pm.server_side &&'
check could re-introduce the initial issue, at least for the MPC
handshake.

/P


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH mptcp-next v2 08/21] mptcp: attempt to add listening sockets for announced addrs
  2022-01-14 15:54   ` Paolo Abeni
  2022-01-14 17:47     ` Matthieu Baerts
  2022-01-14 22:26     ` Mat Martineau
@ 2022-01-19  1:20     ` Kishen Maloor
  2 siblings, 0 replies; 45+ messages in thread
From: Kishen Maloor @ 2022-01-19  1:20 UTC (permalink / raw)
  To: Paolo Abeni, mptcp

Hi Paolo, Matthieu,

On 1/14/22 7:54 AM, Paolo Abeni wrote:
> Hello,
> 
> On Wed, 2022-01-12 at 17:15 -0500, Kishen Maloor wrote:
>> When ADD_ADDR announcements use the port associated with an
>> active subflow, this change ensures that a listening socket is
>> bound to the announced address and port for subsequently
>> receiving MP_JOINs from the remote end. In case there's
>> a recorded lsk bound to that address+port, it is reused.
>> But if a listening socket for this address is already held by the
>> application then no further action is taken.
>>
>> When a listening socket is created, it is stored in
>> struct mptcp_pm_add_entry and released accordingly.
>>
>> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/203
>>
>> v2: fixed formatting
>>
>> Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
> 
> should be either:
> 
> """
> <changelog>
> 
> <tags>
> """
> 
> or:
> 
> """
> <tags>
> ---
> <changelog>
> """
> 
> we usually keep the changelog outside the commit message for
> development history before landing on netdev, that is:

Thanks! I shall reflect this change in the related commit messages.

> 
> """
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/203
> Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
> ---
> v2: fixed formatting
> """
> 
>> ---
>>  net/mptcp/pm_netlink.c | 56 ++++++++++++++++++++++++++++++++++++++++--
>>  1 file changed, 54 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
>> index 779ec9d375f0..e2211f3b8c8c 100644
>> --- a/net/mptcp/pm_netlink.c
>> +++ b/net/mptcp/pm_netlink.c
>> @@ -43,6 +43,7 @@ struct mptcp_pm_add_entry {
>>  	struct mptcp_addr_info	addr;
>>  	struct timer_list	add_timer;
>>  	struct mptcp_sock	*sock;
>> +	struct mptcp_local_lsk	*lsk_ref;
>>  	u8			retrans_times;
>>  };
>>  
>> @@ -66,6 +67,10 @@ struct pm_nl_pernet {
>>  #define MPTCP_PM_ADDR_MAX	8
>>  #define ADD_ADDR_RETRANS_MAX	3
>>  
>> +static int mptcp_pm_nl_create_listen_socket(struct sock *sk,
>> +					    struct mptcp_pm_addr_entry *entry,
>> +					    struct socket **lsk);
>> +
>>  static bool addresses_equal(const struct mptcp_addr_info *a,
>>  			    const struct mptcp_addr_info *b, bool use_port)
>>  {
>> @@ -438,7 +443,8 @@ mptcp_pm_del_add_timer(struct mptcp_sock *msk,
>>  }
>>  
>>  static bool mptcp_pm_alloc_anno_list(struct mptcp_sock *msk,
>> -				     struct mptcp_pm_addr_entry *entry)
>> +				     struct mptcp_pm_addr_entry *entry,
>> +				     struct mptcp_local_lsk *lsk_ref)
>>  {
>>  	struct mptcp_pm_add_entry *add_entry = NULL;
>>  	struct sock *sk = (struct sock *)msk;
>> @@ -458,6 +464,10 @@ static bool mptcp_pm_alloc_anno_list(struct mptcp_sock *msk,
>>  	add_entry->addr = entry->addr;
>>  	add_entry->sock = msk;
>>  	add_entry->retrans_times = 0;
>> +	add_entry->lsk_ref = lsk_ref;
>> +
>> +	if (lsk_ref)
>> +		lsk_list_add_ref(lsk_ref);
>>  
>>  	timer_setup(&add_entry->add_timer, mptcp_pm_add_timer, 0);
>>  	sk_reset_timer(sk, &add_entry->add_timer,
>> @@ -470,8 +480,11 @@ void mptcp_pm_free_anno_list(struct mptcp_sock *msk)
>>  {
>>  	struct mptcp_pm_add_entry *entry, *tmp;
>>  	struct sock *sk = (struct sock *)msk;
>> +	struct pm_nl_pernet *pernet;
>>  	LIST_HEAD(free_list);
>>  
>> +	pernet = net_generic(sock_net(sk), pm_nl_pernet_id);
>> +
>>  	pr_debug("msk=%p", msk);
>>  
>>  	spin_lock_bh(&msk->pm.lock);
>> @@ -480,6 +493,8 @@ void mptcp_pm_free_anno_list(struct mptcp_sock *msk)
>>  
>>  	list_for_each_entry_safe(entry, tmp, &free_list, list) {
>>  		sk_stop_timer_sync(sk, &entry->add_timer);
>> +		if (entry->lsk_ref)
>> +			lsk_list_release(pernet, entry->lsk_ref);
>>  		kfree(entry);
>>  	}
>>  }
>> @@ -570,13 +585,16 @@ lookup_id_by_addr(struct pm_nl_pernet *pernet, const struct mptcp_addr_info *add
>>  }
>>  
>>  static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
>> +	__must_hold(&msk->pm.lock)
>>  {
>> +	struct mptcp_local_lsk *lsk_ref = NULL;
>>  	struct sock *sk = (struct sock *)msk;
>>  	struct mptcp_pm_addr_entry *local;
>>  	unsigned int add_addr_signal_max;
>>  	unsigned int local_addr_max;
>>  	struct pm_nl_pernet *pernet;
>>  	unsigned int subflows_max;
>> +	struct socket *lsk;
>>  
>>  	pernet = net_generic(sock_net(sk), pm_nl_pernet_id);
>>  
>> @@ -607,12 +625,39 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
>>  		local = select_signal_address(pernet, msk);
>>  
>>  		if (local) {
>> -			if (mptcp_pm_alloc_anno_list(msk, local)) {
>> +			if (!local->addr.port) {
>> +				local->addr.port =
>> +					((struct inet_sock *)inet_sk
>> +					 ((struct sock *)msk))->inet_sport;
>> +
>> +				lsk_ref = lsk_list_find(pernet, &local->addr);
>> +
>> +				if (!lsk_ref) {
>> +					spin_unlock_bh(&msk->pm.lock);
>> +
>> +					mptcp_pm_nl_create_listen_socket(sk, local, &lsk);
>> +
>> +					spin_lock_bh(&msk->pm.lock);
>> +
>> +					if (lsk)
>> +						lsk_ref = lsk_list_add(pernet, &local->addr, lsk);
>> +
>> +					if (lsk && !lsk_ref)
>> +						sock_release(lsk);
> 
> Let's suppose an user-space application listen on 2 different address
> (A, B) and does:
> 
> """
> s1 = socket()
> bind(s1, A)
> listen(s1)
>  // at this point incoming MPTCP connection can be established on s1
>  // and ADD_ADDR sub-options could be sent back 
> 
> s2 = socket()
> bind(s2, B)
> listen(s2)
> """
> 
> If there is a signal endpoint on B, the above listen can race with the 
> mptcp_pm_nl_create_listen_socket() call, leading to hard to track
> startup issues for user-space application.
> 
> I really think we want at list a configuration option, off by default,
> for this feature. Some specific self-test would be a plus.

Looking at your example above, assuming both A and B are bound to the same port
then yes, a race such as you suggest could occur. 

But if you consider bug #203, it arose only because there was no listener in
the application (and unexpectedly so). So if the path manager creates a listener (at the time 
of the address advertisement) to facilitate MPJs then that would have the usual
side effects of creating listeners (in general).

For e.g,. I think this clash could also occur with the existing code in the kernel PM and using a 
port-based endpoint when the app happens to bind a socket to that specific addr+port. 

The other scenario where the path manager needs to always establish a listener is when
running alongside an MPTCP client.

We could certainly add an "attempt to create lsk" option to the ADD_ADDR netlink commands,
as I believe you've both suggested, but perhaps we should think further about the guidance
regarding usage of this option.

For e.g., if creating lsks is not the default behavior, then bug #203 would persist 
unless the entity that issues the ADD_ADDR command exercises this option.

> 
> It will help reviewing, splitting this series in at least 2 chunks:
> * pre-reqs up to ~this patch
> * user-space PM specific stuff
> 
> Side note: it would be nice reducing the level of intentation, e.g.
> factoring-out part of the inner code in some helper.
> 
>> +				}
>> +
>> +				local->addr.port = 0;
>> +			}
>> +
>> +			if (mptcp_pm_alloc_anno_list(msk, local, lsk_ref)) {
>>  				__clear_bit(local->addr.id, msk->pm.id_avail_bitmap);
>>  				msk->pm.add_addr_signaled++;
>>  				mptcp_pm_announce_addr(msk, &local->addr, false);
>>  				mptcp_pm_nl_addr_send_ack(msk);
>>  			}
>> +
>> +			if (lsk_ref)
>> +				lsk_list_release(pernet, lsk_ref);
> 
> Probaly not very relevant, but something alike:
> 
> 	rcu_read_lock()
> 	lsk_ref = __lsk_list_find();
> 	if (lst_ref)
> 		if (mptcp_pm_alloc_anno_list(...)
> 	rcu_read_unlock()
> 
> would save a pair of possibly contended atomic operations in the common
> case.
> 
> Thanks!
> 
> Paolo
> 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH mptcp-next v2 13/21] mptcp: netlink: Add MPTCP_PM_CMD_ANNOUNCE
  2022-01-14 17:04   ` Paolo Abeni
@ 2022-01-19  1:21     ` Kishen Maloor
  0 siblings, 0 replies; 45+ messages in thread
From: Kishen Maloor @ 2022-01-19  1:21 UTC (permalink / raw)
  To: Paolo Abeni, mptcp

On 1/14/22 9:04 AM, Paolo Abeni wrote:
> On Wed, 2022-01-12 at 17:15 -0500, Kishen Maloor wrote:
>> This change adds a MPTCP netlink interface for issuing
>> ADD_ADDR advertisements over the chosen MPTCP connection from a
>> userspace path manager.
>>
>> The command requires the following parameters:
>> { token, { loc_id, family, daddr4 | daddr6 [, dport] } [, if_idx],
>> flags/signal }.
>>
>> Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
>> ---
>>  include/uapi/linux/mptcp.h |   2 +
>>  net/mptcp/pm_netlink.c     | 111 +++++++++++++++++++++++++++++++++++++
>>  2 files changed, 113 insertions(+)
>>
>> diff --git a/include/uapi/linux/mptcp.h b/include/uapi/linux/mptcp.h
>> index f106a3941cdf..40380be396c8 100644
>> --- a/include/uapi/linux/mptcp.h
>> +++ b/include/uapi/linux/mptcp.h
>> @@ -55,6 +55,7 @@ enum {
>>  	MPTCP_PM_ATTR_ADDR,				/* nested address */
>>  	MPTCP_PM_ATTR_RCV_ADD_ADDRS,			/* u32 */
>>  	MPTCP_PM_ATTR_SUBFLOWS,				/* u32 */
>> +	MPTCP_PM_ATTR_TOKEN,				/* u32 */
>>  
>>  	__MPTCP_PM_ATTR_MAX
>>  };
>> @@ -92,6 +93,7 @@ enum {
>>  	MPTCP_PM_CMD_SET_LIMITS,
>>  	MPTCP_PM_CMD_GET_LIMITS,
>>  	MPTCP_PM_CMD_SET_FLAGS,
>> +	MPTCP_PM_CMD_ANNOUNCE,
>>  
>>  	__MPTCP_PM_CMD_AFTER_LAST
>>  };
>> diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
>> index 07536abab3e5..18228b2dfa2d 100644
>> --- a/net/mptcp/pm_netlink.c
>> +++ b/net/mptcp/pm_netlink.c
>> @@ -1268,6 +1268,7 @@ static const struct nla_policy mptcp_pm_policy[MPTCP_PM_ATTR_MAX + 1] = {
>>  					NLA_POLICY_NESTED(mptcp_pm_addr_policy),
>>  	[MPTCP_PM_ATTR_RCV_ADD_ADDRS]	= { .type	= NLA_U32,	},
>>  	[MPTCP_PM_ATTR_SUBFLOWS]	= { .type	= NLA_U32,	},
>> +	[MPTCP_PM_ATTR_TOKEN]		= { .type	= NLA_U32,	},
>>  };
>>  
>>  void mptcp_pm_nl_subflow_chk_stale(const struct mptcp_sock *msk, struct sock *ssk)
>> @@ -2028,6 +2029,111 @@ static int mptcp_nl_addr_backup(struct net *net,
>>  	return ret;
>>  }
>>  
>> +static int mptcp_nl_cmd_announce(struct sk_buff *skb, struct genl_info *info)
>> +{
>> +	struct nlattr *token = info->attrs[MPTCP_PM_ATTR_TOKEN];
>> +	struct nlattr *addr = info->attrs[MPTCP_PM_ATTR_ADDR];
>> +	struct pm_nl_pernet *pernet = genl_info_pm_nl(info);
>> +	struct mptcp_pm_addr_entry addr_val;
>> +	struct mptcp_local_lsk *lsk_ref;
>> +	bool reuse_port = false;
>> +	struct mptcp_sock *msk;
>> +	struct socket *lsk;
>> +	u32 token_val;
>> +	int err;
>> +
>> +	if (!addr || !token) {
>> +		GENL_SET_ERR_MSG(info, "missing required inputs");
>> +		return -EINVAL;
>> +	}
>> +
>> +	token_val = nla_get_u32(token);
>> +
>> +	msk = mptcp_token_get_sock(sock_net(skb->sk), token_val);
>> +	if (!msk) {
>> +		NL_SET_ERR_MSG_ATTR(info->extack, token, "invalid token");
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (READ_ONCE(msk->pm.pm_type) != MPTCP_PM_TYPE_USERSPACE) {
>> +		GENL_SET_ERR_MSG(info, "invalid request; userspace PM not selected");
>> +		return -EINVAL;
>> +	}
>> +
>> +	err = mptcp_pm_parse_entry(addr, info, true, &addr_val);
>> +	if (err < 0) {
>> +		GENL_SET_ERR_MSG(info, "error parsing local address");
>> +		return err;
>> +	}
>> +
>> +	if (addr_val.addr.id == 0 || !(addr_val.flags & MPTCP_PM_ADDR_FLAG_SIGNAL)) {
>> +		GENL_SET_ERR_MSG(info, "invalid addr id or flags");
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (!addr_val.addr.port) {
>> +		addr_val.addr.port =
>> +			((struct inet_sock *)inet_sk
>> +			 ((struct sock *)msk))->inet_sport;
>> +
>> +		reuse_port = true;
>> +	}
>> +
>> +	lsk_ref = lsk_list_find(pernet, &addr_val.addr);
>> +
>> +	if (!lsk_ref) {
>> +		err = mptcp_pm_nl_create_listen_socket(skb->sk, &addr_val, &lsk);
>> +		if ((err && !reuse_port) || (err && (err != -EADDRINUSE) && reuse_port)) {
>> +			GENL_SET_ERR_MSG(info, "error creating listen socket");
>> +			return err;
>> +		}
>> +
>> +		if (lsk) {
>> +			lsk_ref = lsk_list_add(pernet, &addr_val.addr, lsk);
>> +			if (!lsk_ref) {
>> +				GENL_SET_ERR_MSG(info, "can't allocate lsk ref");
>> +				sock_release(lsk);
>> +				return -ENOMEM;
>> +			}
>> +		}
>> +	}
> 
> It looks like that the above chunk of code is sort of recurring. it's
> likely worthy to factor it in some helper alike
> "lsk_list_find_or_create()"

The find->create_listen_sock sequence is run in 3 places, but there are deviations in
the logic that follows a failed create_listen_sock. Hence, I did not over optimize.
But I will check if it makes sense to refactor.

> 
> Cheers,
> 
> Paolo
> 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH mptcp-next v2 10/21] mptcp: handle local addrs announced by userspace PMs
  2022-01-14 17:11   ` Paolo Abeni
@ 2022-01-19  1:24     ` Kishen Maloor
  2022-01-19 19:20       ` Mat Martineau
  0 siblings, 1 reply; 45+ messages in thread
From: Kishen Maloor @ 2022-01-19  1:24 UTC (permalink / raw)
  To: Paolo Abeni, mptcp

On 1/14/22 9:11 AM, Paolo Abeni wrote:
> On Wed, 2022-01-12 at 17:15 -0500, Kishen Maloor wrote:
>> This change adds a new internal function to store/retrieve local
>> addrs announced by userspace PM implementations from the kernel
>> context. The function does not stipulate any limitation on the #
>> of addrs, and handles the requirements of three scenarios:
>> 1) ADD_ADDR announcements (which require that a local id be
>> provided), 2) retrieving the local id associated with an address,
>> also where one may need to be assigned, and 3) reissuance of
>> ADD_ADDRs when there's a successful match of addr/id.
>>
>> The list of all stored local addr entries is held under the
>> MPTCP sock structure. This list, if not released by the REMOVE_ADDR
>> flow is freed while the sock is destructed.
> 
> It feels strange to me that we need to maintain an additional addresses
> list inside the kernel for the user-space PM - which should take care
> of all the status information.

The PM daemon will have the complete picture, but the protocol needs to know the
local ids assigned to addresses so as such the kernel has to store addresses 
(with their ids) regardless of PM. 

> 
> Why anno_list is not enough? Why isn't cheapest to extend it? 

The anno_list is the list of addresses that were announced via the ADD_ADDR command, to
be used specifically for doing re-transmissions.

However, the implementation can also accept connections at local addresses that were not
explicitly announced (hence not in anno_list), and in this case the kernel records the address
and assigns it a local id unique in the scope of its connection.

So basically msk->local_addr_list is the list of all known local addresses, both announced 
and not so their ids can be later retrieved.

To give you more context, in my last iteration of the code before I posted the series, I was storing local addrs
in pernet->local_addr_list just as its done for the kernel PM, but later moved it to
a per-msk list to eliminate contention (in accessing that list) with other userspace or kernel
PM managed connections.

> 
> Being the list unlimited a malicius (or buggy) user-space could consume
> all the kernel memory. I think we need some limits, or at least some
> accounting.

At this point we're taking the PM daemon as a trusted entity which can issue these
netlink commands for path management. So there is currently no configurable ceiling
in the kernel on the size of the PM's kernel stored context.

> 
> /P
> 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH mptcp-next v2 07/21] mptcp: netlink: process IPv6 addrs in creating listening sockets
  2022-01-14 22:09     ` Mat Martineau
@ 2022-01-19  1:25       ` Kishen Maloor
  2022-01-19 19:14         ` Mat Martineau
  0 siblings, 1 reply; 45+ messages in thread
From: Kishen Maloor @ 2022-01-19  1:25 UTC (permalink / raw)
  To: Mat Martineau, Paolo Abeni; +Cc: mptcp

On 1/14/22 2:09 PM, Mat Martineau wrote:
> On Fri, 14 Jan 2022, Paolo Abeni wrote:
> 
>> Hello,
>>
>> On Wed, 2022-01-12 at 17:15 -0500, Kishen Maloor wrote:
>>> This change updates mptcp_pm_nl_create_listen_socket() to create
>>> listening sockets bound to IPv6 addresses (where IPv6 is supported).
>>>
> 
> This tag -
> 
> Fixes: 1729cf186d8a ("mptcp: create the listening socket for new port")
> 
> would help with backporting to stable.
> 
>>> Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
>>> ---
>>>  net/mptcp/pm_netlink.c | 8 ++++++--
>>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
>>> index 4c1895dbc2a5..779ec9d375f0 100644
>>> --- a/net/mptcp/pm_netlink.c
>>> +++ b/net/mptcp/pm_netlink.c
>>> @@ -986,6 +986,7 @@ static int mptcp_pm_nl_create_listen_socket(struct sock *sk,
>>>                          struct mptcp_pm_addr_entry *entry,
>>>                          struct socket **lsk)
>>>  {
>>> +    int addrlen = sizeof(struct sockaddr_in);
>>>      struct sockaddr_storage addr;
>>>      struct mptcp_sock *msk;
>>>      struct socket *ssock;
>>> @@ -1010,8 +1011,11 @@ static int mptcp_pm_nl_create_listen_socket(struct sock *sk,
>>>      }
>>>
>>>      mptcp_info2sockaddr(&entry->addr, &addr, entry->addr.family);
>>> -    err = kernel_bind(ssock, (struct sockaddr *)&addr,
>>> -              sizeof(struct sockaddr_in));
>>> +#if IS_ENABLED(CONFIG_MPTCP_IPV6)
>>> +    if (entry->addr.family == AF_INET6)
>>> +        addrlen = sizeof(struct sockaddr_in6);
>>> +#endif
>>> +    err = kernel_bind(ssock, (struct sockaddr *)&addr, addrlen);
>>>      if (err) {
>>>          pr_warn("kernel_bind error, err=%d", err);
>>>          goto out;
>>
>> This looks a bugfix for -net to me?

Sure, we can record a bug and the fact that this commit fixes it.

>>
>> Possibly worthy additional an additional mp_join self-test for the ipv6
>> case.
>>
> 
> I agree, this would be good for -net. Kishen can you add the suggested selftest too and repost separately for mptcp-net?
> 

Actually, this path is currently exercised by self-tests in userspace_pm.sh through address 
advertisements from the namespace containing the MPTCP client, for which a listening 
socket is created to subsequently receive MPJs. 

> -- 
> Mat Martineau
> Intel


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH mptcp-next v2 04/21] mptcp: establish subflows from either end of connection
  2022-01-17  8:59     ` Paolo Abeni
@ 2022-01-19  1:26       ` Kishen Maloor
  2022-01-19 12:01         ` Paolo Abeni
  0 siblings, 1 reply; 45+ messages in thread
From: Kishen Maloor @ 2022-01-19  1:26 UTC (permalink / raw)
  To: Paolo Abeni, Mat Martineau; +Cc: mptcp

On 1/17/22 12:59 AM, Paolo Abeni wrote:
> On Fri, 2022-01-14 at 14:43 -0800, Mat Martineau wrote:
>> On Wed, 12 Jan 2022, Kishen Maloor wrote:
>>
>>> This change updates internal logic to permit subflows to be
>>> established from either the client or server ends of MPTCP
>>> connections. This symmetry and added flexibility may be
>>> harnessed by PM implementations running on either end in
>>> creating new subflows.
>>>
>>> The essence of this change lies in not relying on the
>>> "server_side" flag (which continues to be available if needed).
>>>
>>
>> After this patch, server_side is unused except for some debug output. If 
>> it's really no longer referenced (see below first), may be better to just 
>> remove it.

I think we'll need it. 

One possible use is to inform the PM daemon of the type of application 
(client/server) associated with a connection.

As we're discussing the idea of creating listening sockets only upon request, the 
knowledge that we are on the client end of a connection is something a PM daemon could
use to request creation of listening sockets whenever it advertises an address.

>>
>>> v2: check for 3rd ACK retransmission only on passive side
>>> of the MPJ handshake
>>>
>>> Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
>>> ---
>>> net/mptcp/options.c  | 2 +-
>>> net/mptcp/protocol.c | 5 +----
>>> net/mptcp/protocol.h | 2 --
>>> 3 files changed, 2 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/net/mptcp/options.c b/net/mptcp/options.c
>>> index cceba8c7806d..89808816db06 100644
>>> --- a/net/mptcp/options.c
>>> +++ b/net/mptcp/options.c
>>> @@ -921,7 +921,7 @@ static bool check_fully_established(struct mptcp_sock *msk, struct sock *ssk,
>>> 		if (TCP_SKB_CB(skb)->seq == subflow->ssn_offset + 1 &&
>>> 		    TCP_SKB_CB(skb)->end_seq == TCP_SKB_CB(skb)->seq &&
>>> 		    subflow->mp_join && (mp_opt->suboptions & OPTIONS_MPTCP_MPJ) &&
>>> -		    READ_ONCE(msk->pm.server_side))
>>> +		    !subflow->request_join)
>>> 			tcp_send_ack(ssk);
>>> 		goto fully_established;
>>> 	}
>>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>>> index 4012f844eec1..408a05bff633 100644
>>> --- a/net/mptcp/protocol.c
>>> +++ b/net/mptcp/protocol.c
>>> @@ -3248,15 +3248,12 @@ bool mptcp_finish_join(struct sock *ssk)
>>> 		return false;
>>> 	}
>>>
>>> -	if (!msk->pm.server_side)
>>> +	if (!list_empty(&subflow->node))
>>> 		goto out;
>>>
>>> 	if (!mptcp_pm_allow_new_subflow(msk))
>>> 		goto err_prohibited;
>>>
>>> -	if (WARN_ON_ONCE(!list_empty(&subflow->node)))
>>> -		goto err_prohibited;
>>> -
>>> 	/* active connections are already on conn_list.
>>> 	 * If we can't acquire msk socket lock here, let the release callback
>>> 	 * handle it
>>> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
>>> index e2a67d3469f6..c50247673c7e 100644
>>> --- a/net/mptcp/protocol.h
>>> +++ b/net/mptcp/protocol.h
>>> @@ -908,10 +908,8 @@ static inline bool mptcp_check_infinite_map(struct sk_buff *skb)
>>> static inline bool subflow_simultaneous_connect(struct sock *sk)
>>> {
>>> 	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
>>> -	struct sock *parent = subflow->conn;
>>>
>>> 	return sk->sk_state == TCP_ESTABLISHED &&
>>> -	       !mptcp_sk(parent)->pm.server_side &&
>>> 	       !subflow->conn_finished;
>>> }
>>
>> Are changes in this function needed? This code was added to address
>> https://github.com/multipath-tcp/mptcp_net-next/issues/35
>> which was a weird case encountered with syzkaller where a socket tried to 
>> connect to itself, triggering the "simultaneous open" code path. It 
>> doesn't seem to be applicable to MP_JOINs.
> 
> I *think* we can reach here even with MPJ, e.g. if we have port-based
> endpoint matching the src address/port - possibly syzkaller could be
> able to reach such scenario.
> 
> Additionally I fear dropping the '!mptcp_sk(parent)->pm.server_side &&'
> check could re-introduce the initial issue, at least for the MPC
> handshake.

The purpose of this commit is to permit the MPJ sequence to happen at either end of 
the connection. The change in mptcp_finish_join() had the most direct impact for enabling 
this, so I looked at other server_side checks mostly in terms of establishing that
bi-directional symmetry.

If the above condition holds only with MPCs, i.e. for the initial connection, then I think it would 
be fine to preserve the server_side check. But if it also holds for MPJs, then we might
have to account for both cases. Perhaps replace:

!mptcp_sk(parent)->pm.server_side)

with something like this?

((subflow->request_mptcp && !mptcp_sk(parent)->pm.server_side) ||
	subflow->request_join)

> 
> /P
> 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH mptcp-next v2 04/21] mptcp: establish subflows from either end of connection
  2022-01-19  1:26       ` Kishen Maloor
@ 2022-01-19 12:01         ` Paolo Abeni
  2022-01-19 17:59           ` Kishen Maloor
  0 siblings, 1 reply; 45+ messages in thread
From: Paolo Abeni @ 2022-01-19 12:01 UTC (permalink / raw)
  To: Kishen Maloor, Mat Martineau; +Cc: mptcp

On Tue, 2022-01-18 at 17:26 -0800, Kishen Maloor wrote:
> On 1/17/22 12:59 AM, Paolo Abeni wrote:
> > On Fri, 2022-01-14 at 14:43 -0800, Mat Martineau wrote:
> > > On Wed, 12 Jan 2022, Kishen Maloor wrote:
> > > 
> > > > This change updates internal logic to permit subflows to be
> > > > established from either the client or server ends of MPTCP
> > > > connections. This symmetry and added flexibility may be
> > > > harnessed by PM implementations running on either end in
> > > > creating new subflows.
> > > > 
> > > > The essence of this change lies in not relying on the
> > > > "server_side" flag (which continues to be available if needed).
> > > > 
> > > 
> > > After this patch, server_side is unused except for some debug output. If 
> > > it's really no longer referenced (see below first), may be better to just 
> > > remove it.
> 
> I think we'll need it. 
> 
> One possible use is to inform the PM daemon of the type of application 
> (client/server) associated with a connection.
> 
> As we're discussing the idea of creating listening sockets only upon request, the 
> knowledge that we are on the client end of a connection is something a PM daemon could
> use to request creation of listening sockets whenever it advertises an address.
> 
> > > 
> > > > v2: check for 3rd ACK retransmission only on passive side
> > > > of the MPJ handshake
> > > > 
> > > > Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
> > > > ---
> > > > net/mptcp/options.c  | 2 +-
> > > > net/mptcp/protocol.c | 5 +----
> > > > net/mptcp/protocol.h | 2 --
> > > > 3 files changed, 2 insertions(+), 7 deletions(-)
> > > > 
> > > > diff --git a/net/mptcp/options.c b/net/mptcp/options.c
> > > > index cceba8c7806d..89808816db06 100644
> > > > --- a/net/mptcp/options.c
> > > > +++ b/net/mptcp/options.c
> > > > @@ -921,7 +921,7 @@ static bool check_fully_established(struct mptcp_sock *msk, struct sock *ssk,
> > > > 		if (TCP_SKB_CB(skb)->seq == subflow->ssn_offset + 1 &&
> > > > 		    TCP_SKB_CB(skb)->end_seq == TCP_SKB_CB(skb)->seq &&
> > > > 		    subflow->mp_join && (mp_opt->suboptions & OPTIONS_MPTCP_MPJ) &&
> > > > -		    READ_ONCE(msk->pm.server_side))
> > > > +		    !subflow->request_join)
> > > > 			tcp_send_ack(ssk);
> > > > 		goto fully_established;
> > > > 	}
> > > > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> > > > index 4012f844eec1..408a05bff633 100644
> > > > --- a/net/mptcp/protocol.c
> > > > +++ b/net/mptcp/protocol.c
> > > > @@ -3248,15 +3248,12 @@ bool mptcp_finish_join(struct sock *ssk)
> > > > 		return false;
> > > > 	}
> > > > 
> > > > -	if (!msk->pm.server_side)
> > > > +	if (!list_empty(&subflow->node))
> > > > 		goto out;
> > > > 
> > > > 	if (!mptcp_pm_allow_new_subflow(msk))
> > > > 		goto err_prohibited;
> > > > 
> > > > -	if (WARN_ON_ONCE(!list_empty(&subflow->node)))
> > > > -		goto err_prohibited;
> > > > -
> > > > 	/* active connections are already on conn_list.
> > > > 	 * If we can't acquire msk socket lock here, let the release callback
> > > > 	 * handle it
> > > > diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> > > > index e2a67d3469f6..c50247673c7e 100644
> > > > --- a/net/mptcp/protocol.h
> > > > +++ b/net/mptcp/protocol.h
> > > > @@ -908,10 +908,8 @@ static inline bool mptcp_check_infinite_map(struct sk_buff *skb)
> > > > static inline bool subflow_simultaneous_connect(struct sock *sk)
> > > > {
> > > > 	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
> > > > -	struct sock *parent = subflow->conn;
> > > > 
> > > > 	return sk->sk_state == TCP_ESTABLISHED &&
> > > > -	       !mptcp_sk(parent)->pm.server_side &&
> > > > 	       !subflow->conn_finished;
> > > > }
> > > 
> > > Are changes in this function needed? This code was added to address
> > > https://github.com/multipath-tcp/mptcp_net-next/issues/35
> > > which was a weird case encountered with syzkaller where a socket tried to 
> > > connect to itself, triggering the "simultaneous open" code path. It 
> > > doesn't seem to be applicable to MP_JOINs.
> > 
> > I *think* we can reach here even with MPJ, e.g. if we have port-based
> > endpoint matching the src address/port - possibly syzkaller could be
> > able to reach such scenario.
> > 
> > Additionally I fear dropping the '!mptcp_sk(parent)->pm.server_side &&'
> > check could re-introduce the initial issue, at least for the MPC
> > handshake.
> 
> The purpose of this commit is to permit the MPJ sequence to happen at either end of 
> the connection. The change in mptcp_finish_join() had the most direct impact for enabling 
> this, so I looked at other server_side checks mostly in terms of establishing that
> bi-directional symmetry.
> 
> If the above condition holds only with MPCs, i.e. for the initial connection, then I think it would 
> be fine to preserve the server_side check. But if it also holds for MPJs, then we might
> have to account for both cases. Perhaps replace:
> 
> !mptcp_sk(parent)->pm.server_side)
> 
> with something like this?
> 
> ((subflow->request_mptcp && !mptcp_sk(parent)->pm.server_side) ||
> 	subflow->request_join)

It looks like 'request_mptcp' is 0 on passive sockets, so:

 	subflow->request_mptcp || subflow->request_join

should suffice.

Possibly factoring out an helper for the above condition would make the
code more readable.

/P


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH mptcp-next v2 04/21] mptcp: establish subflows from either end of connection
  2022-01-19 12:01         ` Paolo Abeni
@ 2022-01-19 17:59           ` Kishen Maloor
  2022-01-19 18:59             ` Mat Martineau
  0 siblings, 1 reply; 45+ messages in thread
From: Kishen Maloor @ 2022-01-19 17:59 UTC (permalink / raw)
  To: Paolo Abeni, Mat Martineau; +Cc: mptcp

On 1/19/22 4:01 AM, Paolo Abeni wrote:
> On Tue, 2022-01-18 at 17:26 -0800, Kishen Maloor wrote:
>> On 1/17/22 12:59 AM, Paolo Abeni wrote:
>>> On Fri, 2022-01-14 at 14:43 -0800, Mat Martineau wrote:
>>>> On Wed, 12 Jan 2022, Kishen Maloor wrote:
>>>>
>>>>> This change updates internal logic to permit subflows to be
>>>>> established from either the client or server ends of MPTCP
>>>>> connections. This symmetry and added flexibility may be
>>>>> harnessed by PM implementations running on either end in
>>>>> creating new subflows.
>>>>>
>>>>> The essence of this change lies in not relying on the
>>>>> "server_side" flag (which continues to be available if needed).
>>>>>
>>>>
>>>> After this patch, server_side is unused except for some debug output. If 
>>>> it's really no longer referenced (see below first), may be better to just 
>>>> remove it.
>>
>> I think we'll need it. 
>>
>> One possible use is to inform the PM daemon of the type of application 
>> (client/server) associated with a connection.
>>
>> As we're discussing the idea of creating listening sockets only upon request, the 
>> knowledge that we are on the client end of a connection is something a PM daemon could
>> use to request creation of listening sockets whenever it advertises an address.
>>
>>>>
>>>>> v2: check for 3rd ACK retransmission only on passive side
>>>>> of the MPJ handshake
>>>>>
>>>>> Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
>>>>> ---
>>>>> net/mptcp/options.c  | 2 +-
>>>>> net/mptcp/protocol.c | 5 +----
>>>>> net/mptcp/protocol.h | 2 --
>>>>> 3 files changed, 2 insertions(+), 7 deletions(-)
>>>>>
>>>>> diff --git a/net/mptcp/options.c b/net/mptcp/options.c
>>>>> index cceba8c7806d..89808816db06 100644
>>>>> --- a/net/mptcp/options.c
>>>>> +++ b/net/mptcp/options.c
>>>>> @@ -921,7 +921,7 @@ static bool check_fully_established(struct mptcp_sock *msk, struct sock *ssk,
>>>>> 		if (TCP_SKB_CB(skb)->seq == subflow->ssn_offset + 1 &&
>>>>> 		    TCP_SKB_CB(skb)->end_seq == TCP_SKB_CB(skb)->seq &&
>>>>> 		    subflow->mp_join && (mp_opt->suboptions & OPTIONS_MPTCP_MPJ) &&
>>>>> -		    READ_ONCE(msk->pm.server_side))
>>>>> +		    !subflow->request_join)
>>>>> 			tcp_send_ack(ssk);
>>>>> 		goto fully_established;
>>>>> 	}
>>>>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>>>>> index 4012f844eec1..408a05bff633 100644
>>>>> --- a/net/mptcp/protocol.c
>>>>> +++ b/net/mptcp/protocol.c
>>>>> @@ -3248,15 +3248,12 @@ bool mptcp_finish_join(struct sock *ssk)
>>>>> 		return false;
>>>>> 	}
>>>>>
>>>>> -	if (!msk->pm.server_side)
>>>>> +	if (!list_empty(&subflow->node))
>>>>> 		goto out;
>>>>>
>>>>> 	if (!mptcp_pm_allow_new_subflow(msk))
>>>>> 		goto err_prohibited;
>>>>>
>>>>> -	if (WARN_ON_ONCE(!list_empty(&subflow->node)))
>>>>> -		goto err_prohibited;
>>>>> -
>>>>> 	/* active connections are already on conn_list.
>>>>> 	 * If we can't acquire msk socket lock here, let the release callback
>>>>> 	 * handle it
>>>>> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
>>>>> index e2a67d3469f6..c50247673c7e 100644
>>>>> --- a/net/mptcp/protocol.h
>>>>> +++ b/net/mptcp/protocol.h
>>>>> @@ -908,10 +908,8 @@ static inline bool mptcp_check_infinite_map(struct sk_buff *skb)
>>>>> static inline bool subflow_simultaneous_connect(struct sock *sk)
>>>>> {
>>>>> 	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
>>>>> -	struct sock *parent = subflow->conn;
>>>>>
>>>>> 	return sk->sk_state == TCP_ESTABLISHED &&
>>>>> -	       !mptcp_sk(parent)->pm.server_side &&
>>>>> 	       !subflow->conn_finished;
>>>>> }
>>>>
>>>> Are changes in this function needed? This code was added to address
>>>> https://github.com/multipath-tcp/mptcp_net-next/issues/35
>>>> which was a weird case encountered with syzkaller where a socket tried to 
>>>> connect to itself, triggering the "simultaneous open" code path. It 
>>>> doesn't seem to be applicable to MP_JOINs.
>>>
>>> I *think* we can reach here even with MPJ, e.g. if we have port-based
>>> endpoint matching the src address/port - possibly syzkaller could be
>>> able to reach such scenario.
>>>
>>> Additionally I fear dropping the '!mptcp_sk(parent)->pm.server_side &&'
>>> check could re-introduce the initial issue, at least for the MPC
>>> handshake.
>>
>> The purpose of this commit is to permit the MPJ sequence to happen at either end of 
>> the connection. The change in mptcp_finish_join() had the most direct impact for enabling 
>> this, so I looked at other server_side checks mostly in terms of establishing that
>> bi-directional symmetry.
>>
>> If the above condition holds only with MPCs, i.e. for the initial connection, then I think it would 
>> be fine to preserve the server_side check. But if it also holds for MPJs, then we might
>> have to account for both cases. Perhaps replace:
>>
>> !mptcp_sk(parent)->pm.server_side)
>>
>> with something like this?
>>
>> ((subflow->request_mptcp && !mptcp_sk(parent)->pm.server_side) ||
>> 	subflow->request_join)
> 
> It looks like 'request_mptcp' is 0 on passive sockets, so:
> 
>  	subflow->request_mptcp || subflow->request_join
> 
> should suffice.

That's right. Makes sense.

> 
> Possibly factoring out an helper for the above condition would make the
> code more readable.

Thanks! Will do so.

> 
> /P
> 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH mptcp-next v2 04/21] mptcp: establish subflows from either end of connection
  2022-01-19 17:59           ` Kishen Maloor
@ 2022-01-19 18:59             ` Mat Martineau
  0 siblings, 0 replies; 45+ messages in thread
From: Mat Martineau @ 2022-01-19 18:59 UTC (permalink / raw)
  To: Kishen Maloor; +Cc: Paolo Abeni, mptcp

On Wed, 19 Jan 2022, Kishen Maloor wrote:

> On 1/19/22 4:01 AM, Paolo Abeni wrote:
>> On Tue, 2022-01-18 at 17:26 -0800, Kishen Maloor wrote:
>>> On 1/17/22 12:59 AM, Paolo Abeni wrote:
>>>> On Fri, 2022-01-14 at 14:43 -0800, Mat Martineau wrote:
>>>>> On Wed, 12 Jan 2022, Kishen Maloor wrote:
>>>>>
>>>>>> This change updates internal logic to permit subflows to be
>>>>>> established from either the client or server ends of MPTCP
>>>>>> connections. This symmetry and added flexibility may be
>>>>>> harnessed by PM implementations running on either end in
>>>>>> creating new subflows.
>>>>>>
>>>>>> The essence of this change lies in not relying on the
>>>>>> "server_side" flag (which continues to be available if needed).
>>>>>>
>>>>>
>>>>> After this patch, server_side is unused except for some debug output. If
>>>>> it's really no longer referenced (see below first), may be better to just
>>>>> remove it.
>>>
>>> I think we'll need it.
>>>
>>> One possible use is to inform the PM daemon of the type of application
>>> (client/server) associated with a connection.
>>>
>>> As we're discussing the idea of creating listening sockets only upon request, the
>>> knowledge that we are on the client end of a connection is something a PM daemon could
>>> use to request creation of listening sockets whenever it advertises an address.
>>>

Ok, makes sense.

>>>>>
>>>>>> v2: check for 3rd ACK retransmission only on passive side
>>>>>> of the MPJ handshake
>>>>>>
>>>>>> Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
>>>>>> ---
>>>>>> net/mptcp/options.c  | 2 +-
>>>>>> net/mptcp/protocol.c | 5 +----
>>>>>> net/mptcp/protocol.h | 2 --
>>>>>> 3 files changed, 2 insertions(+), 7 deletions(-)
>>>>>>
>>>>>> diff --git a/net/mptcp/options.c b/net/mptcp/options.c
>>>>>> index cceba8c7806d..89808816db06 100644
>>>>>> --- a/net/mptcp/options.c
>>>>>> +++ b/net/mptcp/options.c
>>>>>> @@ -921,7 +921,7 @@ static bool check_fully_established(struct mptcp_sock *msk, struct sock *ssk,
>>>>>> 		if (TCP_SKB_CB(skb)->seq == subflow->ssn_offset + 1 &&
>>>>>> 		    TCP_SKB_CB(skb)->end_seq == TCP_SKB_CB(skb)->seq &&
>>>>>> 		    subflow->mp_join && (mp_opt->suboptions & OPTIONS_MPTCP_MPJ) &&
>>>>>> -		    READ_ONCE(msk->pm.server_side))
>>>>>> +		    !subflow->request_join)
>>>>>> 			tcp_send_ack(ssk);
>>>>>> 		goto fully_established;
>>>>>> 	}
>>>>>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>>>>>> index 4012f844eec1..408a05bff633 100644
>>>>>> --- a/net/mptcp/protocol.c
>>>>>> +++ b/net/mptcp/protocol.c
>>>>>> @@ -3248,15 +3248,12 @@ bool mptcp_finish_join(struct sock *ssk)
>>>>>> 		return false;
>>>>>> 	}
>>>>>>
>>>>>> -	if (!msk->pm.server_side)
>>>>>> +	if (!list_empty(&subflow->node))
>>>>>> 		goto out;
>>>>>>
>>>>>> 	if (!mptcp_pm_allow_new_subflow(msk))
>>>>>> 		goto err_prohibited;
>>>>>>
>>>>>> -	if (WARN_ON_ONCE(!list_empty(&subflow->node)))
>>>>>> -		goto err_prohibited;
>>>>>> -
>>>>>> 	/* active connections are already on conn_list.
>>>>>> 	 * If we can't acquire msk socket lock here, let the release callback
>>>>>> 	 * handle it
>>>>>> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
>>>>>> index e2a67d3469f6..c50247673c7e 100644
>>>>>> --- a/net/mptcp/protocol.h
>>>>>> +++ b/net/mptcp/protocol.h
>>>>>> @@ -908,10 +908,8 @@ static inline bool mptcp_check_infinite_map(struct sk_buff *skb)
>>>>>> static inline bool subflow_simultaneous_connect(struct sock *sk)
>>>>>> {
>>>>>> 	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
>>>>>> -	struct sock *parent = subflow->conn;
>>>>>>
>>>>>> 	return sk->sk_state == TCP_ESTABLISHED &&
>>>>>> -	       !mptcp_sk(parent)->pm.server_side &&
>>>>>> 	       !subflow->conn_finished;
>>>>>> }
>>>>>
>>>>> Are changes in this function needed? This code was added to address
>>>>> https://github.com/multipath-tcp/mptcp_net-next/issues/35
>>>>> which was a weird case encountered with syzkaller where a socket tried to
>>>>> connect to itself, triggering the "simultaneous open" code path. It
>>>>> doesn't seem to be applicable to MP_JOINs.
>>>>
>>>> I *think* we can reach here even with MPJ, e.g. if we have port-based
>>>> endpoint matching the src address/port - possibly syzkaller could be
>>>> able to reach such scenario.
>>>>
>>>> Additionally I fear dropping the '!mptcp_sk(parent)->pm.server_side &&'
>>>> check could re-introduce the initial issue, at least for the MPC
>>>> handshake.
>>>
>>> The purpose of this commit is to permit the MPJ sequence to happen at either end of
>>> the connection. The change in mptcp_finish_join() had the most direct impact for enabling
>>> this, so I looked at other server_side checks mostly in terms of establishing that
>>> bi-directional symmetry.
>>>
>>> If the above condition holds only with MPCs, i.e. for the initial connection, then I think it would
>>> be fine to preserve the server_side check. But if it also holds for MPJs, then we might
>>> have to account for both cases. Perhaps replace:
>>>
>>> !mptcp_sk(parent)->pm.server_side)
>>>
>>> with something like this?
>>>
>>> ((subflow->request_mptcp && !mptcp_sk(parent)->pm.server_side) ||
>>> 	subflow->request_join)
>>
>> It looks like 'request_mptcp' is 0 on passive sockets, so:
>>
>>  	subflow->request_mptcp || subflow->request_join
>>
>> should suffice.
>
> That's right. Makes sense.
>

Yeah, good to simplify this even if 'pm.server_side' is still around.

>>
>> Possibly factoring out an helper for the above condition would make the
>> code more readable.
>
> Thanks! Will do so.
>



--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH mptcp-next v2 07/21] mptcp: netlink: process IPv6 addrs in creating listening sockets
  2022-01-19  1:25       ` Kishen Maloor
@ 2022-01-19 19:14         ` Mat Martineau
  0 siblings, 0 replies; 45+ messages in thread
From: Mat Martineau @ 2022-01-19 19:14 UTC (permalink / raw)
  To: Kishen Maloor; +Cc: Paolo Abeni, mptcp

[-- Attachment #1: Type: text/plain, Size: 2803 bytes --]

On Tue, 18 Jan 2022, Kishen Maloor wrote:

> On 1/14/22 2:09 PM, Mat Martineau wrote:
>> On Fri, 14 Jan 2022, Paolo Abeni wrote:
>>
>>> Hello,
>>>
>>> On Wed, 2022-01-12 at 17:15 -0500, Kishen Maloor wrote:
>>>> This change updates mptcp_pm_nl_create_listen_socket() to create
>>>> listening sockets bound to IPv6 addresses (where IPv6 is supported).
>>>>
>>
>> This tag -
>>
>> Fixes: 1729cf186d8a ("mptcp: create the listening socket for new port")
>>
>> would help with backporting to stable.
>>
>>>> Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
>>>> ---
>>>>  net/mptcp/pm_netlink.c | 8 ++++++--
>>>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
>>>> index 4c1895dbc2a5..779ec9d375f0 100644
>>>> --- a/net/mptcp/pm_netlink.c
>>>> +++ b/net/mptcp/pm_netlink.c
>>>> @@ -986,6 +986,7 @@ static int mptcp_pm_nl_create_listen_socket(struct sock *sk,
>>>>                          struct mptcp_pm_addr_entry *entry,
>>>>                          struct socket **lsk)
>>>>  {
>>>> +    int addrlen = sizeof(struct sockaddr_in);
>>>>      struct sockaddr_storage addr;
>>>>      struct mptcp_sock *msk;
>>>>      struct socket *ssock;
>>>> @@ -1010,8 +1011,11 @@ static int mptcp_pm_nl_create_listen_socket(struct sock *sk,
>>>>      }
>>>>
>>>>      mptcp_info2sockaddr(&entry->addr, &addr, entry->addr.family);
>>>> -    err = kernel_bind(ssock, (struct sockaddr *)&addr,
>>>> -              sizeof(struct sockaddr_in));
>>>> +#if IS_ENABLED(CONFIG_MPTCP_IPV6)
>>>> +    if (entry->addr.family == AF_INET6)
>>>> +        addrlen = sizeof(struct sockaddr_in6);
>>>> +#endif
>>>> +    err = kernel_bind(ssock, (struct sockaddr *)&addr, addrlen);
>>>>      if (err) {
>>>>          pr_warn("kernel_bind error, err=%d", err);
>>>>          goto out;
>>>
>>> This looks a bugfix for -net to me?
>
> Sure, we can record a bug and the fact that this commit fixes it.
>
>>>
>>> Possibly worthy additional an additional mp_join self-test for the ipv6
>>> case.
>>>
>>
>> I agree, this would be good for -net. Kishen can you add the suggested selftest too and repost separately for mptcp-net?
>>
>
> Actually, this path is currently exercised by self-tests in userspace_pm.sh through address
> advertisements from the namespace containing the MPTCP client, for which a listening
> socket is created to subsequently receive MPJs.

Ok, it's good that it's covered going forward. Re-reading Paolo's comment, 
maybe that was the main purpose of the requested test (rather than adding 
to self test coverage in -stable). Should be enough to have the kernel fix 
itself for the -net patch.

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH mptcp-next v2 10/21] mptcp: handle local addrs announced by userspace PMs
  2022-01-19  1:24     ` Kishen Maloor
@ 2022-01-19 19:20       ` Mat Martineau
  2022-01-19 20:27         ` Kishen Maloor
  0 siblings, 1 reply; 45+ messages in thread
From: Mat Martineau @ 2022-01-19 19:20 UTC (permalink / raw)
  To: Kishen Maloor; +Cc: Paolo Abeni, mptcp

On Tue, 18 Jan 2022, Kishen Maloor wrote:

> On 1/14/22 9:11 AM, Paolo Abeni wrote:
>> On Wed, 2022-01-12 at 17:15 -0500, Kishen Maloor wrote:
>>> This change adds a new internal function to store/retrieve local
>>> addrs announced by userspace PM implementations from the kernel
>>> context. The function does not stipulate any limitation on the #
>>> of addrs, and handles the requirements of three scenarios:
>>> 1) ADD_ADDR announcements (which require that a local id be
>>> provided), 2) retrieving the local id associated with an address,
>>> also where one may need to be assigned, and 3) reissuance of
>>> ADD_ADDRs when there's a successful match of addr/id.
>>>
>>> The list of all stored local addr entries is held under the
>>> MPTCP sock structure. This list, if not released by the REMOVE_ADDR
>>> flow is freed while the sock is destructed.
>>
>> It feels strange to me that we need to maintain an additional addresses
>> list inside the kernel for the user-space PM - which should take care
>> of all the status information.
>
> The PM daemon will have the complete picture, but the protocol needs to know the
> local ids assigned to addresses so as such the kernel has to store addresses
> (with their ids) regardless of PM.
>
>>
>> Why anno_list is not enough? Why isn't cheapest to extend it?
>
> The anno_list is the list of addresses that were announced via the ADD_ADDR command, to
> be used specifically for doing re-transmissions.
>
> However, the implementation can also accept connections at local addresses that were not
> explicitly announced (hence not in anno_list), and in this case the kernel records the address
> and assigns it a local id unique in the scope of its connection.
>
> So basically msk->local_addr_list is the list of all known local addresses, both announced
> and not so their ids can be later retrieved.
>
> To give you more context, in my last iteration of the code before I posted the series, I was storing local addrs
> in pernet->local_addr_list just as its done for the kernel PM, but later moved it to
> a per-msk list to eliminate contention (in accessing that list) with other userspace or kernel
> PM managed connections.
>
>>
>> Being the list unlimited a malicius (or buggy) user-space could consume
>> all the kernel memory. I think we need some limits, or at least some
>> accounting.
>
> At this point we're taking the PM daemon as a trusted entity which can issue these
> netlink commands for path management. So there is currently no configurable ceiling
> in the kernel on the size of the PM's kernel stored context.
>

I think it's still worthwhile to have some limits/accounting as Paolo 
suggests - part of the point of pushing PM code to userspace is so bugs or 
other vulnerabilities don't take down the whole machine.

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH mptcp-next v2 10/21] mptcp: handle local addrs announced by userspace PMs
  2022-01-19 19:20       ` Mat Martineau
@ 2022-01-19 20:27         ` Kishen Maloor
  2022-01-19 20:44           ` Mat Martineau
  0 siblings, 1 reply; 45+ messages in thread
From: Kishen Maloor @ 2022-01-19 20:27 UTC (permalink / raw)
  To: Mat Martineau; +Cc: Paolo Abeni, mptcp

On 1/19/22 11:20 AM, Mat Martineau wrote:
> On Tue, 18 Jan 2022, Kishen Maloor wrote:
> 
>> On 1/14/22 9:11 AM, Paolo Abeni wrote:
>>> On Wed, 2022-01-12 at 17:15 -0500, Kishen Maloor wrote:
>>>> This change adds a new internal function to store/retrieve local
>>>> addrs announced by userspace PM implementations from the kernel
>>>> context. The function does not stipulate any limitation on the #
>>>> of addrs, and handles the requirements of three scenarios:
>>>> 1) ADD_ADDR announcements (which require that a local id be
>>>> provided), 2) retrieving the local id associated with an address,
>>>> also where one may need to be assigned, and 3) reissuance of
>>>> ADD_ADDRs when there's a successful match of addr/id.
>>>>
>>>> The list of all stored local addr entries is held under the
>>>> MPTCP sock structure. This list, if not released by the REMOVE_ADDR
>>>> flow is freed while the sock is destructed.
>>>
>>> It feels strange to me that we need to maintain an additional addresses
>>> list inside the kernel for the user-space PM - which should take care
>>> of all the status information.
>>
>> The PM daemon will have the complete picture, but the protocol needs to know the
>> local ids assigned to addresses so as such the kernel has to store addresses
>> (with their ids) regardless of PM.
>>
>>>
>>> Why anno_list is not enough? Why isn't cheapest to extend it?
>>
>> The anno_list is the list of addresses that were announced via the ADD_ADDR command, to
>> be used specifically for doing re-transmissions.
>>
>> However, the implementation can also accept connections at local addresses that were not
>> explicitly announced (hence not in anno_list), and in this case the kernel records the address
>> and assigns it a local id unique in the scope of its connection.
>>
>> So basically msk->local_addr_list is the list of all known local addresses, both announced
>> and not so their ids can be later retrieved.
>>
>> To give you more context, in my last iteration of the code before I posted the series, I was storing local addrs
>> in pernet->local_addr_list just as its done for the kernel PM, but later moved it to
>> a per-msk list to eliminate contention (in accessing that list) with other userspace or kernel
>> PM managed connections.
>>
>>>
>>> Being the list unlimited a malicius (or buggy) user-space could consume
>>> all the kernel memory. I think we need some limits, or at least some
>>> accounting.
>>
>> At this point we're taking the PM daemon as a trusted entity which can issue these
>> netlink commands for path management. So there is currently no configurable ceiling
>> in the kernel on the size of the PM's kernel stored context.
>>
> 
> I think it's still worthwhile to have some limits/accounting as Paolo suggests - part of the point of pushing PM code to userspace is so bugs or other vulnerabilities don't take down the whole machine.
> 

Agreed, but do we put circuit breakers inside the kernel or the PM daemon? (for e.g.,
if a PM plug-in can see limitations imposed by the daemon?)

If we consider userspace path management as providing "more flexibility",
then conceivably the kernel would expose hooks to the PM daemon to bump up any 
kernel enforced limits when necessary. So a malicious or buggy PM could also raise 
those limits. 

Since interactions over the netlink interface are to be carried out by a privileged
entity, I've been assuming that the PM daemon is to be trusted. 

On the other hand, if we're talking about having fixed upper bounds in the kernel wrt
# of addrs/subflows (which cannot be changed by the PM daemon), then that could make sense.

> -- 
> Mat Martineau
> Intel


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH mptcp-next v2 10/21] mptcp: handle local addrs announced by userspace PMs
  2022-01-19 20:27         ` Kishen Maloor
@ 2022-01-19 20:44           ` Mat Martineau
  2022-01-19 21:30             ` Kishen Maloor
  0 siblings, 1 reply; 45+ messages in thread
From: Mat Martineau @ 2022-01-19 20:44 UTC (permalink / raw)
  To: Kishen Maloor; +Cc: Paolo Abeni, mptcp

On Wed, 19 Jan 2022, Kishen Maloor wrote:

> On 1/19/22 11:20 AM, Mat Martineau wrote:
>> On Tue, 18 Jan 2022, Kishen Maloor wrote:
>>
>>> On 1/14/22 9:11 AM, Paolo Abeni wrote:
>>>> On Wed, 2022-01-12 at 17:15 -0500, Kishen Maloor wrote:
>>>>> This change adds a new internal function to store/retrieve local
>>>>> addrs announced by userspace PM implementations from the kernel
>>>>> context. The function does not stipulate any limitation on the #
>>>>> of addrs, and handles the requirements of three scenarios:
>>>>> 1) ADD_ADDR announcements (which require that a local id be
>>>>> provided), 2) retrieving the local id associated with an address,
>>>>> also where one may need to be assigned, and 3) reissuance of
>>>>> ADD_ADDRs when there's a successful match of addr/id.
>>>>>
>>>>> The list of all stored local addr entries is held under the
>>>>> MPTCP sock structure. This list, if not released by the REMOVE_ADDR
>>>>> flow is freed while the sock is destructed.
>>>>
>>>> It feels strange to me that we need to maintain an additional addresses
>>>> list inside the kernel for the user-space PM - which should take care
>>>> of all the status information.
>>>
>>> The PM daemon will have the complete picture, but the protocol needs to know the
>>> local ids assigned to addresses so as such the kernel has to store addresses
>>> (with their ids) regardless of PM.
>>>
>>>>
>>>> Why anno_list is not enough? Why isn't cheapest to extend it?
>>>
>>> The anno_list is the list of addresses that were announced via the ADD_ADDR command, to
>>> be used specifically for doing re-transmissions.
>>>
>>> However, the implementation can also accept connections at local addresses that were not
>>> explicitly announced (hence not in anno_list), and in this case the kernel records the address
>>> and assigns it a local id unique in the scope of its connection.
>>>
>>> So basically msk->local_addr_list is the list of all known local addresses, both announced
>>> and not so their ids can be later retrieved.
>>>
>>> To give you more context, in my last iteration of the code before I posted the series, I was storing local addrs
>>> in pernet->local_addr_list just as its done for the kernel PM, but later moved it to
>>> a per-msk list to eliminate contention (in accessing that list) with other userspace or kernel
>>> PM managed connections.
>>>
>>>>
>>>> Being the list unlimited a malicius (or buggy) user-space could consume
>>>> all the kernel memory. I think we need some limits, or at least some
>>>> accounting.
>>>
>>> At this point we're taking the PM daemon as a trusted entity which can issue these
>>> netlink commands for path management. So there is currently no configurable ceiling
>>> in the kernel on the size of the PM's kernel stored context.
>>>
>>
>> I think it's still worthwhile to have some limits/accounting as Paolo suggests - part of the point of pushing PM code to userspace is so bugs or other vulnerabilities don't take down the whole machine.
>>
>
> Agreed, but do we put circuit breakers inside the kernel or the PM daemon? (for e.g.,
> if a PM plug-in can see limitations imposed by the daemon?)
>

Kernel, I'd say.

> If we consider userspace path management as providing "more flexibility",
> then conceivably the kernel would expose hooks to the PM daemon to bump up any
> kernel enforced limits when necessary. So a malicious or buggy PM could also raise
> those limits.
>

True, an admin (or bad actor) can get themselves into trouble by setting 
nonsensical limits. But the limits remain useful for stability and 
managing resources.

> Since interactions over the netlink interface are to be carried out by a privileged
> entity, I've been assuming that the PM daemon is to be trusted.
>

We're trusting the PM daemon with certain (granular) 
privileges/capabilities using CAP_NET_ADMIN, not the old 
"root-controls-all" model.

> On the other hand, if we're talking about having fixed upper bounds in the kernel wrt
> # of addrs/subflows (which cannot be changed by the PM daemon), then that could make sense.
>

That's pretty much what I'm talking about, maybe tunable (within a range) 
by a sysctl.

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH mptcp-next v2 10/21] mptcp: handle local addrs announced by userspace PMs
  2022-01-19 20:44           ` Mat Martineau
@ 2022-01-19 21:30             ` Kishen Maloor
  0 siblings, 0 replies; 45+ messages in thread
From: Kishen Maloor @ 2022-01-19 21:30 UTC (permalink / raw)
  To: Mat Martineau; +Cc: Paolo Abeni, mptcp

On 1/19/22 12:44 PM, Mat Martineau wrote:
> On Wed, 19 Jan 2022, Kishen Maloor wrote:
> 
>> On 1/19/22 11:20 AM, Mat Martineau wrote:
>>> On Tue, 18 Jan 2022, Kishen Maloor wrote:
>>>
>>>> On 1/14/22 9:11 AM, Paolo Abeni wrote:
>>>>> On Wed, 2022-01-12 at 17:15 -0500, Kishen Maloor wrote:
>>>>>> This change adds a new internal function to store/retrieve local
>>>>>> addrs announced by userspace PM implementations from the kernel
>>>>>> context. The function does not stipulate any limitation on the #
>>>>>> of addrs, and handles the requirements of three scenarios:
>>>>>> 1) ADD_ADDR announcements (which require that a local id be
>>>>>> provided), 2) retrieving the local id associated with an address,
>>>>>> also where one may need to be assigned, and 3) reissuance of
>>>>>> ADD_ADDRs when there's a successful match of addr/id.
>>>>>>
>>>>>> The list of all stored local addr entries is held under the
>>>>>> MPTCP sock structure. This list, if not released by the REMOVE_ADDR
>>>>>> flow is freed while the sock is destructed.
>>>>>
>>>>> It feels strange to me that we need to maintain an additional addresses
>>>>> list inside the kernel for the user-space PM - which should take care
>>>>> of all the status information.
>>>>
>>>> The PM daemon will have the complete picture, but the protocol needs to know the
>>>> local ids assigned to addresses so as such the kernel has to store addresses
>>>> (with their ids) regardless of PM.
>>>>
>>>>>
>>>>> Why anno_list is not enough? Why isn't cheapest to extend it?
>>>>
>>>> The anno_list is the list of addresses that were announced via the ADD_ADDR command, to
>>>> be used specifically for doing re-transmissions.
>>>>
>>>> However, the implementation can also accept connections at local addresses that were not
>>>> explicitly announced (hence not in anno_list), and in this case the kernel records the address
>>>> and assigns it a local id unique in the scope of its connection.
>>>>
>>>> So basically msk->local_addr_list is the list of all known local addresses, both announced
>>>> and not so their ids can be later retrieved.
>>>>
>>>> To give you more context, in my last iteration of the code before I posted the series, I was storing local addrs
>>>> in pernet->local_addr_list just as its done for the kernel PM, but later moved it to
>>>> a per-msk list to eliminate contention (in accessing that list) with other userspace or kernel
>>>> PM managed connections.
>>>>
>>>>>
>>>>> Being the list unlimited a malicius (or buggy) user-space could consume
>>>>> all the kernel memory. I think we need some limits, or at least some
>>>>> accounting.
>>>>
>>>> At this point we're taking the PM daemon as a trusted entity which can issue these
>>>> netlink commands for path management. So there is currently no configurable ceiling
>>>> in the kernel on the size of the PM's kernel stored context.
>>>>
>>>
>>> I think it's still worthwhile to have some limits/accounting as Paolo suggests - part of the point of pushing PM code to userspace is so bugs or other vulnerabilities don't take down the whole machine.
>>>
>>
>> Agreed, but do we put circuit breakers inside the kernel or the PM daemon? (for e.g.,
>> if a PM plug-in can see limitations imposed by the daemon?)
>>
> 
> Kernel, I'd say.
> 
>> If we consider userspace path management as providing "more flexibility",
>> then conceivably the kernel would expose hooks to the PM daemon to bump up any
>> kernel enforced limits when necessary. So a malicious or buggy PM could also raise
>> those limits.
>>
> 
> True, an admin (or bad actor) can get themselves into trouble by setting nonsensical limits. But the limits remain useful for stability and managing resources.
> 
>> Since interactions over the netlink interface are to be carried out by a privileged
>> entity, I've been assuming that the PM daemon is to be trusted.
>>
> 
> We're trusting the PM daemon with certain (granular) privileges/capabilities using CAP_NET_ADMIN, not the old "root-controls-all" model.
> 
>> On the other hand, if we're talking about having fixed upper bounds in the kernel wrt
>> # of addrs/subflows (which cannot be changed by the PM daemon), then that could make sense.
>>
> 
> That's pretty much what I'm talking about, maybe tunable (within a range) by a sysctl.

Sounds good, I will look into adding a couple new params, configurable via sysctl for regulating activities by userspace PMs.
Note that these will be unrelated to the equivalent limits that we have in place for the kernel PM. 

> 
> -- 
> Mat Martineau
> Intel


^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2022-01-19 21:31 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-12 22:15 [PATCH mptcp-next v2 00/21] mptcp: support userspace path management Kishen Maloor
2022-01-12 22:15 ` [PATCH mptcp-next v2 01/21] mptcp: do not restrict subflows with non-kernel PMs Kishen Maloor
2022-01-12 22:15 ` [PATCH mptcp-next v2 02/21] mptcp: store remote id from MP_JOIN SYN/ACK in local ctx Kishen Maloor
2022-01-12 22:15 ` [PATCH mptcp-next v2 03/21] mptcp: reflect remote port (not 0) in ANNOUNCED events Kishen Maloor
2022-01-12 22:15 ` [PATCH mptcp-next v2 04/21] mptcp: establish subflows from either end of connection Kishen Maloor
2022-01-14 22:43   ` Mat Martineau
2022-01-17  8:59     ` Paolo Abeni
2022-01-19  1:26       ` Kishen Maloor
2022-01-19 12:01         ` Paolo Abeni
2022-01-19 17:59           ` Kishen Maloor
2022-01-19 18:59             ` Mat Martineau
2022-01-12 22:15 ` [PATCH mptcp-next v2 05/21] mptcp: netlink: store per namespace list of refcounted listen socks Kishen Maloor
2022-01-12 22:15 ` [PATCH mptcp-next v2 06/21] mptcp: netlink: store lsk ref in mptcp_pm_addr_entry Kishen Maloor
2022-01-12 22:15 ` [PATCH mptcp-next v2 07/21] mptcp: netlink: process IPv6 addrs in creating listening sockets Kishen Maloor
2022-01-14 15:27   ` Paolo Abeni
2022-01-14 22:09     ` Mat Martineau
2022-01-19  1:25       ` Kishen Maloor
2022-01-19 19:14         ` Mat Martineau
2022-01-12 22:15 ` [PATCH mptcp-next v2 08/21] mptcp: attempt to add listening sockets for announced addrs Kishen Maloor
2022-01-14 15:54   ` Paolo Abeni
2022-01-14 17:47     ` Matthieu Baerts
2022-01-14 22:26     ` Mat Martineau
2022-01-19  1:20     ` Kishen Maloor
2022-01-12 22:15 ` [PATCH mptcp-next v2 09/21] mptcp: allow ADD_ADDR reissuance by userspace PMs Kishen Maloor
2022-01-12 22:15 ` [PATCH mptcp-next v2 10/21] mptcp: handle local addrs announced " Kishen Maloor
2022-01-14 17:11   ` Paolo Abeni
2022-01-19  1:24     ` Kishen Maloor
2022-01-19 19:20       ` Mat Martineau
2022-01-19 20:27         ` Kishen Maloor
2022-01-19 20:44           ` Mat Martineau
2022-01-19 21:30             ` Kishen Maloor
2022-01-12 22:15 ` [PATCH mptcp-next v2 11/21] mptcp: read attributes of addr entries managed " Kishen Maloor
2022-01-12 22:15 ` [PATCH mptcp-next v2 12/21] mptcp: netlink: split mptcp_pm_parse_addr into two functions Kishen Maloor
2022-01-12 22:15 ` [PATCH mptcp-next v2 13/21] mptcp: netlink: Add MPTCP_PM_CMD_ANNOUNCE Kishen Maloor
2022-01-14 17:04   ` Paolo Abeni
2022-01-19  1:21     ` Kishen Maloor
2022-01-12 22:15 ` [PATCH mptcp-next v2 14/21] mptcp: selftests: support MPTCP_PM_CMD_ANNOUNCE Kishen Maloor
2022-01-12 22:15 ` [PATCH mptcp-next v2 15/21] mptcp: netlink: Add MPTCP_PM_CMD_REMOVE Kishen Maloor
2022-01-12 22:15 ` [PATCH mptcp-next v2 16/21] mptcp: selftests: support MPTCP_PM_CMD_REMOVE Kishen Maloor
2022-01-12 22:15 ` [PATCH mptcp-next v2 17/21] mptcp: netlink: allow userspace-driven subflow establishment Kishen Maloor
2022-01-12 22:15 ` [PATCH mptcp-next v2 18/21] mptcp: selftests: support MPTCP_PM_CMD_SUBFLOW_CREATE Kishen Maloor
2022-01-12 22:15 ` [PATCH mptcp-next v2 19/21] mptcp: selftests: support MPTCP_PM_CMD_SUBFLOW_DESTROY Kishen Maloor
2022-01-12 22:15 ` [PATCH mptcp-next v2 20/21] mptcp: selftests: capture netlink events Kishen Maloor
2022-01-12 22:15 ` [PATCH mptcp-next v2 21/21] selftests: mptcp: functional tests for the userspace PM type Kishen Maloor
2022-01-12 22:35   ` selftests: mptcp: functional tests for the userspace PM type: Build Failure MPTCP CI

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.