All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH mptcp-next v5 0/8] mptcp: fixes and enhancements related to path management
@ 2022-02-03  7:25 Kishen Maloor
  2022-02-03  7:25 ` [PATCH mptcp-next v5 1/8] mptcp: bypass in-kernel PM restrictions for non-kernel PMs Kishen Maloor
                   ` (7 more replies)
  0 siblings, 8 replies; 17+ messages in thread
From: Kishen Maloor @ 2022-02-03  7:25 UTC (permalink / raw)
  To: kishen.maloor, mptcp

This patch series contains fixes and enhancements related to
path management over MPTCP connections, particularly in support of
out-of-kernel PMs. The changes ensure that the required bits of
information are conveyed through MPTCP netlink events which
would be consumed by a path manager in making decisions, more
flexibility in establishing paths from either end of an MPTCP
connection, and better handling of listening sockets which serve
in MPJ handshakes.

v1 -> v2:
-fixed formatting
-check_fully_established: check for 3rd ACK retransmission only on passive
side of the MPJ handshake

v2 -> v3:
-subflow_simultaneous_connect: check for active subflow socket
-new helper lsk_list_find_or_create()
-updated mptcp_pm_nl_create_listen_socket() to take struct net* as param
-new addr flag MPTCP_PM_ADDR_FLAG_NO_LISTEN to skip creating a
listening socket in the kernel during an ADD_ADDR request
-reflect the pm.server_side attribute in the MPTCP_EVENT_CREATED
and MPTCP_EVENT_ESTABLISHED events

v3 -> v4:
-refactor mptcp_pm_add_addr_received() and
mptcp_event_addr_announced() to eliminate a param
-add and use new internal API mptcp_pm_is_kernel()
-bypass accounting fo non-kernel PM managed connections
-call lsk_list_find() after a failed lsk_list_find_or_create()
for a chance to retrieve a recently created lsk by a simultaneous
call

v4 -> v5:
-Fixed error: implicit declaration of function
'mptcp_pm_nl_create_listen_socket'

Kishen Maloor (8):
  mptcp: bypass in-kernel PM restrictions for non-kernel PMs
  mptcp: store remote id from MP_JOIN SYN/ACK in local ctx
  mptcp: reflect remote port (not 0) in ANNOUNCED events
  mptcp: establish subflows from either end of connection
  mptcp: netlink: store per namespace list of refcounted listen socks
  mptcp: netlink: store lsk ref in mptcp_pm_addr_entry
  mptcp: attempt to add listening sockets for announced addrs
  mptcp: expose server_side attribute in MPTCP netlink events

 include/uapi/linux/mptcp.h |   2 +
 net/mptcp/options.c        |   4 +-
 net/mptcp/pm.c             |  12 +-
 net/mptcp/pm_netlink.c     | 222 ++++++++++++++++++++++++++++++++-----
 net/mptcp/protocol.c       |   5 +-
 net/mptcp/protocol.h       |  21 +++-
 net/mptcp/subflow.c        |   4 +-
 7 files changed, 228 insertions(+), 42 deletions(-)


base-commit: a6d509111fdeec203af494abc83af6e746d3519f
-- 
2.31.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH mptcp-next v5 1/8] mptcp: bypass in-kernel PM restrictions for non-kernel PMs
  2022-02-03  7:25 [PATCH mptcp-next v5 0/8] mptcp: fixes and enhancements related to path management Kishen Maloor
@ 2022-02-03  7:25 ` Kishen Maloor
  2022-02-03  7:25 ` [PATCH mptcp-next v5 2/8] mptcp: store remote id from MP_JOIN SYN/ACK in local ctx Kishen Maloor
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Kishen Maloor @ 2022-02-03  7:25 UTC (permalink / raw)
  To: kishen.maloor, mptcp

Current limits on the # of addresses/subflows must apply only to
in-kernel PM managed sockets. Thus this change removes such
restrictions for connections overseen by non-kernel (e.g. userspace)
PMs.

This change also ensures that the kernel does not record stats inside
struct mptcp_pm_data updated along kernel code paths when exercised
by non-kernel PMs.

Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
---
v4: rephrased commit message, add API mptcp_pm_is_kernel(), bypass
accounting fo non-kernel PM managed connections
---
 net/mptcp/pm.c         | 6 +++++-
 net/mptcp/pm_netlink.c | 3 +++
 net/mptcp/protocol.h   | 9 +++++++--
 net/mptcp/subflow.c    | 3 ++-
 4 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
index 1f8878cc29e3..3e053b759181 100644
--- a/net/mptcp/pm.c
+++ b/net/mptcp/pm.c
@@ -87,6 +87,9 @@ bool mptcp_pm_allow_new_subflow(struct mptcp_sock *msk)
 	unsigned int subflows_max;
 	int ret = 0;
 
+	if (!mptcp_pm_is_kernel(msk))
+		return true;
+
 	subflows_max = mptcp_pm_get_subflows_max(msk);
 
 	pr_debug("msk=%p subflows=%d max=%d allow=%d", msk, pm->subflows,
@@ -179,7 +182,8 @@ void mptcp_pm_subflow_check_next(struct mptcp_sock *msk, const struct sock *ssk,
 	bool update_subflows;
 
 	update_subflows = (ssk->sk_state == TCP_CLOSE) &&
-			  (subflow->request_join || subflow->mp_join);
+			  (subflow->request_join || subflow->mp_join) &&
+			  mptcp_pm_is_kernel(msk);
 	if (!READ_ONCE(pm->work_pending) && !update_subflows)
 		return;
 
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 93800f32fcb6..bf24c1a74e1d 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -795,6 +795,9 @@ static void mptcp_pm_nl_rm_addr_or_subflow(struct mptcp_sock *msk,
 		if (!removed)
 			continue;
 
+		if (!mptcp_pm_is_kernel(msk))
+			continue;
+
 		if (rm_type == MPTCP_MIB_RMADDR) {
 			msk->pm.add_addr_accepted--;
 			WRITE_ONCE(msk->pm.accept_addr, true);
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index f37f087caab3..ac8b57d4f853 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -804,9 +804,14 @@ static inline bool mptcp_pm_should_rm_signal(struct mptcp_sock *msk)
 	return READ_ONCE(msk->pm.addr_signal) & BIT(MPTCP_RM_ADDR_SIGNAL);
 }
 
-static inline bool mptcp_pm_is_userspace(struct mptcp_sock *msk)
+static inline bool mptcp_pm_is_userspace(const struct mptcp_sock *msk)
 {
-	return READ_ONCE(msk->pm.pm_type) != MPTCP_PM_TYPE_KERNEL;
+	return READ_ONCE(msk->pm.pm_type) == MPTCP_PM_TYPE_USERSPACE;
+}
+
+static inline bool mptcp_pm_is_kernel(const struct mptcp_sock *msk)
+{
+	return READ_ONCE(msk->pm.pm_type) == MPTCP_PM_TYPE_KERNEL;
 }
 
 static inline unsigned int mptcp_add_addr_len(int family, bool echo, bool port)
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 88ee94adc38c..8c25a1122bfd 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -62,7 +62,8 @@ static void subflow_generate_hmac(u64 key1, u64 key2, u32 nonce1, u32 nonce2,
 static bool mptcp_can_accept_new_subflow(const struct mptcp_sock *msk)
 {
 	return mptcp_is_fully_established((void *)msk) &&
-	       READ_ONCE(msk->pm.accept_subflow);
+		(!mptcp_pm_is_kernel(msk) ||
+		READ_ONCE(msk->pm.accept_subflow));
 }
 
 /* validate received token and create truncated hmac and nonce for SYN-ACK */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH mptcp-next v5 2/8] mptcp: store remote id from MP_JOIN SYN/ACK in local ctx
  2022-02-03  7:25 [PATCH mptcp-next v5 0/8] mptcp: fixes and enhancements related to path management Kishen Maloor
  2022-02-03  7:25 ` [PATCH mptcp-next v5 1/8] mptcp: bypass in-kernel PM restrictions for non-kernel PMs Kishen Maloor
@ 2022-02-03  7:25 ` Kishen Maloor
  2022-02-03  7:25 ` [PATCH mptcp-next v5 3/8] mptcp: reflect remote port (not 0) in ANNOUNCED events Kishen Maloor
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Kishen Maloor @ 2022-02-03  7:25 UTC (permalink / raw)
  To: kishen.maloor, mptcp

This change reads the addr id assigned to the remote endpoint
of a subflow from the MP_JOIN SYN/ACK message and stores it
in the related subflow context. The remote id was not being
captured prior to this change, and will now provide a consistent
view of remote endpoints and their ids as seen through netlink
events.

Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
---
 net/mptcp/subflow.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 8c25a1122bfd..d3691b95401a 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -444,6 +444,7 @@ static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb)
 		subflow->backup = mp_opt.backup;
 		subflow->thmac = mp_opt.thmac;
 		subflow->remote_nonce = mp_opt.nonce;
+		subflow->remote_id = mp_opt.join_id;
 		pr_debug("subflow=%p, thmac=%llu, remote_nonce=%u backup=%d",
 			 subflow, subflow->thmac, subflow->remote_nonce,
 			 subflow->backup);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH mptcp-next v5 3/8] mptcp: reflect remote port (not 0) in ANNOUNCED events
  2022-02-03  7:25 [PATCH mptcp-next v5 0/8] mptcp: fixes and enhancements related to path management Kishen Maloor
  2022-02-03  7:25 ` [PATCH mptcp-next v5 1/8] mptcp: bypass in-kernel PM restrictions for non-kernel PMs Kishen Maloor
  2022-02-03  7:25 ` [PATCH mptcp-next v5 2/8] mptcp: store remote id from MP_JOIN SYN/ACK in local ctx Kishen Maloor
@ 2022-02-03  7:25 ` Kishen Maloor
  2022-02-03  7:25 ` [PATCH mptcp-next v5 4/8] mptcp: establish subflows from either end of connection Kishen Maloor
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Kishen Maloor @ 2022-02-03  7:25 UTC (permalink / raw)
  To: kishen.maloor, mptcp

Per RFC 8684, if no port is specified in an ADD_ADDR message, MPTCP
SHOULD attempt to connect to the specified address on the same port
as the port that is already in use by the subflow on which the
ADD_ADDR signal was sent.

To facilitate that, this change reflects the specific remote port in
use by that subflow in MPTCP_EVENT_ANNOUNCED events.

Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
---
v4: refactor mptcp_pm_add_addr_received() and
mptcp_event_addr_announced() to eliminate a param
---
 net/mptcp/options.c    |  2 +-
 net/mptcp/pm.c         |  6 ++++--
 net/mptcp/pm_netlink.c | 11 ++++++++---
 net/mptcp/protocol.h   |  4 ++--
 4 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 7b615dc10897..6dfaa8e11331 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -1131,7 +1131,7 @@ bool mptcp_incoming_options(struct sock *sk, struct sk_buff *skb)
 		if ((mp_opt.suboptions & OPTION_MPTCP_ADD_ADDR) &&
 		    add_addr_hmac_valid(msk, &mp_opt)) {
 			if (!mp_opt.echo) {
-				mptcp_pm_add_addr_received(msk, &mp_opt.addr);
+				mptcp_pm_add_addr_received(sk, &mp_opt.addr);
 				MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_ADDADDR);
 			} else {
 				mptcp_pm_add_addr_echoed(msk, &mp_opt.addr);
diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
index 3e053b759181..94f008b2d624 100644
--- a/net/mptcp/pm.c
+++ b/net/mptcp/pm.c
@@ -200,15 +200,17 @@ void mptcp_pm_subflow_check_next(struct mptcp_sock *msk, const struct sock *ssk,
 	spin_unlock_bh(&pm->lock);
 }
 
-void mptcp_pm_add_addr_received(struct mptcp_sock *msk,
+void mptcp_pm_add_addr_received(const struct sock *ssk,
 				const struct mptcp_addr_info *addr)
 {
+	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
+	struct mptcp_sock *msk = mptcp_sk(subflow->conn);
 	struct mptcp_pm_data *pm = &msk->pm;
 
 	pr_debug("msk=%p remote_id=%d accept=%d", msk, addr->id,
 		 READ_ONCE(pm->accept_addr));
 
-	mptcp_event_addr_announced(msk, addr);
+	mptcp_event_addr_announced(ssk, addr);
 
 	spin_lock_bh(&pm->lock);
 
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index bf24c1a74e1d..ff13012178ae 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -1974,10 +1974,12 @@ void mptcp_event_addr_removed(const struct mptcp_sock *msk, uint8_t id)
 	kfree_skb(skb);
 }
 
-void mptcp_event_addr_announced(const struct mptcp_sock *msk,
+void mptcp_event_addr_announced(const struct sock *ssk,
 				const struct mptcp_addr_info *info)
 {
-	struct net *net = sock_net((const struct sock *)msk);
+	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
+	struct mptcp_sock *msk = mptcp_sk(subflow->conn);
+	struct net *net = sock_net(ssk);
 	struct nlmsghdr *nlh;
 	struct sk_buff *skb;
 
@@ -1999,7 +2001,10 @@ void mptcp_event_addr_announced(const struct mptcp_sock *msk,
 	if (nla_put_u8(skb, MPTCP_ATTR_REM_ID, info->id))
 		goto nla_put_failure;
 
-	if (nla_put_be16(skb, MPTCP_ATTR_DPORT, info->port))
+	if (nla_put_be16(skb, MPTCP_ATTR_DPORT,
+			 info->port == 0 ?
+			 ((struct inet_sock *)inet_sk(ssk))->inet_dport :
+			 info->port))
 		goto nla_put_failure;
 
 	switch (info->family) {
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index ac8b57d4f853..4371ac3fbde1 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -751,7 +751,7 @@ void mptcp_pm_subflow_established(struct mptcp_sock *msk);
 bool mptcp_pm_nl_check_work_pending(struct mptcp_sock *msk);
 void mptcp_pm_subflow_check_next(struct mptcp_sock *msk, const struct sock *ssk,
 				 const struct mptcp_subflow_context *subflow);
-void mptcp_pm_add_addr_received(struct mptcp_sock *msk,
+void mptcp_pm_add_addr_received(const struct sock *ssk,
 				const struct mptcp_addr_info *addr);
 void mptcp_pm_add_addr_echoed(struct mptcp_sock *msk,
 			      struct mptcp_addr_info *addr);
@@ -780,7 +780,7 @@ int mptcp_pm_remove_subflow(struct mptcp_sock *msk, const struct mptcp_rm_list *
 
 void mptcp_event(enum mptcp_event_type type, const struct mptcp_sock *msk,
 		 const struct sock *ssk, gfp_t gfp);
-void mptcp_event_addr_announced(const struct mptcp_sock *msk, const struct mptcp_addr_info *info);
+void mptcp_event_addr_announced(const struct sock *ssk, const struct mptcp_addr_info *info);
 void mptcp_event_addr_removed(const struct mptcp_sock *msk, u8 id);
 
 static inline bool mptcp_pm_should_add_signal(struct mptcp_sock *msk)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH mptcp-next v5 4/8] mptcp: establish subflows from either end of connection
  2022-02-03  7:25 [PATCH mptcp-next v5 0/8] mptcp: fixes and enhancements related to path management Kishen Maloor
                   ` (2 preceding siblings ...)
  2022-02-03  7:25 ` [PATCH mptcp-next v5 3/8] mptcp: reflect remote port (not 0) in ANNOUNCED events Kishen Maloor
@ 2022-02-03  7:25 ` Kishen Maloor
  2022-02-03  7:25 ` [PATCH mptcp-next v5 5/8] mptcp: netlink: store per namespace list of refcounted listen socks Kishen Maloor
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Kishen Maloor @ 2022-02-03  7:25 UTC (permalink / raw)
  To: kishen.maloor, mptcp

This change updates internal logic to permit subflows to be
established from either the client or server ends of MPTCP
connections. This symmetry and added flexibility may be
harnessed by PM implementations running on either end in
creating new subflows.

The essence of this change lies in not relying on the
"server_side" flag (which continues to be available if needed).

Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
---
v2: check for 3rd ACK retransmission only on passive side
of the MPJ handshake
v3: check for active subflow socket in subflow_simultaneous_connect
---
 net/mptcp/options.c  | 2 +-
 net/mptcp/protocol.c | 5 +----
 net/mptcp/protocol.h | 8 ++++++--
 3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 6dfaa8e11331..4f56e874c542 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -929,7 +929,7 @@ static bool check_fully_established(struct mptcp_sock *msk, struct sock *ssk,
 		if (TCP_SKB_CB(skb)->seq == subflow->ssn_offset + 1 &&
 		    TCP_SKB_CB(skb)->end_seq == TCP_SKB_CB(skb)->seq &&
 		    subflow->mp_join && (mp_opt->suboptions & OPTIONS_MPTCP_MPJ) &&
-		    READ_ONCE(msk->pm.server_side))
+		    !subflow->request_join)
 			tcp_send_ack(ssk);
 		goto fully_established;
 	}
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 3324e1c61576..6142b4b25769 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -3255,15 +3255,12 @@ bool mptcp_finish_join(struct sock *ssk)
 		return false;
 	}
 
-	if (!msk->pm.server_side)
+	if (!list_empty(&subflow->node))
 		goto out;
 
 	if (!mptcp_pm_allow_new_subflow(msk))
 		goto err_prohibited;
 
-	if (WARN_ON_ONCE(!list_empty(&subflow->node)))
-		goto err_prohibited;
-
 	/* active connections are already on conn_list.
 	 * If we can't acquire msk socket lock here, let the release callback
 	 * handle it
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 4371ac3fbde1..1a8d09796627 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -908,13 +908,17 @@ static inline bool mptcp_check_infinite_map(struct sk_buff *skb)
 	return false;
 }
 
+static inline bool is_active_ssk(struct mptcp_subflow_context *subflow)
+{
+	return (subflow->request_mptcp || subflow->request_join);
+}
+
 static inline bool subflow_simultaneous_connect(struct sock *sk)
 {
 	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
-	struct sock *parent = subflow->conn;
 
 	return sk->sk_state == TCP_ESTABLISHED &&
-	       !mptcp_sk(parent)->pm.server_side &&
+	       is_active_ssk(subflow) &&
 	       !subflow->conn_finished;
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH mptcp-next v5 5/8] mptcp: netlink: store per namespace list of refcounted listen socks
  2022-02-03  7:25 [PATCH mptcp-next v5 0/8] mptcp: fixes and enhancements related to path management Kishen Maloor
                   ` (3 preceding siblings ...)
  2022-02-03  7:25 ` [PATCH mptcp-next v5 4/8] mptcp: establish subflows from either end of connection Kishen Maloor
@ 2022-02-03  7:25 ` Kishen Maloor
  2022-02-03 17:46   ` Florian Westphal
  2022-02-03  7:25 ` [PATCH mptcp-next v5 6/8] mptcp: netlink: store lsk ref in mptcp_pm_addr_entry Kishen Maloor
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Kishen Maloor @ 2022-02-03  7:25 UTC (permalink / raw)
  To: kishen.maloor, mptcp

The kernel can create listening sockets bound to announced addresses
via the ADD_ADDR option for receiving MP_JOIN requests. Path
managers may further choose to advertise the same addr+port over multiple
MPTCP connections. So this change provides a simple framework to
manage a list of all distinct listning sockets created in the kernel
over a namespace by encapsulating the socket in a structure that is
ref counted and can be shared across multiple connections. The sockets
are released when there are no more references.

Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
---
v2: fixed formatting
---
 net/mptcp/pm_netlink.c | 76 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 76 insertions(+)

diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index ff13012178ae..3d6251baef26 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -22,6 +22,14 @@ static struct genl_family mptcp_genl_family;
 
 static int pm_nl_pernet_id;
 
+struct mptcp_local_lsk {
+	struct list_head	list;
+	struct mptcp_addr_info	addr;
+	struct socket		*lsk;
+	struct rcu_head		rcu;
+	refcount_t		refcount;
+};
+
 struct mptcp_pm_addr_entry {
 	struct list_head	list;
 	struct mptcp_addr_info	addr;
@@ -41,7 +49,10 @@ struct mptcp_pm_add_entry {
 struct pm_nl_pernet {
 	/* protects pernet updates */
 	spinlock_t		lock;
+	/* protects access to pernet lsk list */
+	spinlock_t		lsk_list_lock;
 	struct list_head	local_addr_list;
+	struct list_head	lsk_list;
 	unsigned int		addrs;
 	unsigned int		stale_loss_cnt;
 	unsigned int		add_addr_signal_max;
@@ -83,6 +94,69 @@ static bool addresses_equal(const struct mptcp_addr_info *a,
 	return a->port == b->port;
 }
 
+static struct mptcp_local_lsk *lsk_list_find(struct pm_nl_pernet *pernet,
+					     struct mptcp_addr_info *addr)
+{
+	struct mptcp_local_lsk *lsk_ref = NULL;
+	struct mptcp_local_lsk *i;
+
+	rcu_read_lock();
+
+	list_for_each_entry_rcu(i, &pernet->lsk_list, list) {
+		if (addresses_equal(&i->addr, addr, true)) {
+			if (refcount_inc_not_zero(&i->refcount)) {
+				lsk_ref = i;
+				break;
+			}
+		}
+	}
+
+	rcu_read_unlock();
+
+	return lsk_ref;
+}
+
+static void lsk_list_add_ref(struct mptcp_local_lsk *lsk_ref)
+{
+	refcount_inc(&lsk_ref->refcount);
+}
+
+static struct mptcp_local_lsk *lsk_list_add(struct pm_nl_pernet *pernet,
+					    struct mptcp_addr_info *addr,
+					    struct socket *lsk)
+{
+	struct mptcp_local_lsk *lsk_ref;
+
+	lsk_ref = kmalloc(sizeof(*lsk_ref), GFP_ATOMIC);
+
+	if (!lsk_ref)
+		return NULL;
+
+	lsk_ref->lsk = lsk;
+	memcpy(&lsk_ref->addr, addr, sizeof(struct mptcp_addr_info));
+	refcount_set(&lsk_ref->refcount, 1);
+
+	spin_lock_bh(&pernet->lsk_list_lock);
+	list_add_rcu(&lsk_ref->list, &pernet->lsk_list);
+	spin_unlock_bh(&pernet->lsk_list_lock);
+
+	return lsk_ref;
+}
+
+static void lsk_list_release(struct pm_nl_pernet *pernet,
+			     struct mptcp_local_lsk *lsk_ref)
+{
+	if (lsk_ref && refcount_dec_and_test(&lsk_ref->refcount)) {
+		sock_release(lsk_ref->lsk);
+
+		spin_lock_bh(&pernet->lsk_list_lock);
+		list_del_rcu(&lsk_ref->list);
+		spin_unlock_bh(&pernet->lsk_list_lock);
+
+		kfree_rcu(lsk_ref, rcu);
+	}
+}
+
 static bool address_zero(const struct mptcp_addr_info *addr)
 {
 	struct mptcp_addr_info zero;
@@ -2141,12 +2215,14 @@ static int __net_init pm_nl_init_net(struct net *net)
 	struct pm_nl_pernet *pernet = net_generic(net, pm_nl_pernet_id);
 
 	INIT_LIST_HEAD_RCU(&pernet->local_addr_list);
+	INIT_LIST_HEAD_RCU(&pernet->lsk_list);
 
 	/* Cit. 2 subflows ought to be enough for anybody. */
 	pernet->subflows_max = 2;
 	pernet->next_id = 1;
 	pernet->stale_loss_cnt = 4;
 	spin_lock_init(&pernet->lock);
+	spin_lock_init(&pernet->lsk_list_lock);
 
 	/* No need to initialize other pernet fields, the struct is zeroed at
 	 * allocation time.
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH mptcp-next v5 6/8] mptcp: netlink: store lsk ref in mptcp_pm_addr_entry
  2022-02-03  7:25 [PATCH mptcp-next v5 0/8] mptcp: fixes and enhancements related to path management Kishen Maloor
                   ` (4 preceding siblings ...)
  2022-02-03  7:25 ` [PATCH mptcp-next v5 5/8] mptcp: netlink: store per namespace list of refcounted listen socks Kishen Maloor
@ 2022-02-03  7:25 ` Kishen Maloor
  2022-02-16  3:56   ` Geliang Tang
  2022-02-03  7:25 ` [PATCH mptcp-next v5 7/8] mptcp: attempt to add listening sockets for announced addrs Kishen Maloor
  2022-02-03  7:25 ` [PATCH mptcp-next v5 8/8] mptcp: expose server_side attribute in MPTCP netlink events Kishen Maloor
  7 siblings, 1 reply; 17+ messages in thread
From: Kishen Maloor @ 2022-02-03  7:25 UTC (permalink / raw)
  To: kishen.maloor, mptcp

This change updates struct mptcp_pm_addr_entry to store a
listening socket (lsk) reference, i.e. a pointer to a reference
counted structure containing the lsk (struct socket *) instead
of the lsk itself. Code blocks that previously operated on
the lsk in struct mptcp_pm_addr_entry have been updated to work
with the lsk ref instead, utilizing new helper functions.

Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
---
v2: fixed formatting
v3: added helper lsk_list_find_or_create(), updated
mptcp_pm_nl_create_listen_socket() to take struct net* as param
v4: call lsk_list_find() after a failed lsk_list_find_or_create()
for a chance to retrieve a recently created lsk by a simultaneous
call
v5: fixed implicit declaration error
---
 net/mptcp/pm_netlink.c | 83 +++++++++++++++++++++++++++++++-----------
 1 file changed, 62 insertions(+), 21 deletions(-)

diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 3d6251baef26..a4fb9acbba51 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -35,7 +35,7 @@ struct mptcp_pm_addr_entry {
 	struct mptcp_addr_info	addr;
 	u8			flags;
 	int			ifindex;
-	struct socket		*lsk;
+	struct mptcp_local_lsk	*lsk_ref;
 };
 
 struct mptcp_pm_add_entry {
@@ -66,6 +66,10 @@ struct pm_nl_pernet {
 #define MPTCP_PM_ADDR_MAX	8
 #define ADD_ADDR_RETRANS_MAX	3
 
+static int mptcp_pm_nl_create_listen_socket(struct net *net,
+					    struct mptcp_pm_addr_entry *entry,
+					    struct socket **lsk);
+
 static bool addresses_equal(const struct mptcp_addr_info *a,
 			    const struct mptcp_addr_info *b, bool use_port)
 {
@@ -157,6 +161,33 @@ static void lsk_list_release(struct pm_nl_pernet *pernet,
 	}
 }
 
+static struct mptcp_local_lsk *lsk_list_find_or_create(struct net *net,
+						       struct pm_nl_pernet *pernet,
+						       struct mptcp_pm_addr_entry *entry,
+						       int *createlsk_err)
+{
+	struct mptcp_local_lsk *lsk_ref;
+	struct socket *lsk;
+	int err;
+
+	lsk_ref = lsk_list_find(pernet, &entry->addr);
+
+	if (!lsk_ref) {
+		err = mptcp_pm_nl_create_listen_socket(net, entry, &lsk);
+
+		if (createlsk_err)
+			*createlsk_err = err;
+
+		if (lsk)
+			lsk_ref = lsk_list_add(pernet, &entry->addr, lsk);
+
+		if (lsk && !lsk_ref)
+			sock_release(lsk);
+	}
+
+	return lsk_ref;
+}
+
 static bool address_zero(const struct mptcp_addr_info *addr)
 {
 	struct mptcp_addr_info zero;
@@ -999,8 +1030,9 @@ static int mptcp_pm_nl_append_new_local_addr(struct pm_nl_pernet *pernet,
 	return ret;
 }
 
-static int mptcp_pm_nl_create_listen_socket(struct sock *sk,
-					    struct mptcp_pm_addr_entry *entry)
+static int mptcp_pm_nl_create_listen_socket(struct net *net,
+					    struct mptcp_pm_addr_entry *entry,
+					    struct socket **lsk)
 {
 	int addrlen = sizeof(struct sockaddr_in);
 	struct sockaddr_storage addr;
@@ -1009,12 +1041,12 @@ static int mptcp_pm_nl_create_listen_socket(struct sock *sk,
 	int backlog = 1024;
 	int err;
 
-	err = sock_create_kern(sock_net(sk), entry->addr.family,
-			       SOCK_STREAM, IPPROTO_MPTCP, &entry->lsk);
+	err = sock_create_kern(net, entry->addr.family,
+			       SOCK_STREAM, IPPROTO_MPTCP, lsk);
 	if (err)
 		return err;
 
-	msk = mptcp_sk(entry->lsk->sk);
+	msk = mptcp_sk((*lsk)->sk);
 	if (!msk) {
 		err = -EINVAL;
 		goto out;
@@ -1046,7 +1078,8 @@ static int mptcp_pm_nl_create_listen_socket(struct sock *sk,
 	return 0;
 
 out:
-	sock_release(entry->lsk);
+	sock_release(*lsk);
+	*lsk = NULL;
 	return err;
 }
 
@@ -1095,7 +1128,7 @@ int mptcp_pm_nl_get_local_id(struct mptcp_sock *msk, struct sock_common *skc)
 	entry->addr.port = 0;
 	entry->ifindex = 0;
 	entry->flags = 0;
-	entry->lsk = NULL;
+	entry->lsk_ref = NULL;
 	ret = mptcp_pm_nl_append_new_local_addr(pernet, entry);
 	if (ret < 0)
 		kfree(entry);
@@ -1304,18 +1337,25 @@ static int mptcp_nl_cmd_add_addr(struct sk_buff *skb, struct genl_info *info)
 
 	*entry = addr;
 	if (entry->addr.port) {
-		ret = mptcp_pm_nl_create_listen_socket(skb->sk, entry);
-		if (ret) {
-			GENL_SET_ERR_MSG(info, "create listen socket error");
+		entry->lsk_ref = lsk_list_find_or_create(sock_net(skb->sk), pernet, entry, &ret);
+
+		if (!entry->lsk_ref)
+			entry->lsk_ref = lsk_list_find(pernet, &entry->addr);
+
+		if (!entry->lsk_ref) {
+			GENL_SET_ERR_MSG(info, "can't create/allocate lsk");
 			kfree(entry);
+			ret = (ret == 0) ? -ENOMEM : ret;
 			return ret;
 		}
 	}
+
 	ret = mptcp_pm_nl_append_new_local_addr(pernet, entry);
+
 	if (ret < 0) {
 		GENL_SET_ERR_MSG(info, "too many addresses or duplicate one");
-		if (entry->lsk)
-			sock_release(entry->lsk);
+		if (entry->lsk_ref)
+			lsk_list_release(pernet, entry->lsk_ref);
 		kfree(entry);
 		return ret;
 	}
@@ -1418,10 +1458,11 @@ static int mptcp_nl_remove_subflow_and_signal_addr(struct net *net,
 }
 
 /* caller must ensure the RCU grace period is already elapsed */
-static void __mptcp_pm_release_addr_entry(struct mptcp_pm_addr_entry *entry)
+static void __mptcp_pm_release_addr_entry(struct pm_nl_pernet *pernet,
+					  struct mptcp_pm_addr_entry *entry)
 {
-	if (entry->lsk)
-		sock_release(entry->lsk);
+	if (entry->lsk_ref)
+		lsk_list_release(pernet, entry->lsk_ref);
 	kfree(entry);
 }
 
@@ -1503,7 +1544,7 @@ static int mptcp_nl_cmd_del_addr(struct sk_buff *skb, struct genl_info *info)
 
 	mptcp_nl_remove_subflow_and_signal_addr(sock_net(skb->sk), &entry->addr);
 	synchronize_rcu();
-	__mptcp_pm_release_addr_entry(entry);
+	__mptcp_pm_release_addr_entry(pernet, entry);
 
 	return ret;
 }
@@ -1559,7 +1600,7 @@ static void mptcp_nl_remove_addrs_list(struct net *net,
 }
 
 /* caller must ensure the RCU grace period is already elapsed */
-static void __flush_addrs(struct list_head *list)
+static void __flush_addrs(struct pm_nl_pernet *pernet, struct list_head *list)
 {
 	while (!list_empty(list)) {
 		struct mptcp_pm_addr_entry *cur;
@@ -1567,7 +1608,7 @@ static void __flush_addrs(struct list_head *list)
 		cur = list_entry(list->next,
 				 struct mptcp_pm_addr_entry, list);
 		list_del_rcu(&cur->list);
-		__mptcp_pm_release_addr_entry(cur);
+		__mptcp_pm_release_addr_entry(pernet, cur);
 	}
 }
 
@@ -1592,7 +1633,7 @@ static int mptcp_nl_cmd_flush_addrs(struct sk_buff *skb, struct genl_info *info)
 	spin_unlock_bh(&pernet->lock);
 	mptcp_nl_remove_addrs_list(sock_net(skb->sk), &free_list);
 	synchronize_rcu();
-	__flush_addrs(&free_list);
+	__flush_addrs(pernet, &free_list);
 	return 0;
 }
 
@@ -2242,7 +2283,7 @@ static void __net_exit pm_nl_exit_net(struct list_head *net_list)
 		 * other modifiers, also netns core already waited for a
 		 * RCU grace period.
 		 */
-		__flush_addrs(&pernet->local_addr_list);
+		__flush_addrs(pernet, &pernet->local_addr_list);
 	}
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH mptcp-next v5 7/8] mptcp: attempt to add listening sockets for announced addrs
  2022-02-03  7:25 [PATCH mptcp-next v5 0/8] mptcp: fixes and enhancements related to path management Kishen Maloor
                   ` (5 preceding siblings ...)
  2022-02-03  7:25 ` [PATCH mptcp-next v5 6/8] mptcp: netlink: store lsk ref in mptcp_pm_addr_entry Kishen Maloor
@ 2022-02-03  7:25 ` Kishen Maloor
  2022-02-04 13:52   ` Geliang Tang
  2022-02-03  7:25 ` [PATCH mptcp-next v5 8/8] mptcp: expose server_side attribute in MPTCP netlink events Kishen Maloor
  7 siblings, 1 reply; 17+ messages in thread
From: Kishen Maloor @ 2022-02-03  7:25 UTC (permalink / raw)
  To: kishen.maloor, mptcp

When ADD_ADDR announcements use the port associated with an
active subflow, this change ensures that a listening socket is bound
to the announced addr+port in the kernel for subsequently receiving
MP_JOINs. But if a listening socket for this address is already held
by the application then no action is taken.

A listening socket is created (when there isn't a listener)
just prior to the addr advertisement. If it is desired to not create
a listening socket in the kernel for an address, then this can be
requested by including the MPTCP_PM_ADDR_FLAG_NO_LISTEN flag
with the address.

When a listening socket is created, it is stored in
struct mptcp_pm_add_entry and released accordingly.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/203
Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
---
v2: fixed formatting
v3: added new addr flag MPTCP_PM_ADDR_FLAG_NO_LISTEN to skip creating a
listening socket in the kernel during an ADD_ADDR request, use this flag
along the in-kernel PM flow for ADD_ADDR requests (Note: listening sockets
are always created for port-based endpoints as before), use the
lsk_list_find_or_create() helper
v4: call lsk_list_find() after a failed lsk_list_find_or_create()
for a chance to retrieve a recently created lsk by a simultaneous
call
---
 include/uapi/linux/mptcp.h |  1 +
 net/mptcp/pm_netlink.c     | 46 ++++++++++++++++++++++++++++++++++++--
 2 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/mptcp.h b/include/uapi/linux/mptcp.h
index f106a3941cdf..265cabc0d7aa 100644
--- a/include/uapi/linux/mptcp.h
+++ b/include/uapi/linux/mptcp.h
@@ -81,6 +81,7 @@ enum {
 #define MPTCP_PM_ADDR_FLAG_SUBFLOW			(1 << 1)
 #define MPTCP_PM_ADDR_FLAG_BACKUP			(1 << 2)
 #define MPTCP_PM_ADDR_FLAG_FULLMESH			(1 << 3)
+#define MPTCP_PM_ADDR_FLAG_NO_LISTEN			(1 << 4)
 
 enum {
 	MPTCP_PM_CMD_UNSPEC,
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index a4fb9acbba51..9b3d871d3712 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -43,6 +43,7 @@ struct mptcp_pm_add_entry {
 	struct mptcp_addr_info	addr;
 	struct timer_list	add_timer;
 	struct mptcp_sock	*sock;
+	struct mptcp_local_lsk	*lsk_ref;
 	u8			retrans_times;
 };
 
@@ -469,7 +470,8 @@ mptcp_pm_del_add_timer(struct mptcp_sock *msk,
 }
 
 static bool mptcp_pm_alloc_anno_list(struct mptcp_sock *msk,
-				     struct mptcp_pm_addr_entry *entry)
+				     struct mptcp_pm_addr_entry *entry,
+				     struct mptcp_local_lsk *lsk_ref)
 {
 	struct mptcp_pm_add_entry *add_entry = NULL;
 	struct sock *sk = (struct sock *)msk;
@@ -489,6 +491,10 @@ static bool mptcp_pm_alloc_anno_list(struct mptcp_sock *msk,
 	add_entry->addr = entry->addr;
 	add_entry->sock = msk;
 	add_entry->retrans_times = 0;
+	add_entry->lsk_ref = lsk_ref;
+
+	if (lsk_ref)
+		lsk_list_add_ref(lsk_ref);
 
 	timer_setup(&add_entry->add_timer, mptcp_pm_add_timer, 0);
 	sk_reset_timer(sk, &add_entry->add_timer,
@@ -501,8 +507,11 @@ void mptcp_pm_free_anno_list(struct mptcp_sock *msk)
 {
 	struct mptcp_pm_add_entry *entry, *tmp;
 	struct sock *sk = (struct sock *)msk;
+	struct pm_nl_pernet *pernet;
 	LIST_HEAD(free_list);
 
+	pernet = net_generic(sock_net(sk), pm_nl_pernet_id);
+
 	pr_debug("msk=%p", msk);
 
 	spin_lock_bh(&msk->pm.lock);
@@ -511,6 +520,8 @@ void mptcp_pm_free_anno_list(struct mptcp_sock *msk)
 
 	list_for_each_entry_safe(entry, tmp, &free_list, list) {
 		sk_stop_timer_sync(sk, &entry->add_timer);
+		if (entry->lsk_ref)
+			lsk_list_release(pernet, entry->lsk_ref);
 		kfree(entry);
 	}
 }
@@ -615,7 +626,9 @@ lookup_id_by_addr(struct pm_nl_pernet *pernet, const struct mptcp_addr_info *add
 }
 
 static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
+	__must_hold(&msk->pm.lock)
 {
+	struct mptcp_local_lsk *lsk_ref = NULL;
 	struct sock *sk = (struct sock *)msk;
 	struct mptcp_pm_addr_entry *local;
 	unsigned int add_addr_signal_max;
@@ -652,12 +665,34 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
 		local = select_signal_address(pernet, msk);
 
 		if (local) {
-			if (mptcp_pm_alloc_anno_list(msk, local)) {
+			if (!(local->flags & MPTCP_PM_ADDR_FLAG_NO_LISTEN) &&
+			    !local->addr.port) {
+				local->addr.port =
+					((struct inet_sock *)inet_sk
+					 ((struct sock *)msk))->inet_sport;
+
+				spin_unlock_bh(&msk->pm.lock);
+
+				lsk_ref = lsk_list_find_or_create(sock_net(sk), pernet,
+								  local, NULL);
+
+				spin_lock_bh(&msk->pm.lock);
+
+				if (!lsk_ref)
+					lsk_ref = lsk_list_find(pernet, &local->addr);
+
+				local->addr.port = 0;
+			}
+
+			if (mptcp_pm_alloc_anno_list(msk, local, lsk_ref)) {
 				__clear_bit(local->addr.id, msk->pm.id_avail_bitmap);
 				msk->pm.add_addr_signaled++;
 				mptcp_pm_announce_addr(msk, &local->addr, false);
 				mptcp_pm_nl_addr_send_ack(msk);
 			}
+
+			if (lsk_ref)
+				lsk_list_release(pernet, lsk_ref);
 		}
 	}
 
@@ -749,6 +784,7 @@ static unsigned int fill_local_addresses_vec(struct mptcp_sock *msk,
 }
 
 static void mptcp_pm_nl_add_addr_received(struct mptcp_sock *msk)
+	__must_hold(&msk->pm.lock)
 {
 	struct mptcp_addr_info addrs[MPTCP_PM_ADDR_MAX];
 	struct sock *sk = (struct sock *)msk;
@@ -1389,11 +1425,17 @@ int mptcp_pm_get_flags_and_ifindex_by_id(struct net *net, unsigned int id,
 static bool remove_anno_list_by_saddr(struct mptcp_sock *msk,
 				      struct mptcp_addr_info *addr)
 {
+	struct sock *sk = (struct sock *)msk;
 	struct mptcp_pm_add_entry *entry;
+	struct pm_nl_pernet *pernet;
+
+	pernet = net_generic(sock_net(sk), pm_nl_pernet_id);
 
 	entry = mptcp_pm_del_add_timer(msk, addr, false);
 	if (entry) {
 		list_del(&entry->list);
+		if (entry->lsk_ref)
+			lsk_list_release(pernet, entry->lsk_ref);
 		kfree(entry);
 		return true;
 	}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH mptcp-next v5 8/8] mptcp: expose server_side attribute in MPTCP netlink events
  2022-02-03  7:25 [PATCH mptcp-next v5 0/8] mptcp: fixes and enhancements related to path management Kishen Maloor
                   ` (6 preceding siblings ...)
  2022-02-03  7:25 ` [PATCH mptcp-next v5 7/8] mptcp: attempt to add listening sockets for announced addrs Kishen Maloor
@ 2022-02-03  7:25 ` Kishen Maloor
  2022-02-03  7:38   ` mptcp: expose server_side attribute in MPTCP netlink events: Build Failure MPTCP CI
  7 siblings, 1 reply; 17+ messages in thread
From: Kishen Maloor @ 2022-02-03  7:25 UTC (permalink / raw)
  To: kishen.maloor, mptcp

This change records the server_side attribute in MPTCP_EVENT_CREATED
and MPTCP_EVENT_ESTABLISHED events to inform the recipient of the role
of the associated MPTCP application (Client/Server) that is handling
it's end of the MPTCP connection.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/246
Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
---
 include/uapi/linux/mptcp.h | 1 +
 net/mptcp/pm_netlink.c     | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/include/uapi/linux/mptcp.h b/include/uapi/linux/mptcp.h
index 265cabc0d7aa..0df44a116a31 100644
--- a/include/uapi/linux/mptcp.h
+++ b/include/uapi/linux/mptcp.h
@@ -188,6 +188,7 @@ enum mptcp_event_attr {
 	MPTCP_ATTR_IF_IDX,	/* s32 */
 	MPTCP_ATTR_RESET_REASON,/* u32 */
 	MPTCP_ATTR_RESET_FLAGS, /* u32 */
+	MPTCP_ATTR_SERVER_SIDE,	/* u8 */
 
 	__MPTCP_ATTR_AFTER_LAST
 };
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 9b3d871d3712..eaa1a5a21192 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -2097,6 +2097,9 @@ static int mptcp_event_created(struct sk_buff *skb,
 	if (err)
 		return err;
 
+	if (nla_put_u8(skb, MPTCP_ATTR_SERVER_SIDE, READ_ONCE(msk->pm.server_side)))
+		return -EMSGSIZE;
+
 	return mptcp_event_add_subflow(skb, ssk);
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: mptcp: expose server_side attribute in MPTCP netlink events: Build Failure
  2022-02-03  7:25 ` [PATCH mptcp-next v5 8/8] mptcp: expose server_side attribute in MPTCP netlink events Kishen Maloor
@ 2022-02-03  7:38   ` MPTCP CI
  0 siblings, 0 replies; 17+ messages in thread
From: MPTCP CI @ 2022-02-03  7:38 UTC (permalink / raw)
  To: Kishen Maloor; +Cc: mptcp

Hi Kishen,

Thank you for your modifications, that's great!

But sadly, our CI spotted some issues with it when trying to build it.

You can find more details there:

  https://patchwork.kernel.org/project/mptcp/patch/20220203072508.3072309-9-kishen.maloor@intel.com/
  https://github.com/multipath-tcp/mptcp_net-next/actions/runs/1788178430

Status: failure
Initiator: MPTCPimporter
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/23ad466ddec7

Feel free to reply to this email if you cannot access logs, if you need
some support to fix the error, if this doesn't seem to be caused by your
modifications or if the error is a false positive one.

Cheers,
MPTCP GH Action bot

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH mptcp-next v5 5/8] mptcp: netlink: store per namespace list of refcounted listen socks
  2022-02-03  7:25 ` [PATCH mptcp-next v5 5/8] mptcp: netlink: store per namespace list of refcounted listen socks Kishen Maloor
@ 2022-02-03 17:46   ` Florian Westphal
  2022-02-03 20:09     ` Kishen Maloor
  2022-02-04  1:02     ` Mat Martineau
  0 siblings, 2 replies; 17+ messages in thread
From: Florian Westphal @ 2022-02-03 17:46 UTC (permalink / raw)
  To: Kishen Maloor; +Cc: mptcp

Kishen Maloor <kishen.maloor@intel.com> wrote:
> The kernel can create listening sockets bound to announced addresses
> via the ADD_ADDR option for receiving MP_JOIN requests. Path
> managers may further choose to advertise the same addr+port over multiple
> MPTCP connections. So this change provides a simple framework to
> manage a list of all distinct listning sockets created in the kernel
> over a namespace by encapsulating the socket in a structure that is
> ref counted and can be shared across multiple connections. The sockets
> are released when there are no more references.

I think it makes sense to work on a hook in tcp v4/v6 input path
that gets called for th->syn && !th->ack && no-listener-found case.

The hook would:
1. retrieve join token, fetch mptcp_sock and allow 3whs to continue
   if things look ok from mptcp p.o.v.
2. return "go ahead and send tcp rst" or "mptcp magic, skb stolen"
to the tcp stack.

This also makes sure that plain tcp or mptcp connect requests will
not work for addresses that did not go through socket/bind/listen API.

I will try to prototype something next week.

Given that hook lives in an error path (from tcp point of view)
I think its going to be OK from a upstreaming point of view.

It hopefully avoids the need for "magic listener sockets", and avoids
kernel fighting with userspace applications over which address:port
pairs are really useable.

The latter is a concern IMO, esp. with reuseport and other round-robin
schemes, I don't want mptcp layer to interfere with other application
running on same host.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH mptcp-next v5 5/8] mptcp: netlink: store per namespace list of refcounted listen socks
  2022-02-03 17:46   ` Florian Westphal
@ 2022-02-03 20:09     ` Kishen Maloor
  2022-02-03 20:35       ` Florian Westphal
  2022-02-04  1:02     ` Mat Martineau
  1 sibling, 1 reply; 17+ messages in thread
From: Kishen Maloor @ 2022-02-03 20:09 UTC (permalink / raw)
  To: Florian Westphal; +Cc: mptcp

On 2/3/22 9:46 AM, Florian Westphal wrote:
> Kishen Maloor <kishen.maloor@intel.com> wrote:
>> The kernel can create listening sockets bound to announced addresses
>> via the ADD_ADDR option for receiving MP_JOIN requests. Path
>> managers may further choose to advertise the same addr+port over multiple
>> MPTCP connections. So this change provides a simple framework to
>> manage a list of all distinct listning sockets created in the kernel
>> over a namespace by encapsulating the socket in a structure that is
>> ref counted and can be shared across multiple connections. The sockets
>> are released when there are no more references.
> 
> I think it makes sense to work on a hook in tcp v4/v6 input path
> that gets called for th->syn && !th->ack && no-listener-found case.
> 
> The hook would:
> 1. retrieve join token, fetch mptcp_sock and allow 3whs to continue
>    if things look ok from mptcp p.o.v.
> 2. return "go ahead and send tcp rst" or "mptcp magic, skb stolen"
> to the tcp stack.
> 
> This also makes sure that plain tcp or mptcp connect requests will
> not work for addresses that did not go through socket/bind/listen API.
> 
> I will try to prototype something next week.
> 

Sounds good.

> Given that hook lives in an error path (from tcp point of view)
> I think its going to be OK from a upstreaming point of view.
> 
> It hopefully avoids the need for "magic listener sockets", and avoids
> kernel fighting with userspace applications over which address:port
> pairs are really useable.
> 

Will this also obviate the need for listeners we currently create for port-based
endpoints?

> The latter is a concern IMO, esp. with reuseport and other round-robin
> schemes, I don't want mptcp layer to interfere with other application
> running on same host.

Indeed if there are active/legacy TCP deployments that cannot be reconfigured with the 
NO_LISTEN flag, then we could choose to stick with the current default behavior
and introduce a LISTEN flag (and additionally a NO_LISTEN flag to not create listeners for
port-based endpoints as discussed earlier today). Further, if it's possible, we could 
also update the MPTCP layer to not accept MPC attempts over listeners created in the 
kernel to address that matter?

My only immediate concern (on the surface) with your proposal (which may not actually
be a worry upon assessment) is any potential risks of spurious MPJ
attempts over what might be a "slow"(?) error path - just an early thought for 
consideration. But look forward to seeing your changes!

Cheers,
-Kishen. 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH mptcp-next v5 5/8] mptcp: netlink: store per namespace list of refcounted listen socks
  2022-02-03 20:09     ` Kishen Maloor
@ 2022-02-03 20:35       ` Florian Westphal
  0 siblings, 0 replies; 17+ messages in thread
From: Florian Westphal @ 2022-02-03 20:35 UTC (permalink / raw)
  To: Kishen Maloor; +Cc: Florian Westphal, mptcp

Kishen Maloor <kishen.maloor@intel.com> wrote:
> > Given that hook lives in an error path (from tcp point of view)
> > I think its going to be OK from a upstreaming point of view.
> > 
> > It hopefully avoids the need for "magic listener sockets", and avoids
> > kernel fighting with userspace applications over which address:port
> > pairs are really useable.
> > 
> 
> Will this also obviate the need for listeners we currently create for port-based
> endpoints?

Hopefully yes.

> Indeed if there are active/legacy TCP deployments that cannot be reconfigured with the 
> NO_LISTEN flag, then we could choose to stick with the current default behavior
> and introduce a LISTEN flag (and additionally a NO_LISTEN flag to not create listeners for
> port-based endpoints as discussed earlier today). Further, if it's possible, we could 
> also update the MPTCP layer to not accept MPC attempts over listeners created in the 
> kernel to address that matter?

Yes, we could do that, I suggest to wait and see how the "syn/join hook"
works out.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH mptcp-next v5 5/8] mptcp: netlink: store per namespace list of refcounted listen socks
  2022-02-03 17:46   ` Florian Westphal
  2022-02-03 20:09     ` Kishen Maloor
@ 2022-02-04  1:02     ` Mat Martineau
  2022-02-04  9:47       ` Paolo Abeni
  1 sibling, 1 reply; 17+ messages in thread
From: Mat Martineau @ 2022-02-04  1:02 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Kishen Maloor, mptcp

On Thu, 3 Feb 2022, Florian Westphal wrote:

> Kishen Maloor <kishen.maloor@intel.com> wrote:
>> The kernel can create listening sockets bound to announced addresses
>> via the ADD_ADDR option for receiving MP_JOIN requests. Path
>> managers may further choose to advertise the same addr+port over multiple
>> MPTCP connections. So this change provides a simple framework to
>> manage a list of all distinct listning sockets created in the kernel
>> over a namespace by encapsulating the socket in a structure that is
>> ref counted and can be shared across multiple connections. The sockets
>> are released when there are no more references.
>
> I think it makes sense to work on a hook in tcp v4/v6 input path
> that gets called for th->syn && !th->ack && no-listener-found case.
>
> The hook would:
> 1. retrieve join token, fetch mptcp_sock and allow 3whs to continue
>   if things look ok from mptcp p.o.v.
> 2. return "go ahead and send tcp rst" or "mptcp magic, skb stolen"
> to the tcp stack.
>

This is basically the approach the multipath-tcp.org kernel takes. I think 
we initially decided to use listening sockets instead since it was less 
invasive, but now we have found the limits of the listener approach.

It even looks like your proposal adheres more closely to 
https://datatracker.ietf.org/doc/html/rfc8684.html#section-3.2 (last 
paragraph): "Demultiplexing subflow SYNs MUST be done using the token"

I hadn't noticed that "MUST" before.

Do you think the token lookups should depend on anything other than the 
token value and net namespace - for example, to make sure someone can't 
use a public interface to try to brute-force potential tokens in use on 
other interfaces? The HMACs later in the handshake would guard against an 
actual connection (but would burn some extra CPU cycles). I guess this 
isn't a fundamentally different problem than we have today if there are 
any MPTCP listeners on public interfaces, it just means they don't have to 
work out the port number first.

> This also makes sure that plain tcp or mptcp connect requests will
> not work for addresses that did not go through socket/bind/listen API.
>
> I will try to prototype something next week.
>

Thanks!

> Given that hook lives in an error path (from tcp point of view)
> I think its going to be OK from a upstreaming point of view.
>

Seems reasonable to me.

> It hopefully avoids the need for "magic listener sockets", and avoids
> kernel fighting with userspace applications over which address:port
> pairs are really useable.
>
> The latter is a concern IMO, esp. with reuseport and other round-robin
> schemes, I don't want mptcp layer to interfere with other application
> running on same host.

The complexity of the magic listener sockets - creating and managing them, 
as well as mysterious interactions with userspace - does seem more 
significant than the TCP changes, so I hope the prototype works out.


--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH mptcp-next v5 5/8] mptcp: netlink: store per namespace list of refcounted listen socks
  2022-02-04  1:02     ` Mat Martineau
@ 2022-02-04  9:47       ` Paolo Abeni
  0 siblings, 0 replies; 17+ messages in thread
From: Paolo Abeni @ 2022-02-04  9:47 UTC (permalink / raw)
  To: Mat Martineau, Florian Westphal; +Cc: Kishen Maloor, mptcp

On Thu, 2022-02-03 at 17:02 -0800, Mat Martineau wrote:
> On Thu, 3 Feb 2022, Florian Westphal wrote:
> 
> > Kishen Maloor <kishen.maloor@intel.com> wrote:
> > > The kernel can create listening sockets bound to announced addresses
> > > via the ADD_ADDR option for receiving MP_JOIN requests. Path
> > > managers may further choose to advertise the same addr+port over multiple
> > > MPTCP connections. So this change provides a simple framework to
> > > manage a list of all distinct listning sockets created in the kernel
> > > over a namespace by encapsulating the socket in a structure that is
> > > ref counted and can be shared across multiple connections. The sockets
> > > are released when there are no more references.
> > 
> > I think it makes sense to work on a hook in tcp v4/v6 input path
> > that gets called for th->syn && !th->ack && no-listener-found case.
> > 
> > The hook would:
> > 1. retrieve join token, fetch mptcp_sock and allow 3whs to continue
> >   if things look ok from mptcp p.o.v.
> > 2. return "go ahead and send tcp rst" or "mptcp magic, skb stolen"
> > to the tcp stack.
> > 
> 
> This is basically the approach the multipath-tcp.org kernel takes. I think 
> we initially decided to use listening sockets instead since it was less 
> invasive, but now we have found the limits of the listener approach.
> 
> It even looks like your proposal adheres more closely to 
> https://datatracker.ietf.org/doc/html/rfc8684.html#section-3.2 (last 
> paragraph): "Demultiplexing subflow SYNs MUST be done using the token"
> 
> I hadn't noticed that "MUST" before.
> 
> Do you think the token lookups should depend on anything other than the 
> token value and net namespace - for example, to make sure someone can't 
> use a public interface to try to brute-force potential tokens in use on 
> other interfaces? The HMACs later in the handshake would guard against an 
> actual connection (but would burn some extra CPU cycles). I guess this 
> isn't a fundamentally different problem than we have today if there are 
> any MPTCP listeners on public interfaces, it just means they don't have to 
> work out the port number first.

The matching MPJ token could/should still be validated vs (signal)
endpoints available in the relevant ns. Attackers should be able to
brute force only via IPs that are reachables and announced as signal
endpoints. It should be safe IMHO.

Overall LGTM

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH mptcp-next v5 7/8] mptcp: attempt to add listening sockets for announced addrs
  2022-02-03  7:25 ` [PATCH mptcp-next v5 7/8] mptcp: attempt to add listening sockets for announced addrs Kishen Maloor
@ 2022-02-04 13:52   ` Geliang Tang
  0 siblings, 0 replies; 17+ messages in thread
From: Geliang Tang @ 2022-02-04 13:52 UTC (permalink / raw)
  To: Kishen Maloor; +Cc: MPTCP Upstream

Kishen Maloor <kishen.maloor@intel.com> 于2022年2月3日周四 15:25写道:
>
> When ADD_ADDR announcements use the port associated with an
> active subflow, this change ensures that a listening socket is bound
> to the announced addr+port in the kernel for subsequently receiving
> MP_JOINs. But if a listening socket for this address is already held
> by the application then no action is taken.
>
> A listening socket is created (when there isn't a listener)
> just prior to the addr advertisement. If it is desired to not create
> a listening socket in the kernel for an address, then this can be
> requested by including the MPTCP_PM_ADDR_FLAG_NO_LISTEN flag
> with the address.
>
> When a listening socket is created, it is stored in
> struct mptcp_pm_add_entry and released accordingly.
>
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/203
> Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
> ---
> v2: fixed formatting
> v3: added new addr flag MPTCP_PM_ADDR_FLAG_NO_LISTEN to skip creating a
> listening socket in the kernel during an ADD_ADDR request, use this flag
> along the in-kernel PM flow for ADD_ADDR requests (Note: listening sockets
> are always created for port-based endpoints as before), use the
> lsk_list_find_or_create() helper
> v4: call lsk_list_find() after a failed lsk_list_find_or_create()
> for a chance to retrieve a recently created lsk by a simultaneous
> call
> ---
>  include/uapi/linux/mptcp.h |  1 +
>  net/mptcp/pm_netlink.c     | 46 ++++++++++++++++++++++++++++++++++++--
>  2 files changed, 45 insertions(+), 2 deletions(-)
>
> diff --git a/include/uapi/linux/mptcp.h b/include/uapi/linux/mptcp.h
> index f106a3941cdf..265cabc0d7aa 100644
> --- a/include/uapi/linux/mptcp.h
> +++ b/include/uapi/linux/mptcp.h
> @@ -81,6 +81,7 @@ enum {
>  #define MPTCP_PM_ADDR_FLAG_SUBFLOW                     (1 << 1)
>  #define MPTCP_PM_ADDR_FLAG_BACKUP                      (1 << 2)
>  #define MPTCP_PM_ADDR_FLAG_FULLMESH                    (1 << 3)
> +#define MPTCP_PM_ADDR_FLAG_NO_LISTEN                   (1 << 4)
>
>  enum {
>         MPTCP_PM_CMD_UNSPEC,
> diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
> index a4fb9acbba51..9b3d871d3712 100644
> --- a/net/mptcp/pm_netlink.c
> +++ b/net/mptcp/pm_netlink.c
> @@ -43,6 +43,7 @@ struct mptcp_pm_add_entry {
>         struct mptcp_addr_info  addr;
>         struct timer_list       add_timer;
>         struct mptcp_sock       *sock;
> +       struct mptcp_local_lsk  *lsk_ref;
>         u8                      retrans_times;
>  };
>
> @@ -469,7 +470,8 @@ mptcp_pm_del_add_timer(struct mptcp_sock *msk,
>  }
>
>  static bool mptcp_pm_alloc_anno_list(struct mptcp_sock *msk,
> -                                    struct mptcp_pm_addr_entry *entry)
> +                                    struct mptcp_pm_addr_entry *entry,
> +                                    struct mptcp_local_lsk *lsk_ref)
>  {
>         struct mptcp_pm_add_entry *add_entry = NULL;
>         struct sock *sk = (struct sock *)msk;
> @@ -489,6 +491,10 @@ static bool mptcp_pm_alloc_anno_list(struct mptcp_sock *msk,
>         add_entry->addr = entry->addr;
>         add_entry->sock = msk;
>         add_entry->retrans_times = 0;
> +       add_entry->lsk_ref = lsk_ref;
> +
> +       if (lsk_ref)
> +               lsk_list_add_ref(lsk_ref);
>
>         timer_setup(&add_entry->add_timer, mptcp_pm_add_timer, 0);
>         sk_reset_timer(sk, &add_entry->add_timer,
> @@ -501,8 +507,11 @@ void mptcp_pm_free_anno_list(struct mptcp_sock *msk)
>  {
>         struct mptcp_pm_add_entry *entry, *tmp;
>         struct sock *sk = (struct sock *)msk;
> +       struct pm_nl_pernet *pernet;
>         LIST_HEAD(free_list);
>
> +       pernet = net_generic(sock_net(sk), pm_nl_pernet_id);
> +
>         pr_debug("msk=%p", msk);
>
>         spin_lock_bh(&msk->pm.lock);
> @@ -511,6 +520,8 @@ void mptcp_pm_free_anno_list(struct mptcp_sock *msk)
>
>         list_for_each_entry_safe(entry, tmp, &free_list, list) {
>                 sk_stop_timer_sync(sk, &entry->add_timer);
> +               if (entry->lsk_ref)
> +                       lsk_list_release(pernet, entry->lsk_ref);
>                 kfree(entry);
>         }
>  }
> @@ -615,7 +626,9 @@ lookup_id_by_addr(struct pm_nl_pernet *pernet, const struct mptcp_addr_info *add
>  }
>
>  static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
> +       __must_hold(&msk->pm.lock)
>  {
> +       struct mptcp_local_lsk *lsk_ref = NULL;
>         struct sock *sk = (struct sock *)msk;
>         struct mptcp_pm_addr_entry *local;
>         unsigned int add_addr_signal_max;
> @@ -652,12 +665,34 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
>                 local = select_signal_address(pernet, msk);
>
>                 if (local) {
> -                       if (mptcp_pm_alloc_anno_list(msk, local)) {
> +                       if (!(local->flags & MPTCP_PM_ADDR_FLAG_NO_LISTEN) &&
> +                           !local->addr.port) {
> +                               local->addr.port =
> +                                       ((struct inet_sock *)inet_sk
> +                                        ((struct sock *)msk))->inet_sport;

How about putting them in one line:
local->addr.port = ((struct inet_sock *)inet_sk(sk))->inet_sport;

> +
> +                               spin_unlock_bh(&msk->pm.lock);
> +
> +                               lsk_ref = lsk_list_find_or_create(sock_net(sk), pernet,
> +                                                                 local, NULL);
> +
> +                               spin_lock_bh(&msk->pm.lock);
> +
> +                               if (!lsk_ref)
> +                                       lsk_ref = lsk_list_find(pernet, &local->addr);
> +
> +                               local->addr.port = 0;
> +                       }
> +
> +                       if (mptcp_pm_alloc_anno_list(msk, local, lsk_ref)) {
>                                 __clear_bit(local->addr.id, msk->pm.id_avail_bitmap);
>                                 msk->pm.add_addr_signaled++;
>                                 mptcp_pm_announce_addr(msk, &local->addr, false);
>                                 mptcp_pm_nl_addr_send_ack(msk);
>                         }
> +
> +                       if (lsk_ref)
> +                               lsk_list_release(pernet, lsk_ref);
>                 }
>         }
>
> @@ -749,6 +784,7 @@ static unsigned int fill_local_addresses_vec(struct mptcp_sock *msk,
>  }
>
>  static void mptcp_pm_nl_add_addr_received(struct mptcp_sock *msk)
> +       __must_hold(&msk->pm.lock)
>  {
>         struct mptcp_addr_info addrs[MPTCP_PM_ADDR_MAX];
>         struct sock *sk = (struct sock *)msk;
> @@ -1389,11 +1425,17 @@ int mptcp_pm_get_flags_and_ifindex_by_id(struct net *net, unsigned int id,
>  static bool remove_anno_list_by_saddr(struct mptcp_sock *msk,
>                                       struct mptcp_addr_info *addr)
>  {
> +       struct sock *sk = (struct sock *)msk;
>         struct mptcp_pm_add_entry *entry;
> +       struct pm_nl_pernet *pernet;
> +
> +       pernet = net_generic(sock_net(sk), pm_nl_pernet_id);
>
>         entry = mptcp_pm_del_add_timer(msk, addr, false);
>         if (entry) {
>                 list_del(&entry->list);
> +               if (entry->lsk_ref)
> +                       lsk_list_release(pernet, entry->lsk_ref);
>                 kfree(entry);
>                 return true;
>         }
> --
> 2.31.1
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH mptcp-next v5 6/8] mptcp: netlink: store lsk ref in mptcp_pm_addr_entry
  2022-02-03  7:25 ` [PATCH mptcp-next v5 6/8] mptcp: netlink: store lsk ref in mptcp_pm_addr_entry Kishen Maloor
@ 2022-02-16  3:56   ` Geliang Tang
  0 siblings, 0 replies; 17+ messages in thread
From: Geliang Tang @ 2022-02-16  3:56 UTC (permalink / raw)
  To: Kishen Maloor; +Cc: MPTCP Upstream

Hi Kishen,

Kishen Maloor <kishen.maloor@intel.com> 于2022年2月3日周四 15:25写道:
>
> This change updates struct mptcp_pm_addr_entry to store a
> listening socket (lsk) reference, i.e. a pointer to a reference
> counted structure containing the lsk (struct socket *) instead
> of the lsk itself. Code blocks that previously operated on
> the lsk in struct mptcp_pm_addr_entry have been updated to work
> with the lsk ref instead, utilizing new helper functions.
>
> Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
> ---
> v2: fixed formatting
> v3: added helper lsk_list_find_or_create(), updated
> mptcp_pm_nl_create_listen_socket() to take struct net* as param
> v4: call lsk_list_find() after a failed lsk_list_find_or_create()
> for a chance to retrieve a recently created lsk by a simultaneous
> call
> v5: fixed implicit declaration error
> ---
>  net/mptcp/pm_netlink.c | 83 +++++++++++++++++++++++++++++++-----------
>  1 file changed, 62 insertions(+), 21 deletions(-)
>
> diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
> index 3d6251baef26..a4fb9acbba51 100644
> --- a/net/mptcp/pm_netlink.c
> +++ b/net/mptcp/pm_netlink.c
> @@ -35,7 +35,7 @@ struct mptcp_pm_addr_entry {
>         struct mptcp_addr_info  addr;
>         u8                      flags;
>         int                     ifindex;
> -       struct socket           *lsk;
> +       struct mptcp_local_lsk  *lsk_ref;
>  };
>
>  struct mptcp_pm_add_entry {
> @@ -66,6 +66,10 @@ struct pm_nl_pernet {
>  #define MPTCP_PM_ADDR_MAX      8
>  #define ADD_ADDR_RETRANS_MAX   3
>
> +static int mptcp_pm_nl_create_listen_socket(struct net *net,
> +                                           struct mptcp_pm_addr_entry *entry,
> +                                           struct socket **lsk);
> +
>  static bool addresses_equal(const struct mptcp_addr_info *a,
>                             const struct mptcp_addr_info *b, bool use_port)
>  {
> @@ -157,6 +161,33 @@ static void lsk_list_release(struct pm_nl_pernet *pernet,
>         }
>  }
>
> +static struct mptcp_local_lsk *lsk_list_find_or_create(struct net *net,
> +                                                      struct pm_nl_pernet *pernet,
> +                                                      struct mptcp_pm_addr_entry *entry,
> +                                                      int *createlsk_err)
> +{
> +       struct mptcp_local_lsk *lsk_ref;
> +       struct socket *lsk;
> +       int err;
> +
> +       lsk_ref = lsk_list_find(pernet, &entry->addr);
> +
> +       if (!lsk_ref) {
> +               err = mptcp_pm_nl_create_listen_socket(net, entry, &lsk);
> +
> +               if (createlsk_err)
> +                       *createlsk_err = err;
> +
> +               if (lsk)
> +                       lsk_ref = lsk_list_add(pernet, &entry->addr, lsk);
> +
> +               if (lsk && !lsk_ref)
> +                       sock_release(lsk);
> +       }
> +
> +       return lsk_ref;
> +}
> +
>  static bool address_zero(const struct mptcp_addr_info *addr)
>  {
>         struct mptcp_addr_info zero;
> @@ -999,8 +1030,9 @@ static int mptcp_pm_nl_append_new_local_addr(struct pm_nl_pernet *pernet,
>         return ret;
>  }
>
> -static int mptcp_pm_nl_create_listen_socket(struct sock *sk,
> -                                           struct mptcp_pm_addr_entry *entry)
> +static int mptcp_pm_nl_create_listen_socket(struct net *net,
> +                                           struct mptcp_pm_addr_entry *entry,
> +                                           struct socket **lsk)
>  {
>         int addrlen = sizeof(struct sockaddr_in);
>         struct sockaddr_storage addr;
> @@ -1009,12 +1041,12 @@ static int mptcp_pm_nl_create_listen_socket(struct sock *sk,
>         int backlog = 1024;
>         int err;
>
> -       err = sock_create_kern(sock_net(sk), entry->addr.family,
> -                              SOCK_STREAM, IPPROTO_MPTCP, &entry->lsk);
> +       err = sock_create_kern(net, entry->addr.family,
> +                              SOCK_STREAM, IPPROTO_MPTCP, lsk);
>         if (err)
>                 return err;
>
> -       msk = mptcp_sk(entry->lsk->sk);
> +       msk = mptcp_sk((*lsk)->sk);
>         if (!msk) {
>                 err = -EINVAL;
>                 goto out;
> @@ -1046,7 +1078,8 @@ static int mptcp_pm_nl_create_listen_socket(struct sock *sk,
>         return 0;
>
>  out:
> -       sock_release(entry->lsk);
> +       sock_release(*lsk);
> +       *lsk = NULL;
>         return err;
>  }
>
> @@ -1095,7 +1128,7 @@ int mptcp_pm_nl_get_local_id(struct mptcp_sock *msk, struct sock_common *skc)
>         entry->addr.port = 0;
>         entry->ifindex = 0;
>         entry->flags = 0;
> -       entry->lsk = NULL;
> +       entry->lsk_ref = NULL;
>         ret = mptcp_pm_nl_append_new_local_addr(pernet, entry);
>         if (ret < 0)
>                 kfree(entry);
> @@ -1304,18 +1337,25 @@ static int mptcp_nl_cmd_add_addr(struct sk_buff *skb, struct genl_info *info)
>
>         *entry = addr;
>         if (entry->addr.port) {
> -               ret = mptcp_pm_nl_create_listen_socket(skb->sk, entry);
> -               if (ret) {
> -                       GENL_SET_ERR_MSG(info, "create listen socket error");
> +               entry->lsk_ref = lsk_list_find_or_create(sock_net(skb->sk), pernet, entry, &ret);
> +
> +               if (!entry->lsk_ref)
> +                       entry->lsk_ref = lsk_list_find(pernet, &entry->addr);
> +
> +               if (!entry->lsk_ref) {
> +                       GENL_SET_ERR_MSG(info, "can't create/allocate lsk");
>                         kfree(entry);
> +                       ret = (ret == 0) ? -ENOMEM : ret;
>                         return ret;
>                 }
>         }
> +

Blank lines aren't necessary here,

>         ret = mptcp_pm_nl_append_new_local_addr(pernet, entry);
> +

and here.

Thanks,

Geliang
SUSE

>         if (ret < 0) {
>                 GENL_SET_ERR_MSG(info, "too many addresses or duplicate one");
> -               if (entry->lsk)
> -                       sock_release(entry->lsk);
> +               if (entry->lsk_ref)
> +                       lsk_list_release(pernet, entry->lsk_ref);
>                 kfree(entry);
>                 return ret;
>         }
> @@ -1418,10 +1458,11 @@ static int mptcp_nl_remove_subflow_and_signal_addr(struct net *net,
>  }
>
>  /* caller must ensure the RCU grace period is already elapsed */
> -static void __mptcp_pm_release_addr_entry(struct mptcp_pm_addr_entry *entry)
> +static void __mptcp_pm_release_addr_entry(struct pm_nl_pernet *pernet,
> +                                         struct mptcp_pm_addr_entry *entry)
>  {
> -       if (entry->lsk)
> -               sock_release(entry->lsk);
> +       if (entry->lsk_ref)
> +               lsk_list_release(pernet, entry->lsk_ref);
>         kfree(entry);
>  }
>
> @@ -1503,7 +1544,7 @@ static int mptcp_nl_cmd_del_addr(struct sk_buff *skb, struct genl_info *info)
>
>         mptcp_nl_remove_subflow_and_signal_addr(sock_net(skb->sk), &entry->addr);
>         synchronize_rcu();
> -       __mptcp_pm_release_addr_entry(entry);
> +       __mptcp_pm_release_addr_entry(pernet, entry);
>
>         return ret;
>  }
> @@ -1559,7 +1600,7 @@ static void mptcp_nl_remove_addrs_list(struct net *net,
>  }
>
>  /* caller must ensure the RCU grace period is already elapsed */
> -static void __flush_addrs(struct list_head *list)
> +static void __flush_addrs(struct pm_nl_pernet *pernet, struct list_head *list)
>  {
>         while (!list_empty(list)) {
>                 struct mptcp_pm_addr_entry *cur;
> @@ -1567,7 +1608,7 @@ static void __flush_addrs(struct list_head *list)
>                 cur = list_entry(list->next,
>                                  struct mptcp_pm_addr_entry, list);
>                 list_del_rcu(&cur->list);
> -               __mptcp_pm_release_addr_entry(cur);
> +               __mptcp_pm_release_addr_entry(pernet, cur);
>         }
>  }
>
> @@ -1592,7 +1633,7 @@ static int mptcp_nl_cmd_flush_addrs(struct sk_buff *skb, struct genl_info *info)
>         spin_unlock_bh(&pernet->lock);
>         mptcp_nl_remove_addrs_list(sock_net(skb->sk), &free_list);
>         synchronize_rcu();
> -       __flush_addrs(&free_list);
> +       __flush_addrs(pernet, &free_list);
>         return 0;
>  }
>
> @@ -2242,7 +2283,7 @@ static void __net_exit pm_nl_exit_net(struct list_head *net_list)
>                  * other modifiers, also netns core already waited for a
>                  * RCU grace period.
>                  */
> -               __flush_addrs(&pernet->local_addr_list);
> +               __flush_addrs(pernet, &pernet->local_addr_list);
>         }
>  }
>
> --
> 2.31.1
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-02-16  3:55 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-03  7:25 [PATCH mptcp-next v5 0/8] mptcp: fixes and enhancements related to path management Kishen Maloor
2022-02-03  7:25 ` [PATCH mptcp-next v5 1/8] mptcp: bypass in-kernel PM restrictions for non-kernel PMs Kishen Maloor
2022-02-03  7:25 ` [PATCH mptcp-next v5 2/8] mptcp: store remote id from MP_JOIN SYN/ACK in local ctx Kishen Maloor
2022-02-03  7:25 ` [PATCH mptcp-next v5 3/8] mptcp: reflect remote port (not 0) in ANNOUNCED events Kishen Maloor
2022-02-03  7:25 ` [PATCH mptcp-next v5 4/8] mptcp: establish subflows from either end of connection Kishen Maloor
2022-02-03  7:25 ` [PATCH mptcp-next v5 5/8] mptcp: netlink: store per namespace list of refcounted listen socks Kishen Maloor
2022-02-03 17:46   ` Florian Westphal
2022-02-03 20:09     ` Kishen Maloor
2022-02-03 20:35       ` Florian Westphal
2022-02-04  1:02     ` Mat Martineau
2022-02-04  9:47       ` Paolo Abeni
2022-02-03  7:25 ` [PATCH mptcp-next v5 6/8] mptcp: netlink: store lsk ref in mptcp_pm_addr_entry Kishen Maloor
2022-02-16  3:56   ` Geliang Tang
2022-02-03  7:25 ` [PATCH mptcp-next v5 7/8] mptcp: attempt to add listening sockets for announced addrs Kishen Maloor
2022-02-04 13:52   ` Geliang Tang
2022-02-03  7:25 ` [PATCH mptcp-next v5 8/8] mptcp: expose server_side attribute in MPTCP netlink events Kishen Maloor
2022-02-03  7:38   ` mptcp: expose server_side attribute in MPTCP netlink events: Build Failure MPTCP CI

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.