All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH mptcp 00/10] mptcp: add SOL_SOCKET support
@ 2021-04-08 18:49 Florian Westphal
  2021-04-08 18:49 ` [PATCH mptcp 01/10] mptcp: add skeleton to sync msk socket options to subflows Florian Westphal
                   ` (9 more replies)
  0 siblings, 10 replies; 16+ messages in thread
From: Florian Westphal @ 2021-04-08 18:49 UTC (permalink / raw)
  To: mptcp; +Cc: Florian Westphal

This patch set improves support for several SOL_SOCKET tuneables,
addressing comments received for v1 patch set.

First patch adds skeleton synchronization functions to copy mptcp socket
settings to a subflow socket.

Second patch adds sequence counting scheme to avoid re-sync when subflow
and mptcp-level socket are known to have the same setting applied.

Notable changes:
I removed the sequence number from patch 1 and split it into second
patch to make those bits clearer.

TCP_CONGESTION is now applied to all subflows.

TCP_INFO retrieves info from the first subflow.  It seems better to
add MPTCP_INFO from out-of-tree patch set in a future change for
userspace that wants mptcp-level flow statistics.

I've not changed SO_LINGER either, this should be updated in a
future change to send a FASTCLOSE when mptcp socket gets closed
while the linger time is 0.

The last patch adds a simple test to check SO_MARK replication.

Florian Westphal (10):
  mptcp: add skeleton to sync msk socket options to subflows
  mptcp: tag sequence_seq with socket state
  mptcp: setsockopt: handle SO_KEEPALIVE and SO_PRIORITY
  mptcp: setsockopt: handle receive/send buffer and device bind
  mptcp: setsockopt: support SO_LINGER
  mptcp: setsockopt: add SO_MARK support
  mptcp: setsockopt: add SO_INCOMING_CPU
  mptcp: setsockopt: SO_DEBUG and no-op options
  mptcp: sockopt: add TCP_CONGESTION and TCP_INFO
  selftests: mptcp: add packet mark test case

 net/mptcp/protocol.c                          |  55 ++-
 net/mptcp/protocol.h                          |  11 +
 net/mptcp/sockopt.c                           | 404 ++++++++++++++++++
 net/mptcp/subflow.c                           |   5 +
 tools/testing/selftests/net/mptcp/Makefile    |   2 +-
 .../selftests/net/mptcp/mptcp_connect.c       |  23 +-
 .../selftests/net/mptcp/mptcp_sockopt.sh      | 276 ++++++++++++
 7 files changed, 765 insertions(+), 11 deletions(-)
 create mode 100755 tools/testing/selftests/net/mptcp/mptcp_sockopt.sh

-- 
2.26.3


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH mptcp 01/10] mptcp: add skeleton to sync msk socket options to subflows
  2021-04-08 18:49 [PATCH mptcp 00/10] mptcp: add SOL_SOCKET support Florian Westphal
@ 2021-04-08 18:49 ` Florian Westphal
  2021-04-08 18:49 ` [PATCH mptcp 02/10] mptcp: tag sequence_seq with socket state Florian Westphal
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Florian Westphal @ 2021-04-08 18:49 UTC (permalink / raw)
  To: mptcp; +Cc: Florian Westphal

Handle following cases:
1. setsockopt is called with multiple subflows.
   Change might have to be mirrored to all of them.
   This is done directly in process context/setsockopt call.
2. Outgoing subflow is created after one or several setsockopt()
   calls have been made.  Old setsockopt changes should be
   synced to the new socket.
3. Incoming subflow, after setsockopt call(s).

Cases 2 and 3 are handled right after the join list is spliced to the conn
list.

Not all sockopt values can be just be copied by value, some require
helper calls.  Those can acquire socket lock (which can sleep).

If the join->conn list splicing is done from preemptible context,
synchronization can be done right away, otherwise its deferred to work
queue.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/mptcp/protocol.c | 43 +++++++++++++++++++++++++++++++++++--------
 net/mptcp/protocol.h |  7 +++++++
 net/mptcp/sockopt.c  | 19 +++++++++++++++++++
 net/mptcp/subflow.c  |  1 +
 4 files changed, 62 insertions(+), 8 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 2d895c3c8746..2233baf03c3a 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -721,18 +721,44 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk)
 		sk->sk_data_ready(sk);
 }
 
-void __mptcp_flush_join_list(struct mptcp_sock *msk)
+static bool mptcp_do_flush_join_list(struct mptcp_sock *msk)
 {
 	struct mptcp_subflow_context *subflow;
+	bool ret;
 
 	if (likely(list_empty(&msk->join_list)))
-		return;
+		return false;
 
+	ret = true;
 	spin_lock_bh(&msk->join_list_lock);
 	list_for_each_entry(subflow, &msk->join_list, node)
 		mptcp_propagate_sndbuf((struct sock *)msk, mptcp_subflow_tcp_sock(subflow));
+
 	list_splice_tail_init(&msk->join_list, &msk->conn_list);
 	spin_unlock_bh(&msk->join_list_lock);
+
+	return ret;
+}
+
+void __mptcp_flush_join_list(struct mptcp_sock *msk)
+{
+	if (likely(!mptcp_do_flush_join_list(msk)))
+		return;
+
+	if (!test_and_set_bit(MPTCP_WORK_SYNC_SETSOCKOPT, &msk->flags))
+		mptcp_schedule_work((struct sock *)msk);
+}
+
+static void mptcp_flush_join_list(struct mptcp_sock *msk)
+{
+	bool sync_needed = test_and_clear_bit(MPTCP_WORK_SYNC_SETSOCKOPT, &msk->flags);
+
+	might_sleep();
+
+	if (!mptcp_do_flush_join_list(msk) && !sync_needed)
+		return;
+
+	mptcp_sockopt_sync_all(msk);
 }
 
 static bool mptcp_timer_pending(struct sock *sk)
@@ -1445,7 +1471,7 @@ static void __mptcp_push_pending(struct sock *sk, unsigned int flags)
 			int ret = 0;
 
 			prev_ssk = ssk;
-			__mptcp_flush_join_list(msk);
+			mptcp_flush_join_list(msk);
 			ssk = mptcp_subflow_get_send(msk);
 
 			/* try to keep the subflow socket lock across
@@ -1871,7 +1897,7 @@ static bool __mptcp_move_skbs(struct mptcp_sock *msk)
 	unsigned int moved = 0;
 	bool ret, done;
 
-	__mptcp_flush_join_list(msk);
+	mptcp_flush_join_list(msk);
 	do {
 		struct sock *ssk = mptcp_subflow_recv_lookup(msk);
 		bool slowpath;
@@ -2295,7 +2321,7 @@ static void mptcp_worker(struct work_struct *work)
 		goto unlock;
 
 	mptcp_check_data_fin_ack(sk);
-	__mptcp_flush_join_list(msk);
+	mptcp_flush_join_list(msk);
 
 	mptcp_check_fastclose(msk);
 
@@ -2499,7 +2525,7 @@ static void __mptcp_check_send_data_fin(struct sock *sk)
 		}
 	}
 
-	__mptcp_flush_join_list(msk);
+	mptcp_flush_join_list(msk);
 	mptcp_for_each_subflow(msk, subflow) {
 		struct sock *tcp_sk = mptcp_subflow_tcp_sock(subflow);
 
@@ -2636,7 +2662,8 @@ static int mptcp_disconnect(struct sock *sk, int flags)
 	struct mptcp_subflow_context *subflow;
 	struct mptcp_sock *msk = mptcp_sk(sk);
 
-	__mptcp_flush_join_list(msk);
+	mptcp_do_flush_join_list(msk);
+
 	mptcp_for_each_subflow(msk, subflow) {
 		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
 
@@ -3204,7 +3231,7 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
 		/* set ssk->sk_socket of accept()ed flows to mptcp socket.
 		 * This is needed so NOSPACE flag can be set from tcp stack.
 		 */
-		__mptcp_flush_join_list(msk);
+		mptcp_flush_join_list(msk);
 		mptcp_for_each_subflow(msk, subflow) {
 			struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
 
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index d9a489c73029..55fe05a40216 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -108,6 +108,7 @@
 #define MPTCP_CLEAN_UNA		7
 #define MPTCP_ERROR_REPORT	8
 #define MPTCP_RETRANSMIT	9
+#define MPTCP_WORK_SYNC_SETSOCKOPT 10
 
 static inline bool before64(__u64 seq1, __u64 seq2)
 {
@@ -747,6 +748,12 @@ unsigned int mptcp_pm_get_add_addr_accept_max(struct mptcp_sock *msk);
 unsigned int mptcp_pm_get_subflows_max(struct mptcp_sock *msk);
 unsigned int mptcp_pm_get_local_addr_max(struct mptcp_sock *msk);
 
+int mptcp_setsockopt(struct sock *sk, int level, int optname,
+		     sockptr_t optval, unsigned int optlen);
+
+void mptcp_sockopt_sync(struct mptcp_sock *msk, struct sock *ssk);
+void mptcp_sockopt_sync_all(struct mptcp_sock *msk);
+
 static inline struct mptcp_ext *mptcp_get_ext(const struct sk_buff *skb)
 {
 	return (struct mptcp_ext *)skb_ext_find(skb, SKB_EXT_MPTCP);
diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index fb98fab252df..4fdc0ad6acf7 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -350,3 +350,22 @@ int mptcp_getsockopt(struct sock *sk, int level, int optname,
 	return -EOPNOTSUPP;
 }
 
+void mptcp_sockopt_sync(struct mptcp_sock *msk, struct sock *ssk)
+{
+	msk_owned_by_me(msk);
+}
+
+void mptcp_sockopt_sync_all(struct mptcp_sock *msk)
+{
+	struct mptcp_subflow_context *subflow;
+
+	msk_owned_by_me(msk);
+
+	mptcp_for_each_subflow(msk, subflow) {
+		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
+
+		mptcp_sockopt_sync(msk, ssk);
+
+		cond_resched();
+	}
+}
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index d79a99c2bfc4..fc7107f8d81b 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1315,6 +1315,7 @@ int __mptcp_subflow_connect(struct sock *sk, const struct mptcp_addr_info *loc,
 	mptcp_info2sockaddr(remote, &addr, ssk->sk_family);
 
 	mptcp_add_pending_subflow(msk, subflow);
+	mptcp_sockopt_sync(msk, ssk);
 	err = kernel_connect(sf, (struct sockaddr *)&addr, addrlen, O_NONBLOCK);
 	if (err && err != -EINPROGRESS)
 		goto failed_unlink;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH mptcp 02/10] mptcp: tag sequence_seq with socket state
  2021-04-08 18:49 [PATCH mptcp 00/10] mptcp: add SOL_SOCKET support Florian Westphal
  2021-04-08 18:49 ` [PATCH mptcp 01/10] mptcp: add skeleton to sync msk socket options to subflows Florian Westphal
@ 2021-04-08 18:49 ` Florian Westphal
  2021-04-10  0:12   ` Mat Martineau
  2021-04-13  9:58   ` Paolo Abeni
  2021-04-08 18:49 ` [PATCH mptcp 03/10] mptcp: setsockopt: handle SO_KEEPALIVE and SO_PRIORITY Florian Westphal
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 16+ messages in thread
From: Florian Westphal @ 2021-04-08 18:49 UTC (permalink / raw)
  To: mptcp; +Cc: Florian Westphal

Paolo Abeni suggested to avoid re-syncing new subflows because
they inherit options from listener. In case options were set on
listener but are not set on mptcp-socket there is no need to
do any synchronisation for new subflows.

This change sets sockopt_seq of new mptcp sockets to the seq of
the mptcp listener sock.

Subflow sequence is set to the embedded tcp listener sk.
Add a comment explaing why sk_state is involved in sockopt_seq
generation.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/mptcp/protocol.c | 16 ++++++++++-----
 net/mptcp/protocol.h |  4 ++++
 net/mptcp/sockopt.c  | 47 ++++++++++++++++++++++++++++++++++++++++++--
 net/mptcp/subflow.c  |  4 ++++
 4 files changed, 64 insertions(+), 7 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 2233baf03c3a..26e3ffe8840e 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -729,11 +729,14 @@ static bool mptcp_do_flush_join_list(struct mptcp_sock *msk)
 	if (likely(list_empty(&msk->join_list)))
 		return false;
 
-	ret = true;
 	spin_lock_bh(&msk->join_list_lock);
-	list_for_each_entry(subflow, &msk->join_list, node)
-		mptcp_propagate_sndbuf((struct sock *)msk, mptcp_subflow_tcp_sock(subflow));
+	list_for_each_entry(subflow, &msk->join_list, node) {
+		u32 sseq = READ_ONCE(subflow->setsockopt_seq);
 
+		mptcp_propagate_sndbuf((struct sock *)msk, mptcp_subflow_tcp_sock(subflow));
+		if (READ_ONCE(msk->setsockopt_seq) != sseq)
+			ret = true;
+	}
 	list_splice_tail_init(&msk->join_list, &msk->conn_list);
 	spin_unlock_bh(&msk->join_list_lock);
 
@@ -745,7 +748,8 @@ void __mptcp_flush_join_list(struct mptcp_sock *msk)
 	if (likely(!mptcp_do_flush_join_list(msk)))
 		return;
 
-	if (!test_and_set_bit(MPTCP_WORK_SYNC_SETSOCKOPT, &msk->flags))
+	if (msk->setsockopt_seq &&
+	    !test_and_set_bit(MPTCP_WORK_SYNC_SETSOCKOPT, &msk->flags))
 		mptcp_schedule_work((struct sock *)msk);
 }
 
@@ -758,7 +762,8 @@ static void mptcp_flush_join_list(struct mptcp_sock *msk)
 	if (!mptcp_do_flush_join_list(msk) && !sync_needed)
 		return;
 
-	mptcp_sockopt_sync_all(msk);
+	if (msk->setsockopt_seq)
+		mptcp_sockopt_sync_all(msk);
 }
 
 static bool mptcp_timer_pending(struct sock *sk)
@@ -2712,6 +2717,7 @@ struct sock *mptcp_sk_clone(const struct sock *sk,
 	msk->snd_nxt = msk->write_seq;
 	msk->snd_una = msk->write_seq;
 	msk->wnd_end = msk->snd_nxt + req->rsk_rcv_wnd;
+	msk->setsockopt_seq = mptcp_sk(sk)->setsockopt_seq;
 
 	if (mp_opt->mp_capable) {
 		msk->can_ack = true;
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 55fe05a40216..edc0128730df 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -256,6 +256,8 @@ struct mptcp_sock {
 		u64	time;	/* start time of measurement window */
 		u64	rtt_us; /* last maximum rtt of subflows */
 	} rcvq_space;
+
+	u32 setsockopt_seq;
 };
 
 #define mptcp_lock_sock(___sk, cb) do {					\
@@ -414,6 +416,8 @@ struct mptcp_subflow_context {
 	long	delegated_status;
 	struct	list_head delegated_node;   /* link into delegated_action, protected by local BH */
 
+	u32 setsockopt_seq;
+
 	struct	sock *tcp_sock;	    /* tcp sk backpointer */
 	struct	sock *conn;	    /* parent mptcp_sock */
 	const	struct inet_connection_sock_af_ops *icsk_af_ops;
diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index 4fdc0ad6acf7..b1950be15abe 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -24,6 +24,27 @@ static struct sock *__mptcp_tcp_fallback(struct mptcp_sock *msk)
 	return msk->first;
 }
 
+static u32 sockopt_seq_reset(const struct sock *sk)
+{
+	sock_owned_by_me(sk);
+
+	/* Highbits contain state.  Allows to distinguish sockopt_seq
+	 * of listener and established:
+	 * s0 = new_listener()
+	 * sockopt(s0) - seq is 1
+	 * s1 = accept(s0) - s1 inherits seq 1 if listener sk (s0)
+	 * sockopt(s0) - seq increments to 2 on s0
+	 * sockopt(s1) // seq increments to 2 on s1 (different option)
+	 * new ssk completes join, inherits options from s0 // seq 2
+	 * Needs sync from mptcp join logic, but ssk->seq == msk->seq
+	 *
+	 * Set High order bits to sk_state so ssk->seq == msk->seq test
+	 * will fail.
+	 */
+
+	return sk->sk_state << 24u;
+}
+
 static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
 				       sockptr_t optval, unsigned int optlen)
 {
@@ -350,22 +371,44 @@ int mptcp_getsockopt(struct sock *sk, int level, int optname,
 	return -EOPNOTSUPP;
 }
 
+static void __mptcp_sockopt_sync(struct mptcp_sock *msk, struct sock *ssk)
+{
+}
+
 void mptcp_sockopt_sync(struct mptcp_sock *msk, struct sock *ssk)
 {
+	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
+
 	msk_owned_by_me(msk);
+
+	if (READ_ONCE(subflow->setsockopt_seq) != msk->setsockopt_seq) {
+		__mptcp_sockopt_sync(msk, ssk);
+
+		subflow->setsockopt_seq = msk->setsockopt_seq;
+	}
 }
 
 void mptcp_sockopt_sync_all(struct mptcp_sock *msk)
 {
 	struct mptcp_subflow_context *subflow;
+	struct sock *sk = (struct sock *)msk;
+	u32 seq;
 
-	msk_owned_by_me(msk);
+	seq = sockopt_seq_reset(sk);
 
 	mptcp_for_each_subflow(msk, subflow) {
 		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
+		u32 sseq = READ_ONCE(subflow->setsockopt_seq);
 
-		mptcp_sockopt_sync(msk, ssk);
+		if (sseq != msk->setsockopt_seq) {
+			__mptcp_sockopt_sync(msk, ssk);
+			WRITE_ONCE(subflow->setsockopt_seq, seq);
+		} else if (sseq != seq) {
+			WRITE_ONCE(subflow->setsockopt_seq, seq);
+		}
 
 		cond_resched();
 	}
+
+	msk->setsockopt_seq = seq;
 }
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index fc7107f8d81b..82e91b00ad39 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -681,6 +681,9 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk,
 			goto out;
 		}
 
+		/* ssk inherits options of listener sk */
+		ctx->setsockopt_seq = listener->setsockopt_seq;
+
 		if (ctx->mp_capable) {
 			/* this can't race with mptcp_close(), as the msk is
 			 * not yet exposted to user-space
@@ -696,6 +699,7 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk,
 			 * created mptcp socket
 			 */
 			new_msk->sk_destruct = mptcp_sock_destruct;
+			mptcp_sk(new_msk)->setsockopt_seq = ctx->setsockopt_seq;
 			mptcp_pm_new_connection(mptcp_sk(new_msk), child, 1);
 			mptcp_token_accept(subflow_req, mptcp_sk(new_msk));
 			ctx->conn = new_msk;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH mptcp 03/10] mptcp: setsockopt: handle SO_KEEPALIVE and SO_PRIORITY
  2021-04-08 18:49 [PATCH mptcp 00/10] mptcp: add SOL_SOCKET support Florian Westphal
  2021-04-08 18:49 ` [PATCH mptcp 01/10] mptcp: add skeleton to sync msk socket options to subflows Florian Westphal
  2021-04-08 18:49 ` [PATCH mptcp 02/10] mptcp: tag sequence_seq with socket state Florian Westphal
@ 2021-04-08 18:49 ` Florian Westphal
  2021-04-13 10:10   ` Paolo Abeni
  2021-04-08 18:49 ` [PATCH mptcp 04/10] mptcp: setsockopt: handle receive/send buffer and device bind Florian Westphal
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 16+ messages in thread
From: Florian Westphal @ 2021-04-08 18:49 UTC (permalink / raw)
  To: mptcp; +Cc: Florian Westphal

start with something simple: both take an integer value, both
need to be mirrored to all subflows.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/mptcp/sockopt.c | 106 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 106 insertions(+)

diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index b1950be15abe..43da4696336f 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -45,6 +45,90 @@ static u32 sockopt_seq_reset(const struct sock *sk)
 	return sk->sk_state << 24u;
 }
 
+static void sockopt_seq_inc(struct mptcp_sock *msk)
+{
+	u32 seq = msk->setsockopt_seq & 0x00ffffff;
+
+	msk->setsockopt_seq = sockopt_seq_reset((struct sock *)msk) + seq + 1;
+}
+
+static int mptcp_get_int_option(struct mptcp_sock *msk, sockptr_t optval,
+				unsigned int optlen, int *val)
+{
+	if (optlen < sizeof(int))
+		return -EINVAL;
+
+	if (copy_from_sockptr(val, optval, sizeof(*val)))
+		return -EFAULT;
+
+	return 0;
+}
+
+static void mptcp_sol_socket_sync_intval(struct mptcp_sock *msk, int optname, int val)
+{
+	struct mptcp_subflow_context *subflow;
+	struct sock *sk = (struct sock *)msk;
+
+	lock_sock(sk);
+	sockopt_seq_inc(msk);
+
+	mptcp_for_each_subflow(msk, subflow) {
+		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
+		bool slow = lock_sock_fast(ssk);
+
+		switch (optname) {
+		case SO_KEEPALIVE:
+			if (ssk->sk_prot->keepalive)
+				ssk->sk_prot->keepalive(ssk, !!val);
+			sock_valbool_flag(ssk, SOCK_KEEPOPEN, !!val);
+			break;
+		case SO_PRIORITY:
+			ssk->sk_priority = val;
+			break;
+		}
+
+		subflow->setsockopt_seq = msk->setsockopt_seq;
+		unlock_sock_fast(ssk, slow);
+	}
+
+	release_sock(sk);
+}
+
+static int mptcp_sol_socket_intval(struct mptcp_sock *msk, int optname, int val)
+{
+	sockptr_t optval = KERNEL_SOCKPTR(&val);
+	struct sock *sk = (struct sock *)msk;
+	int ret;
+
+	ret = sock_setsockopt(sk->sk_socket, SOL_SOCKET, optname,
+			      optval, sizeof(val));
+	if (ret)
+		return ret;
+
+	mptcp_sol_socket_sync_intval(msk, optname, val);
+	return 0;
+}
+
+static int mptcp_setsockopt_sol_socket_int(struct mptcp_sock *msk, int optname,
+					   sockptr_t optval, unsigned int optlen)
+{
+	int val, ret;
+
+	ret = mptcp_get_int_option(msk, optval, optlen, &val);
+	if (ret)
+		return ret;
+
+	switch (optname) {
+	case SO_KEEPALIVE:
+		mptcp_sol_socket_sync_intval(msk, optname, val);
+		return 0;
+	case SO_PRIORITY:
+		return mptcp_sol_socket_intval(msk, optname, val);
+	}
+
+	return -ENOPROTOOPT;
+}
+
 static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
 				       sockptr_t optval, unsigned int optlen)
 {
@@ -71,6 +155,9 @@ static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
 		}
 		release_sock(sk);
 		return ret;
+	case SO_KEEPALIVE:
+	case SO_PRIORITY:
+		return mptcp_setsockopt_sol_socket_int(msk, optname, optval, optlen);
 	}
 
 	return sock_setsockopt(sk->sk_socket, SOL_SOCKET, optname, optval, optlen);
@@ -371,8 +458,27 @@ int mptcp_getsockopt(struct sock *sk, int level, int optname,
 	return -EOPNOTSUPP;
 }
 
+static void sync_socket_options(struct mptcp_sock *msk, struct sock *ssk)
+{
+	struct sock *sk = (struct sock *)msk;
+
+	if (ssk->sk_prot->keepalive) {
+		if (sock_flag(sk, SOCK_KEEPOPEN))
+			ssk->sk_prot->keepalive(ssk, 1);
+		else
+			ssk->sk_prot->keepalive(ssk, 0);
+	}
+
+	ssk->sk_priority = sk->sk_priority;
+}
+
 static void __mptcp_sockopt_sync(struct mptcp_sock *msk, struct sock *ssk)
 {
+	bool slow = lock_sock_fast(ssk);
+
+	sync_socket_options(msk, ssk);
+
+	unlock_sock_fast(ssk, slow);
 }
 
 void mptcp_sockopt_sync(struct mptcp_sock *msk, struct sock *ssk)
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH mptcp 04/10] mptcp: setsockopt: handle receive/send buffer and device bind
  2021-04-08 18:49 [PATCH mptcp 00/10] mptcp: add SOL_SOCKET support Florian Westphal
                   ` (2 preceding siblings ...)
  2021-04-08 18:49 ` [PATCH mptcp 03/10] mptcp: setsockopt: handle SO_KEEPALIVE and SO_PRIORITY Florian Westphal
@ 2021-04-08 18:49 ` Florian Westphal
  2021-04-08 18:49 ` [PATCH mptcp 05/10] mptcp: setsockopt: support SO_LINGER Florian Westphal
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Florian Westphal @ 2021-04-08 18:49 UTC (permalink / raw)
  To: mptcp; +Cc: Florian Westphal

Similar to previous patch: needs to be mirrored to all subflows.

Device bind is simpler: it is only done on the initial (listener) sk.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/mptcp/sockopt.c | 52 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index 43da4696336f..bd60c335d6aa 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -85,6 +85,16 @@ static void mptcp_sol_socket_sync_intval(struct mptcp_sock *msk, int optname, in
 		case SO_PRIORITY:
 			ssk->sk_priority = val;
 			break;
+		case SO_SNDBUF:
+		case SO_SNDBUFFORCE:
+			ssk->sk_userlocks |= SOCK_SNDBUF_LOCK;
+			WRITE_ONCE(ssk->sk_sndbuf, sk->sk_sndbuf);
+			break;
+		case SO_RCVBUF:
+		case SO_RCVBUFFORCE:
+			ssk->sk_userlocks |= SOCK_RCVBUF_LOCK;
+			WRITE_ONCE(ssk->sk_rcvbuf, sk->sk_rcvbuf);
+			break;
 		}
 
 		subflow->setsockopt_seq = msk->setsockopt_seq;
@@ -123,6 +133,10 @@ static int mptcp_setsockopt_sol_socket_int(struct mptcp_sock *msk, int optname,
 		mptcp_sol_socket_sync_intval(msk, optname, val);
 		return 0;
 	case SO_PRIORITY:
+	case SO_SNDBUF:
+	case SO_SNDBUFFORCE:
+	case SO_RCVBUF:
+	case SO_RCVBUFFORCE:
 		return mptcp_sol_socket_intval(msk, optname, val);
 	}
 
@@ -139,6 +153,8 @@ static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
 	switch (optname) {
 	case SO_REUSEPORT:
 	case SO_REUSEADDR:
+	case SO_BINDTODEVICE:
+	case SO_BINDTOIFINDEX:
 		lock_sock(sk);
 		ssock = __mptcp_nmpc_socket(msk);
 		if (!ssock) {
@@ -152,11 +168,19 @@ static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
 				sk->sk_reuseport = ssock->sk->sk_reuseport;
 			else if (optname == SO_REUSEADDR)
 				sk->sk_reuse = ssock->sk->sk_reuse;
+			else if (optname == SO_BINDTODEVICE)
+				sk->sk_bound_dev_if = ssock->sk->sk_bound_dev_if;
+			else if (optname == SO_BINDTOIFINDEX)
+				sk->sk_bound_dev_if = ssock->sk->sk_bound_dev_if;
 		}
 		release_sock(sk);
 		return ret;
 	case SO_KEEPALIVE:
 	case SO_PRIORITY:
+	case SO_SNDBUF:
+	case SO_SNDBUFFORCE:
+	case SO_RCVBUF:
+	case SO_RCVBUFFORCE:
 		return mptcp_setsockopt_sol_socket_int(msk, optname, optval, optlen);
 	}
 
@@ -460,6 +484,7 @@ int mptcp_getsockopt(struct sock *sk, int level, int optname,
 
 static void sync_socket_options(struct mptcp_sock *msk, struct sock *ssk)
 {
+	static const unsigned int tx_rx_locks = SOCK_RCVBUF_LOCK | SOCK_SNDBUF_LOCK;
 	struct sock *sk = (struct sock *)msk;
 
 	if (ssk->sk_prot->keepalive) {
@@ -470,6 +495,33 @@ static void sync_socket_options(struct mptcp_sock *msk, struct sock *ssk)
 	}
 
 	ssk->sk_priority = sk->sk_priority;
+	ssk->sk_bound_dev_if = sk->sk_bound_dev_if;
+	ssk->sk_incoming_cpu = sk->sk_incoming_cpu;
+
+	if (sk->sk_userlocks & tx_rx_locks) {
+		ssk->sk_userlocks |= sk->sk_userlocks & tx_rx_locks;
+		if (sk->sk_userlocks & SOCK_SNDBUF_LOCK)
+			WRITE_ONCE(ssk->sk_sndbuf, sk->sk_sndbuf);
+		if (sk->sk_userlocks & SOCK_RCVBUF_LOCK)
+			WRITE_ONCE(ssk->sk_rcvbuf, sk->sk_rcvbuf);
+	}
+
+	if (sock_flag(sk, SOCK_LINGER)) {
+		ssk->sk_lingertime = sk->sk_lingertime;
+		sock_set_flag(ssk, SOCK_LINGER);
+	} else {
+		sock_reset_flag(ssk, SOCK_LINGER);
+	}
+
+	if (sk->sk_mark != ssk->sk_mark) {
+		ssk->sk_mark = sk->sk_mark;
+		sk_dst_reset(ssk);
+	}
+
+	sock_valbool_flag(ssk, SOCK_DBG, sock_flag(sk, SOCK_DBG));
+
+	if (inet_csk(sk)->icsk_ca_ops != inet_csk(ssk)->icsk_ca_ops)
+		tcp_set_congestion_control(ssk, inet_csk(sk)->icsk_ca_ops->name, false, true);
 }
 
 static void __mptcp_sockopt_sync(struct mptcp_sock *msk, struct sock *ssk)
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH mptcp 05/10] mptcp: setsockopt: support SO_LINGER
  2021-04-08 18:49 [PATCH mptcp 00/10] mptcp: add SOL_SOCKET support Florian Westphal
                   ` (3 preceding siblings ...)
  2021-04-08 18:49 ` [PATCH mptcp 04/10] mptcp: setsockopt: handle receive/send buffer and device bind Florian Westphal
@ 2021-04-08 18:49 ` Florian Westphal
  2021-04-08 18:49 ` [PATCH mptcp 06/10] mptcp: setsockopt: add SO_MARK support Florian Westphal
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Florian Westphal @ 2021-04-08 18:49 UTC (permalink / raw)
  To: mptcp; +Cc: Florian Westphal

Similar to PRIORITY/KEEPALIVE: needs to be mirrored to all subflows.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/mptcp/sockopt.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index bd60c335d6aa..90a0282520c6 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -143,6 +143,47 @@ static int mptcp_setsockopt_sol_socket_int(struct mptcp_sock *msk, int optname,
 	return -ENOPROTOOPT;
 }
 
+static int mptcp_setsockopt_sol_socket_linger(struct mptcp_sock *msk, sockptr_t optval,
+					      unsigned int optlen)
+{
+	struct mptcp_subflow_context *subflow;
+	struct sock *sk = (struct sock *)msk;
+	struct linger ling;
+	sockptr_t kopt;
+	int ret;
+
+	if (optlen < sizeof(ling))
+		return -EINVAL;
+
+	if (copy_from_sockptr(&ling, optval, sizeof(ling)))
+		return -EFAULT;
+
+	kopt = KERNEL_SOCKPTR(&ling);
+	ret = sock_setsockopt(sk->sk_socket, SOL_SOCKET, SO_LINGER, kopt, sizeof(ling));
+	if (ret)
+		return ret;
+
+	lock_sock(sk);
+	sockopt_seq_inc(msk);
+	mptcp_for_each_subflow(msk, subflow) {
+		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
+		bool slow = lock_sock_fast(ssk);
+
+		if (!ling.l_onoff) {
+			sock_reset_flag(ssk, SOCK_LINGER);
+		} else {
+			ssk->sk_lingertime = sk->sk_lingertime;
+			sock_set_flag(ssk, SOCK_LINGER);
+		}
+
+		subflow->setsockopt_seq = msk->setsockopt_seq;
+		unlock_sock_fast(ssk, slow);
+	}
+
+	release_sock(sk);
+	return 0;
+}
+
 static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
 				       sockptr_t optval, unsigned int optlen)
 {
@@ -182,6 +223,8 @@ static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
 	case SO_RCVBUF:
 	case SO_RCVBUFFORCE:
 		return mptcp_setsockopt_sol_socket_int(msk, optname, optval, optlen);
+	case SO_LINGER:
+		return mptcp_setsockopt_sol_socket_linger(msk, optval, optlen);
 	}
 
 	return sock_setsockopt(sk->sk_socket, SOL_SOCKET, optname, optval, optlen);
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH mptcp 06/10] mptcp: setsockopt: add SO_MARK support
  2021-04-08 18:49 [PATCH mptcp 00/10] mptcp: add SOL_SOCKET support Florian Westphal
                   ` (4 preceding siblings ...)
  2021-04-08 18:49 ` [PATCH mptcp 05/10] mptcp: setsockopt: support SO_LINGER Florian Westphal
@ 2021-04-08 18:49 ` Florian Westphal
  2021-04-08 18:49 ` [PATCH mptcp 07/10] mptcp: setsockopt: add SO_INCOMING_CPU Florian Westphal
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Florian Westphal @ 2021-04-08 18:49 UTC (permalink / raw)
  To: mptcp; +Cc: Florian Westphal

Value is synced to all subflows.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/mptcp/sockopt.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index 90a0282520c6..effc7c1efbac 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -95,6 +95,12 @@ static void mptcp_sol_socket_sync_intval(struct mptcp_sock *msk, int optname, in
 			ssk->sk_userlocks |= SOCK_RCVBUF_LOCK;
 			WRITE_ONCE(ssk->sk_rcvbuf, sk->sk_rcvbuf);
 			break;
+		case SO_MARK:
+			if (READ_ONCE(ssk->sk_mark) != sk->sk_mark) {
+				ssk->sk_mark = sk->sk_mark;
+				sk_dst_reset(ssk);
+			}
+			break;
 		}
 
 		subflow->setsockopt_seq = msk->setsockopt_seq;
@@ -132,6 +138,7 @@ static int mptcp_setsockopt_sol_socket_int(struct mptcp_sock *msk, int optname,
 	case SO_KEEPALIVE:
 		mptcp_sol_socket_sync_intval(msk, optname, val);
 		return 0;
+	case SO_MARK:
 	case SO_PRIORITY:
 	case SO_SNDBUF:
 	case SO_SNDBUFFORCE:
@@ -222,6 +229,7 @@ static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
 	case SO_SNDBUFFORCE:
 	case SO_RCVBUF:
 	case SO_RCVBUFFORCE:
+	case SO_MARK:
 		return mptcp_setsockopt_sol_socket_int(msk, optname, optval, optlen);
 	case SO_LINGER:
 		return mptcp_setsockopt_sol_socket_linger(msk, optval, optlen);
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH mptcp 07/10] mptcp: setsockopt: add SO_INCOMING_CPU
  2021-04-08 18:49 [PATCH mptcp 00/10] mptcp: add SOL_SOCKET support Florian Westphal
                   ` (5 preceding siblings ...)
  2021-04-08 18:49 ` [PATCH mptcp 06/10] mptcp: setsockopt: add SO_MARK support Florian Westphal
@ 2021-04-08 18:49 ` Florian Westphal
  2021-04-08 18:49 ` [PATCH mptcp 08/10] mptcp: setsockopt: SO_DEBUG and no-op options Florian Westphal
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Florian Westphal @ 2021-04-08 18:49 UTC (permalink / raw)
  To: mptcp; +Cc: Florian Westphal

Replicate to all subflows.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/mptcp/sockopt.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index effc7c1efbac..6be8abbed6eb 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -101,6 +101,9 @@ static void mptcp_sol_socket_sync_intval(struct mptcp_sock *msk, int optname, in
 				sk_dst_reset(ssk);
 			}
 			break;
+		case SO_INCOMING_CPU:
+			WRITE_ONCE(ssk->sk_incoming_cpu, val);
+			break;
 		}
 
 		subflow->setsockopt_seq = msk->setsockopt_seq;
@@ -125,6 +128,15 @@ static int mptcp_sol_socket_intval(struct mptcp_sock *msk, int optname, int val)
 	return 0;
 }
 
+static void mptcp_so_incoming_cpu(struct mptcp_sock *msk, int val)
+{
+	struct sock *sk = (struct sock *)msk;
+
+	WRITE_ONCE(sk->sk_incoming_cpu, val);
+
+	mptcp_sol_socket_sync_intval(msk, SO_INCOMING_CPU, val);
+}
+
 static int mptcp_setsockopt_sol_socket_int(struct mptcp_sock *msk, int optname,
 					   sockptr_t optval, unsigned int optlen)
 {
@@ -145,6 +157,9 @@ static int mptcp_setsockopt_sol_socket_int(struct mptcp_sock *msk, int optname,
 	case SO_RCVBUF:
 	case SO_RCVBUFFORCE:
 		return mptcp_sol_socket_intval(msk, optname, val);
+	case SO_INCOMING_CPU:
+		mptcp_so_incoming_cpu(msk, val);
+		return 0;
 	}
 
 	return -ENOPROTOOPT;
@@ -230,6 +245,7 @@ static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
 	case SO_RCVBUF:
 	case SO_RCVBUFFORCE:
 	case SO_MARK:
+	case SO_INCOMING_CPU:
 		return mptcp_setsockopt_sol_socket_int(msk, optname, optval, optlen);
 	case SO_LINGER:
 		return mptcp_setsockopt_sol_socket_linger(msk, optval, optlen);
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH mptcp 08/10] mptcp: setsockopt: SO_DEBUG and no-op options
  2021-04-08 18:49 [PATCH mptcp 00/10] mptcp: add SOL_SOCKET support Florian Westphal
                   ` (6 preceding siblings ...)
  2021-04-08 18:49 ` [PATCH mptcp 07/10] mptcp: setsockopt: add SO_INCOMING_CPU Florian Westphal
@ 2021-04-08 18:49 ` Florian Westphal
  2021-04-08 18:49 ` [PATCH mptcp 09/10] mptcp: sockopt: add TCP_CONGESTION and TCP_INFO Florian Westphal
  2021-04-08 18:49 ` [PATCH mptcp 10/10] selftests: mptcp: add packet mark test case Florian Westphal
  9 siblings, 0 replies; 16+ messages in thread
From: Florian Westphal @ 2021-04-08 18:49 UTC (permalink / raw)
  To: mptcp; +Cc: Florian Westphal

Handle SO_DEBUG and set it on all subflows.
Ignore those values not implemented on TCP sockets.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/mptcp/sockopt.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index 6be8abbed6eb..e7a710d1d1de 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -77,6 +77,9 @@ static void mptcp_sol_socket_sync_intval(struct mptcp_sock *msk, int optname, in
 		bool slow = lock_sock_fast(ssk);
 
 		switch (optname) {
+		case SO_DEBUG:
+			sock_valbool_flag(ssk, SOCK_DBG, !!val);
+			break;
 		case SO_KEEPALIVE:
 			if (ssk->sk_prot->keepalive)
 				ssk->sk_prot->keepalive(ssk, !!val);
@@ -150,6 +153,7 @@ static int mptcp_setsockopt_sol_socket_int(struct mptcp_sock *msk, int optname,
 	case SO_KEEPALIVE:
 		mptcp_sol_socket_sync_intval(msk, optname, val);
 		return 0;
+	case SO_DEBUG:
 	case SO_MARK:
 	case SO_PRIORITY:
 	case SO_SNDBUF:
@@ -246,9 +250,21 @@ static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
 	case SO_RCVBUFFORCE:
 	case SO_MARK:
 	case SO_INCOMING_CPU:
+	case SO_DEBUG:
 		return mptcp_setsockopt_sol_socket_int(msk, optname, optval, optlen);
 	case SO_LINGER:
 		return mptcp_setsockopt_sol_socket_linger(msk, optval, optlen);
+	case SO_NO_CHECK:
+	case SO_DONTROUTE:
+	case SO_BROADCAST:
+	case SO_BSDCOMPAT:
+	case SO_PASSCRED:
+	case SO_PASSSEC:
+	case SO_RXQ_OVFL:
+	case SO_WIFI_STATUS:
+	case SO_NOFCS:
+	case SO_SELECT_ERR_QUEUE:
+		return 0;
 	}
 
 	return sock_setsockopt(sk->sk_socket, SOL_SOCKET, optname, optval, optlen);
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH mptcp 09/10] mptcp: sockopt: add TCP_CONGESTION and TCP_INFO
  2021-04-08 18:49 [PATCH mptcp 00/10] mptcp: add SOL_SOCKET support Florian Westphal
                   ` (7 preceding siblings ...)
  2021-04-08 18:49 ` [PATCH mptcp 08/10] mptcp: setsockopt: SO_DEBUG and no-op options Florian Westphal
@ 2021-04-08 18:49 ` Florian Westphal
  2021-04-08 18:49 ` [PATCH mptcp 10/10] selftests: mptcp: add packet mark test case Florian Westphal
  9 siblings, 0 replies; 16+ messages in thread
From: Florian Westphal @ 2021-04-08 18:49 UTC (permalink / raw)
  To: mptcp; +Cc: Florian Westphal

TCP_CONGESTION is set for all subflows.
The mptcp socket gains icsk_ca_ops too so it can be used to keep the
authoritative state that should be set on new/future subflows.

TCP_INFO will return first subflow only.
The out-of-tree kernel has a MPTCP_INFO getsockopt, this could be added
later on.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/mptcp/protocol.c |   4 ++
 net/mptcp/sockopt.c  | 101 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 105 insertions(+)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 26e3ffe8840e..302419e2d92c 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2390,6 +2390,8 @@ static int __mptcp_init_sock(struct sock *sk)
 	timer_setup(&msk->sk.icsk_retransmit_timer, mptcp_retransmit_timer, 0);
 	timer_setup(&sk->sk_timer, mptcp_timeout_timer, 0);
 
+	tcp_assign_congestion_control(sk);
+
 #if IS_ENABLED(CONFIG_KASAN)
 	sock_set_flag(sk, SOCK_RCU_FREE);
 #endif
@@ -2586,6 +2588,8 @@ static void __mptcp_destroy_sock(struct sock *sk)
 	WARN_ON_ONCE(msk->rmem_released);
 	sk_stream_kill_queues(sk);
 	xfrm_sk_free_policy(sk);
+
+	tcp_cleanup_congestion_control(sk);
 	sk_refcnt_debug_release(sk);
 	mptcp_dispose_initial_subflow(msk);
 	sock_put(sk);
diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index e7a710d1d1de..ab9eb728e649 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -510,6 +510,62 @@ static bool mptcp_supported_sockopt(int level, int optname)
 	return false;
 }
 
+static int mptcp_setsockopt_sol_tcp_congestion(struct mptcp_sock *msk, sockptr_t optval,
+					       unsigned int optlen)
+{
+	struct mptcp_subflow_context *subflow;
+	struct sock *sk = (struct sock *)msk;
+	char name[TCP_CA_NAME_MAX];
+	bool cap_net_admin;
+	int ret;
+
+	if (optlen < 1)
+		return -EINVAL;
+
+	ret = strncpy_from_sockptr(name, optval,
+				   min_t(long, TCP_CA_NAME_MAX - 1, optlen));
+	if (ret < 0)
+		return -EFAULT;
+
+	name[ret] = 0;
+
+	cap_net_admin = ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN);
+
+	ret = 0;
+	lock_sock(sk);
+	sockopt_seq_inc(msk);
+	mptcp_for_each_subflow(msk, subflow) {
+		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
+		int err;
+
+		lock_sock(ssk);
+		err = tcp_set_congestion_control(ssk, name, true, cap_net_admin);
+		if (err < 0 && ret == 0)
+			ret = err;
+		subflow->setsockopt_seq = msk->setsockopt_seq;
+		release_sock(ssk);
+	}
+
+	if (ret == 0)
+		tcp_set_congestion_control(sk, name, false, cap_net_admin);
+
+	release_sock(sk);
+	return ret;
+}
+
+static int mptcp_setsockopt_sol_tcp(struct mptcp_sock *msk, int optname,
+				    sockptr_t optval, unsigned int optlen)
+{
+	switch (optname) {
+	case TCP_ULP:
+		return -EOPNOTSUPP;
+	case TCP_CONGESTION:
+		return mptcp_setsockopt_sol_tcp_congestion(msk, optval, optlen);
+	}
+
+	return -EOPNOTSUPP;
+}
+
 int mptcp_setsockopt(struct sock *sk, int level, int optname,
 		     sockptr_t optval, unsigned int optlen)
 {
@@ -539,6 +595,49 @@ int mptcp_setsockopt(struct sock *sk, int level, int optname,
 	if (level == SOL_IPV6)
 		return mptcp_setsockopt_v6(msk, optname, optval, optlen);
 
+	if (level == SOL_TCP)
+		return mptcp_setsockopt_sol_tcp(msk, optname, optval, optlen);
+
+	return -EOPNOTSUPP;
+}
+
+static int mptcp_getsockopt_first_sf_only(struct mptcp_sock *msk, int level, int optname,
+					  char __user *optval, int __user *optlen)
+{
+	struct sock *sk = (struct sock *)msk;
+	struct socket *ssock;
+	int ret = -EINVAL;
+	struct sock *ssk;
+
+	lock_sock(sk);
+	ssk = msk->first;
+	if (ssk) {
+		ret = tcp_getsockopt(ssk, level, optname, optval, optlen);
+		goto out;
+	}
+
+	ssock = __mptcp_nmpc_socket(msk);
+	if (!ssock)
+		goto out;
+
+	ret = tcp_getsockopt(ssock->sk, level, optname, optval, optlen);
+
+out:
+	release_sock(sk);
+	return ret;
+}
+
+static int mptcp_getsockopt_sol_tcp(struct mptcp_sock *msk, int optname,
+				    char __user *optval, int __user *optlen)
+{
+	switch (optname) {
+	case TCP_ULP:
+	case TCP_CONGESTION:
+	case TCP_INFO:
+	case TCP_CC_INFO:
+		return mptcp_getsockopt_first_sf_only(msk, SOL_TCP, optname,
+						      optval, optlen);
+	}
 	return -EOPNOTSUPP;
 }
 
@@ -562,6 +661,8 @@ int mptcp_getsockopt(struct sock *sk, int level, int optname,
 	if (ssk)
 		return tcp_getsockopt(ssk, level, optname, optval, option);
 
+	if (level == SOL_TCP)
+		return mptcp_getsockopt_sol_tcp(msk, optname, optval, option);
 	return -EOPNOTSUPP;
 }
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH mptcp 10/10] selftests: mptcp: add packet mark test case
  2021-04-08 18:49 [PATCH mptcp 00/10] mptcp: add SOL_SOCKET support Florian Westphal
                   ` (8 preceding siblings ...)
  2021-04-08 18:49 ` [PATCH mptcp 09/10] mptcp: sockopt: add TCP_CONGESTION and TCP_INFO Florian Westphal
@ 2021-04-08 18:49 ` Florian Westphal
  9 siblings, 0 replies; 16+ messages in thread
From: Florian Westphal @ 2021-04-08 18:49 UTC (permalink / raw)
  To: mptcp; +Cc: Florian Westphal

Extend mptcp_connect tool with SO_MARK support (-M <value>) and
add a test case that checks that the packet mark gets copied to all
subflows.

This is done by only allowing packets with either skb->mark 1 or 2
via iptables.

DROP rule packet counter is checked; if its not zero, print an error
message and fail the test case.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 tools/testing/selftests/net/mptcp/Makefile    |   2 +-
 .../selftests/net/mptcp/mptcp_connect.c       |  23 +-
 .../selftests/net/mptcp/mptcp_sockopt.sh      | 276 ++++++++++++++++++
 3 files changed, 299 insertions(+), 2 deletions(-)
 create mode 100755 tools/testing/selftests/net/mptcp/mptcp_sockopt.sh

diff --git a/tools/testing/selftests/net/mptcp/Makefile b/tools/testing/selftests/net/mptcp/Makefile
index 00bb158b4a5d..f1464f09b080 100644
--- a/tools/testing/selftests/net/mptcp/Makefile
+++ b/tools/testing/selftests/net/mptcp/Makefile
@@ -6,7 +6,7 @@ KSFT_KHDR_INSTALL := 1
 CFLAGS =  -Wall -Wl,--no-as-needed -O2 -g  -I$(top_srcdir)/usr/include
 
 TEST_PROGS := mptcp_connect.sh pm_netlink.sh mptcp_join.sh diag.sh \
-	      simult_flows.sh
+	      simult_flows.sh mptcp_sockopt.sh
 
 TEST_GEN_FILES = mptcp_connect pm_nl_ctl
 
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.c b/tools/testing/selftests/net/mptcp/mptcp_connect.c
index 69d89b5d666f..2f207cf33661 100644
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.c
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c
@@ -57,6 +57,7 @@ static bool cfg_join;
 static bool cfg_remove;
 static unsigned int cfg_do_w;
 static int cfg_wait;
+static uint32_t cfg_mark;
 
 static void die_usage(void)
 {
@@ -69,6 +70,7 @@ static void die_usage(void)
 	fprintf(stderr, "\t-p num -- use port num\n");
 	fprintf(stderr, "\t-s [MPTCP|TCP] -- use mptcp(default) or tcp sockets\n");
 	fprintf(stderr, "\t-m [poll|mmap|sendfile] -- use poll(default)/mmap+write/sendfile\n");
+	fprintf(stderr, "\t-M mark -- set socket packet mark\n");
 	fprintf(stderr, "\t-u -- check mptcp ulp\n");
 	fprintf(stderr, "\t-w num -- wait num sec before closing the socket\n");
 	exit(1);
@@ -140,6 +142,17 @@ static void set_sndbuf(int fd, unsigned int size)
 	}
 }
 
+static void set_mark(int fd, uint32_t mark)
+{
+	int err;
+
+	err = setsockopt(fd, SOL_SOCKET, SO_MARK, &mark, sizeof(mark));
+	if (err) {
+		perror("set SO_MARK");
+		exit(1);
+	}
+}
+
 static int sock_listen_mptcp(const char * const listenaddr,
 			     const char * const port)
 {
@@ -248,6 +261,9 @@ static int sock_connect_mptcp(const char * const remoteaddr,
 			continue;
 		}
 
+		if (cfg_mark)
+			set_mark(sock, cfg_mark);
+
 		if (connect(sock, a->ai_addr, a->ai_addrlen) == 0)
 			break; /* success */
 
@@ -830,7 +846,7 @@ static void parse_opts(int argc, char **argv)
 {
 	int c;
 
-	while ((c = getopt(argc, argv, "6jr:lp:s:hut:m:S:R:w:")) != -1) {
+	while ((c = getopt(argc, argv, "6jr:lp:s:hut:m:S:R:w:M:")) != -1) {
 		switch (c) {
 		case 'j':
 			cfg_join = true;
@@ -880,6 +896,9 @@ static void parse_opts(int argc, char **argv)
 		case 'w':
 			cfg_wait = atoi(optarg)*1000000;
 			break;
+		case 'M':
+			cfg_mark = strtol(optarg, NULL, 0);
+			break;
 		}
 	}
 
@@ -911,6 +930,8 @@ int main(int argc, char *argv[])
 			set_rcvbuf(fd, cfg_rcvbuf);
 		if (cfg_sndbuf)
 			set_sndbuf(fd, cfg_sndbuf);
+		if (cfg_mark)
+			set_mark(fd, cfg_mark);
 
 		return main_loop_s(fd);
 	}
diff --git a/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh b/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh
new file mode 100755
index 000000000000..2fa13946ac04
--- /dev/null
+++ b/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh
@@ -0,0 +1,276 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+ret=0
+sin=""
+sout=""
+cin=""
+cout=""
+ksft_skip=4
+timeout_poll=30
+timeout_test=$((timeout_poll * 2 + 1))
+mptcp_connect=""
+do_all_tests=1
+
+add_mark_rules()
+{
+	local ns=$1
+	local m=$2
+
+	for t in iptables ip6tables; do
+		# just to debug: check we have multiple subflows connection requests
+		ip netns exec $ns $t -A OUTPUT -p tcp --syn -m mark --mark $m -j ACCEPT
+
+		# RST packets might be handled by a internal dummy socket
+		ip netns exec $ns $t -A OUTPUT -p tcp --tcp-flags RST RST -m mark --mark 0 -j ACCEPT
+
+		ip netns exec $ns $t -A OUTPUT -p tcp -m mark --mark $m -j ACCEPT
+		ip netns exec $ns $t -A OUTPUT -p tcp -m mark --mark 0 -j DROP
+	done
+}
+
+init()
+{
+	rndh=$(printf %x $sec)-$(mktemp -u XXXXXX)
+
+	ns1="ns1-$rndh"
+	ns2="ns2-$rndh"
+
+	for netns in "$ns1" "$ns2";do
+		ip netns add $netns || exit $ksft_skip
+		ip -net $netns link set lo up
+		ip netns exec $netns sysctl -q net.mptcp.enabled=1
+		ip netns exec $netns sysctl -q net.ipv4.conf.all.rp_filter=0
+		ip netns exec $netns sysctl -q net.ipv4.conf.default.rp_filter=0
+	done
+
+	for i in `seq 1 4`; do
+		ip link add ns1eth$i netns "$ns1" type veth peer name ns2eth$i netns "$ns2"
+		ip -net "$ns1" addr add 10.0.$i.1/24 dev ns1eth$i
+		ip -net "$ns1" addr add dead:beef:$i::1/64 dev ns1eth$i nodad
+		ip -net "$ns1" link set ns1eth$i up
+
+		ip -net "$ns2" addr add 10.0.$i.2/24 dev ns2eth$i
+		ip -net "$ns2" addr add dead:beef:$i::2/64 dev ns2eth$i nodad
+		ip -net "$ns2" link set ns2eth$i up
+
+		# let $ns2 reach any $ns1 address from any interface
+		ip -net "$ns2" route add default via 10.0.$i.1 dev ns2eth$i metric 10$i
+
+		ip netns exec $ns1 ./pm_nl_ctl add 10.0.$i.1 flags signal
+		ip netns exec $ns1 ./pm_nl_ctl add dead:beef:$i::1 flags signal
+
+		ip netns exec $ns2 ./pm_nl_ctl add 10.0.$i.2 flags signal
+		ip netns exec $ns2 ./pm_nl_ctl add dead:beef:$i::2 flags signal
+	done
+
+	ip netns exec $ns1 ./pm_nl_ctl limits 8 8
+	ip netns exec $ns2 ./pm_nl_ctl limits 8 8
+
+	add_mark_rules $ns1 1
+	add_mark_rules $ns2 2
+}
+
+cleanup()
+{
+	for netns in "$ns1" "$ns2"; do
+		ip netns del $netns
+	done
+	rm -f "$cin" "$cout"
+	rm -f "$sin" "$sout"
+}
+
+ip -Version > /dev/null 2>&1
+if [ $? -ne 0 ];then
+	echo "SKIP: Could not run test without ip tool"
+	exit $ksft_skip
+fi
+
+iptables -V > /dev/null 2>&1
+if [ $? -ne 0 ];then
+	echo "SKIP: Could not run all tests without iptables tool"
+	exit $ksft_skip
+fi
+
+ip6tables -V > /dev/null 2>&1
+if [ $? -ne 0 ];then
+	echo "SKIP: Could not run all tests without ip6tables tool"
+	exit $ksft_skip
+fi
+
+check_mark()
+{
+	local ns=$1
+	local af=$2
+
+	tables=iptables
+
+	if [ $af -eq 6 ];then
+		tables=ip6tables
+	fi
+
+	counters=$(ip netns exec $ns $tables -v -L OUTPUT | grep DROP)
+	values=${counters%DROP*}
+
+	for v in $values; do
+		if [ $v -ne 0 ]; then
+			echo "FAIL: got $tables $values in ns $ns , not 0 - not all expected packets marked" 1>&2
+			return 1
+		fi
+	done
+
+	return 0
+}
+
+print_file_err()
+{
+	ls -l "$1" 1>&2
+	echo "Trailing bytes are: "
+	tail -c 27 "$1"
+}
+
+check_transfer()
+{
+	in=$1
+	out=$2
+	what=$3
+
+	cmp "$in" "$out" > /dev/null 2>&1
+	if [ $? -ne 0 ] ;then
+		echo "[ FAIL ] $what does not match (in, out):"
+		print_file_err "$in"
+		print_file_err "$out"
+		ret=1
+
+		return 1
+	fi
+
+	return 0
+}
+
+# $1: IP address
+is_v6()
+{
+	[ -z "${1##*:*}" ]
+}
+
+do_transfer()
+{
+	listener_ns="$1"
+	connector_ns="$2"
+	cl_proto="$3"
+	srv_proto="$4"
+	connect_addr="$5"
+
+	port=12001
+
+	:> "$cout"
+	:> "$sout"
+
+	mptcp_connect="./mptcp_connect -r 20"
+
+	local local_addr
+	if is_v6 "${connect_addr}"; then
+		local_addr="::"
+	else
+		local_addr="0.0.0.0"
+	fi
+
+	timeout ${timeout_test} \
+		ip netns exec ${listener_ns} \
+			$mptcp_connect -t ${timeout_poll} -l -M 1 -p $port -s ${srv_proto} \
+				${local_addr} < "$sin" > "$sout" &
+	spid=$!
+
+	sleep 1
+
+	timeout ${timeout_test} \
+		ip netns exec ${connector_ns} \
+			$mptcp_connect -t ${timeout_poll} -M 2 -p $port -s ${cl_proto} \
+				$connect_addr < "$cin" > "$cout" &
+
+	cpid=$!
+
+	wait $cpid
+	retc=$?
+	wait $spid
+	rets=$?
+
+	if [ ${rets} -ne 0 ] || [ ${retc} -ne 0 ]; then
+		echo " client exit code $retc, server $rets" 1>&2
+		echo -e "\nnetns ${listener_ns} socket stat for ${port}:" 1>&2
+		ip netns exec ${listener_ns} ss -Menita 1>&2 -o "sport = :$port"
+
+		echo -e "\nnetns ${connector_ns} socket stat for ${port}:" 1>&2
+		ip netns exec ${connector_ns} ss -Menita 1>&2 -o "dport = :$port"
+
+		ret=1
+		return 1
+	fi
+
+	if [ $local_addr = "::" ];then
+		check_mark $listener_ns 6
+		check_mark $connector_ns 6
+	else
+		check_mark $listener_ns 4
+		check_mark $connector_ns 4
+	fi
+
+	check_transfer $cin $sout "file received by server"
+
+	rets=$?
+
+	if [ $retc -eq 0 ] && [ $rets -eq 0 ];then
+		return 0
+	fi
+
+	return 1
+}
+
+make_file()
+{
+	name=$1
+	who=$2
+	size=$3
+
+	dd if=/dev/urandom of="$name" bs=1024 count=$size 2> /dev/null
+	echo -e "\nMPTCP_TEST_FILE_END_MARKER" >> "$name"
+
+	echo "Created $name (size $size KB) containing data sent by $who"
+}
+
+run_tests()
+{
+	listener_ns="$1"
+	connector_ns="$2"
+	connect_addr="$3"
+	lret=0
+
+	do_transfer ${listener_ns} ${connector_ns} MPTCP MPTCP ${connect_addr}
+
+	lret=$?
+
+	if [ $lret -ne 0 ]; then
+		ret=$lret
+		return
+	fi
+}
+
+sin=$(mktemp)
+sout=$(mktemp)
+cin=$(mktemp)
+cout=$(mktemp)
+init
+make_file "$cin" "client" 1
+make_file "$sin" "server" 1
+trap cleanup EXIT
+
+run_tests $ns1 $ns2 10.0.1.1
+run_tests $ns1 $ns2 dead:beef:1::1
+
+
+if [ $ret -eq 0 ];then
+	echo "PASS: all packets had packet mark set"
+fi
+
+exit $ret
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH mptcp 02/10] mptcp: tag sequence_seq with socket state
  2021-04-08 18:49 ` [PATCH mptcp 02/10] mptcp: tag sequence_seq with socket state Florian Westphal
@ 2021-04-10  0:12   ` Mat Martineau
  2021-04-12  8:05     ` Florian Westphal
  2021-04-13  9:58   ` Paolo Abeni
  1 sibling, 1 reply; 16+ messages in thread
From: Mat Martineau @ 2021-04-10  0:12 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Paolo Abeni, mptcp

On Thu, 8 Apr 2021, Florian Westphal wrote:

> Paolo Abeni suggested to avoid re-syncing new subflows because
> they inherit options from listener. In case options were set on
> listener but are not set on mptcp-socket there is no need to
> do any synchronisation for new subflows.
>
> This change sets sockopt_seq of new mptcp sockets to the seq of
> the mptcp listener sock.
>
> Subflow sequence is set to the embedded tcp listener sk.
> Add a comment explaing why sk_state is involved in sockopt_seq
> generation.
>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
> net/mptcp/protocol.c | 16 ++++++++++-----
> net/mptcp/protocol.h |  4 ++++
> net/mptcp/sockopt.c  | 47 ++++++++++++++++++++++++++++++++++++++++++--
> net/mptcp/subflow.c  |  4 ++++
> 4 files changed, 64 insertions(+), 7 deletions(-)
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 2233baf03c3a..26e3ffe8840e 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -729,11 +729,14 @@ static bool mptcp_do_flush_join_list(struct mptcp_sock *msk)
> 	if (likely(list_empty(&msk->join_list)))
> 		return false;
>
> -	ret = true;
> 	spin_lock_bh(&msk->join_list_lock);
> -	list_for_each_entry(subflow, &msk->join_list, node)
> -		mptcp_propagate_sndbuf((struct sock *)msk, mptcp_subflow_tcp_sock(subflow));
> +	list_for_each_entry(subflow, &msk->join_list, node) {
> +		u32 sseq = READ_ONCE(subflow->setsockopt_seq);
>
> +		mptcp_propagate_sndbuf((struct sock *)msk, mptcp_subflow_tcp_sock(subflow));
> +		if (READ_ONCE(msk->setsockopt_seq) != sseq)
> +			ret = true;
> +	}
> 	list_splice_tail_init(&msk->join_list, &msk->conn_list);
> 	spin_unlock_bh(&msk->join_list_lock);
>
> @@ -745,7 +748,8 @@ void __mptcp_flush_join_list(struct mptcp_sock *msk)
> 	if (likely(!mptcp_do_flush_join_list(msk)))
> 		return;
>
> -	if (!test_and_set_bit(MPTCP_WORK_SYNC_SETSOCKOPT, &msk->flags))
> +	if (msk->setsockopt_seq &&
> +	    !test_and_set_bit(MPTCP_WORK_SYNC_SETSOCKOPT, &msk->flags))
> 		mptcp_schedule_work((struct sock *)msk);
> }
>
> @@ -758,7 +762,8 @@ static void mptcp_flush_join_list(struct mptcp_sock *msk)
> 	if (!mptcp_do_flush_join_list(msk) && !sync_needed)
> 		return;
>
> -	mptcp_sockopt_sync_all(msk);
> +	if (msk->setsockopt_seq)
> +		mptcp_sockopt_sync_all(msk);
> }
>
> static bool mptcp_timer_pending(struct sock *sk)
> @@ -2712,6 +2717,7 @@ struct sock *mptcp_sk_clone(const struct sock *sk,
> 	msk->snd_nxt = msk->write_seq;
> 	msk->snd_una = msk->write_seq;
> 	msk->wnd_end = msk->snd_nxt + req->rsk_rcv_wnd;
> +	msk->setsockopt_seq = mptcp_sk(sk)->setsockopt_seq;
>
> 	if (mp_opt->mp_capable) {
> 		msk->can_ack = true;
> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> index 55fe05a40216..edc0128730df 100644
> --- a/net/mptcp/protocol.h
> +++ b/net/mptcp/protocol.h
> @@ -256,6 +256,8 @@ struct mptcp_sock {
> 		u64	time;	/* start time of measurement window */
> 		u64	rtt_us; /* last maximum rtt of subflows */
> 	} rcvq_space;
> +
> +	u32 setsockopt_seq;
> };
>
> #define mptcp_lock_sock(___sk, cb) do {					\
> @@ -414,6 +416,8 @@ struct mptcp_subflow_context {
> 	long	delegated_status;
> 	struct	list_head delegated_node;   /* link into delegated_action, protected by local BH */
>
> +	u32 setsockopt_seq;
> +
> 	struct	sock *tcp_sock;	    /* tcp sk backpointer */
> 	struct	sock *conn;	    /* parent mptcp_sock */
> 	const	struct inet_connection_sock_af_ops *icsk_af_ops;
> diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
> index 4fdc0ad6acf7..b1950be15abe 100644
> --- a/net/mptcp/sockopt.c
> +++ b/net/mptcp/sockopt.c
> @@ -24,6 +24,27 @@ static struct sock *__mptcp_tcp_fallback(struct mptcp_sock *msk)
> 	return msk->first;
> }
>
> +static u32 sockopt_seq_reset(const struct sock *sk)
> +{
> +	sock_owned_by_me(sk);
> +
> +	/* Highbits contain state.  Allows to distinguish sockopt_seq
> +	 * of listener and established:
> +	 * s0 = new_listener()
> +	 * sockopt(s0) - seq is 1
> +	 * s1 = accept(s0) - s1 inherits seq 1 if listener sk (s0)
> +	 * sockopt(s0) - seq increments to 2 on s0
> +	 * sockopt(s1) // seq increments to 2 on s1 (different option)
> +	 * new ssk completes join, inherits options from s0 // seq 2
> +	 * Needs sync from mptcp join logic, but ssk->seq == msk->seq
> +	 *
> +	 * Set High order bits to sk_state so ssk->seq == msk->seq test
> +	 * will fail.
> +	 */
> +
> +	return sk->sk_state << 24u;

I think this works because sk->sk_state gets automatically promoted to an 
int, but maybe "(u32)sk->sk_state << 24" is the intent?

> +}
> +
> static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
> 				       sockptr_t optval, unsigned int optlen)
> {
> @@ -350,22 +371,44 @@ int mptcp_getsockopt(struct sock *sk, int level, int optname,
> 	return -EOPNOTSUPP;
> }
>
> +static void __mptcp_sockopt_sync(struct mptcp_sock *msk, struct sock *ssk)
> +{
> +}
> +
> void mptcp_sockopt_sync(struct mptcp_sock *msk, struct sock *ssk)
> {
> +	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
> +
> 	msk_owned_by_me(msk);
> +
> +	if (READ_ONCE(subflow->setsockopt_seq) != msk->setsockopt_seq) {
> +		__mptcp_sockopt_sync(msk, ssk);
> +
> +		subflow->setsockopt_seq = msk->setsockopt_seq;
> +	}
> }
>
> void mptcp_sockopt_sync_all(struct mptcp_sock *msk)
> {
> 	struct mptcp_subflow_context *subflow;
> +	struct sock *sk = (struct sock *)msk;
> +	u32 seq;
>
> -	msk_owned_by_me(msk);
> +	seq = sockopt_seq_reset(sk);
>
> 	mptcp_for_each_subflow(msk, subflow) {
> 		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
> +		u32 sseq = READ_ONCE(subflow->setsockopt_seq);
>
> -		mptcp_sockopt_sync(msk, ssk);
> +		if (sseq != msk->setsockopt_seq) {
> +			__mptcp_sockopt_sync(msk, ssk);
> +			WRITE_ONCE(subflow->setsockopt_seq, seq);
> +		} else if (sseq != seq) {
> +			WRITE_ONCE(subflow->setsockopt_seq, seq);
> +		}
>
> 		cond_resched();
> 	}
> +
> +	msk->setsockopt_seq = seq;

One of my comments on the first RFC of this series was a a suggestion to 
avoid setsockopt_seq overflow by resetting it in mptcp_sockopt_sync_all() 
- which this code does!

What I didn't realize at that time was that mptcp_sockopt_sync_all() might 
not be called for a very long time (or at all?) if there are no joins to 
trigger it. To really protect against overflow, mptcp_setsockopt() can 
call (or schedule) mptcp_sockopt_sync_all() if the non-sk_state bits of 
msk->setsockopt_seq exceed a threshold.

> }
> diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> index fc7107f8d81b..82e91b00ad39 100644
> --- a/net/mptcp/subflow.c
> +++ b/net/mptcp/subflow.c
> @@ -681,6 +681,9 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk,
> 			goto out;
> 		}
>
> +		/* ssk inherits options of listener sk */
> +		ctx->setsockopt_seq = listener->setsockopt_seq;
> +
> 		if (ctx->mp_capable) {
> 			/* this can't race with mptcp_close(), as the msk is
> 			 * not yet exposted to user-space
> @@ -696,6 +699,7 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk,
> 			 * created mptcp socket
> 			 */
> 			new_msk->sk_destruct = mptcp_sock_destruct;
> +			mptcp_sk(new_msk)->setsockopt_seq = ctx->setsockopt_seq;
> 			mptcp_pm_new_connection(mptcp_sk(new_msk), child, 1);
> 			mptcp_token_accept(subflow_req, mptcp_sk(new_msk));
> 			ctx->conn = new_msk;
> -- 
> 2.26.3
>
>
>

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH mptcp 02/10] mptcp: tag sequence_seq with socket state
  2021-04-10  0:12   ` Mat Martineau
@ 2021-04-12  8:05     ` Florian Westphal
  0 siblings, 0 replies; 16+ messages in thread
From: Florian Westphal @ 2021-04-12  8:05 UTC (permalink / raw)
  To: Mat Martineau; +Cc: Florian Westphal, Paolo Abeni, mptcp

Mat Martineau <mathew.j.martineau@linux.intel.com> wrote:
> On Thu, 8 Apr 2021, Florian Westphal wrote:
> 
> > +	return sk->sk_state << 24u;
> 
> I think this works because sk->sk_state gets automatically promoted to an
> int, but maybe "(u32)sk->sk_state << 24" is the intent?

Yes, I can add an explicit cast.

> What I didn't realize at that time was that mptcp_sockopt_sync_all() might
> not be called for a very long time (or at all?) if there are no joins to
> trigger it. To really protect against overflow, mptcp_setsockopt() can call
> (or schedule) mptcp_sockopt_sync_all() if the non-sk_state bits of
> msk->setsockopt_seq exceed a threshold.

Next patch adds sockopt_seq_inc() which is supposed to guarantee that it
never wraps to 0.  I can change it so it will wrap in the 24bit sequence
space only, so it would cycle back to sk->sk_state << 24u.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH mptcp 02/10] mptcp: tag sequence_seq with socket state
  2021-04-08 18:49 ` [PATCH mptcp 02/10] mptcp: tag sequence_seq with socket state Florian Westphal
  2021-04-10  0:12   ` Mat Martineau
@ 2021-04-13  9:58   ` Paolo Abeni
  2021-04-13 14:58     ` Florian Westphal
  1 sibling, 1 reply; 16+ messages in thread
From: Paolo Abeni @ 2021-04-13  9:58 UTC (permalink / raw)
  To: Florian Westphal, mptcp

On Thu, 2021-04-08 at 20:49 +0200, Florian Westphal wrote:
> Paolo Abeni suggested to avoid re-syncing new subflows because
> they inherit options from listener. In case options were set on
> listener but are not set on mptcp-socket there is no need to
> do any synchronisation for new subflows.
> 
> This change sets sockopt_seq of new mptcp sockets to the seq of
> the mptcp listener sock.
> 
> Subflow sequence is set to the embedded tcp listener sk.
> Add a comment explaing why sk_state is involved in sockopt_seq
> generation.
> 
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  net/mptcp/protocol.c | 16 ++++++++++-----
>  net/mptcp/protocol.h |  4 ++++
>  net/mptcp/sockopt.c  | 47 ++++++++++++++++++++++++++++++++++++++++++--
>  net/mptcp/subflow.c  |  4 ++++
>  4 files changed, 64 insertions(+), 7 deletions(-)
> 
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 2233baf03c3a..26e3ffe8840e 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -729,11 +729,14 @@ static bool mptcp_do_flush_join_list(struct mptcp_sock *msk)
>  	if (likely(list_empty(&msk->join_list)))
>  		return false;
>  
> -	ret = true;
>  	spin_lock_bh(&msk->join_list_lock);
> -	list_for_each_entry(subflow, &msk->join_list, node)
> -		mptcp_propagate_sndbuf((struct sock *)msk, mptcp_subflow_tcp_sock(subflow));
> +	list_for_each_entry(subflow, &msk->join_list, node) {
> +		u32 sseq = READ_ONCE(subflow->setsockopt_seq);
>  
> +		mptcp_propagate_sndbuf((struct sock *)msk, mptcp_subflow_tcp_sock(subflow));
> +		if (READ_ONCE(msk->setsockopt_seq) != sseq)
> +			ret = true;

I'm a bit lost WRT 'ret' scope. I don't see where it's initialized.
Does the compiler emit a warning here?

Perhaps moving the 'ret' variable definition to this patch will makes
the things clearer?!?

> +	}
>  	list_splice_tail_init(&msk->join_list, &msk->conn_list);
>  	spin_unlock_bh(&msk->join_list_lock);
>  
> @@ -745,7 +748,8 @@ void __mptcp_flush_join_list(struct mptcp_sock *msk)
>  	if (likely(!mptcp_do_flush_join_list(msk)))
>  		return;
>  
> -	if (!test_and_set_bit(MPTCP_WORK_SYNC_SETSOCKOPT, &msk->flags))
> +	if (msk->setsockopt_seq &&
> +	    !test_and_set_bit(MPTCP_WORK_SYNC_SETSOCKOPT, &msk->flags))
>  		mptcp_schedule_work((struct sock *)msk);

It's not obvious to me why the 'msk->setsockopt_seq' check is needed
here (and in the below chunk) ???

Perhaps a comment would help?

Cheers,

Paolo



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH mptcp 03/10] mptcp: setsockopt: handle SO_KEEPALIVE and SO_PRIORITY
  2021-04-08 18:49 ` [PATCH mptcp 03/10] mptcp: setsockopt: handle SO_KEEPALIVE and SO_PRIORITY Florian Westphal
@ 2021-04-13 10:10   ` Paolo Abeni
  0 siblings, 0 replies; 16+ messages in thread
From: Paolo Abeni @ 2021-04-13 10:10 UTC (permalink / raw)
  To: Florian Westphal, mptcp

On Thu, 2021-04-08 at 20:49 +0200, Florian Westphal wrote:
> start with something simple: both take an integer value, both
> need to be mirrored to all subflows.
> 
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  net/mptcp/sockopt.c | 106 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 106 insertions(+)
> 
> diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
> index b1950be15abe..43da4696336f 100644
> --- a/net/mptcp/sockopt.c
> +++ b/net/mptcp/sockopt.c
> @@ -45,6 +45,90 @@ static u32 sockopt_seq_reset(const struct sock *sk)
>  	return sk->sk_state << 24u;
>  }
>  
> +static void sockopt_seq_inc(struct mptcp_sock *msk)
> +{
> +	u32 seq = msk->setsockopt_seq & 0x00ffffff;
> +
> +	msk->setsockopt_seq = sockopt_seq_reset((struct sock *)msk) + seq + 1;

if setsockopt_seq == 0x00ffffff, than this will change the 'state'
related part of setsockopt_seq???

Should the above instead be:

	u32 seq = (msk->setsockopt_seq + 1) & 0x00ffffff;
	
	msk->setsockopt_seq = sockopt_seq_reset((struct sock *)msk) + seq;

Note: I *think* that a wrap-around is not a problem. Since the sequence
contains the 'state' tag, a wrap around could possibly cause a wrong
synchronization only if it occours in-between the subflow creation and
the next join_list - a very short period of time -> more than
reasonably impossible. In all other cases there should be mismatch of
the 'state' part.

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH mptcp 02/10] mptcp: tag sequence_seq with socket state
  2021-04-13  9:58   ` Paolo Abeni
@ 2021-04-13 14:58     ` Florian Westphal
  0 siblings, 0 replies; 16+ messages in thread
From: Florian Westphal @ 2021-04-13 14:58 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: Florian Westphal, mptcp

Paolo Abeni <pabeni@redhat.com> wrote:
> > +	list_for_each_entry(subflow, &msk->join_list, node) {
> > +		u32 sseq = READ_ONCE(subflow->setsockopt_seq);
> >  
> > +		mptcp_propagate_sndbuf((struct sock *)msk, mptcp_subflow_tcp_sock(subflow));
> > +		if (READ_ONCE(msk->setsockopt_seq) != sseq)
> > +			ret = true;
> 
> I'm a bit lost WRT 'ret' scope. I don't see where it's initialized.
> Does the compiler emit a warning here?

Did not see one, but it looks broken.

> Perhaps moving the 'ret' variable definition to this patch will makes
> the things clearer?!?

Done.

> > -	if (!test_and_set_bit(MPTCP_WORK_SYNC_SETSOCKOPT, &msk->flags))
> > +	if (msk->setsockopt_seq &&
> > +	    !test_and_set_bit(MPTCP_WORK_SYNC_SETSOCKOPT, &msk->flags))
> >  		mptcp_schedule_work((struct sock *)msk);
> 
> It's not obvious to me why the 'msk->setsockopt_seq' check is needed
> here (and in the below chunk) ???
> 
> Perhaps a comment would help?

Perhaps:
+static bool setsockopt_was_called(const struct mptcp_sock *msk)
+{
+       return msk->setsockopt_seq > 0;
+}
+

if (setsockopt_was_called(msk) && ..

?

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-04-13 14:58 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-08 18:49 [PATCH mptcp 00/10] mptcp: add SOL_SOCKET support Florian Westphal
2021-04-08 18:49 ` [PATCH mptcp 01/10] mptcp: add skeleton to sync msk socket options to subflows Florian Westphal
2021-04-08 18:49 ` [PATCH mptcp 02/10] mptcp: tag sequence_seq with socket state Florian Westphal
2021-04-10  0:12   ` Mat Martineau
2021-04-12  8:05     ` Florian Westphal
2021-04-13  9:58   ` Paolo Abeni
2021-04-13 14:58     ` Florian Westphal
2021-04-08 18:49 ` [PATCH mptcp 03/10] mptcp: setsockopt: handle SO_KEEPALIVE and SO_PRIORITY Florian Westphal
2021-04-13 10:10   ` Paolo Abeni
2021-04-08 18:49 ` [PATCH mptcp 04/10] mptcp: setsockopt: handle receive/send buffer and device bind Florian Westphal
2021-04-08 18:49 ` [PATCH mptcp 05/10] mptcp: setsockopt: support SO_LINGER Florian Westphal
2021-04-08 18:49 ` [PATCH mptcp 06/10] mptcp: setsockopt: add SO_MARK support Florian Westphal
2021-04-08 18:49 ` [PATCH mptcp 07/10] mptcp: setsockopt: add SO_INCOMING_CPU Florian Westphal
2021-04-08 18:49 ` [PATCH mptcp 08/10] mptcp: setsockopt: SO_DEBUG and no-op options Florian Westphal
2021-04-08 18:49 ` [PATCH mptcp 09/10] mptcp: sockopt: add TCP_CONGESTION and TCP_INFO Florian Westphal
2021-04-08 18:49 ` [PATCH mptcp 10/10] selftests: mptcp: add packet mark test case Florian Westphal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.