All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH mptcp-next v2 0/8] initial SOL_SOCKET support
@ 2021-03-24 13:15 Florian Westphal
  2021-03-24 13:15 ` [RFC PATCH mptcp-next v2 1/8] mptcp: add skeleton to sync msk socket options to subflows Florian Westphal
                   ` (7 more replies)
  0 siblings, 8 replies; 23+ messages in thread
From: Florian Westphal @ 2021-03-24 13:15 UTC (permalink / raw)
  To: mptcp; +Cc: mptcp, Florian Westphal

This patch set improves support for several SOL_SOCKET tuneables,
addressing comments received for v1 patch set.

First patch adds skeleton synchronization functions to copy mptcp socket
settings to a subflow socket.

Notable changes:
In patch 1, work queue is only used when caller holds a spinlock that
prevents calls to sleepable functions such as lock_sock().

TCP_CONGESTION is now applied to all subflows.
TCP_INFO retrieves info from the first subflow.
It seems better to add MPTCP_INFO from out-of-tree patch set
in a future change.

I've not changed SO_LINGER either but I think doing a FASTCLOSE
is the right thing to do when linger time is 0.

Sending this now so there is a bit of review time before thursdays
meeting.

Florian Westphal (8):
  mptcp: add skeleton to sync msk socket options to subflows
  mptcp: setsockopt: handle SO_KEEPALIVE and SO_PRIORITY
  mptcp: setsockopt: handle receive/send buffer and device bind
  mptcp: setsockopt: support SO_LINGER
  mptcp: setsockopt: add SO_MARK support
  mptcp: setsockopt: add SO_INCOMING_CPU
  mptcp: setsockopt: SO_DEBUG and no-op options
  mptcp: sockopt: add TCP_CONGESTION and TCP_INFO

 net/mptcp/protocol.c |  54 +++++--
 net/mptcp/protocol.h |  11 ++
 net/mptcp/sockopt.c  | 362 +++++++++++++++++++++++++++++++++++++++++++
 net/mptcp/subflow.c  |   1 +
 4 files changed, 419 insertions(+), 9 deletions(-)

-- 
2.26.3


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC PATCH mptcp-next v2 1/8] mptcp: add skeleton to sync msk socket options to subflows
  2021-03-24 13:15 [RFC PATCH mptcp-next v2 0/8] initial SOL_SOCKET support Florian Westphal
@ 2021-03-24 13:15 ` Florian Westphal
  2021-03-24 17:38   ` [MPTCP] " Paolo Abeni
  2021-03-24 13:15 ` [RFC PATCH mptcp-next v2 2/8] mptcp: setsockopt: handle SO_KEEPALIVE and SO_PRIORITY Florian Westphal
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Florian Westphal @ 2021-03-24 13:15 UTC (permalink / raw)
  To: mptcp; +Cc: mptcp, Florian Westphal

Handle following cases:
1. setsockopt is called with multiple subflows.
   Change might have to be mirrored to all of them.
   This is done directly in process context/setsockopt call.
2. Outgoing subflow is created after one or several setsockopt()
   calls have been made.  Old setsockopt changes should be
   synced to the new socket.
3. Incoming subflow, after setsockopt call(s).

Cases 2 and 3 are handled right after the join list is spliced to the
conn list.

Because splicing is sometimes done from non-preemptible context, the
sync action can be deferred to the work queue.

Add sequence numbers to subflow context and mptcp socket so
synchronization functions know which subflow is already updated
and which are not.

seqno is re-set to 1 in mptcp_sockopt_sync_all(), at this point
the list of subflows is up to date.

A setsockopt sequence count of 0 means that no setsockopt call
was made so no synchronization is needed.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/mptcp/protocol.c | 50 ++++++++++++++++++++++++++++++++++++--------
 net/mptcp/protocol.h | 11 ++++++++++
 net/mptcp/sockopt.c  | 30 ++++++++++++++++++++++++++
 net/mptcp/subflow.c  |  1 +
 4 files changed, 83 insertions(+), 9 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 9d7e7e13fba8..15beb99f559d 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -730,18 +730,49 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk)
 		sk->sk_data_ready(sk);
 }
 
-void __mptcp_flush_join_list(struct mptcp_sock *msk)
+static bool mptcp_do_flush_join_list(struct mptcp_sock *msk)
 {
 	struct mptcp_subflow_context *subflow;
+	bool ret;
 
 	if (likely(list_empty(&msk->join_list)))
-		return;
+		return false;
 
 	spin_lock_bh(&msk->join_list_lock);
-	list_for_each_entry(subflow, &msk->join_list, node)
+	list_for_each_entry(subflow, &msk->join_list, node) {
+		u32 sseq = READ_ONCE(subflow->setsockopt_seq);
+
 		mptcp_propagate_sndbuf((struct sock *)msk, mptcp_subflow_tcp_sock(subflow));
+		if (READ_ONCE(msk->setsockopt_seq) != sseq)
+			ret = true;
+	}
 	list_splice_tail_init(&msk->join_list, &msk->conn_list);
 	spin_unlock_bh(&msk->join_list_lock);
+
+	return ret;
+}
+
+void __mptcp_flush_join_list(struct mptcp_sock *msk)
+{
+	if (likely(!mptcp_do_flush_join_list(msk)))
+		return;
+
+	if (msk->setsockopt_seq &&
+	    !test_and_set_bit(MPTCP_WORK_SYNC_SETSOCKOPT, &msk->flags))
+		mptcp_schedule_work((struct sock *)msk);
+}
+
+static void mptcp_flush_join_list(struct mptcp_sock *msk)
+{
+	bool sync_needed = test_and_clear_bit(MPTCP_WORK_SYNC_SETSOCKOPT, &msk->flags);
+
+	might_sleep();
+
+	if (!mptcp_do_flush_join_list(msk) && !sync_needed)
+		return;
+
+	if (msk->setsockopt_seq)
+		mptcp_sockopt_sync_all(msk);
 }
 
 static bool mptcp_timer_pending(struct sock *sk)
@@ -1457,7 +1488,7 @@ static void __mptcp_push_pending(struct sock *sk, unsigned int flags)
 			int ret = 0;
 
 			prev_ssk = ssk;
-			__mptcp_flush_join_list(msk);
+			mptcp_flush_join_list(msk);
 			ssk = mptcp_subflow_get_send(msk);
 
 			/* try to keep the subflow socket lock across
@@ -1883,7 +1914,7 @@ static bool __mptcp_move_skbs(struct mptcp_sock *msk)
 	unsigned int moved = 0;
 	bool ret, done;
 
-	__mptcp_flush_join_list(msk);
+	mptcp_flush_join_list(msk);
 	do {
 		struct sock *ssk = mptcp_subflow_recv_lookup(msk);
 		bool slowpath;
@@ -2307,7 +2338,7 @@ static void mptcp_worker(struct work_struct *work)
 		goto unlock;
 
 	mptcp_check_data_fin_ack(sk);
-	__mptcp_flush_join_list(msk);
+	mptcp_flush_join_list(msk);
 
 	mptcp_check_fastclose(msk);
 
@@ -2511,7 +2542,7 @@ static void __mptcp_check_send_data_fin(struct sock *sk)
 		}
 	}
 
-	__mptcp_flush_join_list(msk);
+	mptcp_flush_join_list(msk);
 	mptcp_for_each_subflow(msk, subflow) {
 		struct sock *tcp_sk = mptcp_subflow_tcp_sock(subflow);
 
@@ -2648,7 +2679,8 @@ static int mptcp_disconnect(struct sock *sk, int flags)
 	struct mptcp_subflow_context *subflow;
 	struct mptcp_sock *msk = mptcp_sk(sk);
 
-	__mptcp_flush_join_list(msk);
+	mptcp_do_flush_join_list(msk);
+
 	mptcp_for_each_subflow(msk, subflow) {
 		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
 
@@ -3216,7 +3248,7 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
 		/* set ssk->sk_socket of accept()ed flows to mptcp socket.
 		 * This is needed so NOSPACE flag can be set from tcp stack.
 		 */
-		__mptcp_flush_join_list(msk);
+		mptcp_flush_join_list(msk);
 		mptcp_for_each_subflow(msk, subflow) {
 			struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
 
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 14f0114be17a..df269c26f145 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -108,6 +108,7 @@
 #define MPTCP_CLEAN_UNA		7
 #define MPTCP_ERROR_REPORT	8
 #define MPTCP_RETRANSMIT	9
+#define MPTCP_WORK_SYNC_SETSOCKOPT 10
 
 static inline bool before64(__u64 seq1, __u64 seq2)
 {
@@ -255,6 +256,8 @@ struct mptcp_sock {
 		u64	time;	/* start time of measurement window */
 		u64	rtt_us; /* last maximum rtt of subflows */
 	} rcvq_space;
+
+	u32 setsockopt_seq;
 };
 
 #define mptcp_lock_sock(___sk, cb) do {					\
@@ -413,6 +416,8 @@ struct mptcp_subflow_context {
 	long	delegated_status;
 	struct	list_head delegated_node;   /* link into delegated_action, protected by local BH */
 
+	u32 setsockopt_seq;
+
 	struct	sock *tcp_sock;	    /* tcp sk backpointer */
 	struct	sock *conn;	    /* parent mptcp_sock */
 	const	struct inet_connection_sock_af_ops *icsk_af_ops;
@@ -735,6 +740,12 @@ unsigned int mptcp_pm_get_add_addr_accept_max(struct mptcp_sock *msk);
 unsigned int mptcp_pm_get_subflows_max(struct mptcp_sock *msk);
 unsigned int mptcp_pm_get_local_addr_max(struct mptcp_sock *msk);
 
+int mptcp_setsockopt(struct sock *sk, int level, int optname,
+		     sockptr_t optval, unsigned int optlen);
+
+void mptcp_sockopt_sync(struct mptcp_sock *msk, struct sock *ssk);
+void mptcp_sockopt_sync_all(struct mptcp_sock *msk);
+
 static inline struct mptcp_ext *mptcp_get_ext(const struct sk_buff *skb)
 {
 	return (struct mptcp_ext *)skb_ext_find(skb, SKB_EXT_MPTCP);
diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index fb98fab252df..7f0cb012e3a2 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -350,3 +350,33 @@ int mptcp_getsockopt(struct sock *sk, int level, int optname,
 	return -EOPNOTSUPP;
 }
 
+void mptcp_sockopt_sync(struct mptcp_sock *msk, struct sock *ssk)
+{
+	struct mptcp_subflow_context *subflow;
+
+	msk_owned_by_me(msk);
+
+	subflow = mptcp_subflow_ctx(ssk);
+	if (subflow->setsockopt_seq == msk->setsockopt_seq)
+		return;
+
+	subflow->setsockopt_seq = msk->setsockopt_seq;
+}
+
+void mptcp_sockopt_sync_all(struct mptcp_sock *msk)
+{
+	struct mptcp_subflow_context *subflow;
+
+	msk_owned_by_me(msk);
+
+	mptcp_for_each_subflow(msk, subflow) {
+		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
+
+		mptcp_sockopt_sync(msk, ssk);
+		subflow->setsockopt_seq = 1;
+
+		cond_resched();
+	}
+
+	msk->setsockopt_seq = 1;
+}
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 5fc3cada11dd..1dea59f3ff70 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1312,6 +1312,7 @@ int __mptcp_subflow_connect(struct sock *sk, const struct mptcp_addr_info *loc,
 	mptcp_info2sockaddr(remote, &addr, ssk->sk_family);
 
 	mptcp_add_pending_subflow(msk, subflow);
+	mptcp_sockopt_sync(msk, ssk);
 	err = kernel_connect(sf, (struct sockaddr *)&addr, addrlen, O_NONBLOCK);
 	if (err && err != -EINPROGRESS)
 		goto failed_unlink;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH mptcp-next v2 2/8] mptcp: setsockopt: handle SO_KEEPALIVE and SO_PRIORITY
  2021-03-24 13:15 [RFC PATCH mptcp-next v2 0/8] initial SOL_SOCKET support Florian Westphal
  2021-03-24 13:15 ` [RFC PATCH mptcp-next v2 1/8] mptcp: add skeleton to sync msk socket options to subflows Florian Westphal
@ 2021-03-24 13:15 ` Florian Westphal
  2021-03-24 13:15 ` [RFC PATCH mptcp-next v2 3/8] mptcp: setsockopt: handle receive/send buffer and device bind Florian Westphal
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Florian Westphal @ 2021-03-24 13:15 UTC (permalink / raw)
  To: mptcp; +Cc: mptcp, Florian Westphal

start with something simple: both take an integer value, both
need to be mirrored to all subflows.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/mptcp/sockopt.c | 99 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 99 insertions(+)

diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index 7f0cb012e3a2..dd9e921018f8 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -24,6 +24,82 @@ static struct sock *__mptcp_tcp_fallback(struct mptcp_sock *msk)
 	return msk->first;
 }
 
+static int mptcp_get_int_option(struct mptcp_sock *msk, sockptr_t optval, unsigned int optlen, int *val)
+{
+	if (optlen < sizeof(int))
+		return -EINVAL;
+
+	if (copy_from_sockptr(val, optval, sizeof(*val)))
+		return -EFAULT;
+
+	return 0;
+}
+
+static void mptcp_sol_socket_sync_intval(struct mptcp_sock *msk, int optname, int val)
+{
+	struct mptcp_subflow_context *subflow;
+	struct sock *sk = (struct sock *)msk;
+
+	lock_sock(sk);
+	msk->setsockopt_seq++;
+
+	mptcp_for_each_subflow(msk, subflow) {
+		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
+		bool slow = lock_sock_fast(ssk);
+
+		switch (optname) {
+		case SO_KEEPALIVE:
+			if (ssk->sk_prot->keepalive)
+				ssk->sk_prot->keepalive(ssk, !!val);
+			sock_valbool_flag(ssk, SOCK_KEEPOPEN, !!val);
+			break;
+		case SO_PRIORITY:
+			ssk->sk_priority = val;
+			break;
+		}
+
+		subflow->setsockopt_seq = msk->setsockopt_seq;
+		unlock_sock_fast(ssk, slow);
+	}
+
+	release_sock(sk);
+}
+
+static int mptcp_sol_socket_intval(struct mptcp_sock *msk, int optname, int val)
+{
+	sockptr_t optval = KERNEL_SOCKPTR(&val);
+	struct sock *sk = (struct sock *)msk;
+	int ret;
+
+	ret = sock_setsockopt(sk->sk_socket, SOL_SOCKET, optname,
+			      optval, sizeof(val));
+	if (ret)
+		return ret;
+
+	mptcp_sol_socket_sync_intval(msk, optname, val);
+	return 0;
+}
+
+static int mptcp_setsockopt_sol_socket_int(struct mptcp_sock *msk, int optname,
+					   sockptr_t optval, unsigned int optlen)
+{
+	int val, ret;
+
+	ret = mptcp_get_int_option(msk, optval, optlen, &val);
+	if (ret)
+		return ret;
+
+	switch (optname) {
+	case SO_KEEPALIVE:
+		mptcp_sol_socket_sync_intval(msk, optname, val);
+		return 0;
+	case SO_PRIORITY:
+		return mptcp_sol_socket_intval(msk, optname, val);
+	}
+
+	return -ENOPROTOOPT;
+}
+
 static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
 				       sockptr_t optval, unsigned int optlen)
 {
@@ -50,6 +126,9 @@ static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
 		}
 		release_sock(sk);
 		return ret;
+	case SO_KEEPALIVE:
+	case SO_PRIORITY:
+		return mptcp_setsockopt_sol_socket_int(msk, optname, optval, optlen);
 	}
 
 	return sock_setsockopt(sk->sk_socket, SOL_SOCKET, optname, optval, optlen);
@@ -350,9 +429,24 @@ int mptcp_getsockopt(struct sock *sk, int level, int optname,
 	return -EOPNOTSUPP;
 }
 
+static void sync_socket_options(struct mptcp_sock *msk, struct sock *ssk)
+{
+	struct sock *sk = (struct sock *)msk;
+
+	if (ssk->sk_prot->keepalive) {
+		if (sock_flag(sk, SOCK_KEEPOPEN))
+			ssk->sk_prot->keepalive(ssk, 1);
+		else
+			ssk->sk_prot->keepalive(ssk, 0);
+	}
+
+	ssk->sk_priority = sk->sk_priority;
+}
+
 void mptcp_sockopt_sync(struct mptcp_sock *msk, struct sock *ssk)
 {
 	struct mptcp_subflow_context *subflow;
+	bool slow;
 
 	msk_owned_by_me(msk);
 
@@ -360,7 +454,12 @@ void mptcp_sockopt_sync(struct mptcp_sock *msk, struct sock *ssk)
 	if (subflow->setsockopt_seq == msk->setsockopt_seq)
 		return;
 
+	slow = lock_sock_fast(ssk);
+
+	sync_socket_options(msk, ssk);
 	subflow->setsockopt_seq = msk->setsockopt_seq;
+
+	unlock_sock_fast(ssk, slow);
 }
 
 void mptcp_sockopt_sync_all(struct mptcp_sock *msk)
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH mptcp-next v2 3/8] mptcp: setsockopt: handle receive/send buffer and device bind
  2021-03-24 13:15 [RFC PATCH mptcp-next v2 0/8] initial SOL_SOCKET support Florian Westphal
  2021-03-24 13:15 ` [RFC PATCH mptcp-next v2 1/8] mptcp: add skeleton to sync msk socket options to subflows Florian Westphal
  2021-03-24 13:15 ` [RFC PATCH mptcp-next v2 2/8] mptcp: setsockopt: handle SO_KEEPALIVE and SO_PRIORITY Florian Westphal
@ 2021-03-24 13:15 ` Florian Westphal
  2021-03-24 16:34   ` [MPTCP] " Paolo Abeni
  2021-03-24 13:15 ` [RFC PATCH mptcp-next v2 4/8] mptcp: setsockopt: support SO_LINGER Florian Westphal
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Florian Westphal @ 2021-03-24 13:15 UTC (permalink / raw)
  To: mptcp; +Cc: mptcp, Florian Westphal

Similar to previous patch: needs to be mirrored to all subflows.

Device bind is simpler: it is only done on the initial (listener) sk.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/mptcp/sockopt.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index dd9e921018f8..f0a88753dfdf 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -56,6 +56,16 @@ static void mptcp_sol_socket_sync_intval(struct mptcp_sock *msk, int optname, in
 		case SO_PRIORITY:
 			ssk->sk_priority = val;
 			break;
+		case SO_SNDBUF:
+		case SO_SNDBUFFORCE:
+			ssk->sk_userlocks |= SOCK_SNDBUF_LOCK;
+			WRITE_ONCE(ssk->sk_sndbuf, sk->sk_sndbuf);
+			break;
+		case SO_RCVBUF:
+		case SO_RCVBUFFORCE:
+			ssk->sk_userlocks |= SOCK_RCVBUF_LOCK;
+			WRITE_ONCE(ssk->sk_rcvbuf, sk->sk_rcvbuf);
+			break;
 		}
 
 		subflow->setsockopt_seq = msk->setsockopt_seq;
@@ -94,6 +104,10 @@ static int mptcp_setsockopt_sol_socket_int(struct mptcp_sock *msk, int optname,
 		mptcp_sol_socket_sync_intval(msk, optname, val);
 		return 0;
 	case SO_PRIORITY:
+	case SO_SNDBUF:
+	case SO_SNDBUFFORCE:
+	case SO_RCVBUF:
+	case SO_RCVBUFFORCE:
 		return mptcp_sol_socket_intval(msk, optname, val);
 	}
 
@@ -110,6 +124,8 @@ static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
 	switch (optname) {
 	case SO_REUSEPORT:
 	case SO_REUSEADDR:
+	case SO_BINDTODEVICE:
+	case SO_BINDTOIFINDEX:
 		lock_sock(sk);
 		ssock = __mptcp_nmpc_socket(msk);
 		if (!ssock) {
@@ -123,11 +139,19 @@ static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
 				sk->sk_reuseport = ssock->sk->sk_reuseport;
 			else if (optname == SO_REUSEADDR)
 				sk->sk_reuse = ssock->sk->sk_reuse;
+			else if (optname == SO_BINDTODEVICE)
+				sk->sk_bound_dev_if = ssock->sk->sk_bound_dev_if;
+			else if (optname == SO_BINDTOIFINDEX)
+				sk->sk_bound_dev_if = ssock->sk->sk_bound_dev_if;
 		}
 		release_sock(sk);
 		return ret;
 	case SO_KEEPALIVE:
 	case SO_PRIORITY:
+	case SO_SNDBUF:
+	case SO_SNDBUFFORCE:
+	case SO_RCVBUF:
+	case SO_RCVBUFFORCE:
 		return mptcp_setsockopt_sol_socket_int(msk, optname, optval, optlen);
 	}
 
@@ -441,6 +465,12 @@ static void sync_socket_options(struct mptcp_sock *msk, struct sock *ssk)
 	}
 
 	ssk->sk_priority = sk->sk_priority;
+	ssk->sk_bound_dev_if = sk->sk_bound_dev_if;
+
+	if (sk->sk_userlocks & SOCK_SNDBUF_LOCK)
+		WRITE_ONCE(ssk->sk_sndbuf, sk->sk_sndbuf);
+	if (sk->sk_userlocks & SOCK_RCVBUF_LOCK)
+		WRITE_ONCE(ssk->sk_rcvbuf, sk->sk_rcvbuf);
 }
 
 void mptcp_sockopt_sync(struct mptcp_sock *msk, struct sock *ssk)
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH mptcp-next v2 4/8] mptcp: setsockopt: support SO_LINGER
  2021-03-24 13:15 [RFC PATCH mptcp-next v2 0/8] initial SOL_SOCKET support Florian Westphal
                   ` (2 preceding siblings ...)
  2021-03-24 13:15 ` [RFC PATCH mptcp-next v2 3/8] mptcp: setsockopt: handle receive/send buffer and device bind Florian Westphal
@ 2021-03-24 13:15 ` Florian Westphal
  2021-03-24 13:15 ` [RFC PATCH mptcp-next v2 5/8] mptcp: setsockopt: add SO_MARK support Florian Westphal
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Florian Westphal @ 2021-03-24 13:15 UTC (permalink / raw)
  To: mptcp; +Cc: mptcp, Florian Westphal

Similar to PRIORITY/KEEPALIVE: needs to be mirrored to all subflows.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/mptcp/sockopt.c | 50 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index f0a88753dfdf..f33a9ee12544 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -114,6 +114,47 @@ static int mptcp_setsockopt_sol_socket_int(struct mptcp_sock *msk, int optname,
 	return -ENOPROTOOPT;
 }
 
+static int mptcp_setsockopt_sol_socket_linger(struct mptcp_sock *msk, sockptr_t optval,
+					      unsigned int optlen)
+{
+	struct mptcp_subflow_context *subflow;
+	struct sock *sk = (struct sock *)msk;
+	struct linger ling;
+	sockptr_t kopt;
+	int ret;
+
+	if (optlen < sizeof(ling))
+		return -EINVAL;
+
+	if (copy_from_sockptr(&ling, optval, sizeof(ling)))
+		return -EFAULT;
+
+	kopt = KERNEL_SOCKPTR(&ling);
+	ret = sock_setsockopt(sk->sk_socket, SOL_SOCKET, SO_LINGER, kopt, sizeof(ling));
+	if (ret)
+		return ret;
+
+	lock_sock(sk);
+	msk->setsockopt_seq++;
+	mptcp_for_each_subflow(msk, subflow) {
+		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
+		bool slow = lock_sock_fast(ssk);
+
+		if (!ling.l_onoff)
+			sock_reset_flag(ssk, SOCK_LINGER);
+		else {
+			ssk->sk_lingertime = sk->sk_lingertime;
+			sock_set_flag(ssk, SOCK_LINGER);
+		}
+
+		subflow->setsockopt_seq = msk->setsockopt_seq;
+		unlock_sock_fast(ssk, slow);
+	}
+
+	release_sock(sk);
+	return 0;
+}
+
 static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
 				       sockptr_t optval, unsigned int optlen)
 {
@@ -153,6 +194,8 @@ static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
 	case SO_RCVBUF:
 	case SO_RCVBUFFORCE:
 		return mptcp_setsockopt_sol_socket_int(msk, optname, optval, optlen);
+	case SO_LINGER:
+		return mptcp_setsockopt_sol_socket_linger(msk, optval, optlen);
 	}
 
 	return sock_setsockopt(sk->sk_socket, SOL_SOCKET, optname, optval, optlen);
@@ -471,6 +514,13 @@ static void sync_socket_options(struct mptcp_sock *msk, struct sock *ssk)
 		WRITE_ONCE(ssk->sk_sndbuf, sk->sk_sndbuf);
 	if (sk->sk_userlocks & SOCK_RCVBUF_LOCK)
 		WRITE_ONCE(ssk->sk_rcvbuf, sk->sk_rcvbuf);
+
+	if (sock_flag(sk, SOCK_LINGER)) {
+		ssk->sk_lingertime = sk->sk_lingertime;
+		sock_set_flag(ssk, SOCK_LINGER);
+	} else {
+		sock_reset_flag(ssk, SOCK_LINGER);
+	}
 }
 
 void mptcp_sockopt_sync(struct mptcp_sock *msk, struct sock *ssk)
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH mptcp-next v2 5/8] mptcp: setsockopt: add SO_MARK support
  2021-03-24 13:15 [RFC PATCH mptcp-next v2 0/8] initial SOL_SOCKET support Florian Westphal
                   ` (3 preceding siblings ...)
  2021-03-24 13:15 ` [RFC PATCH mptcp-next v2 4/8] mptcp: setsockopt: support SO_LINGER Florian Westphal
@ 2021-03-24 13:15 ` Florian Westphal
  2021-03-25  1:22   ` Mat Martineau
  2021-03-24 13:15 ` [RFC PATCH mptcp-next v2 6/8] mptcp: setsockopt: add SO_INCOMING_CPU Florian Westphal
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Florian Westphal @ 2021-03-24 13:15 UTC (permalink / raw)
  To: mptcp; +Cc: mptcp, Florian Westphal

Value is synced to all subflows.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/mptcp/sockopt.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index f33a9ee12544..cb29013d7d74 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -66,6 +66,12 @@ static void mptcp_sol_socket_sync_intval(struct mptcp_sock *msk, int optname, in
 			ssk->sk_userlocks |= SOCK_RCVBUF_LOCK;
 			WRITE_ONCE(ssk->sk_rcvbuf, sk->sk_rcvbuf);
 			break;
+		case SO_MARK:
+			if (READ_ONCE(ssk->sk_mark) != sk->sk_mark) {
+				ssk->sk_mark = sk->sk_mark;
+				sk_dst_reset(ssk);
+			}
+			break;
 		}
 
 		subflow->setsockopt_seq = msk->setsockopt_seq;
@@ -103,6 +109,7 @@ static int mptcp_setsockopt_sol_socket_int(struct mptcp_sock *msk, int optname,
 	case SO_KEEPALIVE:
 		mptcp_sol_socket_sync_intval(msk, optname, val);
 		return 0;
+	case SO_MARK:
 	case SO_PRIORITY:
 	case SO_SNDBUF:
 	case SO_SNDBUFFORCE:
@@ -193,6 +200,7 @@ static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
 	case SO_SNDBUFFORCE:
 	case SO_RCVBUF:
 	case SO_RCVBUFFORCE:
+	case SO_MARK:
 		return mptcp_setsockopt_sol_socket_int(msk, optname, optval, optlen);
 	case SO_LINGER:
 		return mptcp_setsockopt_sol_socket_linger(msk, optval, optlen);
@@ -521,6 +529,11 @@ static void sync_socket_options(struct mptcp_sock *msk, struct sock *ssk)
 	} else {
 		sock_reset_flag(ssk, SOCK_LINGER);
 	}
+
+	if (sk->sk_mark != ssk->sk_mark) {
+		ssk->sk_mark = sk->sk_mark;
+		sk_dst_reset(ssk);
+	}
 }
 
 void mptcp_sockopt_sync(struct mptcp_sock *msk, struct sock *ssk)
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH mptcp-next v2 6/8] mptcp: setsockopt: add SO_INCOMING_CPU
  2021-03-24 13:15 [RFC PATCH mptcp-next v2 0/8] initial SOL_SOCKET support Florian Westphal
                   ` (4 preceding siblings ...)
  2021-03-24 13:15 ` [RFC PATCH mptcp-next v2 5/8] mptcp: setsockopt: add SO_MARK support Florian Westphal
@ 2021-03-24 13:15 ` Florian Westphal
  2021-03-24 13:15 ` [RFC PATCH mptcp-next v2 7/8] mptcp: setsockopt: SO_DEBUG and no-op options Florian Westphal
  2021-03-24 13:15 ` [RFC PATCH mptcp-next v2 8/8] mptcp: sockopt: add TCP_CONGESTION and TCP_INFO Florian Westphal
  7 siblings, 0 replies; 23+ messages in thread
From: Florian Westphal @ 2021-03-24 13:15 UTC (permalink / raw)
  To: mptcp; +Cc: mptcp, Florian Westphal

Replicate to all subflows.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/mptcp/sockopt.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index cb29013d7d74..6d59b25f3658 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -72,6 +72,9 @@ static void mptcp_sol_socket_sync_intval(struct mptcp_sock *msk, int optname, in
 				sk_dst_reset(ssk);
 			}
 			break;
+		case SO_INCOMING_CPU:
+			WRITE_ONCE(ssk->sk_incoming_cpu, val);
+			break;
 		}
 
 		subflow->setsockopt_seq = msk->setsockopt_seq;
@@ -96,6 +99,15 @@ static int mptcp_sol_socket_intval(struct mptcp_sock *msk, int optname, int val)
 	return 0;
 }
 
+static void mptcp_so_incoming_cpu(struct mptcp_sock *msk, int val)
+{
+	struct sock *sk = (struct sock *)msk;
+
+	WRITE_ONCE(sk->sk_incoming_cpu, val);
+
+	mptcp_sol_socket_sync_intval(msk, SO_INCOMING_CPU, val);
+}
+
 static int mptcp_setsockopt_sol_socket_int(struct mptcp_sock *msk, int optname,
 					   sockptr_t optval, unsigned int optlen)
 {
@@ -116,6 +128,9 @@ static int mptcp_setsockopt_sol_socket_int(struct mptcp_sock *msk, int optname,
 	case SO_RCVBUF:
 	case SO_RCVBUFFORCE:
 		return mptcp_sol_socket_intval(msk, optname, val);
+	case SO_INCOMING_CPU:
+		mptcp_so_incoming_cpu(msk, val);
+		return 0;
 	}
 
 	return -ENOPROTOOPT;
@@ -201,6 +216,7 @@ static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
 	case SO_RCVBUF:
 	case SO_RCVBUFFORCE:
 	case SO_MARK:
+	case SO_INCOMING_CPU:
 		return mptcp_setsockopt_sol_socket_int(msk, optname, optval, optlen);
 	case SO_LINGER:
 		return mptcp_setsockopt_sol_socket_linger(msk, optval, optlen);
@@ -517,6 +533,7 @@ static void sync_socket_options(struct mptcp_sock *msk, struct sock *ssk)
 
 	ssk->sk_priority = sk->sk_priority;
 	ssk->sk_bound_dev_if = sk->sk_bound_dev_if;
+	ssk->sk_incoming_cpu = sk->sk_incoming_cpu;
 
 	if (sk->sk_userlocks & SOCK_SNDBUF_LOCK)
 		WRITE_ONCE(ssk->sk_sndbuf, sk->sk_sndbuf);
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH mptcp-next v2 7/8] mptcp: setsockopt: SO_DEBUG and no-op options
  2021-03-24 13:15 [RFC PATCH mptcp-next v2 0/8] initial SOL_SOCKET support Florian Westphal
                   ` (5 preceding siblings ...)
  2021-03-24 13:15 ` [RFC PATCH mptcp-next v2 6/8] mptcp: setsockopt: add SO_INCOMING_CPU Florian Westphal
@ 2021-03-24 13:15 ` Florian Westphal
  2021-03-24 13:15 ` [RFC PATCH mptcp-next v2 8/8] mptcp: sockopt: add TCP_CONGESTION and TCP_INFO Florian Westphal
  7 siblings, 0 replies; 23+ messages in thread
From: Florian Westphal @ 2021-03-24 13:15 UTC (permalink / raw)
  To: mptcp; +Cc: mptcp, Florian Westphal

Handle SO_DEBUG and set it on all subflows.
Ignore those values not implemented on TCP sockets.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/mptcp/sockopt.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index 6d59b25f3658..d345ebc3947a 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -48,6 +48,9 @@ static void mptcp_sol_socket_sync_intval(struct mptcp_sock *msk, int optname, in
 		bool slow = lock_sock_fast(ssk);
 
 		switch (optname) {
+		case SO_DEBUG:
+			sock_valbool_flag(ssk, SOCK_DBG, !!val);
+			break;
 		case SO_KEEPALIVE:
 			if (ssk->sk_prot->keepalive)
 				ssk->sk_prot->keepalive(ssk, !!val);
@@ -121,6 +124,7 @@ static int mptcp_setsockopt_sol_socket_int(struct mptcp_sock *msk, int optname,
 	case SO_KEEPALIVE:
 		mptcp_sol_socket_sync_intval(msk, optname, val);
 		return 0;
+	case SO_DEBUG:
 	case SO_MARK:
 	case SO_PRIORITY:
 	case SO_SNDBUF:
@@ -217,9 +221,21 @@ static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
 	case SO_RCVBUFFORCE:
 	case SO_MARK:
 	case SO_INCOMING_CPU:
+	case SO_DEBUG:
 		return mptcp_setsockopt_sol_socket_int(msk, optname, optval, optlen);
 	case SO_LINGER:
 		return mptcp_setsockopt_sol_socket_linger(msk, optval, optlen);
+	case SO_NO_CHECK:
+	case SO_DONTROUTE:
+	case SO_BROADCAST:
+	case SO_BSDCOMPAT:
+	case SO_PASSCRED:
+	case SO_PASSSEC:
+	case SO_RXQ_OVFL:
+	case SO_WIFI_STATUS:
+	case SO_NOFCS:
+	case SO_SELECT_ERR_QUEUE:
+		return 0;
 	}
 
 	return sock_setsockopt(sk->sk_socket, SOL_SOCKET, optname, optval, optlen);
@@ -551,6 +567,8 @@ static void sync_socket_options(struct mptcp_sock *msk, struct sock *ssk)
 		ssk->sk_mark = sk->sk_mark;
 		sk_dst_reset(ssk);
 	}
+
+	sock_valbool_flag(ssk, SOCK_DBG, sock_flag(sk, SOCK_DBG));
 }
 
 void mptcp_sockopt_sync(struct mptcp_sock *msk, struct sock *ssk)
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH mptcp-next v2 8/8] mptcp: sockopt: add TCP_CONGESTION and TCP_INFO
  2021-03-24 13:15 [RFC PATCH mptcp-next v2 0/8] initial SOL_SOCKET support Florian Westphal
                   ` (6 preceding siblings ...)
  2021-03-24 13:15 ` [RFC PATCH mptcp-next v2 7/8] mptcp: setsockopt: SO_DEBUG and no-op options Florian Westphal
@ 2021-03-24 13:15 ` Florian Westphal
  7 siblings, 0 replies; 23+ messages in thread
From: Florian Westphal @ 2021-03-24 13:15 UTC (permalink / raw)
  To: mptcp; +Cc: mptcp, Florian Westphal

TCP_CONGESTION is set for all subflows.
The mptcp socket gains icsk_ca_ops too so it can be used to keep the
authoritative state that should be set on new/future subflows.

TCP_INFO will return first subflow only.
The out-of-tree kernel has a MPTCP_INFO getsockopt, this could be added
later on.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/mptcp/protocol.c |   4 ++
 net/mptcp/sockopt.c  | 105 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 109 insertions(+)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 15beb99f559d..3aefc52ab8f1 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2402,6 +2402,8 @@ static int __mptcp_init_sock(struct sock *sk)
 	timer_setup(&msk->sk.icsk_retransmit_timer, mptcp_retransmit_timer, 0);
 	timer_setup(&sk->sk_timer, mptcp_timeout_timer, 0);
 
+	tcp_assign_congestion_control(sk);
+
 #if IS_ENABLED(CONFIG_KASAN)
 	sock_set_flag(sk, SOCK_RCU_FREE);
 #endif
@@ -2598,6 +2600,8 @@ static void __mptcp_destroy_sock(struct sock *sk)
 	WARN_ON_ONCE(msk->rmem_released);
 	sk_stream_kill_queues(sk);
 	xfrm_sk_free_policy(sk);
+
+	tcp_cleanup_congestion_control(sk);
 	sk_refcnt_debug_release(sk);
 	mptcp_dispose_initial_subflow(msk);
 	sock_put(sk);
diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index d345ebc3947a..2c9aabe631c2 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -481,6 +481,63 @@ static bool mptcp_supported_sockopt(int level, int optname)
 	return false;
 }
 
+static int mptcp_setsockopt_sol_tcp_congestion(struct mptcp_sock *msk, sockptr_t optval,
+					       unsigned int optlen)
+{
+	struct mptcp_subflow_context *subflow;
+	struct sock *sk = (struct sock *)msk;
+	char name[TCP_CA_NAME_MAX];
+	bool cap_net_admin;
+	int ret;
+
+	if (optlen < 1)
+		return -EINVAL;
+
+	ret = strncpy_from_sockptr(name, optval,
+				min_t(long, TCP_CA_NAME_MAX-1, optlen));
+	if (ret < 0)
+		return -EFAULT;
+
+	name[ret] = 0;
+
+	cap_net_admin = ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN);
+
+	ret = 0;
+	lock_sock(sk);
+	msk->setsockopt_seq++;
+	mptcp_for_each_subflow(msk, subflow) {
+		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
+		int err;
+
+		lock_sock(ssk);
+		err = tcp_set_congestion_control(ssk, name, true, cap_net_admin);
+		if (err < 0 && ret == 0)
+			ret = err;
+		subflow->setsockopt_seq = msk->setsockopt_seq;
+		release_sock(ssk);
+	}
+
+	if (ret == 0)
+		tcp_set_congestion_control(sk, name, false, cap_net_admin);
+
+	release_sock(sk);
+	return ret;
+}
+
+
+static int mptcp_setsockopt_sol_tcp(struct mptcp_sock *msk, int optname,
+				    sockptr_t optval, unsigned int optlen)
+{
+	switch (optname) {
+	case TCP_ULP:
+		return -EOPNOTSUPP;
+	case TCP_CONGESTION:
+		return mptcp_setsockopt_sol_tcp_congestion(msk, optval, optlen);
+	}
+
+	return -EOPNOTSUPP;
+}
+
 int mptcp_setsockopt(struct sock *sk, int level, int optname,
 		     sockptr_t optval, unsigned int optlen)
 {
@@ -510,6 +567,49 @@ int mptcp_setsockopt(struct sock *sk, int level, int optname,
 	if (level == SOL_IPV6)
 		return mptcp_setsockopt_v6(msk, optname, optval, optlen);
 
+	if (level == SOL_TCP)
+		return mptcp_setsockopt_sol_tcp(msk, optname, optval, optlen);
+
+	return -EOPNOTSUPP;
+}
+
+static int mptcp_getsockopt_first_sf_only(struct mptcp_sock *msk, int level, int optname,
+					  char __user *optval, int __user *optlen)
+{
+	struct sock *sk = (struct sock *)msk;
+	struct socket *ssock;
+	int ret = -EINVAL;
+	struct sock *ssk;
+
+	lock_sock(sk);
+	ssk = msk->first;
+	if (ssk) {
+		ret = tcp_getsockopt(ssk, level, optname, optval, optlen);
+		goto out;
+	}
+
+	ssock = __mptcp_nmpc_socket(msk);
+	if (!ssock)
+		goto out;
+
+	ret = tcp_getsockopt(ssock->sk, level, optname, optval, optlen);
+
+out:
+	release_sock(sk);
+	return ret;
+}
+
+static int mptcp_getsockopt_sol_tcp(struct mptcp_sock *msk, int optname,
+				    char __user *optval, int __user *optlen)
+{
+	switch (optname) {
+	case TCP_ULP:
+	case TCP_CONGESTION:
+	case TCP_INFO:
+	case TCP_CC_INFO:
+		return mptcp_getsockopt_first_sf_only(msk, SOL_TCP, optname,
+						      optval, optlen);
+	}
 	return -EOPNOTSUPP;
 }
 
@@ -533,6 +633,8 @@ int mptcp_getsockopt(struct sock *sk, int level, int optname,
 	if (ssk)
 		return tcp_getsockopt(ssk, level, optname, optval, option);
 
+	if (level == SOL_TCP)
+		return mptcp_getsockopt_sol_tcp(msk, optname, optval, option);
 	return -EOPNOTSUPP;
 }
 
@@ -569,6 +671,9 @@ static void sync_socket_options(struct mptcp_sock *msk, struct sock *ssk)
 	}
 
 	sock_valbool_flag(ssk, SOCK_DBG, sock_flag(sk, SOCK_DBG));
+
+	if (inet_csk(sk)->icsk_ca_ops != inet_csk(ssk)->icsk_ca_ops)
+		tcp_set_congestion_control(ssk, inet_csk(sk)->icsk_ca_ops->name, false, true);
 }
 
 void mptcp_sockopt_sync(struct mptcp_sock *msk, struct sock *ssk)
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [MPTCP] [RFC PATCH mptcp-next v2 3/8] mptcp: setsockopt: handle receive/send buffer and device bind
  2021-03-24 13:15 ` [RFC PATCH mptcp-next v2 3/8] mptcp: setsockopt: handle receive/send buffer and device bind Florian Westphal
@ 2021-03-24 16:34   ` Paolo Abeni
  2021-03-24 17:15     ` Florian Westphal
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Abeni @ 2021-03-24 16:34 UTC (permalink / raw)
  To: Florian Westphal, mptcp; +Cc: mptcp

On Wed, 2021-03-24 at 14:15 +0100, Florian Westphal wrote:
> @@ -441,6 +465,12 @@ static void sync_socket_options(struct mptcp_sock *msk, struct sock *ssk)
>  	}
>  
>  	ssk->sk_priority = sk->sk_priority;
> +	ssk->sk_bound_dev_if = sk->sk_bound_dev_if;
> +
> +	if (sk->sk_userlocks & SOCK_SNDBUF_LOCK)
> +		WRITE_ONCE(ssk->sk_sndbuf, sk->sk_sndbuf);
> +	if (sk->sk_userlocks & SOCK_RCVBUF_LOCK)
> +		WRITE_ONCE(ssk->sk_rcvbuf, sk->sk_rcvbuf);

Don't we need also:

	ssk->sk_userlocks = sk->sk_userlocks;

?

Otherwise tcp could still autotune under the wood ?

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [MPTCP] [RFC PATCH mptcp-next v2 3/8] mptcp: setsockopt: handle receive/send buffer and device bind
  2021-03-24 16:34   ` [MPTCP] " Paolo Abeni
@ 2021-03-24 17:15     ` Florian Westphal
  0 siblings, 0 replies; 23+ messages in thread
From: Florian Westphal @ 2021-03-24 17:15 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: Florian Westphal, mptcp, mptcp

Paolo Abeni <pabeni@redhat.com> wrote:
> On Wed, 2021-03-24 at 14:15 +0100, Florian Westphal wrote:
> > @@ -441,6 +465,12 @@ static void sync_socket_options(struct mptcp_sock *msk, struct sock *ssk)
> >  	}
> >  
> >  	ssk->sk_priority = sk->sk_priority;
> > +	ssk->sk_bound_dev_if = sk->sk_bound_dev_if;
> > +
> > +	if (sk->sk_userlocks & SOCK_SNDBUF_LOCK)
> > +		WRITE_ONCE(ssk->sk_sndbuf, sk->sk_sndbuf);
> > +	if (sk->sk_userlocks & SOCK_RCVBUF_LOCK)
> > +		WRITE_ONCE(ssk->sk_rcvbuf, sk->sk_rcvbuf);
> 
> Don't we need also:
> 
> 	ssk->sk_userlocks = sk->sk_userlocks;

Yes, right.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [MPTCP] [RFC PATCH mptcp-next v2 1/8] mptcp: add skeleton to sync msk socket options to subflows
  2021-03-24 13:15 ` [RFC PATCH mptcp-next v2 1/8] mptcp: add skeleton to sync msk socket options to subflows Florian Westphal
@ 2021-03-24 17:38   ` Paolo Abeni
  2021-03-24 20:01     ` Florian Westphal
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Abeni @ 2021-03-24 17:38 UTC (permalink / raw)
  To: Florian Westphal, mptcp; +Cc: mptcp

On Wed, 2021-03-24 at 14:15 +0100, Florian Westphal wrote:
> Handle following cases:
> 1. setsockopt is called with multiple subflows.
>    Change might have to be mirrored to all of them.
>    This is done directly in process context/setsockopt call.
> 2. Outgoing subflow is created after one or several setsockopt()
>    calls have been made.  Old setsockopt changes should be
>    synced to the new socket.
> 3. Incoming subflow, after setsockopt call(s).
> 
> Cases 2 and 3 are handled right after the join list is spliced to the
> conn list.
> 
> Because splicing is sometimes done from non-preemptible context, the
> sync action can be deferred to the work queue.
> 
> Add sequence numbers to subflow context and mptcp socket so
> synchronization functions know which subflow is already updated
> and which are not.
> 
> seqno is re-set to 1 in mptcp_sockopt_sync_all(), at this point
> the list of subflows is up to date.
> 
> A setsockopt sequence count of 0 means that no setsockopt call
> was made so no synchronization is needed.
> 
> Signed-off-by: Florian Westphal <fw@strlen.de>

If I read correctly, each incoming subflow always get setsockopt_seq =
0 at accept time, while outgoing subflow are always synced at creation
time.

It looks like we have 2 semantically different 'msk->setsockopt_seq'
values:
* 0 -> no sync required
* any value greater than 0 -> sync required for incoming subflows.

If so, setsockopt could always set msk->setsockopt_seq to 1 and no need
to reset after sync_all.

Somewhat related, in the following scenario:

socket()
sesockopt()
listen()
// msks is accepted, msk->setsockopt_seq != 0
// mpj subflow is accepted, ssk->setsockopt_seq == 0, the mpj ssk
inherited from the listener socket the same sockoptions of the msk/mpc
subflow

we will do an unneeded synchronization for the MPJ subflow.

I think we could avoid that, on top of the above. e.g. in setsockopt
setting msk->setsockopt_seq to some value depending on msk->sk_state,
and copying msk->setsockopt_seq into ssk->setsockopt_seq at accept
time.

Not sure if the latter would be overkill?!? the above scenario looks
like a common one.

Cheers,

Paolo



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [MPTCP] [RFC PATCH mptcp-next v2 1/8] mptcp: add skeleton to sync msk socket options to subflows
  2021-03-24 17:38   ` [MPTCP] " Paolo Abeni
@ 2021-03-24 20:01     ` Florian Westphal
  2021-03-25  9:51       ` Paolo Abeni
  0 siblings, 1 reply; 23+ messages in thread
From: Florian Westphal @ 2021-03-24 20:01 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: Florian Westphal, mptcp, mptcp

Paolo Abeni <pabeni@redhat.com> wrote:
> On Wed, 2021-03-24 at 14:15 +0100, Florian Westphal wrote:
> > Handle following cases:
> > 1. setsockopt is called with multiple subflows.
> >    Change might have to be mirrored to all of them.
> >    This is done directly in process context/setsockopt call.
> > 2. Outgoing subflow is created after one or several setsockopt()
> >    calls have been made.  Old setsockopt changes should be
> >    synced to the new socket.
> > 3. Incoming subflow, after setsockopt call(s).
> > 
> > Cases 2 and 3 are handled right after the join list is spliced to the
> > conn list.
> > 
> > Because splicing is sometimes done from non-preemptible context, the
> > sync action can be deferred to the work queue.
> > 
> > Add sequence numbers to subflow context and mptcp socket so
> > synchronization functions know which subflow is already updated
> > and which are not.
> > 
> > seqno is re-set to 1 in mptcp_sockopt_sync_all(), at this point
> > the list of subflows is up to date.
> > 
> > A setsockopt sequence count of 0 means that no setsockopt call
> > was made so no synchronization is needed.
> > 
> > Signed-off-by: Florian Westphal <fw@strlen.de>
> 
> If I read correctly, each incoming subflow always get setsockopt_seq =
> 0 at accept time, while outgoing subflow are always synced at creation
> time.

Yes, outgoing are synced at creation time.

> It looks like we have 2 semantically different 'msk->setsockopt_seq'
> values:
> * 0 -> no sync required
> * any value greater than 0 -> sync required for incoming subflows.
> 
> If so, setsockopt could always set msk->setsockopt_seq to 1 and no need
> to reset after sync_all.

In the outgoing case, the subflow isn't linked to the conn_list yet.

It syncs current msk settings, but user could setsockopt() before
the connect finishes, in that case another sync is needed after
connection completes.  No idea how to handle this with single toggle
bit.

I could make it so that it always re-syncs after connect/join finishes.

> socket()
> sesockopt()
> listen()
> // msks is accepted, msk->setsockopt_seq != 0
> // mpj subflow is accepted, ssk->setsockopt_seq == 0, the mpj ssk
> inherited from the listener socket the same sockoptions of the msk/mpc
> subflow

I'm not sure all options have this 'inherited from listener' behaviour.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH mptcp-next v2 5/8] mptcp: setsockopt: add SO_MARK support
  2021-03-24 13:15 ` [RFC PATCH mptcp-next v2 5/8] mptcp: setsockopt: add SO_MARK support Florian Westphal
@ 2021-03-25  1:22   ` Mat Martineau
  2021-03-25  9:32     ` Florian Westphal
  0 siblings, 1 reply; 23+ messages in thread
From: Mat Martineau @ 2021-03-25  1:22 UTC (permalink / raw)
  To: Florian Westphal; +Cc: mptcp, mptcp

On Wed, 24 Mar 2021, Florian Westphal wrote:

> Value is synced to all subflows.
>

The use case I remember for SO_MARK with MPTCP was to designate different 
interfaces for different subflows:

https://lore.kernel.org/mptcp/CAKD1Yr2sBCdUO48cp=rZQ6s4v13ytpPd9oPT+U=iYrdXtba5HA@mail.gmail.com/


Once we have a way to set individual subflow options, it could both be 
useful to set sk_mark on all subflows, and also to not override individual 
settings. The current sync mechanism would overwrite all supported options 
when any single option changes.

There's no way for these options to diverge yet, so we could wait on 
solving that problem. Do you think it's better to stick with the current 
syncing method for now, or to do more detailed tracking of which options 
need to be synced?


Mat


> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
> net/mptcp/sockopt.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
> index f33a9ee12544..cb29013d7d74 100644
> --- a/net/mptcp/sockopt.c
> +++ b/net/mptcp/sockopt.c
> @@ -66,6 +66,12 @@ static void mptcp_sol_socket_sync_intval(struct mptcp_sock *msk, int optname, in
> 			ssk->sk_userlocks |= SOCK_RCVBUF_LOCK;
> 			WRITE_ONCE(ssk->sk_rcvbuf, sk->sk_rcvbuf);
> 			break;
> +		case SO_MARK:
> +			if (READ_ONCE(ssk->sk_mark) != sk->sk_mark) {
> +				ssk->sk_mark = sk->sk_mark;
> +				sk_dst_reset(ssk);
> +			}
> +			break;
> 		}
>
> 		subflow->setsockopt_seq = msk->setsockopt_seq;
> @@ -103,6 +109,7 @@ static int mptcp_setsockopt_sol_socket_int(struct mptcp_sock *msk, int optname,
> 	case SO_KEEPALIVE:
> 		mptcp_sol_socket_sync_intval(msk, optname, val);
> 		return 0;
> +	case SO_MARK:
> 	case SO_PRIORITY:
> 	case SO_SNDBUF:
> 	case SO_SNDBUFFORCE:
> @@ -193,6 +200,7 @@ static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
> 	case SO_SNDBUFFORCE:
> 	case SO_RCVBUF:
> 	case SO_RCVBUFFORCE:
> +	case SO_MARK:
> 		return mptcp_setsockopt_sol_socket_int(msk, optname, optval, optlen);
> 	case SO_LINGER:
> 		return mptcp_setsockopt_sol_socket_linger(msk, optval, optlen);
> @@ -521,6 +529,11 @@ static void sync_socket_options(struct mptcp_sock *msk, struct sock *ssk)
> 	} else {
> 		sock_reset_flag(ssk, SOCK_LINGER);
> 	}
> +
> +	if (sk->sk_mark != ssk->sk_mark) {
> +		ssk->sk_mark = sk->sk_mark;
> +		sk_dst_reset(ssk);
> +	}
> }
>
> void mptcp_sockopt_sync(struct mptcp_sock *msk, struct sock *ssk)
> -- 
> 2.26.3
>
>
>

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH mptcp-next v2 5/8] mptcp: setsockopt: add SO_MARK support
  2021-03-25  1:22   ` Mat Martineau
@ 2021-03-25  9:32     ` Florian Westphal
  2021-03-26  0:14       ` Mat Martineau
  0 siblings, 1 reply; 23+ messages in thread
From: Florian Westphal @ 2021-03-25  9:32 UTC (permalink / raw)
  To: Mat Martineau; +Cc: Florian Westphal, mptcp, mptcp

Mat Martineau <mathew.j.martineau@linux.intel.com> wrote:
> On Wed, 24 Mar 2021, Florian Westphal wrote:
> 
> > Value is synced to all subflows.
> > 
> 
> The use case I remember for SO_MARK with MPTCP was to designate different
> interfaces for different subflows:
> 
> https://lore.kernel.org/mptcp/CAKD1Yr2sBCdUO48cp=rZQ6s4v13ytpPd9oPT+U=iYrdXtba5HA@mail.gmail.com/
> 
> 
> Once we have a way to set individual subflow options, it could both be
> useful to set sk_mark on all subflows, and also to not override individual
> settings. The current sync mechanism would overwrite all supported options
> when any single option changes.
> 
> There's no way for these options to diverge yet, so we could wait on solving
> that problem. Do you think it's better to stick with the current syncing
> method for now, or to do more detailed tracking of which options need to be
> synced?

Looks like same issue as with TCP_CONGESTION.
Q is how we can expose the individual subflows.

I see
https://tools.ietf.org/html/draft-hesmans-mptcp-socket-03

Should that be implemented instead of this?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [MPTCP] [RFC PATCH mptcp-next v2 1/8] mptcp: add skeleton to sync msk socket options to subflows
  2021-03-24 20:01     ` Florian Westphal
@ 2021-03-25  9:51       ` Paolo Abeni
  2021-03-25 12:49         ` Florian Westphal
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Abeni @ 2021-03-25  9:51 UTC (permalink / raw)
  To: Florian Westphal; +Cc: mptcp, mptcp

On Wed, 2021-03-24 at 21:01 +0100, Florian Westphal wrote:
> Paolo Abeni <pabeni@redhat.com> wrote:
> > On Wed, 2021-03-24 at 14:15 +0100, Florian Westphal wrote:
> > > Handle following cases:
> > > 1. setsockopt is called with multiple subflows.
> > >    Change might have to be mirrored to all of them.
> > >    This is done directly in process context/setsockopt call.
> > > 2. Outgoing subflow is created after one or several setsockopt()
> > >    calls have been made.  Old setsockopt changes should be
> > >    synced to the new socket.
> > > 3. Incoming subflow, after setsockopt call(s).
> > > 
> > > Cases 2 and 3 are handled right after the join list is spliced to the
> > > conn list.
> > > 
> > > Because splicing is sometimes done from non-preemptible context, the
> > > sync action can be deferred to the work queue.
> > > 
> > > Add sequence numbers to subflow context and mptcp socket so
> > > synchronization functions know which subflow is already updated
> > > and which are not.
> > > 
> > > seqno is re-set to 1 in mptcp_sockopt_sync_all(), at this point
> > > the list of subflows is up to date.
> > > 
> > > A setsockopt sequence count of 0 means that no setsockopt call
> > > was made so no synchronization is needed.
> > > 
> > > Signed-off-by: Florian Westphal <fw@strlen.de>
> > 
> > If I read correctly, each incoming subflow always get setsockopt_seq =
> > 0 at accept time, while outgoing subflow are always synced at creation
> > time.
> 
> Yes, outgoing are synced at creation time.
> 
> > It looks like we have 2 semantically different 'msk->setsockopt_seq'
> > values:
> > * 0 -> no sync required
> > * any value greater than 0 -> sync required for incoming subflows.
> > 
> > If so, setsockopt could always set msk->setsockopt_seq to 1 and no need
> > to reset after sync_all.
> 
> In the outgoing case, the subflow isn't linked to the conn_list yet.

My bad! I did not recall we use join_list even for outgoing subflows...
I'm wondering if we could move them directly in conn_list
inside __mptcp_subflow_connect()? I think it should not be problematic.

> It syncs current msk settings, but user could setsockopt() before
> the connect finishes, in that case another sync is needed after
> connection completes.  No idea how to handle this with single toggle
> bit.

Yep, with outgoing subflow in join_list the thing I suggested above is
not usable.

> > socket()
> > sesockopt()
> > listen()
> > // msks is accepted, msk->setsockopt_seq != 0
> > // mpj subflow is accepted, ssk->setsockopt_seq == 0, the mpj ssk
> > inherited from the listener socket the same sockoptions of the msk/mpc
> > subflow
> 
> I'm not sure all options have this 'inherited from listener' behaviour.

Why not? the subflow socket is cloned from the listener TCP subflow. 

If tcp_create_openreq_child() does not propagate some socket option,
then the expected behaviour is probably not propagating it.

WDYT?

Thanks!

Paolo





^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [MPTCP] [RFC PATCH mptcp-next v2 1/8] mptcp: add skeleton to sync msk socket options to subflows
  2021-03-25  9:51       ` Paolo Abeni
@ 2021-03-25 12:49         ` Florian Westphal
  2021-03-25 13:12           ` Paolo Abeni
  0 siblings, 1 reply; 23+ messages in thread
From: Florian Westphal @ 2021-03-25 12:49 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: Florian Westphal, mptcp, mptcp

Paolo Abeni <pabeni@redhat.com> wrote:
> > In the outgoing case, the subflow isn't linked to the conn_list yet.
> 
> My bad! I did not recall we use join_list even for outgoing subflows...
> I'm wondering if we could move them directly in conn_list
> inside __mptcp_subflow_connect()? I think it should not be problematic.

I thought conn_list was only allowed to contain full subflows, but I see
that scheduler skips subflows with incomplete join.

But such change doesn't appear to be related to this one.

> > I'm not sure all options have this 'inherited from listener' behaviour.
> 
> Why not? the subflow socket is cloned from the listener TCP subflow. 
> 
> If tcp_create_openreq_child() does not propagate some socket option,
> then the expected behaviour is probably not propagating it.

Ok, so you propose to init ssk->sockopt_seq to msk->sockopt_seq in
mptcp_accept()?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [MPTCP] [RFC PATCH mptcp-next v2 1/8] mptcp: add skeleton to sync msk socket options to subflows
  2021-03-25 12:49         ` Florian Westphal
@ 2021-03-25 13:12           ` Paolo Abeni
  2021-03-25 14:06             ` Florian Westphal
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Abeni @ 2021-03-25 13:12 UTC (permalink / raw)
  To: Florian Westphal; +Cc: mptcp, mptcp

On Thu, 2021-03-25 at 13:49 +0100, Florian Westphal wrote:
> Paolo Abeni <pabeni@redhat.com> wrote:
> > > In the outgoing case, the subflow isn't linked to the conn_list yet.
> > 
> > My bad! I did not recall we use join_list even for outgoing subflows...
> > I'm wondering if we could move them directly in conn_list
> > inside __mptcp_subflow_connect()? I think it should not be problematic.
> 
> I thought conn_list was only allowed to contain full subflows, but I see
> that scheduler skips subflows with incomplete join.
> 
> But such change doesn't appear to be related to this one.

Yep.

> > > I'm not sure all options have this 'inherited from listener' behaviour.
> > 
> > Why not? the subflow socket is cloned from the listener TCP subflow. 
> > 
> > If tcp_create_openreq_child() does not propagate some socket option,
> > then the expected behaviour is probably not propagating it.
> 
> Ok, so you propose to init ssk->sockopt_seq to msk->sockopt_seq in
> mptcp_accept()?

in subflow_syn_recv_sock(), but only if we can avoid incrementing  msk-
>sockopt_seq on per setsockopt basis, otherwise things will be
crippled:

socket(s1)
setsockopt(s1)
listen(s1)

s2 = accept()

setsockopt(s2)
setsockopt(s1, <a different sockopt from the above>)

At this point, if we increment sockopt_seq after everu sockopt, s1-
>sockopt_seq and s2->sockopt_seq will be equal (and we will not sync
any mpj subflow accepted later if propagate sockopt_seq to ssk). 

Instead, if we set sockopt_seq to some socket status related value
after every setsockopt(), s2->sockopt_seq and s1->sockopt_seq will be
different and we will sync later MPJ subflows.

Perhaps we can keep the things as-is to and ev change/improve in a
second time?

Paolo


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [MPTCP] [RFC PATCH mptcp-next v2 1/8] mptcp: add skeleton to sync msk socket options to subflows
  2021-03-25 13:12           ` Paolo Abeni
@ 2021-03-25 14:06             ` Florian Westphal
  2021-03-25 14:33               ` Paolo Abeni
  0 siblings, 1 reply; 23+ messages in thread
From: Florian Westphal @ 2021-03-25 14:06 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: Florian Westphal, mptcp, mptcp

Paolo Abeni <pabeni@redhat.com> wrote:
> > Ok, so you propose to init ssk->sockopt_seq to msk->sockopt_seq in
> > mptcp_accept()?
> 
> in subflow_syn_recv_sock(), but only if we can avoid incrementing  msk-
> >sockopt_seq on per setsockopt basis, otherwise things will be
> crippled:
> 
> socket(s1)
> setsockopt(s1)   // s1->seq 1
> listen(s1)
> 
> s2 = accept() s2->seq == s1->seq == ssk->seq
> 
> setsockopt(s2)   s2->seq = 2
> setsockopt(s1, <a different sockopt from the above>) s1->seq = 2
> 
> At this point, if we increment sockopt_seq after everu sockopt, s1-
> >sockopt_seq and s2->sockopt_seq will be equal (and we will not sync
> any mpj subflow accepted later if propagate sockopt_seq to ssk). 

Sorry, I am not following.

The ssk (s2 msk->first) has ssk->seq 1, s2 msk has 2, so this would be
synced again on completion.

I don't understand how s1 and s2 are related.
This is too complicated for me :(

> Instead, if we set sockopt_seq to some socket status related value
> after every setsockopt(), s2->sockopt_seq and s1->sockopt_seq will be
> different and we will sync later MPJ subflows.

What is a 'socket status related value'?

> Perhaps we can keep the things as-is to and ev change/improve in a
> second time?

I am confused, you mean leave it as-is, i.e. scratch this patch set
and retry (new per-subflow sockopt api for instance)?

Or leave this RFC v2 as-is and see if we can improve the sync strategy
later?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [MPTCP] [RFC PATCH mptcp-next v2 1/8] mptcp: add skeleton to sync msk socket options to subflows
  2021-03-25 14:06             ` Florian Westphal
@ 2021-03-25 14:33               ` Paolo Abeni
  2021-03-25 14:45                 ` Florian Westphal
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Abeni @ 2021-03-25 14:33 UTC (permalink / raw)
  To: Florian Westphal; +Cc: mptcp, mptcp

On Thu, 2021-03-25 at 15:06 +0100, Florian Westphal wrote:
> Paolo Abeni <pabeni@redhat.com> wrote:
> > > Ok, so you propose to init ssk->sockopt_seq to msk->sockopt_seq in
> > > mptcp_accept()?
> > 
> > in subflow_syn_recv_sock(), but only if we can avoid incrementing  msk-
> > > sockopt_seq on per setsockopt basis, otherwise things will be
> > crippled:
> > 
> > socket(s1)
> > setsockopt(s1)   // s1->seq 1
> > listen(s1)
> > 
> > s2 = accept() s2->seq == s1->seq == ssk->seq
> > 
> > setsockopt(s2)   s2->seq = 2
> > setsockopt(s1, <a different sockopt from the above>) s1->seq = 2
> > 
> > At this point, if we increment sockopt_seq after everu sockopt, s1-
> > > sockopt_seq and s2->sockopt_seq will be equal (and we will not sync
> > any mpj subflow accepted later if propagate sockopt_seq to ssk). 
> 
> Sorry, I am not following.
> 
> The ssk (s2 msk->first) has ssk->seq 1, s2 msk has 2, so this would be
> synced again on completion.
> 
> I don't understand how s1 and s2 are related.
> This is too complicated for me :(

s1 is an msk listener socket. It will get 2 setsockopt() syscall, so
s1->sockopt_seq == 2.

Even s2->sockopt_seq == 2, as you noted.

If we copy msk->sockopt_seq into ssk->sockopt_seq, any additional MPJ
subflow joining s2 created after the above syscalls sequence, will
get sockopt_seq == 2, (inherited by s1->sockopt_seq), so will not be
synched, even if synchronization will be needed.

Not sure if the above is less confusing ?!?

> > Instead, if we set sockopt_seq to some socket status related value
> > after every setsockopt(), s2->sockopt_seq and s1->sockopt_seq will be
> > different and we will sync later MPJ subflows.
> 
> What is a 'socket status related value'?

I mean something that changes when 'sk->sk_state' changes. Even plain
'sk->sk_state'.

> > Perhaps we can keep the things as-is to and ev change/improve in a
> > second time?
> 
> I am confused, you mean leave it as-is, i.e. scratch this patch set
> and retry (new per-subflow sockopt api for instance)?
> 
> Or leave this RFC v2 as-is and see if we can improve the sync strategy
> later?

the latter, of course! I mean, keep this v2 as is.

Sorry for the lack of clarity,

/P




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [MPTCP] [RFC PATCH mptcp-next v2 1/8] mptcp: add skeleton to sync msk socket options to subflows
  2021-03-25 14:33               ` Paolo Abeni
@ 2021-03-25 14:45                 ` Florian Westphal
  2021-03-26  9:59                   ` Paolo Abeni
  0 siblings, 1 reply; 23+ messages in thread
From: Florian Westphal @ 2021-03-25 14:45 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: Florian Westphal, mptcp, mptcp

Paolo Abeni <pabeni@redhat.com> wrote:
> On Thu, 2021-03-25 at 15:06 +0100, Florian Westphal wrote:
> > Sorry, I am not following.
> > 
> > The ssk (s2 msk->first) has ssk->seq 1, s2 msk has 2, so this would be
> > synced again on completion.
> > 
> > I don't understand how s1 and s2 are related.
> > This is too complicated for me :(
> 
> s1 is an msk listener socket. It will get 2 setsockopt() syscall, so
> s1->sockopt_seq == 2.
> 
> Even s2->sockopt_seq == 2, as you noted.
> 
> If we copy msk->sockopt_seq into ssk->sockopt_seq, any additional MPJ
> subflow joining s2 created after the above syscalls sequence, will
> get sockopt_seq == 2, (inherited by s1->sockopt_seq), so will not be
> synched, even if synchronization will be needed.
> 
> Not sure if the above is less confusing ?!?

I would expect the new join ssk to have a 0 sequence number.

> > > Instead, if we set sockopt_seq to some socket status related value
> > > after every setsockopt(), s2->sockopt_seq and s1->sockopt_seq will be
> > > different and we will sync later MPJ subflows.
> > 
> > What is a 'socket status related value'?
> 
> I mean something that changes when 'sk->sk_state' changes. Even plain
> 'sk->sk_state'.

So instead of sync if msk->seq != ssk->seq we sync if
sk->sk_state != ssk->sk_state?

I don't understand how that works.

We might try to discuss this in the meeting, perhaps that would help?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH mptcp-next v2 5/8] mptcp: setsockopt: add SO_MARK support
  2021-03-25  9:32     ` Florian Westphal
@ 2021-03-26  0:14       ` Mat Martineau
  0 siblings, 0 replies; 23+ messages in thread
From: Mat Martineau @ 2021-03-26  0:14 UTC (permalink / raw)
  To: Florian Westphal; +Cc: mptcp, mptcp

On Thu, 25 Mar 2021, Florian Westphal wrote:

> Mat Martineau <mathew.j.martineau@linux.intel.com> wrote:
>> On Wed, 24 Mar 2021, Florian Westphal wrote:
>>
>>> Value is synced to all subflows.
>>>
>>
>> The use case I remember for SO_MARK with MPTCP was to designate different
>> interfaces for different subflows:
>>
>> https://lore.kernel.org/mptcp/CAKD1Yr2sBCdUO48cp=rZQ6s4v13ytpPd9oPT+U=iYrdXtba5HA@mail.gmail.com/
>>
>>
>> Once we have a way to set individual subflow options, it could both be
>> useful to set sk_mark on all subflows, and also to not override individual
>> settings. The current sync mechanism would overwrite all supported options
>> when any single option changes.
>>
>> There's no way for these options to diverge yet, so we could wait on solving
>> that problem. Do you think it's better to stick with the current syncing
>> method for now, or to do more detailed tracking of which options need to be
>> synced?
>
> Looks like same issue as with TCP_CONGESTION.
> Q is how we can expose the individual subflows.
>
> I see
> https://tools.ietf.org/html/draft-hesmans-mptcp-socket-03
>
> Should that be implemented instead of this?
>

I had a chance to look at the linked draft, and I think where we ended up 
in the discussion earlier today was this:

  * per-subflow options are useful (SO_MASK, etc)

  * overall settings like this patch set does are also helpful, I think 
especially to make MPTCP sockets behave more like TCP sockets.


The two should be compatible, and the concern I'm pointing out is that the 
proposed sync would clobber individual subflow options even if the 
matching MPTCP-level sockopts had never been set. That may be addressible 
by only syncing "new" subflows rather than all of them in the list.

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [MPTCP] [RFC PATCH mptcp-next v2 1/8] mptcp: add skeleton to sync msk socket options to subflows
  2021-03-25 14:45                 ` Florian Westphal
@ 2021-03-26  9:59                   ` Paolo Abeni
  0 siblings, 0 replies; 23+ messages in thread
From: Paolo Abeni @ 2021-03-26  9:59 UTC (permalink / raw)
  To: Florian Westphal; +Cc: mptcp, mptcp

Hello,

On Thu, 2021-03-25 at 15:45 +0100, Florian Westphal wrote:
> We might try to discuss this in the meeting, perhaps that would help?

I'm sorry, I had a conflicting (long) mtg, and later it was too late :(

I'm also sorry for the lack of clarity on my side.

Perhaps we can have a few mins mtg early next week, so can try to be
less confusing? If so, please bug me whenever it fits you - possibly
_not_ in the 13:15 range, as they are the rush hours for me.

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2021-03-26  9:59 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-24 13:15 [RFC PATCH mptcp-next v2 0/8] initial SOL_SOCKET support Florian Westphal
2021-03-24 13:15 ` [RFC PATCH mptcp-next v2 1/8] mptcp: add skeleton to sync msk socket options to subflows Florian Westphal
2021-03-24 17:38   ` [MPTCP] " Paolo Abeni
2021-03-24 20:01     ` Florian Westphal
2021-03-25  9:51       ` Paolo Abeni
2021-03-25 12:49         ` Florian Westphal
2021-03-25 13:12           ` Paolo Abeni
2021-03-25 14:06             ` Florian Westphal
2021-03-25 14:33               ` Paolo Abeni
2021-03-25 14:45                 ` Florian Westphal
2021-03-26  9:59                   ` Paolo Abeni
2021-03-24 13:15 ` [RFC PATCH mptcp-next v2 2/8] mptcp: setsockopt: handle SO_KEEPALIVE and SO_PRIORITY Florian Westphal
2021-03-24 13:15 ` [RFC PATCH mptcp-next v2 3/8] mptcp: setsockopt: handle receive/send buffer and device bind Florian Westphal
2021-03-24 16:34   ` [MPTCP] " Paolo Abeni
2021-03-24 17:15     ` Florian Westphal
2021-03-24 13:15 ` [RFC PATCH mptcp-next v2 4/8] mptcp: setsockopt: support SO_LINGER Florian Westphal
2021-03-24 13:15 ` [RFC PATCH mptcp-next v2 5/8] mptcp: setsockopt: add SO_MARK support Florian Westphal
2021-03-25  1:22   ` Mat Martineau
2021-03-25  9:32     ` Florian Westphal
2021-03-26  0:14       ` Mat Martineau
2021-03-24 13:15 ` [RFC PATCH mptcp-next v2 6/8] mptcp: setsockopt: add SO_INCOMING_CPU Florian Westphal
2021-03-24 13:15 ` [RFC PATCH mptcp-next v2 7/8] mptcp: setsockopt: SO_DEBUG and no-op options Florian Westphal
2021-03-24 13:15 ` [RFC PATCH mptcp-next v2 8/8] mptcp: sockopt: add TCP_CONGESTION and TCP_INFO Florian Westphal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.