All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/5] mptcp: improve mptcp-level window tracking
@ 2022-04-15 14:48 Paolo Abeni
  2022-04-15 14:48 ` [PATCH net-next 1/5] mptcp: really share subflow snd_wnd Paolo Abeni
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Paolo Abeni @ 2022-04-15 14:48 UTC (permalink / raw)
  To: mptcp

I've been chasing bad/unstable performances with multiple subflows
on very high speed links.

It looks like the root cause is due to the current mptcp-level
congestion window handling. There are apparently a few different
sub-issues:

- the rcv_wnd is not effectively shared on the tx side, as each
  subflow takes in account only the value received by the underlaying
  TCP connection. This is addressed in patch 1/4

- The mptcp-level offered wnd right edge is currently allowed to shrink.
  Reading section 3.3.4.:

"""
   The receive window is relative to the DATA_ACK.  As in TCP, a
   receiver MUST NOT shrink the right edge of the receive window (i.e.,
   DATA_ACK + receive window).  The receiver will use the data sequence
   number to tell if a packet should be accepted at the connection
   level.
"""

I read the above as we need to reflect window right-edge tracking
on the wire, see patch 3/4.

- The offered window right edge tracking can happen concurrently on
  multiple subflows, but there is no mutex protection. We need an
  additional atomic operation - still patch 3/4

This series additionally bump a few new MIBs to track all the above
(ensure/observe that the suspected races actually take place).

I could not access again the host where the issue was su much
noticeable, still in the current setup the tput changes from
[6-18] Gbps to 19Gbps very stable.

RFC -> v1:
 - added patch 3/5 to address Mat's comment, and rebased the
   following on top of it - I hope Eric may tolerate that, it's
   more an hope than guess ;)

Paolo Abeni (5):
  mptcp: really share subflow snd_wnd
  mptcp: add mib for xmit window sharing
  tcp: allow MPTCP to update the announced window.
  mptcp: never shrink offered window
  mptcp: add more offered MIBs counter.

 include/net/mptcp.h   |  2 +-
 net/ipv4/tcp_output.c | 13 +++++-----
 net/mptcp/mib.c       |  4 +++
 net/mptcp/mib.h       |  6 +++++
 net/mptcp/options.c   | 58 +++++++++++++++++++++++++++++++++++++------
 net/mptcp/protocol.c  | 24 ++++++++++++------
 6 files changed, 84 insertions(+), 23 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH net-next 1/5] mptcp: really share subflow snd_wnd
  2022-04-15 14:48 [PATCH net-next 0/5] mptcp: improve mptcp-level window tracking Paolo Abeni
@ 2022-04-15 14:48 ` Paolo Abeni
  2022-04-15 14:48 ` [PATCH net-next 2/5] mptcp: add mib for xmit window sharing Paolo Abeni
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Paolo Abeni @ 2022-04-15 14:48 UTC (permalink / raw)
  To: mptcp

As per RFC, mptcp subflows use a "shared" snd_wnd: the effective
window is the maximum among the current values received on all
subflows. Without such feature a data transfer using multiple
subflows could block.

Window sharing is currently implemented in the RX side:
__tcp_select_window uses the mptcp-level receive buffer to compute
the announced window.

That is not enough: the TCP stack will stick to the window size
received on the given subflow; we need to propagate the msk window
value on each subflow at xmit time.

Change the packet scheduler to ignore the subflow level window
and use instead the msk level one

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/protocol.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 093e084f3ac0..923fdb26214e 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1141,19 +1141,20 @@ struct mptcp_sendmsg_info {
 	bool data_lock_held;
 };
 
-static int mptcp_check_allowed_size(struct mptcp_sock *msk, u64 data_seq,
-				    int avail_size)
+static int mptcp_check_allowed_size(const struct mptcp_sock *msk, struct sock *ssk,
+				    u64 data_seq, int avail_size)
 {
 	u64 window_end = mptcp_wnd_end(msk);
+	u64 mptcp_snd_wnd;
 
 	if (__mptcp_check_fallback(msk))
 		return avail_size;
 
-	if (!before64(data_seq + avail_size, window_end)) {
-		u64 allowed_size = window_end - data_seq;
+	mptcp_snd_wnd = window_end - data_seq;
+	avail_size = min_t(unsigned int, mptcp_snd_wnd, avail_size);
 
-		return min_t(unsigned int, allowed_size, avail_size);
-	}
+	if (unlikely(tcp_sk(ssk)->snd_wnd < mptcp_snd_wnd))
+		tcp_sk(ssk)->snd_wnd = min_t(u64, U32_MAX, mptcp_snd_wnd);
 
 	return avail_size;
 }
@@ -1305,7 +1306,7 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk,
 	}
 
 	/* Zero window and all data acked? Probe. */
-	copy = mptcp_check_allowed_size(msk, data_seq, copy);
+	copy = mptcp_check_allowed_size(msk, ssk, data_seq, copy);
 	if (copy == 0) {
 		u64 snd_una = READ_ONCE(msk->snd_una);
 
@@ -1498,11 +1499,16 @@ static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk)
 	 * to check that subflow has a non empty cwin.
 	 */
 	ssk = send_info[SSK_MODE_ACTIVE].ssk;
-	if (!ssk || !sk_stream_memory_free(ssk) || !tcp_sk(ssk)->snd_wnd)
+	if (!ssk || !sk_stream_memory_free(ssk))
 		return NULL;
 
-	burst = min_t(int, MPTCP_SEND_BURST_SIZE, tcp_sk(ssk)->snd_wnd);
+	burst = min_t(int, MPTCP_SEND_BURST_SIZE, mptcp_wnd_end(msk) - msk->snd_nxt);
 	wmem = READ_ONCE(ssk->sk_wmem_queued);
+	if (!burst) {
+		msk->last_snd = NULL;
+		return ssk;
+	}
+
 	subflow = mptcp_subflow_ctx(ssk);
 	subflow->avg_pacing_rate = div_u64((u64)subflow->avg_pacing_rate * wmem +
 					   READ_ONCE(ssk->sk_pacing_rate) * burst,
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next 2/5] mptcp: add mib for xmit window sharing
  2022-04-15 14:48 [PATCH net-next 0/5] mptcp: improve mptcp-level window tracking Paolo Abeni
  2022-04-15 14:48 ` [PATCH net-next 1/5] mptcp: really share subflow snd_wnd Paolo Abeni
@ 2022-04-15 14:48 ` Paolo Abeni
  2022-04-15 14:48 ` [PATCH net-next 3/5] tcp: allow MPTCP to update the announced window Paolo Abeni
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Paolo Abeni @ 2022-04-15 14:48 UTC (permalink / raw)
  To: mptcp

Bump a counter for counter when snd_wnd is shared among subflow,
for observability's sake.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/mib.c      | 1 +
 net/mptcp/mib.h      | 1 +
 net/mptcp/protocol.c | 4 +++-
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/mptcp/mib.c b/net/mptcp/mib.c
index d93a8c9996fd..6a6f8151375a 100644
--- a/net/mptcp/mib.c
+++ b/net/mptcp/mib.c
@@ -56,6 +56,7 @@ static const struct snmp_mib mptcp_snmp_list[] = {
 	SNMP_MIB_ITEM("RcvPruned", MPTCP_MIB_RCVPRUNED),
 	SNMP_MIB_ITEM("SubflowStale", MPTCP_MIB_SUBFLOWSTALE),
 	SNMP_MIB_ITEM("SubflowRecover", MPTCP_MIB_SUBFLOWRECOVER),
+	SNMP_MIB_ITEM("SndWndShared", MPTCP_MIB_SNDWNDSHARED),
 	SNMP_MIB_SENTINEL
 };
 
diff --git a/net/mptcp/mib.h b/net/mptcp/mib.h
index 529d07af9e14..2411510bef66 100644
--- a/net/mptcp/mib.h
+++ b/net/mptcp/mib.h
@@ -49,6 +49,7 @@ enum linux_mptcp_mib_field {
 	MPTCP_MIB_RCVPRUNED,		/* Incoming packet dropped due to memory limit */
 	MPTCP_MIB_SUBFLOWSTALE,		/* Subflows entered 'stale' status */
 	MPTCP_MIB_SUBFLOWRECOVER,	/* Subflows returned to active status after being stale */
+	MPTCP_MIB_SNDWNDSHARED,		/* Subflow snd wnd is overridden by msk's one */
 	__MPTCP_MIB_MAX
 };
 
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 923fdb26214e..8f159d535fed 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1153,8 +1153,10 @@ static int mptcp_check_allowed_size(const struct mptcp_sock *msk, struct sock *s
 	mptcp_snd_wnd = window_end - data_seq;
 	avail_size = min_t(unsigned int, mptcp_snd_wnd, avail_size);
 
-	if (unlikely(tcp_sk(ssk)->snd_wnd < mptcp_snd_wnd))
+	if (unlikely(tcp_sk(ssk)->snd_wnd < mptcp_snd_wnd)) {
 		tcp_sk(ssk)->snd_wnd = min_t(u64, U32_MAX, mptcp_snd_wnd);
+		MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_SNDWNDSHARED);
+	}
 
 	return avail_size;
 }
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next 3/5] tcp: allow MPTCP to update the announced window.
  2022-04-15 14:48 [PATCH net-next 0/5] mptcp: improve mptcp-level window tracking Paolo Abeni
  2022-04-15 14:48 ` [PATCH net-next 1/5] mptcp: really share subflow snd_wnd Paolo Abeni
  2022-04-15 14:48 ` [PATCH net-next 2/5] mptcp: add mib for xmit window sharing Paolo Abeni
@ 2022-04-15 14:48 ` Paolo Abeni
  2022-04-19  0:11   ` Mat Martineau
  2022-04-15 14:48 ` [PATCH net-next 4/5] mptcp: never shrink offered window Paolo Abeni
  2022-04-15 14:48 ` [PATCH net-next 5/5] mptcp: add more offered MIBs counter Paolo Abeni
  4 siblings, 1 reply; 17+ messages in thread
From: Paolo Abeni @ 2022-04-15 14:48 UTC (permalink / raw)
  To: mptcp

The MPTCP RFC requires that the MPTCP-level receive window's
right edge never moves backward. Currently the MPTCP code
enforces such constraint while tracking the right edge, but it
does not reflects it back on the wire, as lacks a suitable hook
to update accordingly the TCP header.

This change modifiy the existing mptcp_write_options() hook,
providing the current packet's TCP header up to the MPTCP protocol
level, so that the next patch could implement the above mentioned
constraint.

No functional changes intended.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/net/mptcp.h   |  2 +-
 net/ipv4/tcp_output.c | 13 +++++++------
 net/mptcp/options.c   |  2 +-
 3 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/include/net/mptcp.h b/include/net/mptcp.h
index 877077b53200..6b07011c060d 100644
--- a/include/net/mptcp.h
+++ b/include/net/mptcp.h
@@ -125,7 +125,7 @@ bool mptcp_established_options(struct sock *sk, struct sk_buff *skb,
 			       struct mptcp_out_options *opts);
 bool mptcp_incoming_options(struct sock *sk, struct sk_buff *skb);
 
-void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp,
+void mptcp_write_options(struct tcphdr *th, __be32 *ptr, struct tcp_sock *tp,
 			 struct mptcp_out_options *opts);
 
 void mptcp_diag_fill_info(struct mptcp_sock *msk, struct mptcp_info *info);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index c221f3bce975..27deec41a1f4 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -444,12 +444,13 @@ struct tcp_out_options {
 	struct mptcp_out_options mptcp;
 };
 
-static void mptcp_options_write(__be32 *ptr, const struct tcp_sock *tp,
+static void mptcp_options_write(struct tcphdr *th, __be32 *ptr,
+				struct tcp_sock *tp,
 				struct tcp_out_options *opts)
 {
 #if IS_ENABLED(CONFIG_MPTCP)
 	if (unlikely(OPTION_MPTCP & opts->options))
-		mptcp_write_options(ptr, tp, &opts->mptcp);
+		mptcp_write_options(th, ptr, tp, &opts->mptcp);
 #endif
 }
 
@@ -605,7 +606,7 @@ static void bpf_skops_write_hdr_opt(struct sock *sk, struct sk_buff *skb,
  * At least SACK_PERM as the first option is known to lead to a disaster
  * (but it may well be that other scenarios fail similarly).
  */
-static void tcp_options_write(__be32 *ptr, struct tcp_sock *tp,
+static void tcp_options_write(struct tcphdr *th, __be32 *ptr, struct tcp_sock *tp,
 			      struct tcp_out_options *opts)
 {
 	u16 options = opts->options;	/* mungable copy */
@@ -701,7 +702,7 @@ static void tcp_options_write(__be32 *ptr, struct tcp_sock *tp,
 
 	smc_options_write(ptr, &options);
 
-	mptcp_options_write(ptr, tp, opts);
+	mptcp_options_write(th, ptr, tp, opts);
 }
 
 static void smc_set_option(const struct tcp_sock *tp,
@@ -1354,7 +1355,7 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb,
 		th->window	= htons(min(tp->rcv_wnd, 65535U));
 	}
 
-	tcp_options_write((__be32 *)(th + 1), tp, &opts);
+	tcp_options_write(th, (__be32 *)(th + 1), tp, &opts);
 
 #ifdef CONFIG_TCP_MD5SIG
 	/* Calculate the MD5 hash, as we have all we need now */
@@ -3590,7 +3591,7 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
 
 	/* RFC1323: The window in SYN & SYN/ACK segments is never scaled. */
 	th->window = htons(min(req->rsk_rcv_wnd, 65535U));
-	tcp_options_write((__be32 *)(th + 1), NULL, &opts);
+	tcp_options_write(th, (__be32 *)(th + 1), NULL, &opts);
 	th->doff = (tcp_header_size >> 2);
 	__TCP_INC_STATS(sock_net(sk), TCP_MIB_OUTSEGS);
 
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index e05d9458a025..2570911735ab 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -1265,7 +1265,7 @@ static u16 mptcp_make_csum(const struct mptcp_ext *mpext)
 				 ~csum_unfold(mpext->csum));
 }
 
-void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp,
+void mptcp_write_options(struct tcphdr *th, __be32 *ptr, struct tcp_sock *tp,
 			 struct mptcp_out_options *opts)
 {
 	const struct sock *ssk = (const struct sock *)tp;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next 4/5] mptcp: never shrink offered window
  2022-04-15 14:48 [PATCH net-next 0/5] mptcp: improve mptcp-level window tracking Paolo Abeni
                   ` (2 preceding siblings ...)
  2022-04-15 14:48 ` [PATCH net-next 3/5] tcp: allow MPTCP to update the announced window Paolo Abeni
@ 2022-04-15 14:48 ` Paolo Abeni
  2022-04-19  0:29   ` Mat Martineau
  2022-04-15 14:48 ` [PATCH net-next 5/5] mptcp: add more offered MIBs counter Paolo Abeni
  4 siblings, 1 reply; 17+ messages in thread
From: Paolo Abeni @ 2022-04-15 14:48 UTC (permalink / raw)
  To: mptcp

As per RFC, the offered MPTCP-level window should never shrink.
While we currently track the right edge, we don't enforce the
above constraint on the wire.
Additionally, concurrent xmit on different subflows can end-up in
erroneous right edge update.
Address the above explicitly updating the announced window and
protecting the update with an additional atomic operation (sic)

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
RFC -> v1:
 - rebased on previous patch
---
 net/mptcp/options.c | 52 +++++++++++++++++++++++++++++++++++++++------
 1 file changed, 45 insertions(+), 7 deletions(-)

diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 2570911735ab..9c76a171af1e 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -1224,20 +1224,58 @@ bool mptcp_incoming_options(struct sock *sk, struct sk_buff *skb)
 	return true;
 }
 
-static void mptcp_set_rwin(const struct tcp_sock *tp)
+static void mptcp_set_rwin(struct tcp_sock *tp, struct tcphdr *th)
 {
 	const struct sock *ssk = (const struct sock *)tp;
-	const struct mptcp_subflow_context *subflow;
+	struct mptcp_subflow_context *subflow;
+	u64 ack_seq, rcv_wnd_old, rcv_wnd_new;
 	struct mptcp_sock *msk;
-	u64 ack_seq;
+	u32 new_win;
+	u64 win;
 
 	subflow = mptcp_subflow_ctx(ssk);
 	msk = mptcp_sk(subflow->conn);
 
-	ack_seq = READ_ONCE(msk->ack_seq) + tp->rcv_wnd;
+	ack_seq = READ_ONCE(msk->ack_seq);
+	rcv_wnd_new = ack_seq + tp->rcv_wnd;
+
+	rcv_wnd_old = READ_ONCE(msk->rcv_wnd_sent);
+	if (after64(rcv_wnd_new, rcv_wnd_old)) {
+		u64 rcv_wnd;
+
+		for (;;) {
+			rcv_wnd = cmpxchg64(&msk->rcv_wnd_sent, rcv_wnd_old, rcv_wnd_new);
+
+			if (rcv_wnd == rcv_wnd_old)
+				break;
+			if (before64(rcv_wnd_new, rcv_wnd))
+				goto raise_win;
+			rcv_wnd_old = rcv_wnd;
+		};
+		return;
+	}
+
+	if (rcv_wnd_new != rcv_wnd_old) {
+raise_win:
+		win = rcv_wnd_old - ack_seq;
+		tp->rcv_wnd = min_t(u64, win, U32_MAX);
+		new_win = tp->rcv_wnd;
 
-	if (after64(ack_seq, READ_ONCE(msk->rcv_wnd_sent)))
-		WRITE_ONCE(msk->rcv_wnd_sent, ack_seq);
+		/* Make sure we do not exceed the maximum possible
+		 * scaled window.
+		 */
+		if (unlikely(th->syn))
+			new_win = min(new_win, 65535U) << tp->rx_opt.rcv_wscale;
+		if (!tp->rx_opt.rcv_wscale &&
+		    sock_net(ssk)->ipv4.sysctl_tcp_workaround_signed_windows)
+			new_win = min(new_win, MAX_TCP_WINDOW);
+		else
+			new_win = min(new_win, (65535U << tp->rx_opt.rcv_wscale));
+
+		/* RFC1323 scaling applied */
+		new_win >>= tp->rx_opt.rcv_wscale;
+		th->window = htons(new_win);
+	}
 }
 
 u16 __mptcp_make_csum(u64 data_seq, u32 subflow_seq, u16 data_len, __wsum sum)
@@ -1554,7 +1592,7 @@ void mptcp_write_options(struct tcphdr *th, __be32 *ptr, struct tcp_sock *tp,
 	}
 
 	if (tp)
-		mptcp_set_rwin(tp);
+		mptcp_set_rwin(tp, th);
 }
 
 __be32 mptcp_get_reset_option(const struct sk_buff *skb)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next 5/5] mptcp: add more offered MIBs counter.
  2022-04-15 14:48 [PATCH net-next 0/5] mptcp: improve mptcp-level window tracking Paolo Abeni
                   ` (3 preceding siblings ...)
  2022-04-15 14:48 ` [PATCH net-next 4/5] mptcp: never shrink offered window Paolo Abeni
@ 2022-04-15 14:48 ` Paolo Abeni
  2022-04-15 15:10   ` mptcp: add more offered MIBs counter.: Build Failure MPTCP CI
  2022-04-15 16:54   ` mptcp: add more offered MIBs counter.: Tests Results MPTCP CI
  4 siblings, 2 replies; 17+ messages in thread
From: Paolo Abeni @ 2022-04-15 14:48 UTC (permalink / raw)
  To: mptcp

Track the exceptional handling of MPTCP-level offered window
with a few more counters for observability.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/mib.c     | 3 +++
 net/mptcp/mib.h     | 5 +++++
 net/mptcp/options.c | 6 +++++-
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/net/mptcp/mib.c b/net/mptcp/mib.c
index 6a6f8151375a..0dac2863c6e1 100644
--- a/net/mptcp/mib.c
+++ b/net/mptcp/mib.c
@@ -57,6 +57,9 @@ static const struct snmp_mib mptcp_snmp_list[] = {
 	SNMP_MIB_ITEM("SubflowStale", MPTCP_MIB_SUBFLOWSTALE),
 	SNMP_MIB_ITEM("SubflowRecover", MPTCP_MIB_SUBFLOWRECOVER),
 	SNMP_MIB_ITEM("SndWndShared", MPTCP_MIB_SNDWNDSHARED),
+	SNMP_MIB_ITEM("RcvWndShared", MPTCP_MIB_RCVWNDSHARED),
+	SNMP_MIB_ITEM("RcvWndConflictUpdate", MPTCP_MIB_RCVWNDCONFLICTUPDATE),
+	SNMP_MIB_ITEM("RcvWndConflict", MPTCP_MIB_RCVWNDCONFLICT),
 	SNMP_MIB_SENTINEL
 };
 
diff --git a/net/mptcp/mib.h b/net/mptcp/mib.h
index 2411510bef66..2be3596374f4 100644
--- a/net/mptcp/mib.h
+++ b/net/mptcp/mib.h
@@ -50,6 +50,11 @@ enum linux_mptcp_mib_field {
 	MPTCP_MIB_SUBFLOWSTALE,		/* Subflows entered 'stale' status */
 	MPTCP_MIB_SUBFLOWRECOVER,	/* Subflows returned to active status after being stale */
 	MPTCP_MIB_SNDWNDSHARED,		/* Subflow snd wnd is overridden by msk's one */
+	MPTCP_MIB_RCVWNDSHARED,		/* Subflow rcv wnd is overridden by msk's one */
+	MPTCP_MIB_RCVWNDCONFLICTUPDATE,	/* subflow rcv wnd is overridden by msk's one due to
+					 * conflict with another subflow while updating msk rcv wnd
+					 */
+	MPTCP_MIB_RCVWNDCONFLICT,	/* Conflict with while updating msk rcv wnd */
 	__MPTCP_MIB_MAX
 };
 
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 9c76a171af1e..559ff6ca6e66 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -1248,8 +1248,11 @@ static void mptcp_set_rwin(struct tcp_sock *tp, struct tcphdr *th)
 
 			if (rcv_wnd == rcv_wnd_old)
 				break;
-			if (before64(rcv_wnd_new, rcv_wnd))
+			if (before64(rcv_wnd_new, rcv_wnd)) {
+				MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_RCVWNDCONFLICTUPDATE);
 				goto raise_win;
+			}
+			MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_RCVWNDCONFLICT);
 			rcv_wnd_old = rcv_wnd;
 		};
 		return;
@@ -1275,6 +1278,7 @@ static void mptcp_set_rwin(struct tcp_sock *tp, struct tcphdr *th)
 		/* RFC1323 scaling applied */
 		new_win >>= tp->rx_opt.rcv_wscale;
 		th->window = htons(new_win);
+		MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_RCVWNDSHARED);
 	}
 }
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: mptcp: add more offered MIBs counter.: Build Failure
  2022-04-15 14:48 ` [PATCH net-next 5/5] mptcp: add more offered MIBs counter Paolo Abeni
@ 2022-04-15 15:10   ` MPTCP CI
  2022-04-15 16:39     ` Matthieu Baerts
  2022-04-15 16:54   ` mptcp: add more offered MIBs counter.: Tests Results MPTCP CI
  1 sibling, 1 reply; 17+ messages in thread
From: MPTCP CI @ 2022-04-15 15:10 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

Hi Paolo,

Thank you for your modifications, that's great!

But sadly, our CI spotted some issues with it when trying to build it.

You can find more details there:

  https://patchwork.kernel.org/project/mptcp/patch/43cbe1db36c118468bc6e88ced16079d05f30643.1650034062.git.pabeni@redhat.com/
  https://github.com/multipath-tcp/mptcp_net-next/actions/runs/2173091628

Status: failure
Initiator: MPTCPimporter
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/40eef4f35b96

Feel free to reply to this email if you cannot access logs, if you need
some support to fix the error, if this doesn't seem to be caused by your
modifications or if the error is a false positive one.

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (Tessares)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: mptcp: add more offered MIBs counter.: Build Failure
  2022-04-15 15:10   ` mptcp: add more offered MIBs counter.: Build Failure MPTCP CI
@ 2022-04-15 16:39     ` Matthieu Baerts
  0 siblings, 0 replies; 17+ messages in thread
From: Matthieu Baerts @ 2022-04-15 16:39 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

Hi Paolo,

On 15/04/2022 17:10, MPTCP CI wrote:
> Hi Paolo,
> 
> Thank you for your modifications, that's great!
> 
> But sadly, our CI spotted some issues with it when trying to build it.
> 
> You can find more details there:
> 
>   https://patchwork.kernel.org/project/mptcp/patch/43cbe1db36c118468bc6e88ced16079d05f30643.1650034062.git.pabeni@redhat.com/
>   https://github.com/multipath-tcp/mptcp_net-next/actions/runs/2173091628

Please ignore this, that was introduced by a patch I recently applied in
our tree. The issue has just been fixed:

https://patchwork.kernel.org/project/mptcp/patch/20220415162801.1675860-1-matthieu.baerts@tessares.net/

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: mptcp: add more offered MIBs counter.: Tests Results
  2022-04-15 14:48 ` [PATCH net-next 5/5] mptcp: add more offered MIBs counter Paolo Abeni
  2022-04-15 15:10   ` mptcp: add more offered MIBs counter.: Build Failure MPTCP CI
@ 2022-04-15 16:54   ` MPTCP CI
  1 sibling, 0 replies; 17+ messages in thread
From: MPTCP CI @ 2022-04-15 16:54 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

Hi Paolo,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal:
  - Success! ✅:
  - Task: https://cirrus-ci.com/task/4509118336598016
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/4509118336598016/summary/summary.txt

- KVM Validation: debug:
  - Unstable: 2 failed test(s): selftest_diag selftest_mptcp_join - Critical: Global Timeout ❌:
  - Task: https://cirrus-ci.com/task/5635018243440640
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/5635018243440640/summary/summary.txt

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/40eef4f35b96


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-debug

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (Tessares)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 3/5] tcp: allow MPTCP to update the announced window.
  2022-04-15 14:48 ` [PATCH net-next 3/5] tcp: allow MPTCP to update the announced window Paolo Abeni
@ 2022-04-19  0:11   ` Mat Martineau
  2022-04-19 10:42     ` Paolo Abeni
  0 siblings, 1 reply; 17+ messages in thread
From: Mat Martineau @ 2022-04-19  0:11 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

On Fri, 15 Apr 2022, Paolo Abeni wrote:

> The MPTCP RFC requires that the MPTCP-level receive window's
> right edge never moves backward. Currently the MPTCP code
> enforces such constraint while tracking the right edge, but it
> does not reflects it back on the wire, as lacks a suitable hook
> to update accordingly the TCP header.
>
> This change modifiy the existing mptcp_write_options() hook,
> providing the current packet's TCP header up to the MPTCP protocol
> level, so that the next patch could implement the above mentioned
> constraint.
>
> No functional changes intended.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> include/net/mptcp.h   |  2 +-
> net/ipv4/tcp_output.c | 13 +++++++------
> net/mptcp/options.c   |  2 +-
> 3 files changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/include/net/mptcp.h b/include/net/mptcp.h
> index 877077b53200..6b07011c060d 100644
> --- a/include/net/mptcp.h
> +++ b/include/net/mptcp.h
> @@ -125,7 +125,7 @@ bool mptcp_established_options(struct sock *sk, struct sk_buff *skb,
> 			       struct mptcp_out_options *opts);
> bool mptcp_incoming_options(struct sock *sk, struct sk_buff *skb);
>
> -void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp,
> +void mptcp_write_options(struct tcphdr *th, __be32 *ptr, struct tcp_sock *tp,
> 			 struct mptcp_out_options *opts);
>
> void mptcp_diag_fill_info(struct mptcp_sock *msk, struct mptcp_info *info);
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index c221f3bce975..27deec41a1f4 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -444,12 +444,13 @@ struct tcp_out_options {
> 	struct mptcp_out_options mptcp;
> };
>
> -static void mptcp_options_write(__be32 *ptr, const struct tcp_sock *tp,
> +static void mptcp_options_write(struct tcphdr *th, __be32 *ptr,
> +				struct tcp_sock *tp,
> 				struct tcp_out_options *opts)
> {
> #if IS_ENABLED(CONFIG_MPTCP)
> 	if (unlikely(OPTION_MPTCP & opts->options))
> -		mptcp_write_options(ptr, tp, &opts->mptcp);
> +		mptcp_write_options(th, ptr, tp, &opts->mptcp);
> #endif
> }
>
> @@ -605,7 +606,7 @@ static void bpf_skops_write_hdr_opt(struct sock *sk, struct sk_buff *skb,
>  * At least SACK_PERM as the first option is known to lead to a disaster
>  * (but it may well be that other scenarios fail similarly).
>  */
> -static void tcp_options_write(__be32 *ptr, struct tcp_sock *tp,
> +static void tcp_options_write(struct tcphdr *th, __be32 *ptr, struct tcp_sock *tp,
> 			      struct tcp_out_options *opts)

Having both th and ptr seems redundant to my eyes. I'd rather just have a 
th parameter and move the "ptr = (__be32 *)(th + 1);" code inside 
tcp_options_write().

If you're thinking/hoping that Eric finds the additional parameter more 
acceptible, we can go with this patch as-is.

- Mat

> {
> 	u16 options = opts->options;	/* mungable copy */
> @@ -701,7 +702,7 @@ static void tcp_options_write(__be32 *ptr, struct tcp_sock *tp,
>
> 	smc_options_write(ptr, &options);
>
> -	mptcp_options_write(ptr, tp, opts);
> +	mptcp_options_write(th, ptr, tp, opts);
> }
>
> static void smc_set_option(const struct tcp_sock *tp,
> @@ -1354,7 +1355,7 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb,
> 		th->window	= htons(min(tp->rcv_wnd, 65535U));
> 	}
>
> -	tcp_options_write((__be32 *)(th + 1), tp, &opts);
> +	tcp_options_write(th, (__be32 *)(th + 1), tp, &opts);
>
> #ifdef CONFIG_TCP_MD5SIG
> 	/* Calculate the MD5 hash, as we have all we need now */
> @@ -3590,7 +3591,7 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
>
> 	/* RFC1323: The window in SYN & SYN/ACK segments is never scaled. */
> 	th->window = htons(min(req->rsk_rcv_wnd, 65535U));
> -	tcp_options_write((__be32 *)(th + 1), NULL, &opts);
> +	tcp_options_write(th, (__be32 *)(th + 1), NULL, &opts);
> 	th->doff = (tcp_header_size >> 2);
> 	__TCP_INC_STATS(sock_net(sk), TCP_MIB_OUTSEGS);
>
> diff --git a/net/mptcp/options.c b/net/mptcp/options.c
> index e05d9458a025..2570911735ab 100644
> --- a/net/mptcp/options.c
> +++ b/net/mptcp/options.c
> @@ -1265,7 +1265,7 @@ static u16 mptcp_make_csum(const struct mptcp_ext *mpext)
> 				 ~csum_unfold(mpext->csum));
> }
>
> -void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp,
> +void mptcp_write_options(struct tcphdr *th, __be32 *ptr, struct tcp_sock *tp,
> 			 struct mptcp_out_options *opts)
> {
> 	const struct sock *ssk = (const struct sock *)tp;
> -- 
> 2.35.1
>
>
>

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 4/5] mptcp: never shrink offered window
  2022-04-15 14:48 ` [PATCH net-next 4/5] mptcp: never shrink offered window Paolo Abeni
@ 2022-04-19  0:29   ` Mat Martineau
  2022-04-19 10:40     ` Paolo Abeni
  0 siblings, 1 reply; 17+ messages in thread
From: Mat Martineau @ 2022-04-19  0:29 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

On Fri, 15 Apr 2022, Paolo Abeni wrote:

> As per RFC, the offered MPTCP-level window should never shrink.
> While we currently track the right edge, we don't enforce the
> above constraint on the wire.
> Additionally, concurrent xmit on different subflows can end-up in
> erroneous right edge update.
> Address the above explicitly updating the announced window and
> protecting the update with an additional atomic operation (sic)

By "(sic)", are you saying the new atomic operation is consistent with the 
use of an atomic write in the previously written code?

I'm accustomed to "sic" meaning "there's a misspelling in the previous few 
words that have been preserved from some copied text" when I encounter it 
in English text, so I figure the Latin word may have slightly different 
meaning here!


But, aside from the commit message, the code seems to build & test out 
fine - so just the one code question in patch 3.


- Mat

>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> RFC -> v1:
> - rebased on previous patch
> ---
> net/mptcp/options.c | 52 +++++++++++++++++++++++++++++++++++++++------
> 1 file changed, 45 insertions(+), 7 deletions(-)
>
> diff --git a/net/mptcp/options.c b/net/mptcp/options.c
> index 2570911735ab..9c76a171af1e 100644
> --- a/net/mptcp/options.c
> +++ b/net/mptcp/options.c
> @@ -1224,20 +1224,58 @@ bool mptcp_incoming_options(struct sock *sk, struct sk_buff *skb)
> 	return true;
> }
>
> -static void mptcp_set_rwin(const struct tcp_sock *tp)
> +static void mptcp_set_rwin(struct tcp_sock *tp, struct tcphdr *th)
> {
> 	const struct sock *ssk = (const struct sock *)tp;
> -	const struct mptcp_subflow_context *subflow;
> +	struct mptcp_subflow_context *subflow;
> +	u64 ack_seq, rcv_wnd_old, rcv_wnd_new;
> 	struct mptcp_sock *msk;
> -	u64 ack_seq;
> +	u32 new_win;
> +	u64 win;
>
> 	subflow = mptcp_subflow_ctx(ssk);
> 	msk = mptcp_sk(subflow->conn);
>
> -	ack_seq = READ_ONCE(msk->ack_seq) + tp->rcv_wnd;
> +	ack_seq = READ_ONCE(msk->ack_seq);
> +	rcv_wnd_new = ack_seq + tp->rcv_wnd;
> +
> +	rcv_wnd_old = READ_ONCE(msk->rcv_wnd_sent);
> +	if (after64(rcv_wnd_new, rcv_wnd_old)) {
> +		u64 rcv_wnd;
> +
> +		for (;;) {
> +			rcv_wnd = cmpxchg64(&msk->rcv_wnd_sent, rcv_wnd_old, rcv_wnd_new);
> +
> +			if (rcv_wnd == rcv_wnd_old)
> +				break;
> +			if (before64(rcv_wnd_new, rcv_wnd))
> +				goto raise_win;
> +			rcv_wnd_old = rcv_wnd;
> +		};
> +		return;
> +	}
> +
> +	if (rcv_wnd_new != rcv_wnd_old) {
> +raise_win:
> +		win = rcv_wnd_old - ack_seq;
> +		tp->rcv_wnd = min_t(u64, win, U32_MAX);
> +		new_win = tp->rcv_wnd;
>
> -	if (after64(ack_seq, READ_ONCE(msk->rcv_wnd_sent)))
> -		WRITE_ONCE(msk->rcv_wnd_sent, ack_seq);
> +		/* Make sure we do not exceed the maximum possible
> +		 * scaled window.
> +		 */
> +		if (unlikely(th->syn))
> +			new_win = min(new_win, 65535U) << tp->rx_opt.rcv_wscale;
> +		if (!tp->rx_opt.rcv_wscale &&
> +		    sock_net(ssk)->ipv4.sysctl_tcp_workaround_signed_windows)
> +			new_win = min(new_win, MAX_TCP_WINDOW);
> +		else
> +			new_win = min(new_win, (65535U << tp->rx_opt.rcv_wscale));
> +
> +		/* RFC1323 scaling applied */
> +		new_win >>= tp->rx_opt.rcv_wscale;
> +		th->window = htons(new_win);
> +	}
> }
>
> u16 __mptcp_make_csum(u64 data_seq, u32 subflow_seq, u16 data_len, __wsum sum)
> @@ -1554,7 +1592,7 @@ void mptcp_write_options(struct tcphdr *th, __be32 *ptr, struct tcp_sock *tp,
> 	}
>
> 	if (tp)
> -		mptcp_set_rwin(tp);
> +		mptcp_set_rwin(tp, th);
> }
>
> __be32 mptcp_get_reset_option(const struct sk_buff *skb)
> -- 
> 2.35.1
>
>
>

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 4/5] mptcp: never shrink offered window
  2022-04-19  0:29   ` Mat Martineau
@ 2022-04-19 10:40     ` Paolo Abeni
  0 siblings, 0 replies; 17+ messages in thread
From: Paolo Abeni @ 2022-04-19 10:40 UTC (permalink / raw)
  To: Mat Martineau; +Cc: mptcp

On Mon, 2022-04-18 at 17:29 -0700, Mat Martineau wrote:
> On Fri, 15 Apr 2022, Paolo Abeni wrote:
> 
> > As per RFC, the offered MPTCP-level window should never shrink.
> > While we currently track the right edge, we don't enforce the
> > above constraint on the wire.
> > Additionally, concurrent xmit on different subflows can end-up in
> > erroneous right edge update.
> > Address the above explicitly updating the announced window and
> > protecting the update with an additional atomic operation (sic)
> 
> By "(sic)", are you saying the new atomic operation is consistent with the 
> use of an atomic write in the previously written code?
> 
> I'm accustomed to "sic" meaning "there's a misspelling in the previous few 
> words that have been preserved from some copied text" when I encounter it 
> in English text, so I figure the Latin word may have slightly different 
> meaning here!

TL;DR: I guess we can drop '(sic)' from the commit message.

"sic" is the short form of "Sic et simpliciter", literally meaning
"it's exactly like that", often used as you noted to declare that an
apparent typo is actually intended to be there.

Here the above usage is extented to generally remark we do not like
whatever stated proviously, even if it's exactly like what we just
wrote: "Unfortunally it's like that".

The latter usage is somewhat e neologism due to the assonance with a
comics book onomatopoeia ("sigh")


:)

/P


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 3/5] tcp: allow MPTCP to update the announced window.
  2022-04-19  0:11   ` Mat Martineau
@ 2022-04-19 10:42     ` Paolo Abeni
  2022-04-20 16:18       ` Paolo Abeni
  0 siblings, 1 reply; 17+ messages in thread
From: Paolo Abeni @ 2022-04-19 10:42 UTC (permalink / raw)
  To: Mat Martineau; +Cc: mptcp

On Mon, 2022-04-18 at 17:11 -0700, Mat Martineau wrote:
> On Fri, 15 Apr 2022, Paolo Abeni wrote:
> 
> > The MPTCP RFC requires that the MPTCP-level receive window's
> > right edge never moves backward. Currently the MPTCP code
> > enforces such constraint while tracking the right edge, but it
> > does not reflects it back on the wire, as lacks a suitable hook
> > to update accordingly the TCP header.
> > 
> > This change modifiy the existing mptcp_write_options() hook,
> > providing the current packet's TCP header up to the MPTCP protocol
> > level, so that the next patch could implement the above mentioned
> > constraint.
> > 
> > No functional changes intended.
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> > include/net/mptcp.h   |  2 +-
> > net/ipv4/tcp_output.c | 13 +++++++------
> > net/mptcp/options.c   |  2 +-
> > 3 files changed, 9 insertions(+), 8 deletions(-)
> > 
> > diff --git a/include/net/mptcp.h b/include/net/mptcp.h
> > index 877077b53200..6b07011c060d 100644
> > --- a/include/net/mptcp.h
> > +++ b/include/net/mptcp.h
> > @@ -125,7 +125,7 @@ bool mptcp_established_options(struct sock *sk, struct sk_buff *skb,
> > 			       struct mptcp_out_options *opts);
> > bool mptcp_incoming_options(struct sock *sk, struct sk_buff *skb);
> > 
> > -void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp,
> > +void mptcp_write_options(struct tcphdr *th, __be32 *ptr, struct tcp_sock *tp,
> > 			 struct mptcp_out_options *opts);
> > 
> > void mptcp_diag_fill_info(struct mptcp_sock *msk, struct mptcp_info *info);
> > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > index c221f3bce975..27deec41a1f4 100644
> > --- a/net/ipv4/tcp_output.c
> > +++ b/net/ipv4/tcp_output.c
> > @@ -444,12 +444,13 @@ struct tcp_out_options {
> > 	struct mptcp_out_options mptcp;
> > };
> > 
> > -static void mptcp_options_write(__be32 *ptr, const struct tcp_sock *tp,
> > +static void mptcp_options_write(struct tcphdr *th, __be32 *ptr,
> > +				struct tcp_sock *tp,
> > 				struct tcp_out_options *opts)
> > {
> > #if IS_ENABLED(CONFIG_MPTCP)
> > 	if (unlikely(OPTION_MPTCP & opts->options))
> > -		mptcp_write_options(ptr, tp, &opts->mptcp);
> > +		mptcp_write_options(th, ptr, tp, &opts->mptcp);
> > #endif
> > }
> > 
> > @@ -605,7 +606,7 @@ static void bpf_skops_write_hdr_opt(struct sock *sk, struct sk_buff *skb,
> >  * At least SACK_PERM as the first option is known to lead to a disaster
> >  * (but it may well be that other scenarios fail similarly).
> >  */
> > -static void tcp_options_write(__be32 *ptr, struct tcp_sock *tp,
> > +static void tcp_options_write(struct tcphdr *th, __be32 *ptr, struct tcp_sock *tp,
> > 			      struct tcp_out_options *opts)
> 
> Having both th and ptr seems redundant to my eyes. I'd rather just have a 
> th parameter and move the "ptr = (__be32 *)(th + 1);" code inside 
> tcp_options_write().
> 
> If you're thinking/hoping that Eric finds the additional parameter more 
> acceptible, we can go with this patch as-is.

Yep, the main concerns here is obtaining the buy-in from Eric. I hoped
that less differences would help, but likely the code you suggested is
better. I'll send an RFC on netdev with this and the next patch to get
some direct feedback from him.

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 3/5] tcp: allow MPTCP to update the announced window.
  2022-04-19 10:42     ` Paolo Abeni
@ 2022-04-20 16:18       ` Paolo Abeni
  0 siblings, 0 replies; 17+ messages in thread
From: Paolo Abeni @ 2022-04-20 16:18 UTC (permalink / raw)
  To: Mat Martineau; +Cc: mptcp

On Tue, 2022-04-19 at 12:42 +0200, Paolo Abeni wrote:
> On Mon, 2022-04-18 at 17:11 -0700, Mat Martineau wrote:
> > On Fri, 15 Apr 2022, Paolo Abeni wrote:
> > > @@ -605,7 +606,7 @@ static void bpf_skops_write_hdr_opt(struct sock *sk, struct sk_buff *skb,
> > >  * At least SACK_PERM as the first option is known to lead to a disaster
> > >  * (but it may well be that other scenarios fail similarly).
> > >  */
> > > -static void tcp_options_write(__be32 *ptr, struct tcp_sock *tp,
> > > +static void tcp_options_write(struct tcphdr *th, __be32 *ptr, struct tcp_sock *tp,
> > > 			      struct tcp_out_options *opts)
> > 
> > Having both th and ptr seems redundant to my eyes. I'd rather just have a 
> > th parameter and move the "ptr = (__be32 *)(th + 1);" code inside 
> > tcp_options_write().
> > 
> > If you're thinking/hoping that Eric finds the additional parameter more 
> > acceptible, we can go with this patch as-is.
> 
> Yep, the main concerns here is obtaining the buy-in from Eric. I hoped
> that less differences would help, but likely the code you suggested is
> better. I'll send an RFC on netdev with this and the next patch to get
> some direct feedback from him.

Sharing the patches on netdev at least pointed out a build issue on
some 32bit arches lacking cmpxchg64. I'll move to atomic64_cmpxchg(),
which instead has a generic implementation.

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 0/5] mptcp: Improve MPTCP-level window tracking
  2022-05-04 21:54 [PATCH net-next 0/5] mptcp: Improve MPTCP-level window tracking Mat Martineau
  2022-05-04 22:05 ` Mat Martineau
@ 2022-05-06  2:20 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 17+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-05-06  2:20 UTC (permalink / raw)
  To: Mat Martineau
  Cc: netdev, davem, kuba, pabeni, edumazet, matthieu.baerts, mptcp

Hello:

This series was applied to netdev/net-next.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Wed,  4 May 2022 14:54:03 -0700 you wrote:
> This series improves MPTCP receive window compliance with RFC 8684 and
> helps increase throughput on high-speed links. Note that patch 3 makes a
> change in tcp_output.c
> 
> For the details, Paolo says:
> 
> I've been chasing bad/unstable performance with multiple subflows
> on very high speed links.
> 
> [...]

Here is the summary with links:
  - [net-next,1/5] mptcp: really share subflow snd_wnd
    https://git.kernel.org/netdev/net-next/c/b713d0067574
  - [net-next,2/5] mptcp: add mib for xmit window sharing
    https://git.kernel.org/netdev/net-next/c/92be2f522777
  - [net-next,3/5] tcp: allow MPTCP to update the announced window
    https://git.kernel.org/netdev/net-next/c/ea66758c1795
  - [net-next,4/5] mptcp: never shrink offered window
    https://git.kernel.org/netdev/net-next/c/f3589be0c420
  - [net-next,5/5] mptcp: add more offered MIBs counter
    https://git.kernel.org/netdev/net-next/c/38acb6260f60

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 0/5] mptcp: Improve MPTCP-level window tracking
  2022-05-04 21:54 [PATCH net-next 0/5] mptcp: Improve MPTCP-level window tracking Mat Martineau
@ 2022-05-04 22:05 ` Mat Martineau
  2022-05-06  2:20 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 17+ messages in thread
From: Mat Martineau @ 2022-05-04 22:05 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuba, pabeni, edumazet, matthieu.baerts, mptcp


On Wed, 4 May 2022, Mat Martineau wrote:

> This series improves MPTCP receive window compliance with RFC 8684 and
> helps increase throughput on high-speed links. Note that patch 3 makes a
> change in tcp_output.c
>
> For the details, Paolo says:
>
> I've been chasing bad/unstable performance with multiple subflows
> on very high speed links.
>
> It looks like the root cause is due to the current mptcp-level
> congestion window handling. There are apparently a few different
> sub-issues:
>
> - the rcv_wnd is not effectively shared on the tx side, as each
>  subflow takes in account only the value received by the underlaying
>  TCP connection. This is addressed in patch 1/4

I missed a couple of small edits - this should be "patch 1/5"...

>
> - The mptcp-level offered wnd right edge is currently allowed to shrink.
>  Reading section 3.3.4.:
>
> """
>   The receive window is relative to the DATA_ACK.  As in TCP, a
>   receiver MUST NOT shrink the right edge of the receive window (i.e.,
>   DATA_ACK + receive window).  The receiver will use the data sequence
>   number to tell if a packet should be accepted at the connection
>   level.
> """
>
> I read the above as we need to reflect window right-edge tracking
> on the wire, see patch 4/5.
>
> - The offered window right edge tracking can happen concurrently on
>  multiple subflows, but there is no mutex protection. We need an
>  additional atomic operation - still patch 3/4

...and this "patch 4/5".

- Mat

>
> This series additionally bumps a few new MIBs to track all the above
> (ensure/observe that the suspected races actually take place).
>
> I could not access again the host where the issue was so
> noticeable, still in the current setup the tput changes from
> [6-18] Gbps to 19Gbps very stable.
>
>
> Paolo Abeni (5):
>  mptcp: really share subflow snd_wnd
>  mptcp: add mib for xmit window sharing
>  tcp: allow MPTCP to update the announced window
>  mptcp: never shrink offered window
>  mptcp: add more offered MIBs counter
>
> include/net/mptcp.h   |  2 +-
> net/ipv4/tcp_output.c | 14 ++++++-----
> net/mptcp/mib.c       |  4 +++
> net/mptcp/mib.h       |  6 +++++
> net/mptcp/options.c   | 58 +++++++++++++++++++++++++++++++++++++------
> net/mptcp/protocol.c  | 32 +++++++++++++++---------
> net/mptcp/protocol.h  |  2 +-
> 7 files changed, 90 insertions(+), 28 deletions(-)
>
>
> base-commit: a37f37a2e7f5ea3ae2a1278f552aa21a8e32c221
> -- 
> 2.36.0
>
>

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH net-next 0/5] mptcp: Improve MPTCP-level window tracking
@ 2022-05-04 21:54 Mat Martineau
  2022-05-04 22:05 ` Mat Martineau
  2022-05-06  2:20 ` patchwork-bot+netdevbpf
  0 siblings, 2 replies; 17+ messages in thread
From: Mat Martineau @ 2022-05-04 21:54 UTC (permalink / raw)
  To: netdev
  Cc: Mat Martineau, davem, kuba, pabeni, edumazet, matthieu.baerts, mptcp

This series improves MPTCP receive window compliance with RFC 8684 and
helps increase throughput on high-speed links. Note that patch 3 makes a
change in tcp_output.c

For the details, Paolo says:

I've been chasing bad/unstable performance with multiple subflows
on very high speed links.

It looks like the root cause is due to the current mptcp-level
congestion window handling. There are apparently a few different
sub-issues:

- the rcv_wnd is not effectively shared on the tx side, as each
  subflow takes in account only the value received by the underlaying
  TCP connection. This is addressed in patch 1/4

- The mptcp-level offered wnd right edge is currently allowed to shrink.
  Reading section 3.3.4.:

"""
   The receive window is relative to the DATA_ACK.  As in TCP, a
   receiver MUST NOT shrink the right edge of the receive window (i.e.,
   DATA_ACK + receive window).  The receiver will use the data sequence
   number to tell if a packet should be accepted at the connection
   level.
"""

I read the above as we need to reflect window right-edge tracking
on the wire, see patch 4/5.

- The offered window right edge tracking can happen concurrently on
  multiple subflows, but there is no mutex protection. We need an
  additional atomic operation - still patch 3/4

This series additionally bumps a few new MIBs to track all the above
(ensure/observe that the suspected races actually take place).

I could not access again the host where the issue was so
noticeable, still in the current setup the tput changes from
[6-18] Gbps to 19Gbps very stable.


Paolo Abeni (5):
  mptcp: really share subflow snd_wnd
  mptcp: add mib for xmit window sharing
  tcp: allow MPTCP to update the announced window
  mptcp: never shrink offered window
  mptcp: add more offered MIBs counter

 include/net/mptcp.h   |  2 +-
 net/ipv4/tcp_output.c | 14 ++++++-----
 net/mptcp/mib.c       |  4 +++
 net/mptcp/mib.h       |  6 +++++
 net/mptcp/options.c   | 58 +++++++++++++++++++++++++++++++++++++------
 net/mptcp/protocol.c  | 32 +++++++++++++++---------
 net/mptcp/protocol.h  |  2 +-
 7 files changed, 90 insertions(+), 28 deletions(-)


base-commit: a37f37a2e7f5ea3ae2a1278f552aa21a8e32c221
-- 
2.36.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-05-06  2:20 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-15 14:48 [PATCH net-next 0/5] mptcp: improve mptcp-level window tracking Paolo Abeni
2022-04-15 14:48 ` [PATCH net-next 1/5] mptcp: really share subflow snd_wnd Paolo Abeni
2022-04-15 14:48 ` [PATCH net-next 2/5] mptcp: add mib for xmit window sharing Paolo Abeni
2022-04-15 14:48 ` [PATCH net-next 3/5] tcp: allow MPTCP to update the announced window Paolo Abeni
2022-04-19  0:11   ` Mat Martineau
2022-04-19 10:42     ` Paolo Abeni
2022-04-20 16:18       ` Paolo Abeni
2022-04-15 14:48 ` [PATCH net-next 4/5] mptcp: never shrink offered window Paolo Abeni
2022-04-19  0:29   ` Mat Martineau
2022-04-19 10:40     ` Paolo Abeni
2022-04-15 14:48 ` [PATCH net-next 5/5] mptcp: add more offered MIBs counter Paolo Abeni
2022-04-15 15:10   ` mptcp: add more offered MIBs counter.: Build Failure MPTCP CI
2022-04-15 16:39     ` Matthieu Baerts
2022-04-15 16:54   ` mptcp: add more offered MIBs counter.: Tests Results MPTCP CI
2022-05-04 21:54 [PATCH net-next 0/5] mptcp: Improve MPTCP-level window tracking Mat Martineau
2022-05-04 22:05 ` Mat Martineau
2022-05-06  2:20 ` patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.