MPTCP Linux Development
 help / color / Atom feed
* [PATCH v2 mptcp-next 0/8] mptcp: refactor active backup
@ 2021-07-13 21:13 Paolo Abeni
  2021-07-13 21:13 ` [PATCH v2 mptcp-next 1/8] mptcp: more accurate timeout Paolo Abeni
                   ` (9 more replies)
  0 siblings, 10 replies; 14+ messages in thread
From: Paolo Abeni @ 2021-07-13 21:13 UTC (permalink / raw)
  To: mptcp

This series addresses a bunch of issues somewhat related to active
backup handling. The most visible functial issue addressed here is:

https://github.com/multipath-tcp/mptcp_net-next/issues/191

This series also add some specific self-tests, to cover both
active-backup switch-over and proper usage of backup link.

A new netns parameter is introduced:

stale_loss_cnt - the max amount of mptcp rtx timeouts with no progresses
        and outstanding data over a single subflow needed to declare
        such subflow 'stale': no more data will be queue until some ack
        is observed.

Changes from v1:
 - stale_loss_cnt is now configurable via sysctl
 - addressed buglet noted by Mat on v1
 - fixed timeout update issues in patches 1-2
 - fixed splat in __mptcp_clean_una() on reinection
   - the new self-tests are now less unstable

Paolo Abeni (8):
  mptcp: more accurate timeout
  mptcp: less aggressive retransmission stragegy
  mptcp: handle pending data on closed subflow
  mptcp: cleanup sysctl data and helpers
  mptcp: faster active backup recovery
  mptcp: add mibs for stale subflows processing
  mptcp: backup flag from incoming MPJ ack option
  selftests: mptcp: add testcase for active-back

 Documentation/networking/mptcp-sysctl.rst     |  12 +
 net/mptcp/ctrl.c                              |  26 ++-
 net/mptcp/mib.c                               |   2 +
 net/mptcp/mib.h                               |   2 +
 net/mptcp/options.c                           |   8 +-
 net/mptcp/pm.c                                |  21 ++
 net/mptcp/pm_netlink.c                        |  37 +++
 net/mptcp/protocol.c                          | 187 +++++++++++----
 net/mptcp/protocol.h                          |  31 ++-
 net/mptcp/subflow.c                           |   6 +-
 .../testing/selftests/net/mptcp/mptcp_join.sh | 215 +++++++++++++++---
 11 files changed, 459 insertions(+), 88 deletions(-)

-- 
2.26.3


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 mptcp-next 1/8] mptcp: more accurate timeout
  2021-07-13 21:13 [PATCH v2 mptcp-next 0/8] mptcp: refactor active backup Paolo Abeni
@ 2021-07-13 21:13 ` Paolo Abeni
  2021-07-13 21:13 ` [PATCH v2 mptcp-next 2/8] mptcp: less aggressive retransmission stragegy Paolo Abeni
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Paolo Abeni @ 2021-07-13 21:13 UTC (permalink / raw)
  To: mptcp

As reported by Maxim, we have a lot of MPTCP-level
retransmissions when multilple links with different latencies
are in use.

This patch refactor the mptcp-level timeout accounting so that
the maximum of all the active subflow timeout is used. To avoid
traversing the subflow list multiple times, the update is
perfomed inside the packet scheduler.

Additionally clean-up a bit timeout handling.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v1 -> v2:
 - be sure to always set timer in get_send()
 - mptcp_timeout_from_ssk -> mptcp_timeout_from_subflow, will simplfy
   the next patch
 - always rearm the timeout in __mptcp_*push_pending(), to catch
   link failure even when no data are send
---
 net/mptcp/protocol.c | 60 +++++++++++++++++++++++++++-----------------
 1 file changed, 37 insertions(+), 23 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 7ae80c1d7f79..19d734825928 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -411,16 +411,28 @@ static void mptcp_set_datafin_timeout(const struct sock *sk)
 				       TCP_RTO_MIN << icsk->icsk_retransmits);
 }
 
-static void mptcp_set_timeout(const struct sock *sk, const struct sock *ssk)
+static void __mptcp_set_timeout(struct sock *sk, long tout)
 {
-	long tout = ssk && inet_csk(ssk)->icsk_pending ?
-				      inet_csk(ssk)->icsk_timeout - jiffies : 0;
-
-	if (tout <= 0)
-		tout = mptcp_sk(sk)->timer_ival;
 	mptcp_sk(sk)->timer_ival = tout > 0 ? tout : TCP_RTO_MIN;
 }
 
+static long mptcp_timeout_from_subflow(const struct mptcp_subflow_context *subflow)
+{
+	const struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
+
+	return inet_csk(ssk)->icsk_pending ? inet_csk(ssk)->icsk_timeout - jiffies : 0;
+}
+
+static void mptcp_set_timeout(struct sock *sk)
+{
+	struct mptcp_subflow_context *subflow;
+	long tout = 0;
+
+	mptcp_for_each_subflow(mptcp_sk(sk), subflow)
+		tout = max(tout, mptcp_timeout_from_subflow(subflow));
+	__mptcp_set_timeout(sk, tout);
+}
+
 static bool tcp_can_send_ack(const struct sock *ssk)
 {
 	return !((1 << inet_sk_state_load(ssk)) &
@@ -531,7 +543,6 @@ static bool mptcp_check_data_fin(struct sock *sk)
 		}
 
 		ret = true;
-		mptcp_set_timeout(sk, NULL);
 		mptcp_send_ack(msk);
 		mptcp_close_wake_up(sk);
 	}
@@ -791,10 +802,7 @@ static void mptcp_reset_timer(struct sock *sk)
 	if (unlikely(inet_sk_state_load(sk) == TCP_CLOSE))
 		return;
 
-	/* should never be called with mptcp level timer cleared */
-	tout = READ_ONCE(mptcp_sk(sk)->timer_ival);
-	if (WARN_ON_ONCE(!tout))
-		tout = TCP_RTO_MIN;
+	tout = mptcp_sk(sk)->timer_ival;
 	sk_reset_timer(sk, &icsk->icsk_retransmit_timer, jiffies + tout);
 }
 
@@ -1077,7 +1085,7 @@ static void __mptcp_clean_una(struct sock *sk)
 	}
 
 	if (snd_una == READ_ONCE(msk->snd_nxt)) {
-		if (msk->timer_ival && !mptcp_data_fin_enabled(msk))
+		if (mptcp_timer_pending(sk) && !mptcp_data_fin_enabled(msk))
 			mptcp_stop_timer(sk);
 	} else {
 		mptcp_reset_timer(sk);
@@ -1366,16 +1374,22 @@ struct subflow_send_info {
 	u64 ratio;
 };
 
+/* implement the mptcp packet scheduler;
+ * returns the subflow that will transmit the next DSS
+ * additionally updates the rtx timeout
+ */
 static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk)
 {
 	struct subflow_send_info send_info[2];
 	struct mptcp_subflow_context *subflow;
+	struct sock *sk = (struct sock *)msk;
 	int i, nr_active = 0;
 	struct sock *ssk;
+	long tout = 0;
 	u64 ratio;
 	u32 pace;
 
-	sock_owned_by_me((struct sock *)msk);
+	sock_owned_by_me(sk);
 
 	if (__mptcp_check_fallback(msk)) {
 		if (!msk->first)
@@ -1386,8 +1400,10 @@ static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk)
 	/* re-use last subflow, if the burst allow that */
 	if (msk->last_snd && msk->snd_burst > 0 &&
 	    sk_stream_memory_free(msk->last_snd) &&
-	    mptcp_subflow_active(mptcp_subflow_ctx(msk->last_snd)))
+	    mptcp_subflow_active(mptcp_subflow_ctx(msk->last_snd))) {
+		mptcp_set_timeout(sk);
 		return msk->last_snd;
+	}
 
 	/* pick the subflow with the lower wmem/wspace ratio */
 	for (i = 0; i < 2; ++i) {
@@ -1400,6 +1416,7 @@ static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk)
 		if (!mptcp_subflow_active(subflow))
 			continue;
 
+		tout = max(tout, mptcp_timeout_from_subflow(subflow));
 		nr_active += !subflow->backup;
 		if (!sk_stream_memory_free(subflow->tcp_sock) || !tcp_sk(ssk)->snd_wnd)
 			continue;
@@ -1415,6 +1432,7 @@ static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk)
 			send_info[subflow->backup].ratio = ratio;
 		}
 	}
+	__mptcp_set_timeout(sk, tout);
 
 	/* pick the best backup if no other subflow is active */
 	if (!nr_active)
@@ -1433,7 +1451,6 @@ static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk)
 static void mptcp_push_release(struct sock *sk, struct sock *ssk,
 			       struct mptcp_sendmsg_info *info)
 {
-	mptcp_set_timeout(sk, ssk);
 	tcp_push(ssk, 0, info->mss_now, tcp_sk(ssk)->nonagle, info->size_goal);
 	release_sock(ssk);
 }
@@ -1501,12 +1518,11 @@ static void __mptcp_push_pending(struct sock *sk, unsigned int flags)
 		mptcp_push_release(sk, ssk, &info);
 
 out:
-	if (copied) {
-		/* start the timer, if it's not pending */
-		if (!mptcp_timer_pending(sk))
-			mptcp_reset_timer(sk);
+	/* ensure the rtx timer is running */
+	if (!mptcp_timer_pending(sk))
+		mptcp_reset_timer(sk);
+	if (copied)
 		__mptcp_check_send_data_fin(sk);
-	}
 }
 
 static void __mptcp_subflow_push_pending(struct sock *sk, struct sock *ssk)
@@ -1567,7 +1583,6 @@ static void __mptcp_subflow_push_pending(struct sock *sk, struct sock *ssk)
 	 */
 	__mptcp_update_wmem(sk);
 	if (copied) {
-		mptcp_set_timeout(sk, ssk);
 		tcp_push(ssk, 0, info.mss_now, tcp_sk(ssk)->nonagle,
 			 info.size_goal);
 		if (!mptcp_timer_pending(sk))
@@ -2313,7 +2328,6 @@ static void __mptcp_retrans(struct sock *sk)
 			 info.size_goal);
 	}
 
-	mptcp_set_timeout(sk, ssk);
 	release_sock(ssk);
 
 reset_timer:
@@ -2384,6 +2398,7 @@ static int __mptcp_init_sock(struct sock *sk)
 	msk->wmem_reserved = 0;
 	WRITE_ONCE(msk->rmem_released, 0);
 	msk->tx_pending_data = 0;
+	msk->timer_ival = TCP_RTO_MIN;
 
 	msk->first = NULL;
 	inet_csk(sk)->icsk_sync_mss = mptcp_sync_mss;
@@ -2476,7 +2491,6 @@ void mptcp_subflow_shutdown(struct sock *sk, struct sock *ssk, int how)
 			tcp_shutdown(ssk, how);
 		} else {
 			pr_debug("Sending DATA_FIN on subflow %p", ssk);
-			mptcp_set_timeout(sk, ssk);
 			tcp_send_ack(ssk);
 			if (!mptcp_timer_pending(sk))
 				mptcp_reset_timer(sk);
-- 
2.26.3


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 mptcp-next 2/8] mptcp: less aggressive retransmission stragegy
  2021-07-13 21:13 [PATCH v2 mptcp-next 0/8] mptcp: refactor active backup Paolo Abeni
  2021-07-13 21:13 ` [PATCH v2 mptcp-next 1/8] mptcp: more accurate timeout Paolo Abeni
@ 2021-07-13 21:13 ` Paolo Abeni
  2021-07-13 21:13 ` [PATCH v2 mptcp-next 3/8] mptcp: handle pending data on closed subflow Paolo Abeni
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Paolo Abeni @ 2021-07-13 21:13 UTC (permalink / raw)
  To: mptcp

The current mptcp re-inject strategy is very aggressive,
we have mptcp-level retransmissions even on single subflow
connection, if the link in-use is lossy.

Let's be a little more conservative: we do retransmit
only if at least a subflow has write and rtx queue empty.

Additionally use the backup subflows only if the active
subflows are stale - no progresses in at least an rtx period
and ignore stale subflows for rtx timeout update

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/207
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v1 -> v2:
 - skip subflow with stale_count > 0 in rtx time update
---
 net/mptcp/pm.c       | 17 +++++++++++++++++
 net/mptcp/protocol.c | 25 ++++++++++++++++---------
 net/mptcp/protocol.h |  5 ++++-
 3 files changed, 37 insertions(+), 10 deletions(-)

diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
index 639271e09604..9ff17c5205ce 100644
--- a/net/mptcp/pm.c
+++ b/net/mptcp/pm.c
@@ -308,6 +308,23 @@ int mptcp_pm_get_local_id(struct mptcp_sock *msk, struct sock_common *skc)
 	return mptcp_pm_nl_get_local_id(msk, skc);
 }
 
+void mptcp_pm_subflow_chk_stale(const struct mptcp_sock *msk, struct sock *ssk)
+{
+	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
+	u32 rcv_tstamp = READ_ONCE(tcp_sk(ssk)->rcv_tstamp);
+
+	/* keep track of rtx periods with no progress */
+	if (!subflow->stale_count) {
+		subflow->stale_rcv_tstamp = rcv_tstamp;
+		subflow->stale_count++;
+	} else if (subflow->stale_rcv_tstamp == rcv_tstamp) {
+		if (subflow->stale_count < U8_MAX)
+			subflow->stale_count++;
+	} else {
+		subflow->stale_count = 0;
+	}
+}
+
 void mptcp_pm_data_init(struct mptcp_sock *msk)
 {
 	msk->pm.add_addr_signaled = 0;
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 19d734825928..9000ca326225 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -420,7 +420,8 @@ static long mptcp_timeout_from_subflow(const struct mptcp_subflow_context *subfl
 {
 	const struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
 
-	return inet_csk(ssk)->icsk_pending ? inet_csk(ssk)->icsk_timeout - jiffies : 0;
+	return inet_csk(ssk)->icsk_pending && !subflow->stale_count ?
+	       inet_csk(ssk)->icsk_timeout - jiffies : 0;
 }
 
 static void mptcp_set_timeout(struct sock *sk)
@@ -2100,8 +2101,9 @@ static void mptcp_timeout_timer(struct timer_list *t)
  */
 static struct sock *mptcp_subflow_get_retrans(const struct mptcp_sock *msk)
 {
+	struct sock *backup = NULL, *pick = NULL;
 	struct mptcp_subflow_context *subflow;
-	struct sock *backup = NULL;
+	int min_stale_count = INT_MAX;
 
 	sock_owned_by_me((const struct sock *)msk);
 
@@ -2114,11 +2116,11 @@ static struct sock *mptcp_subflow_get_retrans(const struct mptcp_sock *msk)
 		if (!mptcp_subflow_active(subflow))
 			continue;
 
-		/* still data outstanding at TCP level?  Don't retransmit. */
-		if (!tcp_write_queue_empty(ssk)) {
-			if (inet_csk(ssk)->icsk_ca_state >= TCP_CA_Loss)
-				continue;
-			return NULL;
+		/* still data outstanding at TCP level? skip this */
+		if (!tcp_rtx_and_write_queues_empty(ssk)) {
+			mptcp_pm_subflow_chk_stale(msk, ssk);
+			min_stale_count = min_t(int, min_stale_count, subflow->stale_count);
+			continue;
 		}
 
 		if (subflow->backup) {
@@ -2127,10 +2129,15 @@ static struct sock *mptcp_subflow_get_retrans(const struct mptcp_sock *msk)
 			continue;
 		}
 
-		return ssk;
+		if (!pick)
+			pick = ssk;
 	}
 
-	return backup;
+	if (pick)
+		return pick;
+
+	/* use backup only if there are no progresses anywhere */
+	return min_stale_count > 1 ? backup : NULL;
 }
 
 static void mptcp_dispose_initial_subflow(struct mptcp_sock *msk)
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 0f0c026c5f8b..6a3cbdb597e2 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -439,11 +439,13 @@ struct mptcp_subflow_context {
 	u8	reset_seen:1;
 	u8	reset_transient:1;
 	u8	reset_reason:4;
+	u8	stale_count;
 
 	long	delegated_status;
 	struct	list_head delegated_node;   /* link into delegated_action, protected by local BH */
 
-	u32 setsockopt_seq;
+	u32	setsockopt_seq;
+	u32	stale_rcv_tstamp;
 
 	struct	sock *tcp_sock;	    /* tcp sk backpointer */
 	struct	sock *conn;	    /* parent mptcp_sock */
@@ -690,6 +692,7 @@ void mptcp_crypto_hmac_sha(u64 key1, u64 key2, u8 *msg, int len, void *hmac);
 
 void __init mptcp_pm_init(void);
 void mptcp_pm_data_init(struct mptcp_sock *msk);
+void mptcp_pm_subflow_chk_stale(const struct mptcp_sock *msk, struct sock *ssk);
 void mptcp_pm_new_connection(struct mptcp_sock *msk, const struct sock *ssk, int server_side);
 void mptcp_pm_fully_established(struct mptcp_sock *msk, const struct sock *ssk, gfp_t gfp);
 bool mptcp_pm_allow_new_subflow(struct mptcp_sock *msk);
-- 
2.26.3


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 mptcp-next 3/8] mptcp: handle pending data on closed subflow
  2021-07-13 21:13 [PATCH v2 mptcp-next 0/8] mptcp: refactor active backup Paolo Abeni
  2021-07-13 21:13 ` [PATCH v2 mptcp-next 1/8] mptcp: more accurate timeout Paolo Abeni
  2021-07-13 21:13 ` [PATCH v2 mptcp-next 2/8] mptcp: less aggressive retransmission stragegy Paolo Abeni
@ 2021-07-13 21:13 ` Paolo Abeni
  2021-07-13 21:13 ` [PATCH v2 mptcp-next 4/8] mptcp: cleanup sysctl data and helpers Paolo Abeni
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Paolo Abeni @ 2021-07-13 21:13 UTC (permalink / raw)
  To: mptcp

The PM can close active subflow, e.g. due to ingress RM_ADDR
option. Such subflow could carry data still unacked at the
MPTCP-level, both in the write and the rtx_queue, which has
never reached the other peer.

Currently the mptcp-level retransmission will deliver such data,
but at a very low rate (at most 1 DSM for each MPTCP rtx interval).

We can speed-up the recovery a lot, moving all the unacked in the
tcp write_queue, so that it will be pushed again via other
subflows, at the speed allowed by them.

Also make available the new helper for later patches.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/207
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v1 -> v2:
 - drop ssk argument from __mptcp_retransmit_pending_data() - Mat
 - in recovery mode clean_una() must accept moving after first_pending
 - move 'recovery' under data_lock protection, to avoid a bunch of
   ONCE annotation
---
 net/mptcp/options.c  |  8 +++--
 net/mptcp/protocol.c | 76 +++++++++++++++++++++++++++++++++++++++++---
 net/mptcp/protocol.h |  6 ++++
 3 files changed, 82 insertions(+), 8 deletions(-)

diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 4452455aef7f..e37b6f2fb514 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -975,9 +975,11 @@ static void ack_update_msk(struct mptcp_sock *msk,
 	old_snd_una = msk->snd_una;
 	new_snd_una = mptcp_expand_seq(old_snd_una, mp_opt->data_ack, mp_opt->ack64);
 
-	/* ACK for data not even sent yet? Ignore. */
-	if (after64(new_snd_una, snd_nxt))
-		new_snd_una = old_snd_una;
+	/* ACK for data not even sent yet and even above recovery bound? Ignore.*/
+	if (unlikely(after64(new_snd_una, snd_nxt))) {
+		if (!msk->recovery || after64(new_snd_una, msk->recovery_snd_nxt))
+			new_snd_una = old_snd_una;
+	}
 
 	new_wnd_end = new_snd_una + tcp_sk(ssk)->snd_wnd;
 
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 9000ca326225..7fe4c17fb4f5 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1055,8 +1055,14 @@ static void __mptcp_clean_una(struct sock *sk)
 		if (after64(dfrag->data_seq + dfrag->data_len, snd_una))
 			break;
 
-		if (WARN_ON_ONCE(dfrag == msk->first_pending))
-			break;
+		if (unlikely(dfrag == msk->first_pending)) {
+			/* in recovery mode can see ack after the current snd head */
+			if (WARN_ON_ONCE(!msk->recovery))
+				break;
+
+			WRITE_ONCE(msk->first_pending, mptcp_send_next(sk));
+		}
+
 		dfrag_clear(sk, dfrag);
 		cleaned = true;
 	}
@@ -1065,8 +1071,14 @@ static void __mptcp_clean_una(struct sock *sk)
 	if (dfrag && after64(snd_una, dfrag->data_seq)) {
 		u64 delta = snd_una - dfrag->data_seq;
 
-		if (WARN_ON_ONCE(delta > dfrag->already_sent))
-			goto out;
+		/* prevent wrap around in recovery mode */
+		if (unlikely(delta > dfrag->already_sent)) {
+			if (WARN_ON_ONCE(!msk->recovery))
+				goto out;
+			if (WARN_ON_ONCE(delta > dfrag->data_len))
+				goto out;
+			dfrag->already_sent += delta - dfrag->already_sent;
+		}
 
 		dfrag->data_seq += delta;
 		dfrag->offset += delta;
@@ -1077,6 +1089,10 @@ static void __mptcp_clean_una(struct sock *sk)
 		cleaned = true;
 	}
 
+	/* all retransmitted data acked, recovery completed */
+	if (unlikely(msk->recovery) && after64(msk->snd_una, msk->recovery_snd_nxt))
+		msk->recovery = false;
+
 out:
 	if (cleaned) {
 		if (tcp_under_memory_pressure(sk)) {
@@ -1085,7 +1101,7 @@ static void __mptcp_clean_una(struct sock *sk)
 		}
 	}
 
-	if (snd_una == READ_ONCE(msk->snd_nxt)) {
+	if (snd_una == READ_ONCE(msk->snd_nxt) && !msk->recovery) {
 		if (mptcp_timer_pending(sk) && !mptcp_data_fin_enabled(msk))
 			mptcp_stop_timer(sk);
 	} else {
@@ -2148,6 +2164,50 @@ static void mptcp_dispose_initial_subflow(struct mptcp_sock *msk)
 	}
 }
 
+bool __mptcp_retransmit_pending_data(struct sock *sk)
+{
+	struct mptcp_data_frag *cur, *rtx_head;
+	struct mptcp_sock *msk = mptcp_sk(sk);
+
+	if (__mptcp_check_fallback(mptcp_sk(sk)))
+		return false;
+
+	if (tcp_rtx_and_write_queues_empty(sk))
+		return false;
+
+	/* the closing socket has some data untransmitted and/or unacked:
+	 * some data in the mptcp rtx queue has not really xmitted yet.
+	 * keep it simple and re-inject the whole mptcp level rtx queue
+	 */
+	mptcp_data_lock(sk);
+	__mptcp_clean_una_wakeup(sk);
+	rtx_head = mptcp_rtx_head(sk);
+	if (!rtx_head) {
+		mptcp_data_unlock(sk);
+		return false;
+	}
+
+	/* will accept ack for reijected data before re-sending them */
+	if (!msk->recovery || after64(msk->snd_nxt, msk->recovery_snd_nxt))
+		msk->recovery_snd_nxt = msk->snd_nxt;
+	msk->recovery = true;
+	mptcp_data_unlock(sk);
+
+	msk->first_pending = rtx_head;
+	msk->tx_pending_data += msk->snd_nxt - rtx_head->data_seq;
+	msk->snd_nxt = rtx_head->data_seq;
+	msk->snd_burst = 0;
+
+	/* be sure to clear the "sent status" on all re-injected fragments */
+	list_for_each_entry(cur, &msk->rtx_queue, list) {
+		if (!cur->already_sent)
+			break;
+		cur->already_sent = 0;
+	}
+
+	return true;
+}
+
 /* subflow sockets can be either outgoing (connect) or incoming
  * (accept).
  *
@@ -2160,6 +2220,7 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 			      struct mptcp_subflow_context *subflow)
 {
 	struct mptcp_sock *msk = mptcp_sk(sk);
+	bool need_push;
 
 	list_del(&subflow->node);
 
@@ -2171,6 +2232,7 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 	if (ssk->sk_socket)
 		sock_orphan(ssk);
 
+	need_push = __mptcp_retransmit_pending_data(sk);
 	subflow->disposable = 1;
 
 	/* if ssk hit tcp_done(), tcp_cleanup_ulp() cleared the related ops
@@ -2198,6 +2260,9 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 
 	if (msk->subflow && ssk == msk->subflow->sk)
 		mptcp_dispose_initial_subflow(msk);
+
+	if (need_push)
+		__mptcp_push_pending(sk, 0);
 }
 
 void mptcp_close_ssk(struct sock *sk, struct sock *ssk,
@@ -2410,6 +2475,7 @@ static int __mptcp_init_sock(struct sock *sk)
 	msk->first = NULL;
 	inet_csk(sk)->icsk_sync_mss = mptcp_sync_mss;
 	WRITE_ONCE(msk->csum_enabled, mptcp_is_checksum_enabled(sock_net(sk)));
+	msk->recovery = false;
 
 	mptcp_pm_data_init(msk);
 
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 6a3cbdb597e2..6f55784a2efd 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -230,12 +230,17 @@ struct mptcp_sock {
 	struct sock	*last_snd;
 	int		snd_burst;
 	int		old_wspace;
+	u64		recovery_snd_nxt;	/* in recovery mode accept up to this seq;
+						 * recovery related fields are under data_lock
+						 * protection
+						 */
 	u64		snd_una;
 	u64		wnd_end;
 	unsigned long	timer_ival;
 	u32		token;
 	int		rmem_released;
 	unsigned long	flags;
+	bool		recovery;		/* closing subflow write queue reinjected */
 	bool		can_ack;
 	bool		fully_established;
 	bool		rcv_data_fin;
@@ -557,6 +562,7 @@ int mptcp_is_checksum_enabled(struct net *net);
 int mptcp_allow_join_id0(struct net *net);
 void mptcp_subflow_fully_established(struct mptcp_subflow_context *subflow,
 				     struct mptcp_options_received *mp_opt);
+bool __mptcp_retransmit_pending_data(struct sock *sk);
 bool mptcp_subflow_data_available(struct sock *sk);
 void __init mptcp_subflow_init(void);
 void mptcp_subflow_shutdown(struct sock *sk, struct sock *ssk, int how);
-- 
2.26.3


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 mptcp-next 4/8] mptcp: cleanup sysctl data and helpers
  2021-07-13 21:13 [PATCH v2 mptcp-next 0/8] mptcp: refactor active backup Paolo Abeni
                   ` (2 preceding siblings ...)
  2021-07-13 21:13 ` [PATCH v2 mptcp-next 3/8] mptcp: handle pending data on closed subflow Paolo Abeni
@ 2021-07-13 21:13 ` Paolo Abeni
  2021-07-13 21:13 ` [PATCH v2 mptcp-next 5/8] mptcp: faster active backup recovery Paolo Abeni
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Paolo Abeni @ 2021-07-13 21:13 UTC (permalink / raw)
  To: mptcp

Reorder the data in mptcp_pernet to avoid wasting space
with no reasons and constify the access helpers.

No functional changes intended.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/ctrl.c     | 12 ++++++------
 net/mptcp/protocol.h |  8 ++++----
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/net/mptcp/ctrl.c b/net/mptcp/ctrl.c
index 7d738bd06f2c..63bba9d8e289 100644
--- a/net/mptcp/ctrl.c
+++ b/net/mptcp/ctrl.c
@@ -21,33 +21,33 @@ struct mptcp_pernet {
 	struct ctl_table_header *ctl_table_hdr;
 #endif
 
-	u8 mptcp_enabled;
 	unsigned int add_addr_timeout;
+	u8 mptcp_enabled;
 	u8 checksum_enabled;
 	u8 allow_join_initial_addr_port;
 };
 
-static struct mptcp_pernet *mptcp_get_pernet(struct net *net)
+static struct mptcp_pernet *mptcp_get_pernet(const struct net *net)
 {
 	return net_generic(net, mptcp_pernet_id);
 }
 
-int mptcp_is_enabled(struct net *net)
+int mptcp_is_enabled(const struct net *net)
 {
 	return mptcp_get_pernet(net)->mptcp_enabled;
 }
 
-unsigned int mptcp_get_add_addr_timeout(struct net *net)
+unsigned int mptcp_get_add_addr_timeout(const struct net *net)
 {
 	return mptcp_get_pernet(net)->add_addr_timeout;
 }
 
-int mptcp_is_checksum_enabled(struct net *net)
+int mptcp_is_checksum_enabled(const struct net *net)
 {
 	return mptcp_get_pernet(net)->checksum_enabled;
 }
 
-int mptcp_allow_join_id0(struct net *net)
+int mptcp_allow_join_id0(const struct net *net)
 {
 	return mptcp_get_pernet(net)->allow_join_initial_addr_port;
 }
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 6f55784a2efd..43ff6c5baddc 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -556,10 +556,10 @@ static inline void mptcp_subflow_delegated_done(struct mptcp_subflow_context *su
 	clear_bit(MPTCP_DELEGATE_SEND, &subflow->delegated_status);
 }
 
-int mptcp_is_enabled(struct net *net);
-unsigned int mptcp_get_add_addr_timeout(struct net *net);
-int mptcp_is_checksum_enabled(struct net *net);
-int mptcp_allow_join_id0(struct net *net);
+int mptcp_is_enabled(const struct net *net);
+unsigned int mptcp_get_add_addr_timeout(const struct net *net);
+int mptcp_is_checksum_enabled(const struct net *net);
+int mptcp_allow_join_id0(const struct net *net);
 void mptcp_subflow_fully_established(struct mptcp_subflow_context *subflow,
 				     struct mptcp_options_received *mp_opt);
 bool __mptcp_retransmit_pending_data(struct sock *sk);
-- 
2.26.3


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 mptcp-next 5/8] mptcp: faster active backup recovery
  2021-07-13 21:13 [PATCH v2 mptcp-next 0/8] mptcp: refactor active backup Paolo Abeni
                   ` (3 preceding siblings ...)
  2021-07-13 21:13 ` [PATCH v2 mptcp-next 4/8] mptcp: cleanup sysctl data and helpers Paolo Abeni
@ 2021-07-13 21:13 ` Paolo Abeni
  2021-07-14  0:46   ` kernel test robot
  2021-07-14  8:36   ` Dan Carpenter
  2021-07-13 21:13 ` [PATCH v2 mptcp-next 6/8] mptcp: add mibs for stale subflows processing Paolo Abeni
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 14+ messages in thread
From: Paolo Abeni @ 2021-07-13 21:13 UTC (permalink / raw)
  To: mptcp

the msk can use backup subflows to transmit in-sequence data
only if there are no other active subflow. On active backup
scenario, the MPTCP connection can do forward progress only
due to MPTCP retransmissions - rtx can pick backup subflows.

This patch introduces a new flag flow MPTCP subflows: if the
underlaying TCP connection made no progresses for long time,
and there are other less problematic subflows available, the
given subflow become stale.

Stale subflows are not considered active: if all non backup
subflows become stale, the MPTCP scheduler can pick backup
subflows for plain transmissions.

Stale subflows can return in active state, as soon as any reply
from the peer is observed.

Active backup scenarios can now leverage the available b/w
with no restrinction.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/207
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v1 -> v2:
 - moved stale_loss_cnt definition in mptcp_pernet
 - added sysctl documentation
---
 Documentation/networking/mptcp-sysctl.rst | 12 ++++++++
 net/mptcp/ctrl.c                          | 14 +++++++++
 net/mptcp/pm.c                            |  2 ++
 net/mptcp/pm_netlink.c                    | 36 +++++++++++++++++++++++
 net/mptcp/protocol.c                      | 27 +++++++++++++++--
 net/mptcp/protocol.h                      | 12 ++++++--
 6 files changed, 98 insertions(+), 5 deletions(-)

diff --git a/Documentation/networking/mptcp-sysctl.rst b/Documentation/networking/mptcp-sysctl.rst
index 76d939e688b8..45fa8b2aefa8 100644
--- a/Documentation/networking/mptcp-sysctl.rst
+++ b/Documentation/networking/mptcp-sysctl.rst
@@ -45,3 +45,15 @@ allow_join_initial_addr_port - BOOLEAN
 	This is a per-namespace sysctl.
 
 	Default: 1
+
+stale_loss_cnt - INTEGER
+        The number of MPTCP-level retransmission intervals with no traffic and
+        pending outstanding data on a given subflow required to declare it stale.
+        The packet scheduler ignores stale subflows.
+        A low stale_loss_cnt  value allows for fast active-backup switch-over,
+        an high value maximixe links utilization on edge scenarios e.g. lossy
+        link with high BER or peer pausing the data processing.
+
+	This is a per-namespace sysctl.
+
+	Default: 4
diff --git a/net/mptcp/ctrl.c b/net/mptcp/ctrl.c
index 63bba9d8e289..8b235468c88f 100644
--- a/net/mptcp/ctrl.c
+++ b/net/mptcp/ctrl.c
@@ -22,6 +22,7 @@ struct mptcp_pernet {
 #endif
 
 	unsigned int add_addr_timeout;
+	unsigned int stale_loss_cnt;
 	u8 mptcp_enabled;
 	u8 checksum_enabled;
 	u8 allow_join_initial_addr_port;
@@ -52,12 +53,18 @@ int mptcp_allow_join_id0(const struct net *net)
 	return mptcp_get_pernet(net)->allow_join_initial_addr_port;
 }
 
+unsigned int mptcp_stale_loss_cnt(const struct net *net)
+{
+	return mptcp_get_pernet(net)->stale_loss_cnt;
+}
+
 static void mptcp_pernet_set_defaults(struct mptcp_pernet *pernet)
 {
 	pernet->mptcp_enabled = 1;
 	pernet->add_addr_timeout = TCP_RTO_MAX;
 	pernet->checksum_enabled = 0;
 	pernet->allow_join_initial_addr_port = 1;
+	pernet->stale_loss_cnt = 4;
 }
 
 #ifdef CONFIG_SYSCTL
@@ -95,6 +102,12 @@ static struct ctl_table mptcp_sysctl_table[] = {
 		.extra1       = SYSCTL_ZERO,
 		.extra2       = SYSCTL_ONE
 	},
+	{
+		.procname = "stale_loss_cnt",
+		.maxlen = sizeof(unsigned int),
+		.mode = 0644,
+		.proc_handler = proc_douintvec_minmax,
+	},
 	{}
 };
 
@@ -114,6 +127,7 @@ static int mptcp_pernet_new_table(struct net *net, struct mptcp_pernet *pernet)
 	table[1].data = &pernet->add_addr_timeout;
 	table[2].data = &pernet->checksum_enabled;
 	table[3].data = &pernet->allow_join_initial_addr_port;
+	table[4].data = &pernet->stale_loss_cnt;
 
 	hdr = register_net_sysctl(net, MPTCP_SYSCTL_PATH, table);
 	if (!hdr)
diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
index 9ff17c5205ce..d8a85fe92360 100644
--- a/net/mptcp/pm.c
+++ b/net/mptcp/pm.c
@@ -320,8 +320,10 @@ void mptcp_pm_subflow_chk_stale(const struct mptcp_sock *msk, struct sock *ssk)
 	} else if (subflow->stale_rcv_tstamp == rcv_tstamp) {
 		if (subflow->stale_count < U8_MAX)
 			subflow->stale_count++;
+		mptcp_pm_nl_subflow_chk_stale(msk, ssk);
 	} else {
 		subflow->stale_count = 0;
+		mptcp_subflow_set_active(subflow);
 	}
 }
 
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index d2591ebf01d9..eb1f5cf89423 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -47,6 +47,7 @@ struct pm_nl_pernet {
 	spinlock_t		lock;
 	struct list_head	local_addr_list;
 	unsigned int		addrs;
+	unsigned int		stale_loss_cnt;
 	unsigned int		add_addr_signal_max;
 	unsigned int		add_addr_accept_max;
 	unsigned int		local_addr_max;
@@ -900,6 +901,40 @@ static const struct nla_policy mptcp_pm_policy[MPTCP_PM_ATTR_MAX + 1] = {
 	[MPTCP_PM_ATTR_SUBFLOWS]	= { .type	= NLA_U32,	},
 };
 
+void mptcp_pm_nl_subflow_chk_stale(const struct mptcp_sock *msk, struct sock *ssk)
+{
+	struct mptcp_subflow_context *iter, *subflow = mptcp_subflow_ctx(ssk);
+	struct sock *sk = (struct sock *)msk;
+	unsigned int active_max_loss_cnt;
+	struct net *net = sock_net(sk);
+	unsigned int stale_loss_cnt;
+	bool slow, push;
+
+	stale_loss_cnt = mptcp_stale_loss_cnt(net);
+	if (subflow->stale || !stale_loss_cnt || subflow->stale_count <= stale_loss_cnt)
+		return;
+
+	/* look for another available subflow not in loss state */
+	active_max_loss_cnt = max_t(int, stale_loss_cnt - 1, 1);
+	mptcp_for_each_subflow(msk, iter) {
+		if (iter != subflow && mptcp_subflow_active(iter) &&
+		    iter->stale_count < active_max_loss_cnt) {
+			/* we have some alteratives, try to mark this subflow as idle ...*/
+			slow = lock_sock_fast(ssk);
+			if (!tcp_rtx_and_write_queues_empty(ssk)) {
+				subflow->stale = 1;
+				push = __mptcp_retransmit_pending_data(sk);
+			}
+			unlock_sock_fast(ssk, slow);
+
+			/* pending data on the idle subflow: retransmit */
+			if (push)
+				__mptcp_push_pending(sk, 0);
+			return;
+		}
+	}
+}
+
 static int mptcp_pm_family_to_addr(int family)
 {
 #if IS_ENABLED(CONFIG_MPTCP_IPV6)
@@ -1923,6 +1958,7 @@ static int __net_init pm_nl_init_net(struct net *net)
 
 	INIT_LIST_HEAD_RCU(&pernet->local_addr_list);
 	pernet->next_id = 1;
+	pernet->stale_loss_cnt = 4;
 	spin_lock_init(&pernet->lock);
 
 	/* No need to initialize other pernet fields, the struct is zeroed at
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 7fe4c17fb4f5..e936b88d340d 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1391,6 +1391,27 @@ struct subflow_send_info {
 	u64 ratio;
 };
 
+void mptcp_subflow_set_active(struct mptcp_subflow_context *subflow)
+{
+	if (!subflow->stale)
+		return;
+
+	subflow->stale = 0;
+}
+
+bool mptcp_subflow_active(struct mptcp_subflow_context *subflow)
+{
+	if (unlikely(subflow->stale)) {
+		u32 rcv_tstamp = READ_ONCE(tcp_sk(mptcp_subflow_tcp_sock(subflow))->rcv_tstamp);
+
+		if (subflow->stale_rcv_tstamp == rcv_tstamp)
+			return false;
+
+		mptcp_subflow_set_active(subflow);
+	}
+	return __mptcp_subflow_active(subflow);
+}
+
 /* implement the mptcp packet scheduler;
  * returns the subflow that will transmit the next DSS
  * additionally updates the rtx timeout
@@ -1472,7 +1493,7 @@ static void mptcp_push_release(struct sock *sk, struct sock *ssk,
 	release_sock(ssk);
 }
 
-static void __mptcp_push_pending(struct sock *sk, unsigned int flags)
+void __mptcp_push_pending(struct sock *sk, unsigned int flags)
 {
 	struct sock *prev_ssk = NULL, *ssk = NULL;
 	struct mptcp_sock *msk = mptcp_sk(sk);
@@ -2115,7 +2136,7 @@ static void mptcp_timeout_timer(struct timer_list *t)
  *
  * A backup subflow is returned only if that is the only kind available.
  */
-static struct sock *mptcp_subflow_get_retrans(const struct mptcp_sock *msk)
+static struct sock *mptcp_subflow_get_retrans(struct mptcp_sock *msk)
 {
 	struct sock *backup = NULL, *pick = NULL;
 	struct mptcp_subflow_context *subflow;
@@ -2129,7 +2150,7 @@ static struct sock *mptcp_subflow_get_retrans(const struct mptcp_sock *msk)
 	mptcp_for_each_subflow(msk, subflow) {
 		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
 
-		if (!mptcp_subflow_active(subflow))
+		if (!__mptcp_subflow_active(subflow))
 			continue;
 
 		/* still data outstanding at TCP level? skip this */
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 43ff6c5baddc..8bdd038def38 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -432,7 +432,8 @@ struct mptcp_subflow_context {
 		send_mp_prio : 1,
 		rx_eof : 1,
 		can_ack : 1,        /* only after processing the remote a key */
-		disposable : 1;	    /* ctx can be free at ulp release time */
+		disposable : 1,	    /* ctx can be free at ulp release time */
+		stale : 1;	    /* unable to snd/rcv data, do not use for xmit */
 	enum mptcp_data_avail data_avail;
 	u32	remote_nonce;
 	u64	thmac;
@@ -560,9 +561,11 @@ int mptcp_is_enabled(const struct net *net);
 unsigned int mptcp_get_add_addr_timeout(const struct net *net);
 int mptcp_is_checksum_enabled(const struct net *net);
 int mptcp_allow_join_id0(const struct net *net);
+unsigned int mptcp_stale_loss_cnt(const struct net *net);
 void mptcp_subflow_fully_established(struct mptcp_subflow_context *subflow,
 				     struct mptcp_options_received *mp_opt);
 bool __mptcp_retransmit_pending_data(struct sock *sk);
+void __mptcp_push_pending(struct sock *sk, unsigned int flags);
 bool mptcp_subflow_data_available(struct sock *sk);
 void __init mptcp_subflow_init(void);
 void mptcp_subflow_shutdown(struct sock *sk, struct sock *ssk, int how);
@@ -581,7 +584,7 @@ void mptcp_info2sockaddr(const struct mptcp_addr_info *info,
 			 struct sockaddr_storage *addr,
 			 unsigned short family);
 
-static inline bool mptcp_subflow_active(struct mptcp_subflow_context *subflow)
+static inline bool __mptcp_subflow_active(struct mptcp_subflow_context *subflow)
 {
 	struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
 
@@ -593,6 +596,10 @@ static inline bool mptcp_subflow_active(struct mptcp_subflow_context *subflow)
 	return ((1 << ssk->sk_state) & (TCPF_ESTABLISHED | TCPF_CLOSE_WAIT));
 }
 
+void mptcp_subflow_set_active(struct mptcp_subflow_context *subflow);
+
+bool mptcp_subflow_active(struct mptcp_subflow_context *subflow);
+
 static inline void mptcp_subflow_tcp_fallback(struct sock *sk,
 					      struct mptcp_subflow_context *ctx)
 {
@@ -699,6 +706,7 @@ void mptcp_crypto_hmac_sha(u64 key1, u64 key2, u8 *msg, int len, void *hmac);
 void __init mptcp_pm_init(void);
 void mptcp_pm_data_init(struct mptcp_sock *msk);
 void mptcp_pm_subflow_chk_stale(const struct mptcp_sock *msk, struct sock *ssk);
+void mptcp_pm_nl_subflow_chk_stale(const struct mptcp_sock *msk, struct sock *ssk);
 void mptcp_pm_new_connection(struct mptcp_sock *msk, const struct sock *ssk, int server_side);
 void mptcp_pm_fully_established(struct mptcp_sock *msk, const struct sock *ssk, gfp_t gfp);
 bool mptcp_pm_allow_new_subflow(struct mptcp_sock *msk);
-- 
2.26.3


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 mptcp-next 6/8] mptcp: add mibs for stale subflows processing
  2021-07-13 21:13 [PATCH v2 mptcp-next 0/8] mptcp: refactor active backup Paolo Abeni
                   ` (4 preceding siblings ...)
  2021-07-13 21:13 ` [PATCH v2 mptcp-next 5/8] mptcp: faster active backup recovery Paolo Abeni
@ 2021-07-13 21:13 ` Paolo Abeni
  2021-07-13 21:13 ` [PATCH v2 mptcp-next 7/8] mptcp: backup flag from incoming MPJ ack option Paolo Abeni
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Paolo Abeni @ 2021-07-13 21:13 UTC (permalink / raw)
  To: mptcp

This allows monitoring exceptional events like
active backup scenarios.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/mib.c        | 2 ++
 net/mptcp/mib.h        | 2 ++
 net/mptcp/pm.c         | 2 ++
 net/mptcp/pm_netlink.c | 1 +
 net/mptcp/protocol.c   | 1 +
 5 files changed, 8 insertions(+)

diff --git a/net/mptcp/mib.c b/net/mptcp/mib.c
index ff2cc0e3273d..3a7c4e7b2d79 100644
--- a/net/mptcp/mib.c
+++ b/net/mptcp/mib.c
@@ -45,6 +45,8 @@ static const struct snmp_mib mptcp_snmp_list[] = {
 	SNMP_MIB_ITEM("MPPrioTx", MPTCP_MIB_MPPRIOTX),
 	SNMP_MIB_ITEM("MPPrioRx", MPTCP_MIB_MPPRIORX),
 	SNMP_MIB_ITEM("RcvPruned", MPTCP_MIB_RCVPRUNED),
+	SNMP_MIB_ITEM("SubflowStale", MPTCP_MIB_SUBFLOWSTALE),
+	SNMP_MIB_ITEM("SubflowRecover", MPTCP_MIB_SUBFLOWRECOVER),
 	SNMP_MIB_SENTINEL
 };
 
diff --git a/net/mptcp/mib.h b/net/mptcp/mib.h
index 0663cb12b448..8ec16c991aac 100644
--- a/net/mptcp/mib.h
+++ b/net/mptcp/mib.h
@@ -38,6 +38,8 @@ enum linux_mptcp_mib_field {
 	MPTCP_MIB_MPPRIOTX,		/* Transmit a MP_PRIO */
 	MPTCP_MIB_MPPRIORX,		/* Received a MP_PRIO */
 	MPTCP_MIB_RCVPRUNED,		/* Incoming packet dropped due to memory limit */
+	MPTCP_MIB_SUBFLOWSTALE,		/* Subflows entered 'stale' status */
+	MPTCP_MIB_SUBFLOWRECOVER,	/* Subflows returned to active status after being stale */
 	__MPTCP_MIB_MAX
 };
 
diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
index d8a85fe92360..0ed3e565f8f8 100644
--- a/net/mptcp/pm.c
+++ b/net/mptcp/pm.c
@@ -10,6 +10,8 @@
 #include <net/mptcp.h>
 #include "protocol.h"
 
+#include "mib.h"
+
 /* path manager command handlers */
 
 int mptcp_pm_announce_addr(struct mptcp_sock *msk,
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index eb1f5cf89423..3d4fa2dd2cea 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -924,6 +924,7 @@ void mptcp_pm_nl_subflow_chk_stale(const struct mptcp_sock *msk, struct sock *ss
 			if (!tcp_rtx_and_write_queues_empty(ssk)) {
 				subflow->stale = 1;
 				push = __mptcp_retransmit_pending_data(sk);
+				MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_SUBFLOWSTALE);
 			}
 			unlock_sock_fast(ssk, slow);
 
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index e936b88d340d..8eb2626503d7 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1397,6 +1397,7 @@ void mptcp_subflow_set_active(struct mptcp_subflow_context *subflow)
 		return;
 
 	subflow->stale = 0;
+	MPTCP_INC_STATS(sock_net(mptcp_subflow_tcp_sock(subflow)), MPTCP_MIB_SUBFLOWRECOVER);
 }
 
 bool mptcp_subflow_active(struct mptcp_subflow_context *subflow)
-- 
2.26.3


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 mptcp-next 7/8] mptcp: backup flag from incoming MPJ ack option
  2021-07-13 21:13 [PATCH v2 mptcp-next 0/8] mptcp: refactor active backup Paolo Abeni
                   ` (5 preceding siblings ...)
  2021-07-13 21:13 ` [PATCH v2 mptcp-next 6/8] mptcp: add mibs for stale subflows processing Paolo Abeni
@ 2021-07-13 21:13 ` Paolo Abeni
  2021-07-13 21:13 ` [PATCH v2 mptcp-next 8/8] selftests: mptcp: add testcase for active-back Paolo Abeni
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Paolo Abeni @ 2021-07-13 21:13 UTC (permalink / raw)
  To: mptcp

the parsed incoming backup flag is not propagated
to the subflow itself, the client may end-up using it
to send data.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/191
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/subflow.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 966f777d35ce..1151926d335b 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -435,10 +435,12 @@ static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb)
 			goto do_reset;
 		}
 
+		subflow->backup = mp_opt.backup;
 		subflow->thmac = mp_opt.thmac;
 		subflow->remote_nonce = mp_opt.nonce;
-		pr_debug("subflow=%p, thmac=%llu, remote_nonce=%u", subflow,
-			 subflow->thmac, subflow->remote_nonce);
+		pr_debug("subflow=%p, thmac=%llu, remote_nonce=%u backup=%d",
+			 subflow, subflow->thmac, subflow->remote_nonce,
+			 subflow->backup);
 
 		if (!subflow_thmac_valid(subflow)) {
 			MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_JOINACKMAC);
-- 
2.26.3


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 mptcp-next 8/8] selftests: mptcp: add testcase for active-back
  2021-07-13 21:13 [PATCH v2 mptcp-next 0/8] mptcp: refactor active backup Paolo Abeni
                   ` (6 preceding siblings ...)
  2021-07-13 21:13 ` [PATCH v2 mptcp-next 7/8] mptcp: backup flag from incoming MPJ ack option Paolo Abeni
@ 2021-07-13 21:13 ` Paolo Abeni
  2021-07-15  1:20 ` [PATCH v2 mptcp-next 0/8] mptcp: refactor active backup Mat Martineau
  2021-07-16 10:01 ` Matthieu Baerts
  9 siblings, 0 replies; 14+ messages in thread
From: Paolo Abeni @ 2021-07-13 21:13 UTC (permalink / raw)
  To: mptcp

Add more test-case for link failures scenario,
including recovery from link failure using only
backup subflows and bi-directional transfer.

Additionally explicitly check for stale count

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 v1 -> v2:
 -  address recurrent self-tests failures:
    must be less stringent for bidirectionl tests
 -  add shapers, to allow the scheduler using
    all the virtual links
 - reduce the link failure test file size, to
   keep the test run-time almost reasonable
---
 .../testing/selftests/net/mptcp/mptcp_join.sh | 215 +++++++++++++++---
 1 file changed, 184 insertions(+), 31 deletions(-)

diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index f02f4de2f3a0..220154cb92a7 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -3,8 +3,10 @@
 
 ret=0
 sin=""
+sinfail=""
 sout=""
 cin=""
+cinfail=""
 cinsent=""
 cout=""
 ksft_skip=4
@@ -76,6 +78,14 @@ init()
 	done
 }
 
+init_shapers()
+{
+	for i in `seq 1 4`; do
+		tc -n $ns1 qdisc add dev ns1eth$i root netem rate 20mbit delay 1
+		tc -n $ns2 qdisc add dev ns2eth$i root netem rate 20mbit delay 1
+	done
+}
+
 cleanup_partial()
 {
 	rm -f "$capout"
@@ -88,8 +98,8 @@ cleanup_partial()
 
 cleanup()
 {
-	rm -f "$cin" "$cout"
-	rm -f "$sin" "$sout" "$cinsent"
+	rm -f "$cin" "$cout" "$sinfail"
+	rm -f "$sin" "$sout" "$cinsent" "$cinfail"
 	cleanup_partial
 }
 
@@ -211,11 +221,16 @@ link_failure()
 {
 	ns="$1"
 
-	l=$((RANDOM%4))
-	l=$((l+1))
+	if [ -z "$FAILING_LINKS" ]; then
+		l=$((RANDOM%4))
+		FAILING_LINKS=$((l+1))
+	fi
 
-	veth="ns1eth$l"
-	ip -net "$ns" link set "$veth" down
+	for l in $FAILING_LINKS; do
+		veth="ns1eth$l"
+		echo "link $veth down" 1>&2
+		ip -net "$ns" link set "$veth" down
+	done
 }
 
 # $1: IP address
@@ -280,10 +295,17 @@ do_transfer()
 		local_addr="0.0.0.0"
 	fi
 
-	timeout ${timeout_test} \
-		ip netns exec ${listener_ns} \
-			$mptcp_connect -t ${timeout_poll} -l -p $port -s ${srv_proto} \
-				${local_addr} < "$sin" > "$sout" &
+	if [ "$test_link_fail" -eq 2 ];then
+		timeout ${timeout_test} \
+			ip netns exec ${listener_ns} \
+				$mptcp_connect -t ${timeout_poll} -l -p $port -s ${cl_proto} \
+					${local_addr} < "$sinfail" > "$sout" &
+	else
+		timeout ${timeout_test} \
+			ip netns exec ${listener_ns} \
+				$mptcp_connect -t ${timeout_poll} -l -p $port -s ${srv_proto} \
+					${local_addr} < "$sin" > "$sout" &
+	fi
 	spid=$!
 
 	sleep 1
@@ -294,7 +316,7 @@ do_transfer()
 				$mptcp_connect -t ${timeout_poll} -p $port -s ${cl_proto} \
 					$connect_addr < "$cin" > "$cout" &
 	else
-		( cat "$cin" ; sleep 2; link_failure $listener_ns ; cat "$cin" ) | \
+		( cat "$cinfail" ; sleep 2; link_failure $listener_ns ; cat "$cinfail" ) | \
 			tee "$cinsent" | \
 			timeout ${timeout_test} \
 				ip netns exec ${connector_ns} \
@@ -434,7 +456,11 @@ do_transfer()
 		return 1
 	fi
 
-	check_transfer $sin $cout "file received by client"
+	if [ "$test_link_fail" -eq 2 ];then
+		check_transfer $sinfail $cout "file received by client"
+	else
+		check_transfer $sin $cout "file received by client"
+	fi
 	retc=$?
 	if [ "$test_link_fail" -eq 0 ];then
 		check_transfer $cin $sout "file received by server"
@@ -477,29 +503,29 @@ run_tests()
 	lret=0
 	oldin=""
 
-	if [ "$test_linkfail" -eq 1 ];then
-		size=$((RANDOM%1024))
+	# create the input file for the failure test when
+	# the first failure test run
+	if [ "$test_linkfail" -ne 0 -a -z "$cinfail" ]; then
+		size=$((RANDOM%8))
 		size=$((size+1))
-		size=$((size*128))
+		size=$((size*2048))
 
-		oldin=$(mktemp)
-		cp "$cin" "$oldin"
-		make_file "$cin" "client" $size
+		cinfail=$(mktemp)
+		make_file "$cinfail" "client" $size
 	fi
 
-	do_transfer ${listener_ns} ${connector_ns} MPTCP MPTCP ${connect_addr} \
-		${test_linkfail} ${addr_nr_ns1} ${addr_nr_ns2} ${speed} ${bkup}
-	lret=$?
+	if [ "$test_linkfail" -eq 2 -a -z "$sinfail" ]; then
+		size=$((RANDOM%16))
+		size=$((size+1))
+		size=$((size*2048))
 
-	if [ "$test_linkfail" -eq 1 ];then
-		cp "$oldin" "$cin"
-		rm -f "$oldin"
+		sinfail=$(mktemp)
+		make_file "$sinfail" "server" $size
 	fi
 
-	if [ $lret -ne 0 ]; then
-		ret=$lret
-		return
-	fi
+	do_transfer ${listener_ns} ${connector_ns} MPTCP MPTCP ${connect_addr} \
+		${test_linkfail} ${addr_nr_ns1} ${addr_nr_ns2} ${speed} ${bkup}
+	lret=$?
 }
 
 chk_csum_nr()
@@ -593,6 +619,46 @@ chk_join_nr()
 	fi
 }
 
+# a negative value for 'stale_max' means no upper bound:
+# for bidirectional transfer, if one peer sleep for a while
+# - as these tests do - we can have a quite high number of
+# stale/recover conversions, proportional to
+# sleep duration/ MPTCP-level RTX interval.
+chk_stale_nr()
+{
+	local ns=$1
+	local stale_min=$2
+	local stale_max=$3
+	local stale_delta=$4
+	local dump_stats
+	local stale_nr
+	local recover_nr
+
+	printf "%-39s %s" " " "stale"
+	stale_nr=`ip netns exec $ns nstat -as | grep MPTcpExtSubflowStale | awk '{print $2}'`
+	[ -z "$stale_nr" ] && stale_nr=0
+	recover_nr=`ip netns exec $ns nstat -as | grep MPTcpExtSubflowRecover | awk '{print $2}'`
+	[ -z "$recover_nr" ] && recover_nr=0
+
+	if [ $stale_nr -lt $stale_min ] ||
+	   [ $stale_max -gt 0 -a $stale_nr -gt $stale_max ] ||
+	   [ $((stale_nr - $recover_nr)) -ne $stale_delta ]; then
+		echo "[fail] got $stale_nr stale[s] $recover_nr recover[s], " \
+		     " expected stale in range [$stale_min..$stale_max]," \
+		     " stale-recover delta $stale_delta "
+		ret=1
+		dump_stats=1
+	else
+		echo "[ ok ]"
+	fi
+
+	if [ "${dump_stats}" = 1 ]; then
+		echo $ns stats
+		ip netns exec $ns ip -s link show
+		ip netns exec $ns nstat -as | grep MPTcp
+	fi
+}
+
 chk_add_nr()
 {
 	local add_nr=$1
@@ -801,6 +867,27 @@ chk_prio_nr()
 	fi
 }
 
+chk_link_usage()
+{
+	local ns=$1
+	local link=$2
+	local out=$3
+	local expected_rate=$4
+	local tx_link=`ip netns exec $ns cat /sys/class/net/$link/statistics/tx_bytes`
+	local tx_total=`ls -l $out | awk '{print $5}'`
+	local tx_rate=$((tx_link * 100 / $tx_total))
+	local tolerance=5
+
+	printf "%-39s %s" " " "link usage "
+	if [ $tx_rate -lt $((expected_rate - $tolerance)) -o \
+	     $tx_rate -gt $((expected_rate + $tolerance)) ]; then
+		echo "[fail] got $tx_rate% usage, expected $expected_rate%"
+		ret=1
+	else
+		echo "[ ok ]"
+	fi
+}
+
 subflows_tests()
 {
 	reset
@@ -924,14 +1011,80 @@ link_failure_tests()
 {
 	# accept and use add_addr with additional subflows and link loss
 	reset
+
+	# without any b/w limit each veth could spool the packets and get
+	# them acked at xmit time, so that the corresponding subflow will
+	# have almost always no outstanding pkts, the scheduler will pick
+	# always the first subflow and we will have hard time testing
+	# active backup and link switch-over.
+	# Let's set some arbitrary (low) virtual link limits.
+	init_shapers
 	ip netns exec $ns1 ./pm_nl_ctl limits 0 3
-	ip netns exec $ns1 ./pm_nl_ctl add 10.0.2.1 flags signal
+	ip netns exec $ns1 ./pm_nl_ctl add 10.0.2.1 dev ns1eth2 flags signal
 	ip netns exec $ns2 ./pm_nl_ctl limits 1 3
-	ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 flags subflow
-	ip netns exec $ns2 ./pm_nl_ctl add 10.0.4.2 flags subflow
+	ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 dev ns2eth3 flags subflow
+	ip netns exec $ns2 ./pm_nl_ctl add 10.0.4.2 dev ns2eth4 flags subflow
 	run_tests $ns1 $ns2 10.0.1.1 1
 	chk_join_nr "multiple flows, signal, link failure" 3 3 3
 	chk_add_nr 1 1
+	chk_stale_nr $ns2 1 5 1
+
+	# accept and use add_addr with additional subflows and link loss
+	# for bidirectional transfer
+	reset
+	init_shapers
+	ip netns exec $ns1 ./pm_nl_ctl limits 0 3
+	ip netns exec $ns1 ./pm_nl_ctl add 10.0.2.1 dev ns1eth2 flags signal
+	ip netns exec $ns2 ./pm_nl_ctl limits 1 3
+	ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 dev ns2eth3 flags subflow
+	ip netns exec $ns2 ./pm_nl_ctl add 10.0.4.2 dev ns2eth4 flags subflow
+	run_tests $ns1 $ns2 10.0.1.1 2
+	chk_join_nr "multiple flows, signal, bidirectional, link failure" 3 3 3
+	chk_add_nr 1 1
+	chk_stale_nr $ns2 1 -1 1
+
+	# 2 subflows plus 1 backup subflow with a lossy link, backup
+	# will never be used
+	reset
+	init_shapers
+	ip netns exec $ns1 ./pm_nl_ctl limits 0 2
+	ip netns exec $ns1 ./pm_nl_ctl add 10.0.2.1 dev ns1eth2 flags signal
+	ip netns exec $ns2 ./pm_nl_ctl limits 1 2
+	export FAILING_LINKS="1"
+	ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 dev ns2eth3 flags subflow,backup
+	run_tests $ns1 $ns2 10.0.1.1 1
+	chk_join_nr "backup subflow unused with link failure" 2 2 2
+	chk_add_nr 1 1
+	chk_link_usage $ns2 ns2eth3 $cinsent 0
+
+	# 2 lossy links after half transfer, backup will get half of
+	# the traffic
+	reset
+	init_shapers
+	ip netns exec $ns1 ./pm_nl_ctl limits 0 2
+	ip netns exec $ns1 ./pm_nl_ctl add 10.0.2.1 dev ns1eth2 flags signal
+	ip netns exec $ns2 ./pm_nl_ctl limits 1 2
+	ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 dev ns2eth3 flags subflow,backup
+	export FAILING_LINKS="1 2"
+	run_tests $ns1 $ns2 10.0.1.1 1
+	chk_join_nr "backup subflow used due to multiple links failure" 2 2 2
+	chk_add_nr 1 1
+	chk_stale_nr $ns2 2 4 2
+	chk_link_usage $ns2 ns2eth3 $cinsent 50
+
+	# use a backup subflow with the first subflow on a lossy link
+	# for bidirectional transfer
+	reset
+	init_shapers
+	ip netns exec $ns1 ./pm_nl_ctl limits 0 2
+	ip netns exec $ns1 ./pm_nl_ctl add 10.0.2.1 dev ns1eth2 flags signal
+	ip netns exec $ns2 ./pm_nl_ctl limits 1 3
+	ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 dev ns2eth3 flags subflow,backup
+	run_tests $ns1 $ns2 10.0.1.1 2
+	chk_join_nr "backup subflow in use, bidirectional, link failure" 2 2 2
+	chk_add_nr 1 1
+	chk_stale_nr $ns2 1 -1 2
+	chk_link_usage $ns2 ns2eth3 $cinsent 50
 }
 
 add_addr_timeout_tests()
-- 
2.26.3


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 mptcp-next 5/8] mptcp: faster active backup recovery
  2021-07-13 21:13 ` [PATCH v2 mptcp-next 5/8] mptcp: faster active backup recovery Paolo Abeni
@ 2021-07-14  0:46   ` kernel test robot
  2021-07-14  8:36   ` Dan Carpenter
  1 sibling, 0 replies; 14+ messages in thread
From: kernel test robot @ 2021-07-14  0:46 UTC (permalink / raw)
  To: Paolo Abeni, mptcp; +Cc: clang-built-linux, kbuild-all


[-- Attachment #1: Type: text/plain, Size: 3959 bytes --]

Hi Paolo,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on mptcp/export]
[also build test WARNING on kselftest/next linus/master v5.14-rc1 next-20210713]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Paolo-Abeni/mptcp-refactor-active-backup/20210714-051541
base:   https://github.com/multipath-tcp/mptcp_net-next.git export
config: x86_64-randconfig-r006-20210713 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 8d69635ed9ecf36fd0ca85906bfde17949671cbe)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # https://github.com/0day-ci/linux/commit/bc57ec20ec19b4d0b6de45ef1df8c8c3685c7fe2
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Paolo-Abeni/mptcp-refactor-active-backup/20210714-051541
        git checkout bc57ec20ec19b4d0b6de45ef1df8c8c3685c7fe2
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> net/mptcp/pm_netlink.c:924:8: warning: variable 'push' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
                           if (!tcp_rtx_and_write_queues_empty(ssk)) {
                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   net/mptcp/pm_netlink.c:931:8: note: uninitialized use occurs here
                           if (push)
                               ^~~~
   net/mptcp/pm_netlink.c:924:4: note: remove the 'if' if its condition is always true
                           if (!tcp_rtx_and_write_queues_empty(ssk)) {
                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   net/mptcp/pm_netlink.c:911:17: note: initialize the variable 'push' to silence this warning
           bool slow, push;
                          ^
                           = 0
   1 warning generated.


vim +924 net/mptcp/pm_netlink.c

   903	
   904	void mptcp_pm_nl_subflow_chk_stale(const struct mptcp_sock *msk, struct sock *ssk)
   905	{
   906		struct mptcp_subflow_context *iter, *subflow = mptcp_subflow_ctx(ssk);
   907		struct sock *sk = (struct sock *)msk;
   908		unsigned int active_max_loss_cnt;
   909		struct net *net = sock_net(sk);
   910		unsigned int stale_loss_cnt;
   911		bool slow, push;
   912	
   913		stale_loss_cnt = mptcp_stale_loss_cnt(net);
   914		if (subflow->stale || !stale_loss_cnt || subflow->stale_count <= stale_loss_cnt)
   915			return;
   916	
   917		/* look for another available subflow not in loss state */
   918		active_max_loss_cnt = max_t(int, stale_loss_cnt - 1, 1);
   919		mptcp_for_each_subflow(msk, iter) {
   920			if (iter != subflow && mptcp_subflow_active(iter) &&
   921			    iter->stale_count < active_max_loss_cnt) {
   922				/* we have some alteratives, try to mark this subflow as idle ...*/
   923				slow = lock_sock_fast(ssk);
 > 924				if (!tcp_rtx_and_write_queues_empty(ssk)) {
   925					subflow->stale = 1;
   926					push = __mptcp_retransmit_pending_data(sk);
   927				}
   928				unlock_sock_fast(ssk, slow);
   929	
   930				/* pending data on the idle subflow: retransmit */
   931				if (push)
   932					__mptcp_push_pending(sk, 0);
   933				return;
   934			}
   935		}
   936	}
   937	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 34185 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 mptcp-next 5/8] mptcp: faster active backup recovery
  2021-07-13 21:13 ` [PATCH v2 mptcp-next 5/8] mptcp: faster active backup recovery Paolo Abeni
  2021-07-14  0:46   ` kernel test robot
@ 2021-07-14  8:36   ` Dan Carpenter
  2021-07-14 10:15     ` Matthieu Baerts
  1 sibling, 1 reply; 14+ messages in thread
From: Dan Carpenter @ 2021-07-14  8:36 UTC (permalink / raw)
  To: kbuild, Paolo Abeni, mptcp; +Cc: lkp, kbuild-all

Hi Paolo,

url:    https://github.com/0day-ci/linux/commits/Paolo-Abeni/mptcp-refactor-active-backup/20210714-051541
base:   https://github.com/multipath-tcp/mptcp_net-next.git export
config: x86_64-randconfig-m001-20210713 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

New smatch warnings:
net/mptcp/pm_netlink.c:931 mptcp_pm_nl_subflow_chk_stale() error: uninitialized symbol 'push'.

vim +/push +931 net/mptcp/pm_netlink.c

bc57ec20ec19b4d Paolo Abeni 2021-07-13  904  void mptcp_pm_nl_subflow_chk_stale(const struct mptcp_sock *msk, struct sock *ssk)
bc57ec20ec19b4d Paolo Abeni 2021-07-13  905  {
bc57ec20ec19b4d Paolo Abeni 2021-07-13  906  	struct mptcp_subflow_context *iter, *subflow = mptcp_subflow_ctx(ssk);
bc57ec20ec19b4d Paolo Abeni 2021-07-13  907  	struct sock *sk = (struct sock *)msk;
bc57ec20ec19b4d Paolo Abeni 2021-07-13  908  	unsigned int active_max_loss_cnt;
bc57ec20ec19b4d Paolo Abeni 2021-07-13  909  	struct net *net = sock_net(sk);
bc57ec20ec19b4d Paolo Abeni 2021-07-13  910  	unsigned int stale_loss_cnt;
bc57ec20ec19b4d Paolo Abeni 2021-07-13  911  	bool slow, push;
bc57ec20ec19b4d Paolo Abeni 2021-07-13  912  
bc57ec20ec19b4d Paolo Abeni 2021-07-13  913  	stale_loss_cnt = mptcp_stale_loss_cnt(net);
bc57ec20ec19b4d Paolo Abeni 2021-07-13  914  	if (subflow->stale || !stale_loss_cnt || subflow->stale_count <= stale_loss_cnt)
bc57ec20ec19b4d Paolo Abeni 2021-07-13  915  		return;
bc57ec20ec19b4d Paolo Abeni 2021-07-13  916  
bc57ec20ec19b4d Paolo Abeni 2021-07-13  917  	/* look for another available subflow not in loss state */
bc57ec20ec19b4d Paolo Abeni 2021-07-13  918  	active_max_loss_cnt = max_t(int, stale_loss_cnt - 1, 1);
bc57ec20ec19b4d Paolo Abeni 2021-07-13  919  	mptcp_for_each_subflow(msk, iter) {
bc57ec20ec19b4d Paolo Abeni 2021-07-13  920  		if (iter != subflow && mptcp_subflow_active(iter) &&
bc57ec20ec19b4d Paolo Abeni 2021-07-13  921  		    iter->stale_count < active_max_loss_cnt) {
bc57ec20ec19b4d Paolo Abeni 2021-07-13  922  			/* we have some alteratives, try to mark this subflow as idle ...*/
bc57ec20ec19b4d Paolo Abeni 2021-07-13  923  			slow = lock_sock_fast(ssk);
bc57ec20ec19b4d Paolo Abeni 2021-07-13  924  			if (!tcp_rtx_and_write_queues_empty(ssk)) {
bc57ec20ec19b4d Paolo Abeni 2021-07-13  925  				subflow->stale = 1;
bc57ec20ec19b4d Paolo Abeni 2021-07-13  926  				push = __mptcp_retransmit_pending_data(sk);
bc57ec20ec19b4d Paolo Abeni 2021-07-13  927  			}

"push" uninitialized on else path.

bc57ec20ec19b4d Paolo Abeni 2021-07-13  928  			unlock_sock_fast(ssk, slow);
bc57ec20ec19b4d Paolo Abeni 2021-07-13  929  
bc57ec20ec19b4d Paolo Abeni 2021-07-13  930  			/* pending data on the idle subflow: retransmit */
bc57ec20ec19b4d Paolo Abeni 2021-07-13 @931  			if (push)
bc57ec20ec19b4d Paolo Abeni 2021-07-13  932  				__mptcp_push_pending(sk, 0);
bc57ec20ec19b4d Paolo Abeni 2021-07-13  933  			return;
bc57ec20ec19b4d Paolo Abeni 2021-07-13  934  		}
bc57ec20ec19b4d Paolo Abeni 2021-07-13  935  	}
bc57ec20ec19b4d Paolo Abeni 2021-07-13  936  }

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 mptcp-next 5/8] mptcp: faster active backup recovery
  2021-07-14  8:36   ` Dan Carpenter
@ 2021-07-14 10:15     ` Matthieu Baerts
  0 siblings, 0 replies; 14+ messages in thread
From: Matthieu Baerts @ 2021-07-14 10:15 UTC (permalink / raw)
  To: Dan Carpenter, kbuild, Paolo Abeni, mptcp; +Cc: lkp, kbuild-all

Hi Dan,

On 14/07/2021 10:36, Dan Carpenter wrote:
> Hi Paolo,
> 
> url:    https://github.com/0day-ci/linux/commits/Paolo-Abeni/mptcp-refactor-active-backup/20210714-051541
> base:   https://github.com/multipath-tcp/mptcp_net-next.git export
> config: x86_64-randconfig-m001-20210713 (attached as .config)
> compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> 
> New smatch warnings:
> net/mptcp/pm_netlink.c:931 mptcp_pm_nl_subflow_chk_stale() error: uninitialized symbol 'push'.


Thank you for your email!



It looks like Intel's kernel test robot spot the same issue a few hours
ago [1] (that's good, it was very quick!).

Paolo already sent a fix [2].


[1] https://lore.kernel.org/mptcp/202107140830.4kwYyqkO-lkp@intel.com/
[2]
https://lore.kernel.org/mptcp/1d91265cca09bb69516907f44b5e4b72c4efabe1.1626252886.git.pabeni@redhat.com/T/#u


Cheers,

Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 mptcp-next 0/8] mptcp: refactor active backup
  2021-07-13 21:13 [PATCH v2 mptcp-next 0/8] mptcp: refactor active backup Paolo Abeni
                   ` (7 preceding siblings ...)
  2021-07-13 21:13 ` [PATCH v2 mptcp-next 8/8] selftests: mptcp: add testcase for active-back Paolo Abeni
@ 2021-07-15  1:20 ` Mat Martineau
  2021-07-16 10:01 ` Matthieu Baerts
  9 siblings, 0 replies; 14+ messages in thread
From: Mat Martineau @ 2021-07-15  1:20 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

On Tue, 13 Jul 2021, Paolo Abeni wrote:

> This series addresses a bunch of issues somewhat related to active
> backup handling. The most visible functial issue addressed here is:
>
> https://github.com/multipath-tcp/mptcp_net-next/issues/191
>
> This series also add some specific self-tests, to cover both
> active-backup switch-over and proper usage of backup link.
>
> A new netns parameter is introduced:
>
> stale_loss_cnt - the max amount of mptcp rtx timeouts with no progresses
>        and outstanding data over a single subflow needed to declare
>        such subflow 'stale': no more data will be queue until some ack
>        is observed.
>
> Changes from v1:
> - stale_loss_cnt is now configurable via sysctl
> - addressed buglet noted by Mat on v1
> - fixed timeout update issues in patches 1-2
> - fixed splat in __mptcp_clean_una() on reinection
>   - the new self-tests are now less unstable

The "less unstable" tests pass reliably for me!

Ok with me to put these on the export branch, with the squash-to that 
addresses the kbuild warning.

Regarding the "stale vs. stall" discussion on irc, I'm fine with referring 
to the subflows as "stale".

Thanks!

Mat


>
> Paolo Abeni (8):
>  mptcp: more accurate timeout
>  mptcp: less aggressive retransmission stragegy
>  mptcp: handle pending data on closed subflow
>  mptcp: cleanup sysctl data and helpers
>  mptcp: faster active backup recovery
>  mptcp: add mibs for stale subflows processing
>  mptcp: backup flag from incoming MPJ ack option
>  selftests: mptcp: add testcase for active-back
>
> Documentation/networking/mptcp-sysctl.rst     |  12 +
> net/mptcp/ctrl.c                              |  26 ++-
> net/mptcp/mib.c                               |   2 +
> net/mptcp/mib.h                               |   2 +
> net/mptcp/options.c                           |   8 +-
> net/mptcp/pm.c                                |  21 ++
> net/mptcp/pm_netlink.c                        |  37 +++
> net/mptcp/protocol.c                          | 187 +++++++++++----
> net/mptcp/protocol.h                          |  31 ++-
> net/mptcp/subflow.c                           |   6 +-
> .../testing/selftests/net/mptcp/mptcp_join.sh | 215 +++++++++++++++---
> 11 files changed, 459 insertions(+), 88 deletions(-)
>
> -- 
> 2.26.3
>
>
>

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 mptcp-next 0/8] mptcp: refactor active backup
  2021-07-13 21:13 [PATCH v2 mptcp-next 0/8] mptcp: refactor active backup Paolo Abeni
                   ` (8 preceding siblings ...)
  2021-07-15  1:20 ` [PATCH v2 mptcp-next 0/8] mptcp: refactor active backup Mat Martineau
@ 2021-07-16 10:01 ` Matthieu Baerts
  9 siblings, 0 replies; 14+ messages in thread
From: Matthieu Baerts @ 2021-07-16 10:01 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp, Mat Martineau

Hi Paolo, Mat,

On 13/07/2021 23:13, Paolo Abeni wrote:
> This series addresses a bunch of issues somewhat related to active 
> backup handling. The most visible functial issue addressed here is:
> 
> https://github.com/multipath-tcp/mptcp_net-next/issues/191
> 
> This series also add some specific self-tests, to cover both 
> active-backup switch-over and proper usage of backup link.

Thank you for the patches and the reviews!

I just added them in our tree without a few typo spot by 'checkpatch.pl
--codespell' and Mat's RvB tags:

- ff602fc7b2be: mptcp: more accurate timeout
- bf507fd9b493: mptcp: less aggressive retransmission strategy
- 9e0347b01fcd: mptcp: handle pending data on closed subflow
- d3b26c3a88d3: mptcp: cleanup sysctl data and helpers
- fb36094314d3: mptcp: faster active backup recovery
- ba4708d59438: mptcp: add mibs for stale subflows processing
- d44276b1443d: mptcp: backup flag from incoming MPJ ack option
- 9975dc21bcc4: selftests: mptcp: add testcase for active-back
- Results: 1187a6e1e0e7..0cff4b614c2d

- dc374a0e2daa: "squashed" (with conflicts) in "mptcp: faster active
backup recovery"
- 58258485522d: conflict in t/mptcp-add-mibs-for-stale-subflows-processing
- dc374a0e2daa: Squash-to: "mptcp: faster active backup recovery"
- Results: 0cff4b614c2d..047cd75a9837

Builds and tests are now in progress:

https://cirrus-ci.com/github/multipath-tcp/mptcp_net-next/export/20210716T100127
https://github.com/multipath-tcp/mptcp_net-next/actions/workflows/build-validation.yml?query=branch:export/20210716T100127

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, back to index

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-13 21:13 [PATCH v2 mptcp-next 0/8] mptcp: refactor active backup Paolo Abeni
2021-07-13 21:13 ` [PATCH v2 mptcp-next 1/8] mptcp: more accurate timeout Paolo Abeni
2021-07-13 21:13 ` [PATCH v2 mptcp-next 2/8] mptcp: less aggressive retransmission stragegy Paolo Abeni
2021-07-13 21:13 ` [PATCH v2 mptcp-next 3/8] mptcp: handle pending data on closed subflow Paolo Abeni
2021-07-13 21:13 ` [PATCH v2 mptcp-next 4/8] mptcp: cleanup sysctl data and helpers Paolo Abeni
2021-07-13 21:13 ` [PATCH v2 mptcp-next 5/8] mptcp: faster active backup recovery Paolo Abeni
2021-07-14  0:46   ` kernel test robot
2021-07-14  8:36   ` Dan Carpenter
2021-07-14 10:15     ` Matthieu Baerts
2021-07-13 21:13 ` [PATCH v2 mptcp-next 6/8] mptcp: add mibs for stale subflows processing Paolo Abeni
2021-07-13 21:13 ` [PATCH v2 mptcp-next 7/8] mptcp: backup flag from incoming MPJ ack option Paolo Abeni
2021-07-13 21:13 ` [PATCH v2 mptcp-next 8/8] selftests: mptcp: add testcase for active-back Paolo Abeni
2021-07-15  1:20 ` [PATCH v2 mptcp-next 0/8] mptcp: refactor active backup Mat Martineau
2021-07-16 10:01 ` Matthieu Baerts

MPTCP Linux Development

Archives are clonable:
	git clone --mirror https://lore.kernel.org/mptcp/0 mptcp/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 mptcp mptcp/ https://lore.kernel.org/mptcp \
		mptcp@lists.linux.dev
	public-inbox-index mptcp

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/dev.linux.lists.mptcp


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git