mptcp.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [PATCH mptcp-next v11 0/5] BPF redundant scheduler
@ 2022-09-09  9:11 Geliang Tang
  2022-09-09  9:11 ` [PATCH mptcp-next v11 1/5] Squash to "mptcp: add get_subflow wrappers" Geliang Tang
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Geliang Tang @ 2022-09-09  9:11 UTC (permalink / raw)
  To: mptcp; +Cc: Geliang Tang

v11:
 - address to Mat's comments in v10.
 - rebase to export/20220908T063452

v10:
 - send multiple dfrags in __mptcp_push_pending().

v9:
 - drop the extra *err paramenter of mptcp_sched_get_send() as Florian
   suggested.

v8:
 - update __mptcp_push_pending(), send the same data on each subflow.
 - update __mptcp_retrans, track the max sent data.
 = add a new patch.

v7:
 - drop redundant flag in v6
 - drop __mptcp_subflows_push_pending in v6
 - update redundant subflows support in __mptcp_push_pending
 - update redundant subflows support in __mptcp_retrans

v6:
 - Add redundant flag for struct mptcp_sched_ops.
 - add a dedicated function __mptcp_subflows_push_pending() to deal with
   redundat subflows push pending.

v5:
 - address to Paolo's comment, keep the optimization to
mptcp_subflow_get_send() for the non eBPF case.
 - merge mptcp_sched_get_send() and __mptcp_sched_get_send() in v4 into one.
 - depends on "cleanups for bpf sched selftests".

v4:
 - small cleanups in patch 1, 2.
 - add TODO in patch 3.
 - rebase patch 5 on 'cleanups for bpf sched selftests'.

v3:
 - use new API.
 - fix the link failure tests issue mentioned in ("https://patchwork.kernel.org/project/mptcp/cover/cover.1653033459.git.geliang.tang@suse.com/").

v2:
 - add MPTCP_SUBFLOWS_MAX limit to avoid infinite loops when the
   scheduler always sets call_again to true.
 - track the largest copied amount.
 - deal with __mptcp_subflow_push_pending() and the retransmit loop.
 - depends on "BPF round-robin scheduler" v14.

v1:

Implements the redundant BPF MPTCP scheduler, which sends all packets
redundantly on all available subflows.

Geliang Tang (5):
  Squash to "mptcp: add get_subflow wrappers"
  mptcp: redundant subflows push pending
  mptcp: redundant subflows retrans support
  selftests/bpf: Add bpf_red scheduler
  selftests/bpf: Add bpf_red test

 net/mptcp/protocol.c                          | 158 +++++++++++-------
 net/mptcp/protocol.h                          |  12 +-
 net/mptcp/sched.c                             |  59 ++++---
 .../testing/selftests/bpf/prog_tests/mptcp.c  |  34 ++++
 .../selftests/bpf/progs/mptcp_bpf_red.c       |  36 ++++
 5 files changed, 209 insertions(+), 90 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/mptcp_bpf_red.c

-- 
2.35.3


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH mptcp-next v11 1/5] Squash to "mptcp: add get_subflow wrappers"
  2022-09-09  9:11 [PATCH mptcp-next v11 0/5] BPF redundant scheduler Geliang Tang
@ 2022-09-09  9:11 ` Geliang Tang
  2022-09-09  9:11 ` [PATCH mptcp-next v11 2/5] mptcp: redundant subflows push pending Geliang Tang
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Geliang Tang @ 2022-09-09  9:11 UTC (permalink / raw)
  To: mptcp; +Cc: Geliang Tang

Please update the commit log:

'''
This patch defines two new wrappers mptcp_sched_get_send() and
mptcp_sched_get_retrans(), invoke get_subflow() of msk->sched in them.
Use them instead of using mptcp_subflow_get_send() or
mptcp_subflow_get_retrans() directly.

Set the subflow pointers array in struct mptcp_sched_data before invoking
get_subflow(), then it can be used in get_subflow() in the BPF contexts.

Check the subflow scheduled flags to test which subflow or subflows are
picked by the scheduler.

Move sock_owned_by_me() and the fallback check code from
mptcp_subflow_get_send/retrans() into the wrappers.

Redundant subflows are not supported in __mptcp_subflow_push_pending()
yet. This patch adds a placeholder in mptcp_sched_get_send() to pick the
first subflow for the redundant subflows case.
'''

Signed-off-by: Geliang Tang <geliang.tang@suse.com>
---
 net/mptcp/protocol.c |  4 +--
 net/mptcp/protocol.h |  2 +-
 net/mptcp/sched.c    | 59 +++++++++++++++++++++++++-------------------
 3 files changed, 37 insertions(+), 28 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 55442df8fb97..8888de922475 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1547,7 +1547,7 @@ void __mptcp_push_pending(struct sock *sk, unsigned int flags)
 			int ret = 0;
 
 			prev_ssk = ssk;
-			ssk = mptcp_sched_get_send(msk);
+			ssk = mptcp_subflow_get_send(msk);
 
 			/* First check. If the ssk has changed since
 			 * the last round, release prev_ssk
@@ -2425,7 +2425,7 @@ static void __mptcp_retrans(struct sock *sk)
 	mptcp_clean_una_wakeup(sk);
 
 	/* first check ssk: need to kick "stale" logic */
-	ssk = mptcp_sched_get_retrans(msk);
+	ssk = mptcp_subflow_get_retrans(msk);
 	dfrag = mptcp_rtx_head(sk);
 	if (!dfrag) {
 		if (mptcp_data_fin_enabled(msk)) {
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 1bc9f6e77ddd..b6411285a0b2 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -639,7 +639,7 @@ void mptcp_subflow_set_scheduled(struct mptcp_subflow_context *subflow,
 struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk);
 struct sock *mptcp_subflow_get_retrans(struct mptcp_sock *msk);
 struct sock *mptcp_sched_get_send(struct mptcp_sock *msk);
-struct sock *mptcp_sched_get_retrans(struct mptcp_sock *msk);
+int mptcp_sched_get_retrans(struct mptcp_sock *msk);
 
 static inline bool __tcp_can_send(const struct sock *ssk)
 {
diff --git a/net/mptcp/sched.c b/net/mptcp/sched.c
index 044c5ec8bbfb..cff538a9e187 100644
--- a/net/mptcp/sched.c
+++ b/net/mptcp/sched.c
@@ -114,35 +114,48 @@ static int mptcp_sched_data_init(struct mptcp_sock *msk, bool reinject,
 	for (; i < MPTCP_SUBFLOWS_MAX; i++)
 		data->contexts[i] = NULL;
 
+	msk->snd_burst = 0;
+
 	return 0;
 }
 
 struct sock *mptcp_sched_get_send(struct mptcp_sock *msk)
 {
+	struct mptcp_subflow_context *subflow;
 	struct mptcp_sched_data data;
 	struct sock *ssk = NULL;
-	int i;
 
 	sock_owned_by_me((struct sock *)msk);
 
 	/* the following check is moved out of mptcp_subflow_get_send */
 	if (__mptcp_check_fallback(msk)) {
-		if (!msk->first)
-			return NULL;
-		return __tcp_can_send(msk->first) &&
-		       sk_stream_memory_free(msk->first) ? msk->first : NULL;
+		if (msk->first &&
+		    __tcp_can_send(msk->first) &&
+		    sk_stream_memory_free(msk->first)) {
+			mptcp_subflow_set_scheduled(mptcp_subflow_ctx(msk->first), true);
+			return msk->first;
+		}
+		return NULL;
 	}
 
-	if (!msk->sched)
-		return mptcp_subflow_get_send(msk);
+	if (!msk->sched) {
+		ssk = mptcp_subflow_get_send(msk);
+		if (ssk)
+			mptcp_subflow_set_scheduled(mptcp_subflow_ctx(ssk), true);
+		return ssk;
+	}
 
 	mptcp_sched_data_init(msk, false, &data);
 	msk->sched->get_subflow(msk, &data);
 
-	for (i = 0; i < MPTCP_SUBFLOWS_MAX; i++) {
-		if (data.contexts[i] && READ_ONCE(data.contexts[i]->scheduled)) {
-			ssk = data.contexts[i]->tcp_sock;
-			msk->last_snd = ssk;
+	mptcp_for_each_subflow(msk, subflow) {
+		if (READ_ONCE(subflow->scheduled)) {
+			/* TODO: Redundant subflows are not supported in
+			 * __mptcp_subflow_push_pending() yet. Here's a
+			 * placeholder to pick the first subflow for the
+			 * redundant subflows case.
+			 */
+			ssk = subflow->tcp_sock;
 			break;
 		}
 	}
@@ -150,31 +163,27 @@ struct sock *mptcp_sched_get_send(struct mptcp_sock *msk)
 	return ssk;
 }
 
-struct sock *mptcp_sched_get_retrans(struct mptcp_sock *msk)
+int mptcp_sched_get_retrans(struct mptcp_sock *msk)
 {
 	struct mptcp_sched_data data;
 	struct sock *ssk = NULL;
-	int i;
 
 	sock_owned_by_me((const struct sock *)msk);
 
 	/* the following check is moved out of mptcp_subflow_get_retrans */
 	if (__mptcp_check_fallback(msk))
-		return NULL;
+		return -EINVAL;
 
-	if (!msk->sched)
-		return mptcp_subflow_get_retrans(msk);
+	if (!msk->sched) {
+		ssk = mptcp_subflow_get_retrans(msk);
+		if (!ssk)
+			return -EINVAL;
+		mptcp_subflow_set_scheduled(mptcp_subflow_ctx(ssk), true);
+		return 0;
+	}
 
 	mptcp_sched_data_init(msk, true, &data);
 	msk->sched->get_subflow(msk, &data);
 
-	for (i = 0; i < MPTCP_SUBFLOWS_MAX; i++) {
-		if (data.contexts[i] && READ_ONCE(data.contexts[i]->scheduled)) {
-			ssk = data.contexts[i]->tcp_sock;
-			msk->last_snd = ssk;
-			break;
-		}
-	}
-
-	return ssk;
+	return 0;
 }
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH mptcp-next v11 2/5] mptcp: redundant subflows push pending
  2022-09-09  9:11 [PATCH mptcp-next v11 0/5] BPF redundant scheduler Geliang Tang
  2022-09-09  9:11 ` [PATCH mptcp-next v11 1/5] Squash to "mptcp: add get_subflow wrappers" Geliang Tang
@ 2022-09-09  9:11 ` Geliang Tang
  2022-09-09  9:11 ` [PATCH mptcp-next v11 3/5] mptcp: redundant subflows retrans support Geliang Tang
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Geliang Tang @ 2022-09-09  9:11 UTC (permalink / raw)
  To: mptcp; +Cc: Geliang Tang

This patch adds the redundant subflows support for __mptcp_push_pending().
Use mptcp_sched_get_send() wrapper instead of mptcp_subflow_get_send()
in it.

Check the subflow scheduled flags to test which subflow or subflows are
picked by the scheduler, use them to send data.

Signed-off-by: Geliang Tang <geliang.tang@suse.com>
---
 net/mptcp/protocol.c | 96 +++++++++++++++++++++++++-------------------
 net/mptcp/protocol.h | 10 +++++
 2 files changed, 65 insertions(+), 41 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 8888de922475..be62126d7c63 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1530,61 +1530,75 @@ void mptcp_check_and_set_pending(struct sock *sk)
 
 void __mptcp_push_pending(struct sock *sk, unsigned int flags)
 {
-	struct sock *prev_ssk = NULL, *ssk = NULL;
+	struct mptcp_data_frag *dfrag, *last_dfrag = NULL;
 	struct mptcp_sock *msk = mptcp_sk(sk);
-	struct mptcp_sendmsg_info info = {
-				.flags = flags,
-	};
+	struct mptcp_subflow_context *subflow;
 	bool do_check_data_fin = false;
-	struct mptcp_data_frag *dfrag;
+	int last_sent = 0;
 	int len;
 
-	while ((dfrag = mptcp_send_head(sk))) {
-		info.sent = dfrag->already_sent;
-		info.limit = dfrag->data_len;
-		len = dfrag->data_len - dfrag->already_sent;
-		while (len > 0) {
-			int ret = 0;
-
-			prev_ssk = ssk;
-			ssk = mptcp_subflow_get_send(msk);
+	if (!mptcp_sched_get_send(msk))
+		goto out;
 
-			/* First check. If the ssk has changed since
-			 * the last round, release prev_ssk
-			 */
-			if (ssk != prev_ssk && prev_ssk)
-				mptcp_push_release(prev_ssk, &info);
-			if (!ssk)
-				goto out;
+	mptcp_for_each_subflow(msk, subflow) {
+		if (!__mptcp_subflow_active(subflow))
+			continue;
 
-			/* Need to lock the new subflow only if different
-			 * from the previous one, otherwise we are still
-			 * helding the relevant lock
-			 */
-			if (ssk != prev_ssk)
-				lock_sock(ssk);
+		if (READ_ONCE(subflow->scheduled)) {
+			struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
+			struct mptcp_sendmsg_info info = {
+				.flags = flags,
+			};
 
-			ret = mptcp_sendmsg_frag(sk, ssk, dfrag, &info);
-			if (ret <= 0) {
-				if (ret == -EAGAIN)
-					continue;
-				mptcp_push_release(ssk, &info);
+			dfrag = mptcp_send_head(sk);
+			if (!dfrag)
 				goto out;
-			}
 
-			do_check_data_fin = true;
-			info.sent += ret;
-			len -= ret;
+			lock_sock(ssk);
+
+			do {
+				info.sent = dfrag->already_sent;
+				info.limit = dfrag->data_len;
+				len = dfrag->data_len - dfrag->already_sent;
+				last_dfrag = dfrag;
+				last_sent = 0;
+				while (len > 0) {
+					int ret = 0;
+
+					ret = mptcp_sendmsg_frag(sk, ssk, dfrag, &info);
+					if (ret <= 0) {
+						if (ret == -EAGAIN)
+							continue;
+						mptcp_push_release(ssk, &info);
+						goto out;
+					}
+
+					do_check_data_fin = true;
+					info.sent += ret;
+					last_sent += ret;
+					len -= ret;
+				}
+			} while ((dfrag = mptcp_next_frag(sk, dfrag)));
+
+			mptcp_push_release(ssk, &info);
+			msk->last_snd = ssk;
+			mptcp_subflow_set_scheduled(subflow, false);
+		}
+	}
 
-			mptcp_update_post_push(msk, dfrag, ret);
+	while((dfrag = mptcp_send_head(sk))) {
+		if (!last_dfrag || !last_sent)
+			goto out;
+
+		if (dfrag == last_dfrag) {
+			mptcp_update_post_push(msk, dfrag, last_sent);
+			WRITE_ONCE(msk->first_pending, mptcp_send_next(sk));
+			break;
 		}
+		dfrag->already_sent = dfrag->data_len;
 		WRITE_ONCE(msk->first_pending, mptcp_send_next(sk));
 	}
 
-	/* at this point we held the socket lock for the last subflow we used */
-	if (ssk)
-		mptcp_push_release(ssk, &info);
-
 out:
 	/* ensure the rtx timer is running */
 	if (!mptcp_timer_pending(sk))
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index b6411285a0b2..149c1e007d5b 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -358,6 +358,16 @@ static inline struct mptcp_data_frag *mptcp_send_next(struct sock *sk)
 						     list_next_entry(cur, list);
 }
 
+static inline struct mptcp_data_frag *mptcp_next_frag(struct sock *sk,
+						      struct mptcp_data_frag *cur)
+{
+	if (!cur)
+		return NULL;
+
+	return list_is_last(&cur->list, &mptcp_sk(sk)->rtx_queue) ? NULL :
+						     list_next_entry(cur, list);
+}
+
 static inline struct mptcp_data_frag *mptcp_pending_tail(const struct sock *sk)
 {
 	struct mptcp_sock *msk = mptcp_sk(sk);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH mptcp-next v11 3/5] mptcp: redundant subflows retrans support
  2022-09-09  9:11 [PATCH mptcp-next v11 0/5] BPF redundant scheduler Geliang Tang
  2022-09-09  9:11 ` [PATCH mptcp-next v11 1/5] Squash to "mptcp: add get_subflow wrappers" Geliang Tang
  2022-09-09  9:11 ` [PATCH mptcp-next v11 2/5] mptcp: redundant subflows push pending Geliang Tang
@ 2022-09-09  9:11 ` Geliang Tang
  2022-09-09  9:11 ` [PATCH mptcp-next v11 4/5] selftests/bpf: Add bpf_red scheduler Geliang Tang
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Geliang Tang @ 2022-09-09  9:11 UTC (permalink / raw)
  To: mptcp; +Cc: Geliang Tang

This patch adds the redundant subflows support for __mptcp_retrans(). In
it, use sched_get_retrans() wrapper instead of mptcp_subflow_get_retrans().

Iterate each subflow of msk, check the scheduled flag to test if it is
picked by the scheduler. If so, use it to send data.

Signed-off-by: Geliang Tang <geliang.tang@suse.com>
---
 net/mptcp/protocol.c | 62 ++++++++++++++++++++++++++++----------------
 1 file changed, 39 insertions(+), 23 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index be62126d7c63..bbf55a0fe314 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2430,16 +2430,17 @@ static void mptcp_check_fastclose(struct mptcp_sock *msk)
 static void __mptcp_retrans(struct sock *sk)
 {
 	struct mptcp_sock *msk = mptcp_sk(sk);
+	struct mptcp_subflow_context *subflow;
 	struct mptcp_sendmsg_info info = {};
 	struct mptcp_data_frag *dfrag;
-	size_t copied = 0;
 	struct sock *ssk;
-	int ret;
+	int ret, err;
+	u16 len = 0;
 
 	mptcp_clean_una_wakeup(sk);
 
 	/* first check ssk: need to kick "stale" logic */
-	ssk = mptcp_subflow_get_retrans(msk);
+	err = mptcp_sched_get_retrans(msk);
 	dfrag = mptcp_rtx_head(sk);
 	if (!dfrag) {
 		if (mptcp_data_fin_enabled(msk)) {
@@ -2458,31 +2459,46 @@ static void __mptcp_retrans(struct sock *sk)
 		goto reset_timer;
 	}
 
-	if (!ssk)
+	if (err)
 		goto reset_timer;
 
-	lock_sock(ssk);
+	mptcp_for_each_subflow(msk, subflow) {
+		if (READ_ONCE(subflow->scheduled)) {
+			u16 copied = 0;
 
-	/* limit retransmission to the bytes already sent on some subflows */
-	info.sent = 0;
-	info.limit = READ_ONCE(msk->csum_enabled) ? dfrag->data_len : dfrag->already_sent;
-	while (info.sent < info.limit) {
-		ret = mptcp_sendmsg_frag(sk, ssk, dfrag, &info);
-		if (ret <= 0)
-			break;
+			ssk = mptcp_subflow_tcp_sock(subflow);
+			if (!ssk)
+				goto reset_timer;
 
-		MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RETRANSSEGS);
-		copied += ret;
-		info.sent += ret;
-	}
-	if (copied) {
-		dfrag->already_sent = max(dfrag->already_sent, info.sent);
-		tcp_push(ssk, 0, info.mss_now, tcp_sk(ssk)->nonagle,
-			 info.size_goal);
-		WRITE_ONCE(msk->allow_infinite_fallback, false);
-	}
+			lock_sock(ssk);
 
-	release_sock(ssk);
+			/* limit retransmission to the bytes already sent on some subflows */
+			info.sent = 0;
+			info.limit = READ_ONCE(msk->csum_enabled) ? dfrag->data_len :
+								    dfrag->already_sent;
+			while (info.sent < info.limit) {
+				ret = mptcp_sendmsg_frag(sk, ssk, dfrag, &info);
+				if (ret <= 0)
+					break;
+
+				MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RETRANSSEGS);
+				copied += ret;
+				info.sent += ret;
+			}
+			if (copied) {
+				len = max(copied, len);
+				tcp_push(ssk, 0, info.mss_now, tcp_sk(ssk)->nonagle,
+					 info.size_goal);
+				WRITE_ONCE(msk->allow_infinite_fallback, false);
+			}
+
+			release_sock(ssk);
+
+			msk->last_snd = ssk;
+			mptcp_subflow_set_scheduled(subflow, false);
+		}
+	}
+	dfrag->already_sent = max(dfrag->already_sent, len);
 
 reset_timer:
 	mptcp_check_and_set_pending(sk);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH mptcp-next v11 4/5] selftests/bpf: Add bpf_red scheduler
  2022-09-09  9:11 [PATCH mptcp-next v11 0/5] BPF redundant scheduler Geliang Tang
                   ` (2 preceding siblings ...)
  2022-09-09  9:11 ` [PATCH mptcp-next v11 3/5] mptcp: redundant subflows retrans support Geliang Tang
@ 2022-09-09  9:11 ` Geliang Tang
  2022-09-09  9:11 ` [PATCH mptcp-next v11 5/5] selftests/bpf: Add bpf_red test Geliang Tang
  2022-09-13  0:01 ` [PATCH mptcp-next v11 0/5] BPF redundant scheduler Mat Martineau
  5 siblings, 0 replies; 9+ messages in thread
From: Geliang Tang @ 2022-09-09  9:11 UTC (permalink / raw)
  To: mptcp; +Cc: Geliang Tang

This patch implements the redundant BPF MPTCP scheduler, named bpf_red,
which sends all packets redundantly on all available subflows.

Signed-off-by: Geliang Tang <geliang.tang@suse.com>
---
 .../selftests/bpf/progs/mptcp_bpf_red.c       | 36 +++++++++++++++++++
 1 file changed, 36 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/mptcp_bpf_red.c

diff --git a/tools/testing/selftests/bpf/progs/mptcp_bpf_red.c b/tools/testing/selftests/bpf/progs/mptcp_bpf_red.c
new file mode 100644
index 000000000000..58f90473e495
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/mptcp_bpf_red.c
@@ -0,0 +1,36 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022, SUSE. */
+
+#include <linux/bpf.h>
+#include "bpf_tcp_helpers.h"
+
+char _license[] SEC("license") = "GPL";
+
+SEC("struct_ops/mptcp_sched_red_init")
+void BPF_PROG(mptcp_sched_red_init, const struct mptcp_sock *msk)
+{
+}
+
+SEC("struct_ops/mptcp_sched_red_release")
+void BPF_PROG(mptcp_sched_red_release, const struct mptcp_sock *msk)
+{
+}
+
+void BPF_STRUCT_OPS(bpf_red_get_subflow, const struct mptcp_sock *msk,
+		    struct mptcp_sched_data *data)
+{
+	for (int i = 0; i < MPTCP_SUBFLOWS_MAX; i++) {
+		if (!data->contexts[i])
+			break;
+
+		mptcp_subflow_set_scheduled(data->contexts[i], true);
+	}
+}
+
+SEC(".struct_ops")
+struct mptcp_sched_ops red = {
+	.init		= (void *)mptcp_sched_red_init,
+	.release	= (void *)mptcp_sched_red_release,
+	.get_subflow	= (void *)bpf_red_get_subflow,
+	.name		= "bpf_red",
+};
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH mptcp-next v11 5/5] selftests/bpf: Add bpf_red test
  2022-09-09  9:11 [PATCH mptcp-next v11 0/5] BPF redundant scheduler Geliang Tang
                   ` (3 preceding siblings ...)
  2022-09-09  9:11 ` [PATCH mptcp-next v11 4/5] selftests/bpf: Add bpf_red scheduler Geliang Tang
@ 2022-09-09  9:11 ` Geliang Tang
  2022-09-09 10:54   ` selftests/bpf: Add bpf_red test: Tests Results MPTCP CI
  2022-09-13  0:01 ` [PATCH mptcp-next v11 0/5] BPF redundant scheduler Mat Martineau
  5 siblings, 1 reply; 9+ messages in thread
From: Geliang Tang @ 2022-09-09  9:11 UTC (permalink / raw)
  To: mptcp; +Cc: Geliang Tang

This patch adds the redundant BPF MPTCP scheduler test: test_red(). Use
sysctl to set net.mptcp.scheduler to use this sched. Add two veth net
devices to simulate the multiple addresses case. Use 'ip mptcp endpoint'
command to add the new endpoint ADDR_2 to PM netlink. Send data and check
bytes_sent of 'ss' output after it to make sure the data has been
redundantly sent on both net devices.

Signed-off-by: Geliang Tang <geliang.tang@suse.com>
---
 .../testing/selftests/bpf/prog_tests/mptcp.c  | 34 +++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c
index 647d313475bc..8426a5aba721 100644
--- a/tools/testing/selftests/bpf/prog_tests/mptcp.c
+++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c
@@ -9,6 +9,7 @@
 #include "mptcp_bpf_first.skel.h"
 #include "mptcp_bpf_bkup.skel.h"
 #include "mptcp_bpf_rr.skel.h"
+#include "mptcp_bpf_red.skel.h"
 
 #ifndef TCP_CA_NAME_MAX
 #define TCP_CA_NAME_MAX	16
@@ -381,6 +382,37 @@ static void test_rr(void)
 	mptcp_bpf_rr__destroy(rr_skel);
 }
 
+static void test_red(void)
+{
+	struct mptcp_bpf_red *red_skel;
+	int server_fd, client_fd;
+	struct bpf_link *link;
+
+	red_skel = mptcp_bpf_red__open_and_load();
+	if (!ASSERT_OK_PTR(red_skel, "bpf_red__open_and_load"))
+		return;
+
+	link = bpf_map__attach_struct_ops(red_skel->maps.red);
+	if (!ASSERT_OK_PTR(link, "bpf_map__attach_struct_ops")) {
+		mptcp_bpf_red__destroy(red_skel);
+		return;
+	}
+
+	sched_init("subflow", "bpf_red");
+	server_fd = start_mptcp_server(AF_INET, ADDR_1, 0, 0);
+	client_fd = connect_to_fd(server_fd, 0);
+
+	send_data(server_fd, client_fd);
+	ASSERT_OK(has_bytes_sent(ADDR_1), "has_bytes_sent addr 1");
+	ASSERT_OK(has_bytes_sent(ADDR_2), "has_bytes_sent addr 2");
+
+	close(client_fd);
+	close(server_fd);
+	sched_cleanup();
+	bpf_link__destroy(link);
+	mptcp_bpf_red__destroy(red_skel);
+}
+
 void test_mptcp(void)
 {
 	if (test__start_subtest("base"))
@@ -391,4 +423,6 @@ void test_mptcp(void)
 		test_bkup();
 	if (test__start_subtest("rr"))
 		test_rr();
+	if (test__start_subtest("red"))
+		test_red();
 }
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: selftests/bpf: Add bpf_red test: Tests Results
  2022-09-09  9:11 ` [PATCH mptcp-next v11 5/5] selftests/bpf: Add bpf_red test Geliang Tang
@ 2022-09-09 10:54   ` MPTCP CI
  0 siblings, 0 replies; 9+ messages in thread
From: MPTCP CI @ 2022-09-09 10:54 UTC (permalink / raw)
  To: Geliang Tang; +Cc: mptcp

Hi Geliang,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal:
  - Unstable: 4 failed test(s): mptcp_connect_mmap selftest_mptcp_connect selftest_mptcp_join selftest_simult_flows - Critical: 2 Call Trace(s) ❌:
  - Task: https://cirrus-ci.com/task/5145384333869056
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/5145384333869056/summary/summary.txt

- KVM Validation: debug:
  - Unstable: 4 failed test(s): mptcp_connect_mmap selftest_mptcp_connect selftest_mptcp_join selftest_simult_flows - Critical: 2 Call Trace(s) ❌:
  - Task: https://cirrus-ci.com/task/4656858278395904
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/4656858278395904/summary/summary.txt

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/1f51144b7bf6


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-debug

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (Tessares)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH mptcp-next v11 0/5] BPF redundant scheduler
  2022-09-09  9:11 [PATCH mptcp-next v11 0/5] BPF redundant scheduler Geliang Tang
                   ` (4 preceding siblings ...)
  2022-09-09  9:11 ` [PATCH mptcp-next v11 5/5] selftests/bpf: Add bpf_red test Geliang Tang
@ 2022-09-13  0:01 ` Mat Martineau
  2022-09-23  3:25   ` Geliang Tang
  5 siblings, 1 reply; 9+ messages in thread
From: Mat Martineau @ 2022-09-13  0:01 UTC (permalink / raw)
  To: Geliang Tang; +Cc: mptcp

On Fri, 9 Sep 2022, Geliang Tang wrote:

> v11:
> - address to Mat's comments in v10.
> - rebase to export/20220908T063452
>

Hi Geliang -

Thanks for the updates to this series.

I get slightly different kernel splats than the CI. For example, here's my 
kmsg output with the first test in mptcp_connect.sh:

[ 3102.670021] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth2: link becomes ready
[ 3102.885448] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth3: link becomes ready
[ 3103.112575] IPv6: ADDRCONF(NETDEV_CHANGE): ns3eth4: link becomes ready
[ 3103.463347] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth1: link becomes ready
[ 3107.580236] ------------[ cut here ]------------
[ 3107.581325] WARNING: CPU: 2 PID: 1112 at net/mptcp/protocol.c:1306 mptcp_sendmsg_frag (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1306 (discriminator 1)) 
[ 3107.583192] Modules linked in:
[ 3107.585317] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014
[ 3107.587250] RIP: 0010:mptcp_sendmsg_frag (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1306 (discriminator 1)) 
[ 3107.588421] Code: 0f 85 21 fd ff ff 48 c7 c2 a0 c1 1a 83 be 78 00 00 00 48 c7 c7 80 c7 1a 83 c6 05 9e 31 f2 01 01 e8 03 83 03 00 e9 fd fc ff ff <0f> 0b 0f b6 44 24 63 88 44 24 30 e9 dc f4 ff ff 8b 74 24 18 48 89
All code
========
    0:	0f 85 21 fd ff ff    	jne    0xfffffffffffffd27
    6:	48 c7 c2 a0 c1 1a 83 	mov    $0xffffffff831ac1a0,%rdx
    d:	be 78 00 00 00       	mov    $0x78,%esi
   12:	48 c7 c7 80 c7 1a 83 	mov    $0xffffffff831ac780,%rdi
   19:	c6 05 9e 31 f2 01 01 	movb   $0x1,0x1f2319e(%rip)        # 0x1f231be
   20:	e8 03 83 03 00       	call   0x38328
   25:	e9 fd fc ff ff       	jmp    0xfffffffffffffd27
   2a:*	0f 0b                	ud2    		<-- trapping instruction
   2c:	0f b6 44 24 63       	movzbl 0x63(%rsp),%eax
   31:	88 44 24 30          	mov    %al,0x30(%rsp)
   35:	e9 dc f4 ff ff       	jmp    0xfffffffffffff516
   3a:	8b 74 24 18          	mov    0x18(%rsp),%esi
   3e:	48                   	rex.W
   3f:	89                   	.byte 0x89

Code starting with the faulting instruction
===========================================
    0:	0f 0b                	ud2
    2:	0f b6 44 24 63       	movzbl 0x63(%rsp),%eax
    7:	88 44 24 30          	mov    %al,0x30(%rsp)
    b:	e9 dc f4 ff ff       	jmp    0xfffffffffffff4ec
   10:	8b 74 24 18          	mov    0x18(%rsp),%esi
   14:	48                   	rex.W
   15:	89                   	.byte 0x89
[ 3107.592031] RSP: 0018:ffff888010f67910 EFLAGS: 00010202
[ 3107.593172] RAX: e585171f95821f87 RBX: ffff888113b04f00 RCX: ffffffff8260bb19
[ 3107.594458] RDX: 0000000000000001 RSI: ffffffff8260bac4 RDI: ffff88800fa84848
[ 3107.596326] RBP: e585171f95821f88 R08: 0000000000000000 R09: ffff88800fa848af
[ 3107.597665] R10: ffffed1001f50915 R11: 0000000000000000 R12: ffff888010f67a78
[ 3107.599903] R13: 0000000000000001 R14: ffff888107691800 R15: ffff8880357a0000
[ 3107.601366] FS:  00007f668bdb0740(0000) GS:ffff88811b100000(0000) knlGS:0000000000000000
[ 3107.603201] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3107.604495] CR2: 00007ffe77890328 CR3: 0000000115318005 CR4: 0000000000370ee0
[ 3107.606212] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3107.607854] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 3107.610532] Call Trace:
[ 3107.611229]  <TASK>
[ 3107.611746] ? mptcp_init_sock (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1230) 
[ 3107.612654] ? lockdep_hardirqs_on_prepare (/home/mjmartin/work/mptcp-nn/kernel/locking/lockdep.c:4252 /home/mjmartin/work/mptcp-nn/kernel/locking/lockdep.c:4319 
/home/mjmartin/work/mptcp-nn/kernel/locking/lockdep.c:4271) 
[ 3107.613890] ? __local_bh_enable_ip (/home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:45 /home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:80 
/home/mjmartin/work/mptcp-nn/kernel/softirq.c:401) 
[ 3107.614952] __mptcp_push_pending (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1569) 
[ 3107.615948] ? mptcp_close (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1532) 
[ 3107.616850] ? __sk_mem_raise_allocated (/home/mjmartin/work/mptcp-nn/net/core/sock.c:2810 /home/mjmartin/work/mptcp-nn/net/core/sock.c:2981) 
[ 3107.617982] ? copy_page_from_iter (/home/mjmartin/work/mptcp-nn/lib/iov_iter.c:751 /home/mjmartin/work/mptcp-nn/lib/iov_iter.c:738) 
[ 3107.618991] mptcp_sendmsg (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1789) 
[ 3107.619896] ? __mptcp_push_pending (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1682) 
[ 3107.621006] ? inet_send_prepare (/home/mjmartin/work/mptcp-nn/net/ipv4/af_inet.c:807) 
[ 3107.622087] ? inet_send_prepare (/home/mjmartin/work/mptcp-nn/net/ipv4/af_inet.c:816) 
[ 3107.623082] sock_sendmsg (/home/mjmartin/work/mptcp-nn/net/socket.c:717 /home/mjmartin/work/mptcp-nn/net/socket.c:734) 
[ 3107.624372] sock_write_iter (/home/mjmartin/work/mptcp-nn/net/socket.c:1109) 
[ 3107.625361] ? sock_sendmsg (/home/mjmartin/work/mptcp-nn/net/socket.c:1092) 
[ 3107.626261] ? file_has_perm (/home/mjmartin/work/mptcp-nn/security/selinux/hooks.c:1724) 
[ 3107.627185] ? selinux_file_permission (/home/mjmartin/work/mptcp-nn/security/selinux/hooks.c:3570 /home/mjmartin/work/mptcp-nn/security/selinux/hooks.c:3590) 
[ 3107.628345] vfs_write (/home/mjmartin/work/mptcp-nn/./include/linux/fs.h:2187 /home/mjmartin/work/mptcp-nn/fs/read_write.c:491 /home/mjmartin/work/mptcp-nn/fs/read_write.c:578) 
[ 3107.629171] ? __ia32_sys_pread64 (/home/mjmartin/work/mptcp-nn/fs/read_write.c:559) 
[ 3107.630159] ? bit_wait_io_timeout (/home/mjmartin/work/mptcp-nn/kernel/locking/mutex.c:902) 
[ 3107.631177] ? __fget_light (/home/mjmartin/work/mptcp-nn/fs/file.c:1007 (discriminator 1)) 
[ 3107.632005] ksys_write (/home/mjmartin/work/mptcp-nn/fs/read_write.c:631) 
[ 3107.632957] ? __ia32_sys_read (/home/mjmartin/work/mptcp-nn/fs/read_write.c:621) 
[ 3107.633859] ? lockdep_hardirqs_on_prepare (/home/mjmartin/work/mptcp-nn/kernel/locking/lockdep.c:466 /home/mjmartin/work/mptcp-nn/kernel/locking/lockdep.c:4320 
/home/mjmartin/work/mptcp-nn/kernel/locking/lockdep.c:4271) 
[ 3107.635041] ? syscall_enter_from_user_mode (/home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:45 /home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:80 
/home/mjmartin/work/mptcp-nn/kernel/entry/common.c:109) 
[ 3107.636398] do_syscall_64 (/home/mjmartin/work/mptcp-nn/arch/x86/entry/common.c:50 /home/mjmartin/work/mptcp-nn/arch/x86/entry/common.c:80) 
[ 3107.637254] entry_SYSCALL_64_after_hwframe (/home/mjmartin/work/mptcp-nn/arch/x86/entry/entry_64.S:120) 
[ 3107.638351] RIP: 0033:0x7f668beb48f7
[ 3107.639336] Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
All code
========
    0:	0f 00                	(bad)
    2:	f7 d8                	neg    %eax
    4:	64 89 02             	mov    %eax,%fs:(%rdx)
    7:	48 c7 c0 ff ff ff ff 	mov    $0xffffffffffffffff,%rax
    e:	eb b7                	jmp    0xffffffffffffffc7
   10:	0f 1f 00             	nopl   (%rax)
   13:	f3 0f 1e fa          	endbr64
   17:	64 8b 04 25 18 00 00 	mov    %fs:0x18,%eax
   1e:	00
   1f:	85 c0                	test   %eax,%eax
   21:	75 10                	jne    0x33
   23:	b8 01 00 00 00       	mov    $0x1,%eax
   28:	0f 05                	syscall
   2a:*	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax		<-- trapping instruction
   30:	77 51                	ja     0x83
   32:	c3                   	ret
   33:	48 83 ec 28          	sub    $0x28,%rsp
   37:	48 89 54 24 18       	mov    %rdx,0x18(%rsp)
   3c:	48                   	rex.W
   3d:	89                   	.byte 0x89
   3e:	74 24                	je     0x64

Code starting with the faulting instruction
===========================================
    0:	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax
    6:	77 51                	ja     0x59
    8:	c3                   	ret
    9:	48 83 ec 28          	sub    $0x28,%rsp
    d:	48 89 54 24 18       	mov    %rdx,0x18(%rsp)
   12:	48                   	rex.W
   13:	89                   	.byte 0x89
   14:	74 24                	je     0x3a
[ 3107.643426] RSP: 002b:00007ffe778943c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 3107.645335] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f668beb48f7
[ 3107.647101] RDX: 0000000000002000 RSI: 00007ffe778943f0 RDI: 0000000000000003
[ 3107.648756] RBP: 0000000000000000 R08: 00007f668bfab214 R09: 00007f668bfab280
[ 3107.650320] R10: 00007f668bdba140 R11: 0000000000000246 R12: 0000000000001500
[ 3107.651883] R13: 0000000000002000 R14: 0000000000000000 R15: 0000000000002000
[ 3107.653497]  </TASK>
[ 3107.654005] irq event stamp: 17761
[ 3107.654780] hardirqs last enabled at (17773): __up_console_sem (/home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:45 (discriminator 1) /home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:80 
(discriminator 1) /home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:138 (discriminator 1) /home/mjmartin/work/mptcp-nn/kernel/printk/printk.c:264 (discriminator 1)) 
[ 3107.656996] hardirqs last disabled at (17788): __schedule (/home/mjmartin/work/mptcp-nn/kernel/sched/core.c:6393 (discriminator 1)) 
[ 3107.658933] softirqs last enabled at (17806): __irq_exit_rcu (/home/mjmartin/work/mptcp-nn/kernel/softirq.c:445 /home/mjmartin/work/mptcp-nn/kernel/softirq.c:650) 
[ 3107.660848] softirqs last disabled at (17819): __irq_exit_rcu (/home/mjmartin/work/mptcp-nn/kernel/softirq.c:445 /home/mjmartin/work/mptcp-nn/kernel/softirq.c:650) 
[ 3107.662940] ---[ end trace 0000000000000000 ]---


Do you see anything similar in your testing? This was on a 4-cpu VM for 
me.


Line 1306 of protocol.c that caused the splat is:

 		WARN_ON_ONCE(reuse_skb);

and it looks like that is expected to happen with a zero window and all 
data acked. Sounds like a condition that wasn't expected with previous 
schedulers (that only sent on one subflow at a time), but could happen 
with redundant schedulers when msk->snd_una is updated by another subflow.

If you can't reproduce this, let me know and I can investigate some more. 
It's reproducible on my system.

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH mptcp-next v11 0/5] BPF redundant scheduler
  2022-09-13  0:01 ` [PATCH mptcp-next v11 0/5] BPF redundant scheduler Mat Martineau
@ 2022-09-23  3:25   ` Geliang Tang
  0 siblings, 0 replies; 9+ messages in thread
From: Geliang Tang @ 2022-09-23  3:25 UTC (permalink / raw)
  To: Mat Martineau; +Cc: mptcp

Hi Mat,

Sorry for the late reply.

On Mon, Sep 12, 2022 at 05:01:39PM -0700, Mat Martineau wrote:
> On Fri, 9 Sep 2022, Geliang Tang wrote:
> 
> > v11:
> > - address to Mat's comments in v10.
> > - rebase to export/20220908T063452
> > 
> 
> Hi Geliang -
> 
> Thanks for the updates to this series.
> 
> I get slightly different kernel splats than the CI. For example, here's my
> kmsg output with the first test in mptcp_connect.sh:
> 
> [ 3102.670021] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth2: link becomes ready
> [ 3102.885448] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth3: link becomes ready
> [ 3103.112575] IPv6: ADDRCONF(NETDEV_CHANGE): ns3eth4: link becomes ready
> [ 3103.463347] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth1: link becomes ready
> [ 3107.580236] ------------[ cut here ]------------
> [ 3107.581325] WARNING: CPU: 2 PID: 1112 at net/mptcp/protocol.c:1306
> mptcp_sendmsg_frag (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1306
> (discriminator 1)) [ 3107.583192] Modules linked in:
> [ 3107.585317] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014
> [ 3107.587250] RIP: 0010:mptcp_sendmsg_frag
> (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1306 (discriminator 1)) [
> 3107.588421] Code: 0f 85 21 fd ff ff 48 c7 c2 a0 c1 1a 83 be 78 00 00 00 48
> c7 c7 80 c7 1a 83 c6 05 9e 31 f2 01 01 e8 03 83 03 00 e9 fd fc ff ff <0f> 0b
> 0f b6 44 24 63 88 44 24 30 e9 dc f4 ff ff 8b 74 24 18 48 89
> All code
> ========
>    0:	0f 85 21 fd ff ff    	jne    0xfffffffffffffd27
>    6:	48 c7 c2 a0 c1 1a 83 	mov    $0xffffffff831ac1a0,%rdx
>    d:	be 78 00 00 00       	mov    $0x78,%esi
>   12:	48 c7 c7 80 c7 1a 83 	mov    $0xffffffff831ac780,%rdi
>   19:	c6 05 9e 31 f2 01 01 	movb   $0x1,0x1f2319e(%rip)        # 0x1f231be
>   20:	e8 03 83 03 00       	call   0x38328
>   25:	e9 fd fc ff ff       	jmp    0xfffffffffffffd27
>   2a:*	0f 0b                	ud2    		<-- trapping instruction
>   2c:	0f b6 44 24 63       	movzbl 0x63(%rsp),%eax
>   31:	88 44 24 30          	mov    %al,0x30(%rsp)
>   35:	e9 dc f4 ff ff       	jmp    0xfffffffffffff516
>   3a:	8b 74 24 18          	mov    0x18(%rsp),%esi
>   3e:	48                   	rex.W
>   3f:	89                   	.byte 0x89
> 
> Code starting with the faulting instruction
> ===========================================
>    0:	0f 0b                	ud2
>    2:	0f b6 44 24 63       	movzbl 0x63(%rsp),%eax
>    7:	88 44 24 30          	mov    %al,0x30(%rsp)
>    b:	e9 dc f4 ff ff       	jmp    0xfffffffffffff4ec
>   10:	8b 74 24 18          	mov    0x18(%rsp),%esi
>   14:	48                   	rex.W
>   15:	89                   	.byte 0x89
> [ 3107.592031] RSP: 0018:ffff888010f67910 EFLAGS: 00010202
> [ 3107.593172] RAX: e585171f95821f87 RBX: ffff888113b04f00 RCX: ffffffff8260bb19
> [ 3107.594458] RDX: 0000000000000001 RSI: ffffffff8260bac4 RDI: ffff88800fa84848
> [ 3107.596326] RBP: e585171f95821f88 R08: 0000000000000000 R09: ffff88800fa848af
> [ 3107.597665] R10: ffffed1001f50915 R11: 0000000000000000 R12: ffff888010f67a78
> [ 3107.599903] R13: 0000000000000001 R14: ffff888107691800 R15: ffff8880357a0000
> [ 3107.601366] FS:  00007f668bdb0740(0000) GS:ffff88811b100000(0000) knlGS:0000000000000000
> [ 3107.603201] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 3107.604495] CR2: 00007ffe77890328 CR3: 0000000115318005 CR4: 0000000000370ee0
> [ 3107.606212] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 3107.607854] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 3107.610532] Call Trace:
> [ 3107.611229]  <TASK>
> [ 3107.611746] ? mptcp_init_sock
> (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1230) [ 3107.612654] ?
> lockdep_hardirqs_on_prepare
> (/home/mjmartin/work/mptcp-nn/kernel/locking/lockdep.c:4252
> /home/mjmartin/work/mptcp-nn/kernel/locking/lockdep.c:4319
> /home/mjmartin/work/mptcp-nn/kernel/locking/lockdep.c:4271) [ 3107.613890] ?
> __local_bh_enable_ip
> (/home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:45
> /home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:80
> /home/mjmartin/work/mptcp-nn/kernel/softirq.c:401) [ 3107.614952]
> __mptcp_push_pending
> (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1569) [ 3107.615948] ?
> mptcp_close (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1532) [
> 3107.616850] ? __sk_mem_raise_allocated
> (/home/mjmartin/work/mptcp-nn/net/core/sock.c:2810
> /home/mjmartin/work/mptcp-nn/net/core/sock.c:2981) [ 3107.617982] ?
> copy_page_from_iter (/home/mjmartin/work/mptcp-nn/lib/iov_iter.c:751
> /home/mjmartin/work/mptcp-nn/lib/iov_iter.c:738) [ 3107.618991]
> mptcp_sendmsg (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1789) [
> 3107.619896] ? __mptcp_push_pending
> (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1682) [ 3107.621006] ?
> inet_send_prepare (/home/mjmartin/work/mptcp-nn/net/ipv4/af_inet.c:807) [
> 3107.622087] ? inet_send_prepare
> (/home/mjmartin/work/mptcp-nn/net/ipv4/af_inet.c:816) [ 3107.623082]
> sock_sendmsg (/home/mjmartin/work/mptcp-nn/net/socket.c:717
> /home/mjmartin/work/mptcp-nn/net/socket.c:734) [ 3107.624372]
> sock_write_iter (/home/mjmartin/work/mptcp-nn/net/socket.c:1109) [
> 3107.625361] ? sock_sendmsg (/home/mjmartin/work/mptcp-nn/net/socket.c:1092)
> [ 3107.626261] ? file_has_perm
> (/home/mjmartin/work/mptcp-nn/security/selinux/hooks.c:1724) [ 3107.627185]
> ? selinux_file_permission
> (/home/mjmartin/work/mptcp-nn/security/selinux/hooks.c:3570
> /home/mjmartin/work/mptcp-nn/security/selinux/hooks.c:3590) [ 3107.628345]
> vfs_write (/home/mjmartin/work/mptcp-nn/./include/linux/fs.h:2187
> /home/mjmartin/work/mptcp-nn/fs/read_write.c:491
> /home/mjmartin/work/mptcp-nn/fs/read_write.c:578) [ 3107.629171] ?
> __ia32_sys_pread64 (/home/mjmartin/work/mptcp-nn/fs/read_write.c:559) [
> 3107.630159] ? bit_wait_io_timeout
> (/home/mjmartin/work/mptcp-nn/kernel/locking/mutex.c:902) [ 3107.631177] ?
> __fget_light (/home/mjmartin/work/mptcp-nn/fs/file.c:1007 (discriminator 1))
> [ 3107.632005] ksys_write (/home/mjmartin/work/mptcp-nn/fs/read_write.c:631)
> [ 3107.632957] ? __ia32_sys_read
> (/home/mjmartin/work/mptcp-nn/fs/read_write.c:621) [ 3107.633859] ?
> lockdep_hardirqs_on_prepare
> (/home/mjmartin/work/mptcp-nn/kernel/locking/lockdep.c:466
> /home/mjmartin/work/mptcp-nn/kernel/locking/lockdep.c:4320
> /home/mjmartin/work/mptcp-nn/kernel/locking/lockdep.c:4271) [ 3107.635041] ?
> syscall_enter_from_user_mode
> (/home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:45
> /home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:80
> /home/mjmartin/work/mptcp-nn/kernel/entry/common.c:109) [ 3107.636398]
> do_syscall_64 (/home/mjmartin/work/mptcp-nn/arch/x86/entry/common.c:50
> /home/mjmartin/work/mptcp-nn/arch/x86/entry/common.c:80) [ 3107.637254]
> entry_SYSCALL_64_after_hwframe
> (/home/mjmartin/work/mptcp-nn/arch/x86/entry/entry_64.S:120) [ 3107.638351]
> RIP: 0033:0x7f668beb48f7
> [ 3107.639336] Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
> All code
> ========
>    0:	0f 00                	(bad)
>    2:	f7 d8                	neg    %eax
>    4:	64 89 02             	mov    %eax,%fs:(%rdx)
>    7:	48 c7 c0 ff ff ff ff 	mov    $0xffffffffffffffff,%rax
>    e:	eb b7                	jmp    0xffffffffffffffc7
>   10:	0f 1f 00             	nopl   (%rax)
>   13:	f3 0f 1e fa          	endbr64
>   17:	64 8b 04 25 18 00 00 	mov    %fs:0x18,%eax
>   1e:	00
>   1f:	85 c0                	test   %eax,%eax
>   21:	75 10                	jne    0x33
>   23:	b8 01 00 00 00       	mov    $0x1,%eax
>   28:	0f 05                	syscall
>   2a:*	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax		<-- trapping instruction
>   30:	77 51                	ja     0x83
>   32:	c3                   	ret
>   33:	48 83 ec 28          	sub    $0x28,%rsp
>   37:	48 89 54 24 18       	mov    %rdx,0x18(%rsp)
>   3c:	48                   	rex.W
>   3d:	89                   	.byte 0x89
>   3e:	74 24                	je     0x64
> 
> Code starting with the faulting instruction
> ===========================================
>    0:	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax
>    6:	77 51                	ja     0x59
>    8:	c3                   	ret
>    9:	48 83 ec 28          	sub    $0x28,%rsp
>    d:	48 89 54 24 18       	mov    %rdx,0x18(%rsp)
>   12:	48                   	rex.W
>   13:	89                   	.byte 0x89
>   14:	74 24                	je     0x3a
> [ 3107.643426] RSP: 002b:00007ffe778943c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [ 3107.645335] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f668beb48f7
> [ 3107.647101] RDX: 0000000000002000 RSI: 00007ffe778943f0 RDI: 0000000000000003
> [ 3107.648756] RBP: 0000000000000000 R08: 00007f668bfab214 R09: 00007f668bfab280
> [ 3107.650320] R10: 00007f668bdba140 R11: 0000000000000246 R12: 0000000000001500
> [ 3107.651883] R13: 0000000000002000 R14: 0000000000000000 R15: 0000000000002000
> [ 3107.653497]  </TASK>
> [ 3107.654005] irq event stamp: 17761
> [ 3107.654780] hardirqs last enabled at (17773): __up_console_sem
> (/home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:45
> (discriminator 1)
> /home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:80
> (discriminator 1)
> /home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:138
> (discriminator 1) /home/mjmartin/work/mptcp-nn/kernel/printk/printk.c:264
> (discriminator 1)) [ 3107.656996] hardirqs last disabled at (17788):
> __schedule (/home/mjmartin/work/mptcp-nn/kernel/sched/core.c:6393
> (discriminator 1)) [ 3107.658933] softirqs last enabled at (17806):
> __irq_exit_rcu (/home/mjmartin/work/mptcp-nn/kernel/softirq.c:445
> /home/mjmartin/work/mptcp-nn/kernel/softirq.c:650) [ 3107.660848] softirqs
> last disabled at (17819): __irq_exit_rcu
> (/home/mjmartin/work/mptcp-nn/kernel/softirq.c:445
> /home/mjmartin/work/mptcp-nn/kernel/softirq.c:650) [ 3107.662940] ---[ end
> trace 0000000000000000 ]---
> 
> 
> Do you see anything similar in your testing? This was on a 4-cpu VM for me.
> 

Yes, I got both this error (1306 WARN_ON_ONCE(reuse_skb)) and another error
(1010 WARN_ON_ONCE(!msk->recovery)) in my tests.

> 
> Line 1306 of protocol.c that caused the splat is:
> 
> 		WARN_ON_ONCE(reuse_skb);
> 
> and it looks like that is expected to happen with a zero window and all data
> acked. Sounds like a condition that wasn't expected with previous schedulers
> (that only sent on one subflow at a time), but could happen with redundant
> schedulers when msk->snd_una is updated by another subflow.
>

The original code updates dfrag->already_sent immediately after invoking
mptcp_sendmsg_frag. But we delay updating dfrag->already_sent in our code
in this series after all frags are sent. Then mptcp_check_allowed_size()
will return 0 sometime in this case. We got (1306 WARN_ON_ONCE(reuse_skb))
error here:

1291         if (copy == 0) {
1292                 u64 snd_una = READ_ONCE(msk->snd_una);
1293 
1294                 if (snd_una != msk->snd_nxt) {
1295                         tcp_remove_empty_skb(ssk);
1296                         return 0;
1297                 }
1298 
1299                 zero_window_probe = true;
1300                 data_seq = snd_una - 1;
1301                 copy = 1;
1302 
1303                 /* all mptcp-level data is acked, no skbs should be present into the
1304                  * ssk write queue
1305                  */
1306                 WARN_ON_ONCE(reuse_skb);
1307         }

The orignal code updates msk->first_pending immediately after every frag is
sent, but we delay updating it after all frags are sent. In this way, the
code will run to the position of (dfrag == msk->first_pending). We got
(1010 WARN_ON_ONCE(!msk->recovery)) error here:

1008                 if (unlikely(dfrag == msk->first_pending)) {
1009                         /* in recovery mode can see ack after the current snd head */
1010                         if (WARN_ON_ONCE(!msk->recovery))
1011                                 break;
1012 
1013                         WRITE_ONCE(msk->first_pending, mptcp_send_next(sk));
1014                 }

I'm trying to fix these two errors, but I haven't made much progress. So I
want to hear your suggestions.

Thanks,
-Geliang

> If you can't reproduce this, let me know and I can investigate some more.
> It's reproducible on my system.
> 
> --
> Mat Martineau
> Intel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-09-23  3:24 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-09  9:11 [PATCH mptcp-next v11 0/5] BPF redundant scheduler Geliang Tang
2022-09-09  9:11 ` [PATCH mptcp-next v11 1/5] Squash to "mptcp: add get_subflow wrappers" Geliang Tang
2022-09-09  9:11 ` [PATCH mptcp-next v11 2/5] mptcp: redundant subflows push pending Geliang Tang
2022-09-09  9:11 ` [PATCH mptcp-next v11 3/5] mptcp: redundant subflows retrans support Geliang Tang
2022-09-09  9:11 ` [PATCH mptcp-next v11 4/5] selftests/bpf: Add bpf_red scheduler Geliang Tang
2022-09-09  9:11 ` [PATCH mptcp-next v11 5/5] selftests/bpf: Add bpf_red test Geliang Tang
2022-09-09 10:54   ` selftests/bpf: Add bpf_red test: Tests Results MPTCP CI
2022-09-13  0:01 ` [PATCH mptcp-next v11 0/5] BPF redundant scheduler Mat Martineau
2022-09-23  3:25   ` Geliang Tang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).