* [PATCH mptcp-next v25 1/5] mptcp: add scheduler wrappers
2022-12-15 12:32 [PATCH mptcp-next v25 0/5] BPF redundant scheduler, part 2 Geliang Tang
@ 2022-12-15 12:32 ` Geliang Tang
2022-12-15 12:32 ` [PATCH mptcp-next v25 2/5] mptcp: use get_send wrapper Geliang Tang
` (5 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Geliang Tang @ 2022-12-15 12:32 UTC (permalink / raw)
To: mptcp; +Cc: Geliang Tang
This patch defines two packet scheduler wrappers mptcp_sched_get_send()
and mptcp_sched_get_retrans(), invoke data_init() and get_subflow() of
msk->sched in them.
Set data->reinject to true in mptcp_sched_get_retrans(), set it false in
mptcp_sched_get_send().
If msk->sched is NULL, use default functions mptcp_subflow_get_send()
and mptcp_subflow_get_retrans() to send data.
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
---
net/mptcp/protocol.c | 4 ++--
net/mptcp/protocol.h | 4 ++++
net/mptcp/sched.c | 50 ++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 56 insertions(+), 2 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 1ee39a16d9a8..14c69a519898 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1396,7 +1396,7 @@ bool mptcp_subflow_active(struct mptcp_subflow_context *subflow)
* returns the subflow that will transmit the next DSS
* additionally updates the rtx timeout
*/
-static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk)
+struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk)
{
struct subflow_send_info send_info[SSK_MODE_MAX];
struct mptcp_subflow_context *subflow;
@@ -2216,7 +2216,7 @@ static void mptcp_timeout_timer(struct timer_list *t)
*
* A backup subflow is returned only if that is the only kind available.
*/
-static struct sock *mptcp_subflow_get_retrans(struct mptcp_sock *msk)
+struct sock *mptcp_subflow_get_retrans(struct mptcp_sock *msk)
{
struct sock *backup = NULL, *pick = NULL;
struct mptcp_subflow_context *subflow;
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index a9ff7028fad8..ecb94ce68ea4 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -657,6 +657,10 @@ void mptcp_subflow_set_scheduled(struct mptcp_subflow_context *subflow,
bool scheduled);
void mptcp_sched_data_set_contexts(const struct mptcp_sock *msk,
struct mptcp_sched_data *data);
+struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk);
+struct sock *mptcp_subflow_get_retrans(struct mptcp_sock *msk);
+int mptcp_sched_get_send(struct mptcp_sock *msk);
+int mptcp_sched_get_retrans(struct mptcp_sock *msk);
static inline bool __tcp_can_send(const struct sock *ssk)
{
diff --git a/net/mptcp/sched.c b/net/mptcp/sched.c
index 0d7c73e9562e..c4006f142f10 100644
--- a/net/mptcp/sched.c
+++ b/net/mptcp/sched.c
@@ -112,3 +112,53 @@ void mptcp_sched_data_set_contexts(const struct mptcp_sock *msk,
for (; i < MPTCP_SUBFLOWS_MAX; i++)
data->contexts[i] = NULL;
}
+
+int mptcp_sched_get_send(struct mptcp_sock *msk)
+{
+ struct mptcp_subflow_context *subflow;
+ struct mptcp_sched_data data;
+
+ mptcp_for_each_subflow(msk, subflow) {
+ if (READ_ONCE(subflow->scheduled))
+ return 0;
+ }
+
+ if (!msk->sched) {
+ struct sock *ssk;
+
+ ssk = mptcp_subflow_get_send(msk);
+ if (!ssk)
+ return -EINVAL;
+ mptcp_subflow_set_scheduled(mptcp_subflow_ctx(ssk), true);
+ return 0;
+ }
+
+ data.reinject = false;
+ msk->sched->data_init(msk, &data);
+ return msk->sched->get_subflow(msk, &data);
+}
+
+int mptcp_sched_get_retrans(struct mptcp_sock *msk)
+{
+ struct mptcp_subflow_context *subflow;
+ struct mptcp_sched_data data;
+
+ mptcp_for_each_subflow(msk, subflow) {
+ if (READ_ONCE(subflow->scheduled))
+ return 0;
+ }
+
+ if (!msk->sched) {
+ struct sock *ssk;
+
+ ssk = mptcp_subflow_get_retrans(msk);
+ if (!ssk)
+ return -EINVAL;
+ mptcp_subflow_set_scheduled(mptcp_subflow_ctx(ssk), true);
+ return 0;
+ }
+
+ data.reinject = true;
+ msk->sched->data_init(msk, &data);
+ return msk->sched->get_subflow(msk, &data);
+}
--
2.35.3
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH mptcp-next v25 2/5] mptcp: use get_send wrapper
2022-12-15 12:32 [PATCH mptcp-next v25 0/5] BPF redundant scheduler, part 2 Geliang Tang
2022-12-15 12:32 ` [PATCH mptcp-next v25 1/5] mptcp: add scheduler wrappers Geliang Tang
@ 2022-12-15 12:32 ` Geliang Tang
2022-12-20 1:44 ` Mat Martineau
2022-12-15 12:32 ` [PATCH mptcp-next v25 3/5] mptcp: use get_retrans wrapper Geliang Tang
` (4 subsequent siblings)
6 siblings, 1 reply; 11+ messages in thread
From: Geliang Tang @ 2022-12-15 12:32 UTC (permalink / raw)
To: mptcp; +Cc: Geliang Tang
This patch adds the multiple subflows support for __mptcp_push_pending
and __mptcp_subflow_push_pending. Use get_send() wrapper instead of
mptcp_subflow_get_send() in them.
Check the subflow scheduled flags to test which subflow or subflows are
picked by the scheduler, use them to send data.
Move sock_owned_by_me() check and fallback check into get_send() wrapper
from mptcp_subflow_get_send().
This commit allows the scheduler to set the subflow->scheduled bit in
multiple subflows, but it does not allow for sending redundant data.
Multiple scheduled subflows will send sequential data on each subflow.
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
---
net/mptcp/protocol.c | 117 ++++++++++++++++++++++++++-----------------
net/mptcp/sched.c | 13 +++++
2 files changed, 85 insertions(+), 45 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 14c69a519898..57967438e70f 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1407,15 +1407,6 @@ struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk)
u64 linger_time;
long tout = 0;
- msk_owned_by_me(msk);
-
- if (__mptcp_check_fallback(msk)) {
- if (!msk->first)
- return NULL;
- return __tcp_can_send(msk->first) &&
- sk_stream_memory_free(msk->first) ? msk->first : NULL;
- }
-
/* pick the subflow with the lower wmem/wspace ratio */
for (i = 0; i < SSK_MODE_MAX; ++i) {
send_info[i].ssk = NULL;
@@ -1566,43 +1557,57 @@ void __mptcp_push_pending(struct sock *sk, unsigned int flags)
.flags = flags,
};
bool do_check_data_fin = false;
+ int push_count = 1;
- while (mptcp_send_head(sk)) {
+ while (mptcp_send_head(sk) && (push_count > 0)) {
+ struct mptcp_subflow_context *subflow;
int ret = 0;
- prev_ssk = ssk;
- ssk = mptcp_subflow_get_send(msk);
+ if (mptcp_sched_get_send(msk))
+ break;
- /* First check. If the ssk has changed since
- * the last round, release prev_ssk
- */
- if (ssk != prev_ssk && prev_ssk)
- mptcp_push_release(prev_ssk, &info);
- if (!ssk)
- goto out;
+ push_count = 0;
- /* Need to lock the new subflow only if different
- * from the previous one, otherwise we are still
- * helding the relevant lock
- */
- if (ssk != prev_ssk)
- lock_sock(ssk);
+ mptcp_for_each_subflow(msk, subflow) {
+ if (READ_ONCE(subflow->scheduled)) {
+ mptcp_subflow_set_scheduled(subflow, false);
- ret = __subflow_push_pending(sk, ssk, &info);
- if (ret <= 0) {
- if (ret == -EAGAIN)
- continue;
- mptcp_push_release(ssk, &info);
- goto out;
+ prev_ssk = ssk;
+ ssk = mptcp_subflow_tcp_sock(subflow);
+ if (ssk != prev_ssk) {
+ /* First check. If the ssk has changed since
+ * the last round, release prev_ssk
+ */
+ if (prev_ssk)
+ mptcp_push_release(prev_ssk, &info);
+
+ /* Need to lock the new subflow only if different
+ * from the previous one, otherwise we are still
+ * helding the relevant lock
+ */
+ lock_sock(ssk);
+ }
+
+ push_count++;
+
+ ret = __subflow_push_pending(sk, ssk, &info);
+ if (ret <= 0) {
+ if (ret != -EAGAIN ||
+ (1 << ssk->sk_state) &
+ (TCPF_FIN_WAIT1 | TCPF_FIN_WAIT2 | TCPF_CLOSE))
+ push_count--;
+ continue;
+ }
+ do_check_data_fin = true;
+ msk->last_snd = ssk;
+ }
}
- do_check_data_fin = true;
}
/* at this point we held the socket lock for the last subflow we used */
if (ssk)
mptcp_push_release(ssk, &info);
-out:
/* ensure the rtx timer is running */
if (!mptcp_timer_pending(sk))
mptcp_reset_timer(sk);
@@ -1616,30 +1621,52 @@ static void __mptcp_subflow_push_pending(struct sock *sk, struct sock *ssk, bool
struct mptcp_sendmsg_info info = {
.data_lock_held = true,
};
+ bool keep_pushing = true;
struct sock *xmit_ssk;
int copied = 0;
info.flags = 0;
- while (mptcp_send_head(sk)) {
+ while (mptcp_send_head(sk) && keep_pushing) {
+ struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
int ret = 0;
/* check for a different subflow usage only after
* spooling the first chunk of data
*/
- xmit_ssk = first ? ssk : mptcp_subflow_get_send(msk);
- if (!xmit_ssk)
- goto out;
- if (xmit_ssk != ssk) {
- mptcp_subflow_delegate(mptcp_subflow_ctx(xmit_ssk),
- MPTCP_DELEGATE_SEND);
+ if (first) {
+ mptcp_subflow_set_scheduled(subflow, false);
+ ret = __subflow_push_pending(sk, ssk, &info);
+ first = false;
+ if (ret <= 0)
+ break;
+ copied += ret;
+ msk->last_snd = ssk;
+ continue;
+ }
+
+ if (mptcp_sched_get_send(msk))
goto out;
+
+ if (READ_ONCE(subflow->scheduled)) {
+ mptcp_subflow_set_scheduled(subflow, false);
+ ret = __subflow_push_pending(sk, ssk, &info);
+ if (ret <= 0)
+ keep_pushing = false;
+ copied += ret;
+ msk->last_snd = ssk;
}
- ret = __subflow_push_pending(sk, ssk, &info);
- first = false;
- if (ret <= 0)
- break;
- copied += ret;
+ mptcp_for_each_subflow(msk, subflow) {
+ if (READ_ONCE(subflow->scheduled)) {
+ xmit_ssk = mptcp_subflow_tcp_sock(subflow);
+ if (xmit_ssk != ssk) {
+ mptcp_subflow_delegate(subflow,
+ MPTCP_DELEGATE_SEND);
+ msk->last_snd = xmit_ssk;
+ keep_pushing = false;
+ }
+ }
+ }
}
out:
diff --git a/net/mptcp/sched.c b/net/mptcp/sched.c
index c4006f142f10..6428323d8c7f 100644
--- a/net/mptcp/sched.c
+++ b/net/mptcp/sched.c
@@ -118,6 +118,19 @@ int mptcp_sched_get_send(struct mptcp_sock *msk)
struct mptcp_subflow_context *subflow;
struct mptcp_sched_data data;
+ msk_owned_by_me(msk);
+
+ /* the following check is moved out of mptcp_subflow_get_send */
+ if (__mptcp_check_fallback(msk)) {
+ if (msk->first &&
+ __tcp_can_send(msk->first) &&
+ sk_stream_memory_free(msk->first)) {
+ mptcp_subflow_set_scheduled(mptcp_subflow_ctx(msk->first), true);
+ return 0;
+ }
+ return -EINVAL;
+ }
+
mptcp_for_each_subflow(msk, subflow) {
if (READ_ONCE(subflow->scheduled))
return 0;
--
2.35.3
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH mptcp-next v25 2/5] mptcp: use get_send wrapper
2022-12-15 12:32 ` [PATCH mptcp-next v25 2/5] mptcp: use get_send wrapper Geliang Tang
@ 2022-12-20 1:44 ` Mat Martineau
0 siblings, 0 replies; 11+ messages in thread
From: Mat Martineau @ 2022-12-20 1:44 UTC (permalink / raw)
To: Geliang Tang; +Cc: mptcp
On Thu, 15 Dec 2022, Geliang Tang wrote:
> This patch adds the multiple subflows support for __mptcp_push_pending
> and __mptcp_subflow_push_pending. Use get_send() wrapper instead of
> mptcp_subflow_get_send() in them.
>
> Check the subflow scheduled flags to test which subflow or subflows are
> picked by the scheduler, use them to send data.
>
> Move sock_owned_by_me() check and fallback check into get_send() wrapper
> from mptcp_subflow_get_send().
>
> This commit allows the scheduler to set the subflow->scheduled bit in
> multiple subflows, but it does not allow for sending redundant data.
> Multiple scheduled subflows will send sequential data on each subflow.
>
> Signed-off-by: Geliang Tang <geliang.tang@suse.com>
> ---
> net/mptcp/protocol.c | 117 ++++++++++++++++++++++++++-----------------
> net/mptcp/sched.c | 13 +++++
> 2 files changed, 85 insertions(+), 45 deletions(-)
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 14c69a519898..57967438e70f 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -1407,15 +1407,6 @@ struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk)
> u64 linger_time;
> long tout = 0;
>
> - msk_owned_by_me(msk);
> -
> - if (__mptcp_check_fallback(msk)) {
> - if (!msk->first)
> - return NULL;
> - return __tcp_can_send(msk->first) &&
> - sk_stream_memory_free(msk->first) ? msk->first : NULL;
> - }
> -
> /* pick the subflow with the lower wmem/wspace ratio */
> for (i = 0; i < SSK_MODE_MAX; ++i) {
> send_info[i].ssk = NULL;
> @@ -1566,43 +1557,57 @@ void __mptcp_push_pending(struct sock *sk, unsigned int flags)
> .flags = flags,
> };
> bool do_check_data_fin = false;
> + int push_count = 1;
>
> - while (mptcp_send_head(sk)) {
> + while (mptcp_send_head(sk) && (push_count > 0)) {
> + struct mptcp_subflow_context *subflow;
> int ret = 0;
>
> - prev_ssk = ssk;
> - ssk = mptcp_subflow_get_send(msk);
> + if (mptcp_sched_get_send(msk))
> + break;
>
> - /* First check. If the ssk has changed since
> - * the last round, release prev_ssk
> - */
> - if (ssk != prev_ssk && prev_ssk)
> - mptcp_push_release(prev_ssk, &info);
> - if (!ssk)
> - goto out;
> + push_count = 0;
>
> - /* Need to lock the new subflow only if different
> - * from the previous one, otherwise we are still
> - * helding the relevant lock
> - */
> - if (ssk != prev_ssk)
> - lock_sock(ssk);
> + mptcp_for_each_subflow(msk, subflow) {
> + if (READ_ONCE(subflow->scheduled)) {
> + mptcp_subflow_set_scheduled(subflow, false);
>
> - ret = __subflow_push_pending(sk, ssk, &info);
> - if (ret <= 0) {
> - if (ret == -EAGAIN)
> - continue;
> - mptcp_push_release(ssk, &info);
> - goto out;
> + prev_ssk = ssk;
> + ssk = mptcp_subflow_tcp_sock(subflow);
> + if (ssk != prev_ssk) {
> + /* First check. If the ssk has changed since
> + * the last round, release prev_ssk
> + */
> + if (prev_ssk)
> + mptcp_push_release(prev_ssk, &info);
> +
> + /* Need to lock the new subflow only if different
> + * from the previous one, otherwise we are still
> + * helding the relevant lock
> + */
> + lock_sock(ssk);
> + }
> +
> + push_count++;
> +
> + ret = __subflow_push_pending(sk, ssk, &info);
> + if (ret <= 0) {
> + if (ret != -EAGAIN ||
> + (1 << ssk->sk_state) &
> + (TCPF_FIN_WAIT1 | TCPF_FIN_WAIT2 | TCPF_CLOSE))
> + push_count--;
> + continue;
> + }
> + do_check_data_fin = true;
> + msk->last_snd = ssk;
> + }
> }
> - do_check_data_fin = true;
> }
>
> /* at this point we held the socket lock for the last subflow we used */
> if (ssk)
> mptcp_push_release(ssk, &info);
>
> -out:
> /* ensure the rtx timer is running */
> if (!mptcp_timer_pending(sk))
> mptcp_reset_timer(sk);
> @@ -1616,30 +1621,52 @@ static void __mptcp_subflow_push_pending(struct sock *sk, struct sock *ssk, bool
> struct mptcp_sendmsg_info info = {
> .data_lock_held = true,
> };
> + bool keep_pushing = true;
> struct sock *xmit_ssk;
> int copied = 0;
>
> info.flags = 0;
> - while (mptcp_send_head(sk)) {
> + while (mptcp_send_head(sk) && keep_pushing) {
> + struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
> int ret = 0;
>
> /* check for a different subflow usage only after
> * spooling the first chunk of data
> */
> - xmit_ssk = first ? ssk : mptcp_subflow_get_send(msk);
> - if (!xmit_ssk)
> - goto out;
> - if (xmit_ssk != ssk) {
> - mptcp_subflow_delegate(mptcp_subflow_ctx(xmit_ssk),
> - MPTCP_DELEGATE_SEND);
> + if (first) {
> + mptcp_subflow_set_scheduled(subflow, false);
> + ret = __subflow_push_pending(sk, ssk, &info);
> + first = false;
> + if (ret <= 0)
> + break;
> + copied += ret;
> + msk->last_snd = ssk;
> + continue;
> + }
> +
> + if (mptcp_sched_get_send(msk))
> goto out;
> +
> + if (READ_ONCE(subflow->scheduled)) {
> + mptcp_subflow_set_scheduled(subflow, false);
> + ret = __subflow_push_pending(sk, ssk, &info);
> + if (ret <= 0)
> + keep_pushing = false;
> + copied += ret;
> + msk->last_snd = ssk;
> }
>
> - ret = __subflow_push_pending(sk, ssk, &info);
> - first = false;
> - if (ret <= 0)
> - break;
> - copied += ret;
> + mptcp_for_each_subflow(msk, subflow) {
> + if (READ_ONCE(subflow->scheduled)) {
> + xmit_ssk = mptcp_subflow_tcp_sock(subflow);
> + if (xmit_ssk != ssk) {
> + mptcp_subflow_delegate(subflow,
> + MPTCP_DELEGATE_SEND);
> + msk->last_snd = xmit_ssk;
Hi Geliang/Matthieu -
Following up to my earlier replay about this series, this is the line to
delete (the assignment to msk->last_snd).
Similar to the removal of subflow_set_scheduled(subflow, false) in this
block of code, the subflow hasn't been used to send yet so last_snd
shouldn't be updated. It will get changed when __subflow_push_pending() is
called later.
- Mat
> + keep_pushing = false;
> + }
> + }
> + }
> }
>
> out:
> diff --git a/net/mptcp/sched.c b/net/mptcp/sched.c
> index c4006f142f10..6428323d8c7f 100644
> --- a/net/mptcp/sched.c
> +++ b/net/mptcp/sched.c
> @@ -118,6 +118,19 @@ int mptcp_sched_get_send(struct mptcp_sock *msk)
> struct mptcp_subflow_context *subflow;
> struct mptcp_sched_data data;
>
> + msk_owned_by_me(msk);
> +
> + /* the following check is moved out of mptcp_subflow_get_send */
> + if (__mptcp_check_fallback(msk)) {
> + if (msk->first &&
> + __tcp_can_send(msk->first) &&
> + sk_stream_memory_free(msk->first)) {
> + mptcp_subflow_set_scheduled(mptcp_subflow_ctx(msk->first), true);
> + return 0;
> + }
> + return -EINVAL;
> + }
> +
> mptcp_for_each_subflow(msk, subflow) {
> if (READ_ONCE(subflow->scheduled))
> return 0;
> --
> 2.35.3
>
>
>
--
Mat Martineau
Intel
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH mptcp-next v25 3/5] mptcp: use get_retrans wrapper
2022-12-15 12:32 [PATCH mptcp-next v25 0/5] BPF redundant scheduler, part 2 Geliang Tang
2022-12-15 12:32 ` [PATCH mptcp-next v25 1/5] mptcp: add scheduler wrappers Geliang Tang
2022-12-15 12:32 ` [PATCH mptcp-next v25 2/5] mptcp: use get_send wrapper Geliang Tang
@ 2022-12-15 12:32 ` Geliang Tang
2022-12-15 12:32 ` [PATCH mptcp-next v25 4/5] selftests/bpf: Add bpf_red scheduler Geliang Tang
` (3 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Geliang Tang @ 2022-12-15 12:32 UTC (permalink / raw)
To: mptcp; +Cc: Geliang Tang
This patch adds the multiple subflows support for __mptcp_retrans(). Use
get_retrans() wrapper instead of mptcp_subflow_get_retrans() in it.
Check the subflow scheduled flags to test which subflow or subflows are
picked by the scheduler, use them to send data.
Move sock_owned_by_me() check and fallback check into get_retrans()
wrapper from mptcp_subflow_get_retrans().
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
---
net/mptcp/protocol.c | 66 +++++++++++++++++++++++++-------------------
net/mptcp/sched.c | 6 ++++
2 files changed, 44 insertions(+), 28 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 57967438e70f..91b84321885c 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2249,11 +2249,6 @@ struct sock *mptcp_subflow_get_retrans(struct mptcp_sock *msk)
struct mptcp_subflow_context *subflow;
int min_stale_count = INT_MAX;
- msk_owned_by_me(msk);
-
- if (__mptcp_check_fallback(msk))
- return NULL;
-
mptcp_for_each_subflow(msk, subflow) {
struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
@@ -2523,16 +2518,17 @@ static void mptcp_check_fastclose(struct mptcp_sock *msk)
static void __mptcp_retrans(struct sock *sk)
{
struct mptcp_sock *msk = mptcp_sk(sk);
+ struct mptcp_subflow_context *subflow;
struct mptcp_sendmsg_info info = {};
struct mptcp_data_frag *dfrag;
- size_t copied = 0;
struct sock *ssk;
- int ret;
+ int ret, err;
+ u16 len = 0;
mptcp_clean_una_wakeup(sk);
/* first check ssk: need to kick "stale" logic */
- ssk = mptcp_subflow_get_retrans(msk);
+ err = mptcp_sched_get_retrans(msk);
dfrag = mptcp_rtx_head(sk);
if (!dfrag) {
if (mptcp_data_fin_enabled(msk)) {
@@ -2551,31 +2547,45 @@ static void __mptcp_retrans(struct sock *sk)
goto reset_timer;
}
- if (!ssk)
+ if (err)
goto reset_timer;
- lock_sock(ssk);
+ mptcp_for_each_subflow(msk, subflow) {
+ if (READ_ONCE(subflow->scheduled)) {
+ u16 copied = 0;
- /* limit retransmission to the bytes already sent on some subflows */
- info.sent = 0;
- info.limit = READ_ONCE(msk->csum_enabled) ? dfrag->data_len : dfrag->already_sent;
- while (info.sent < info.limit) {
- ret = mptcp_sendmsg_frag(sk, ssk, dfrag, &info);
- if (ret <= 0)
- break;
+ mptcp_subflow_set_scheduled(subflow, false);
- MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RETRANSSEGS);
- copied += ret;
- info.sent += ret;
- }
- if (copied) {
- dfrag->already_sent = max(dfrag->already_sent, info.sent);
- tcp_push(ssk, 0, info.mss_now, tcp_sk(ssk)->nonagle,
- info.size_goal);
- WRITE_ONCE(msk->allow_infinite_fallback, false);
- }
+ ssk = mptcp_subflow_tcp_sock(subflow);
- release_sock(ssk);
+ lock_sock(ssk);
+
+ /* limit retransmission to the bytes already sent on some subflows */
+ info.sent = 0;
+ info.limit = READ_ONCE(msk->csum_enabled) ? dfrag->data_len :
+ dfrag->already_sent;
+ while (info.sent < info.limit) {
+ ret = mptcp_sendmsg_frag(sk, ssk, dfrag, &info);
+ if (ret <= 0)
+ break;
+
+ MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RETRANSSEGS);
+ copied += ret;
+ info.sent += ret;
+ }
+ if (copied) {
+ len = max(copied, len);
+ tcp_push(ssk, 0, info.mss_now, tcp_sk(ssk)->nonagle,
+ info.size_goal);
+ WRITE_ONCE(msk->allow_infinite_fallback, false);
+ }
+
+ release_sock(ssk);
+
+ msk->last_snd = ssk;
+ }
+ }
+ dfrag->already_sent = max(dfrag->already_sent, len);
reset_timer:
mptcp_check_and_set_pending(sk);
diff --git a/net/mptcp/sched.c b/net/mptcp/sched.c
index 6428323d8c7f..c7c167e48d72 100644
--- a/net/mptcp/sched.c
+++ b/net/mptcp/sched.c
@@ -156,6 +156,12 @@ int mptcp_sched_get_retrans(struct mptcp_sock *msk)
struct mptcp_subflow_context *subflow;
struct mptcp_sched_data data;
+ msk_owned_by_me(msk);
+
+ /* the following check is moved out of mptcp_subflow_get_retrans */
+ if (__mptcp_check_fallback(msk))
+ return -EINVAL;
+
mptcp_for_each_subflow(msk, subflow) {
if (READ_ONCE(subflow->scheduled))
return 0;
--
2.35.3
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH mptcp-next v25 4/5] selftests/bpf: Add bpf_red scheduler
2022-12-15 12:32 [PATCH mptcp-next v25 0/5] BPF redundant scheduler, part 2 Geliang Tang
` (2 preceding siblings ...)
2022-12-15 12:32 ` [PATCH mptcp-next v25 3/5] mptcp: use get_retrans wrapper Geliang Tang
@ 2022-12-15 12:32 ` Geliang Tang
2022-12-15 12:32 ` [PATCH mptcp-next v25 5/5] selftests/bpf: Add bpf_red test Geliang Tang
` (2 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Geliang Tang @ 2022-12-15 12:32 UTC (permalink / raw)
To: mptcp; +Cc: Geliang Tang
This patch implements the redundant BPF MPTCP scheduler, named bpf_red,
which sends all packets redundantly on all available subflows.
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
---
.../selftests/bpf/progs/mptcp_bpf_red.c | 45 +++++++++++++++++++
1 file changed, 45 insertions(+)
create mode 100644 tools/testing/selftests/bpf/progs/mptcp_bpf_red.c
diff --git a/tools/testing/selftests/bpf/progs/mptcp_bpf_red.c b/tools/testing/selftests/bpf/progs/mptcp_bpf_red.c
new file mode 100644
index 000000000000..30dd6f521b7f
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/mptcp_bpf_red.c
@@ -0,0 +1,45 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022, SUSE. */
+
+#include <linux/bpf.h>
+#include "bpf_tcp_helpers.h"
+
+char _license[] SEC("license") = "GPL";
+
+SEC("struct_ops/mptcp_sched_red_init")
+void BPF_PROG(mptcp_sched_red_init, const struct mptcp_sock *msk)
+{
+}
+
+SEC("struct_ops/mptcp_sched_red_release")
+void BPF_PROG(mptcp_sched_red_release, const struct mptcp_sock *msk)
+{
+}
+
+void BPF_STRUCT_OPS(bpf_red_data_init, const struct mptcp_sock *msk,
+ struct mptcp_sched_data *data)
+{
+ mptcp_sched_data_set_contexts(msk, data);
+}
+
+int BPF_STRUCT_OPS(bpf_red_get_subflow, const struct mptcp_sock *msk,
+ struct mptcp_sched_data *data)
+{
+ for (int i = 0; i < MPTCP_SUBFLOWS_MAX; i++) {
+ if (!data->contexts[i])
+ break;
+
+ mptcp_subflow_set_scheduled(data->contexts[i], true);
+ }
+
+ return 0;
+}
+
+SEC(".struct_ops")
+struct mptcp_sched_ops red = {
+ .init = (void *)mptcp_sched_red_init,
+ .release = (void *)mptcp_sched_red_release,
+ .data_init = (void *)bpf_red_data_init,
+ .get_subflow = (void *)bpf_red_get_subflow,
+ .name = "bpf_red",
+};
--
2.35.3
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH mptcp-next v25 5/5] selftests/bpf: Add bpf_red test
2022-12-15 12:32 [PATCH mptcp-next v25 0/5] BPF redundant scheduler, part 2 Geliang Tang
` (3 preceding siblings ...)
2022-12-15 12:32 ` [PATCH mptcp-next v25 4/5] selftests/bpf: Add bpf_red scheduler Geliang Tang
@ 2022-12-15 12:32 ` Geliang Tang
2022-12-20 2:52 ` selftests/bpf: Add bpf_red test: Tests Results MPTCP CI
2022-12-20 1:39 ` [PATCH mptcp-next v25 0/5] BPF redundant scheduler, part 2 Mat Martineau
2022-12-21 14:07 ` Matthieu Baerts
6 siblings, 1 reply; 11+ messages in thread
From: Geliang Tang @ 2022-12-15 12:32 UTC (permalink / raw)
To: mptcp; +Cc: Geliang Tang
This patch adds the redundant BPF MPTCP scheduler test: test_red(). Use
sysctl to set net.mptcp.scheduler to use this sched. Add two veth net
devices to simulate the multiple addresses case. Use 'ip mptcp endpoint'
command to add the new endpoint ADDR_2 to PM netlink. Send data and check
bytes_sent of 'ss' output after it to make sure the data has been
redundantly sent on both net devices.
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
---
.../testing/selftests/bpf/prog_tests/mptcp.c | 34 +++++++++++++++++++
1 file changed, 34 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c
index 647d313475bc..8426a5aba721 100644
--- a/tools/testing/selftests/bpf/prog_tests/mptcp.c
+++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c
@@ -9,6 +9,7 @@
#include "mptcp_bpf_first.skel.h"
#include "mptcp_bpf_bkup.skel.h"
#include "mptcp_bpf_rr.skel.h"
+#include "mptcp_bpf_red.skel.h"
#ifndef TCP_CA_NAME_MAX
#define TCP_CA_NAME_MAX 16
@@ -381,6 +382,37 @@ static void test_rr(void)
mptcp_bpf_rr__destroy(rr_skel);
}
+static void test_red(void)
+{
+ struct mptcp_bpf_red *red_skel;
+ int server_fd, client_fd;
+ struct bpf_link *link;
+
+ red_skel = mptcp_bpf_red__open_and_load();
+ if (!ASSERT_OK_PTR(red_skel, "bpf_red__open_and_load"))
+ return;
+
+ link = bpf_map__attach_struct_ops(red_skel->maps.red);
+ if (!ASSERT_OK_PTR(link, "bpf_map__attach_struct_ops")) {
+ mptcp_bpf_red__destroy(red_skel);
+ return;
+ }
+
+ sched_init("subflow", "bpf_red");
+ server_fd = start_mptcp_server(AF_INET, ADDR_1, 0, 0);
+ client_fd = connect_to_fd(server_fd, 0);
+
+ send_data(server_fd, client_fd);
+ ASSERT_OK(has_bytes_sent(ADDR_1), "has_bytes_sent addr 1");
+ ASSERT_OK(has_bytes_sent(ADDR_2), "has_bytes_sent addr 2");
+
+ close(client_fd);
+ close(server_fd);
+ sched_cleanup();
+ bpf_link__destroy(link);
+ mptcp_bpf_red__destroy(red_skel);
+}
+
void test_mptcp(void)
{
if (test__start_subtest("base"))
@@ -391,4 +423,6 @@ void test_mptcp(void)
test_bkup();
if (test__start_subtest("rr"))
test_rr();
+ if (test__start_subtest("red"))
+ test_red();
}
--
2.35.3
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH mptcp-next v25 0/5] BPF redundant scheduler, part 2
2022-12-15 12:32 [PATCH mptcp-next v25 0/5] BPF redundant scheduler, part 2 Geliang Tang
` (4 preceding siblings ...)
2022-12-15 12:32 ` [PATCH mptcp-next v25 5/5] selftests/bpf: Add bpf_red test Geliang Tang
@ 2022-12-20 1:39 ` Mat Martineau
2022-12-20 3:49 ` Geliang Tang
2022-12-21 14:07 ` Matthieu Baerts
6 siblings, 1 reply; 11+ messages in thread
From: Mat Martineau @ 2022-12-20 1:39 UTC (permalink / raw)
To: Geliang Tang; +Cc: mptcp
On Thu, 15 Dec 2022, Geliang Tang wrote:
> v25:
> - update calls to mptcp_subflow_set_scheduled(subflow, false) in
> __mptcp_subflow_push_pending().
> - rebased on "tag: export/20221215T054923"
>
Thanks for all the work on this Geliang. I think it's ok to add this to
the export branch for further testing, and any fixes/updates can be
squashed if needed.
I do have feedback on one line to delete in patch 2 (see that reply). Not
sure if Matthieu needs a squash-to patch for that or can update when
applying.
Should this be applied after the bpf_rr patches, or earlier in the
"features other trees" section of the export branch?
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
- Mat
> v24:
> - rename push to keep_pushing
> - check the scheduled bit on ssk first
> - drop delegate flag
> - depends on "mptcp: use msk_owned_by_me helper"
>
> v23:
> - patch 2 and 3: clear subflow->scheduled flag on the error paths.
>
> v22:
> - update patch 2 as Mat suggested.
> - patch 4 and 5 in v21 will send later.
>
> v21:
> - address Mat's comments in v20.
> - redundant sends on retransmit code path.
>
> v20:
> - rebased on "Squash to "mptcp: refactor push_pending logic" v19"
>
> v19:
> - patch 1, use 'continue' instead of 'goto again'.
>
> v18:
> - some cleanups
> - update commit logs.
>
> v17:
> - address to Mat's comments in v16
> - rebase to export/20221108T055508.
>
> v16:
> - keep last_snd and snd_burst in struct mptcp_sock.
> - drop "mptcp: register default scheduler".
> - drop "mptcp: add scheduler wrappers", move it into "mptcp: use
> get_send wrapper" and "mptcp: use get_retrans wrapper".
> - depends on 'v2, Revert "mptcp: add get_subflow wrappers" - fix
> divide error in mptcp_subflow_get_send'
>
> v15:
> 1: "refactor push pending" v10
> 2-11: "register default scheduler" v3
> - move last_snd and snd_burst into struct mptcp_sched_ops
> 12-19: "BPF redundant scheduler" v15
> - split "use get_send wrapper" into two patches
> - rebase to export/20221021T061837.
>
> v14:
> - add "mptcp: refactor push_pending logic" v10 as patch 1
> - drop update_first_pending in patch 4
> - drop update_already_sent in patch 5
>
> v13:
> - deponds on "refactor push pending" v9.
> - Simply 'goto out' after invoking mptcp_subflow_delegate in patch 1.
> - All selftests (mptcp_connect.sh, mptcp_join.sh and simult_flows.sh) passed.
>
> v12:
> - fix WARN_ON_ONCE(reuse_skb) and WARN_ON_ONCE(!msk->recovery) errors
> in kernel logs.
>
> v11:
> - address to Mat's comments in v10.
> - rebase to export/20220908T063452
>
> v10:
> - send multiple dfrags in __mptcp_push_pending().
>
> v9:
> - drop the extra *err paramenter of mptcp_sched_get_send() as Florian
> suggested.
>
> v8:
> - update __mptcp_push_pending(), send the same data on each subflow.
> - update __mptcp_retrans, track the max sent data.
> = add a new patch.
>
> v7:
> - drop redundant flag in v6
> - drop __mptcp_subflows_push_pending in v6
> - update redundant subflows support in __mptcp_push_pending
> - update redundant subflows support in __mptcp_retrans
>
> v6:
> - Add redundant flag for struct mptcp_sched_ops.
> - add a dedicated function __mptcp_subflows_push_pending() to deal with
> redundat subflows push pending.
>
> v5:
> - address to Paolo's comment, keep the optimization to
> mptcp_subflow_get_send() for the non eBPF case.
> - merge mptcp_sched_get_send() and __mptcp_sched_get_send() in v4 into one.
> - depends on "cleanups for bpf sched selftests".
>
> v4:
> - small cleanups in patch 1, 2.
> - add TODO in patch 3.
> - rebase patch 5 on 'cleanups for bpf sched selftests'.
>
> v3:
> - use new API.
> - fix the link failure tests issue mentioned in ("https://patchwork.kernel.org/project/mptcp/cover/cover.1653033459.git.geliang.tang@suse.com/").
>
> v2:
> - add MPTCP_SUBFLOWS_MAX limit to avoid infinite loops when the
> scheduler always sets call_again to true.
> - track the largest copied amount.
> - deal with __mptcp_subflow_push_pending() and the retransmit loop.
> - depends on "BPF round-robin scheduler" v14.
>
> v1:
>
> Implements the redundant BPF MPTCP scheduler, which sends all packets
> redundantly on all available subflows.
>
> Geliang Tang (5):
> mptcp: add scheduler wrappers
> mptcp: use get_send wrapper
> mptcp: use get_retrans wrapper
> selftests/bpf: Add bpf_red scheduler
> selftests/bpf: Add bpf_red test
>
> net/mptcp/protocol.c | 187 +++++++++++-------
> net/mptcp/protocol.h | 4 +
> net/mptcp/sched.c | 69 +++++++
> .../testing/selftests/bpf/prog_tests/mptcp.c | 34 ++++
> .../selftests/bpf/progs/mptcp_bpf_red.c | 45 +++++
> 5 files changed, 264 insertions(+), 75 deletions(-)
> create mode 100644 tools/testing/selftests/bpf/progs/mptcp_bpf_red.c
>
> --
> 2.35.3
>
>
>
--
Mat Martineau
Intel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH mptcp-next v25 0/5] BPF redundant scheduler, part 2
2022-12-20 1:39 ` [PATCH mptcp-next v25 0/5] BPF redundant scheduler, part 2 Mat Martineau
@ 2022-12-20 3:49 ` Geliang Tang
0 siblings, 0 replies; 11+ messages in thread
From: Geliang Tang @ 2022-12-20 3:49 UTC (permalink / raw)
To: Matthieu Baerts, Mat Martineau; +Cc: mptcp
Hi Mat, Matt,
On Mon, Dec 19, 2022 at 05:39:59PM -0800, Mat Martineau wrote:
> On Thu, 15 Dec 2022, Geliang Tang wrote:
>
> > v25:
> > - update calls to mptcp_subflow_set_scheduled(subflow, false) in
> > __mptcp_subflow_push_pending().
> > - rebased on "tag: export/20221215T054923"
> >
>
> Thanks for all the work on this Geliang. I think it's ok to add this to the
> export branch for further testing, and any fixes/updates can be squashed if
> needed.
>
> I do have feedback on one line to delete in patch 2 (see that reply). Not
> sure if Matthieu needs a squash-to patch for that or can update when
> applying.
I just sent a squash-to patch to delete this line in ML.
>
> Should this be applied after the bpf_rr patches, or earlier in the "features
> other trees" section of the export branch?
The first three patches should be inserted between the commits "mptcp: add
sched_data_set_contexts helper" and "bpf: Add bpf_mptcp_sched_ops". The
last two should be applied after the commit "selftests/bpf: Add bpf_rr
test".
>
> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
>
>
> - Mat
>
> > Geliang Tang (5):
> > mptcp: add scheduler wrappers
> > mptcp: use get_send wrapper
> > mptcp: use get_retrans wrapper
In addition, the commit logs of these two patches need to be updated
too since sock_owned_by_me() are now replaced by msk_owned_by_me().
The first one:
Move sock_owned_by_me() check and fallback check into get_send() wrapper
from mptcp_subflow_get_send().
->
Move owned_by_me() and fallback checks into get_send() wrapper from
mptcp_subflow_get_send().
The second:
Move sock_owned_by_me() check and fallback check into get_retrans()
wrapper from mptcp_subflow_get_retrans().
->
Move owned_by_me() and fallback checks into get_retrans() wrapper from
mptcp_subflow_get_retrans().
Thanks,
-Geliang
> > selftests/bpf: Add bpf_red scheduler
> > selftests/bpf: Add bpf_red test
> >
> > net/mptcp/protocol.c | 187 +++++++++++-------
> > net/mptcp/protocol.h | 4 +
> > net/mptcp/sched.c | 69 +++++++
> > .../testing/selftests/bpf/prog_tests/mptcp.c | 34 ++++
> > .../selftests/bpf/progs/mptcp_bpf_red.c | 45 +++++
> > 5 files changed, 264 insertions(+), 75 deletions(-)
> > create mode 100644 tools/testing/selftests/bpf/progs/mptcp_bpf_red.c
> >
> > --
> > 2.35.3
> >
> >
> >
>
> --
> Mat Martineau
> Intel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH mptcp-next v25 0/5] BPF redundant scheduler, part 2
2022-12-15 12:32 [PATCH mptcp-next v25 0/5] BPF redundant scheduler, part 2 Geliang Tang
` (5 preceding siblings ...)
2022-12-20 1:39 ` [PATCH mptcp-next v25 0/5] BPF redundant scheduler, part 2 Mat Martineau
@ 2022-12-21 14:07 ` Matthieu Baerts
6 siblings, 0 replies; 11+ messages in thread
From: Matthieu Baerts @ 2022-12-21 14:07 UTC (permalink / raw)
To: Geliang Tang, Mat Martineau; +Cc: mptcp
Hi Geliang, Mat,
On 15/12/2022 13:32, Geliang Tang wrote:
> Implements the redundant BPF MPTCP scheduler, which sends all packets
> redundantly on all available subflows.
Thank you for the hard work with the patches and the reviews!
I just applied them in our tree in 2 different places as requested by
Geliang. I also integrated the Squash-to patch sent by Geliang and added
Mat's RvB tag.
New patches for t/upstream:
- 324b3cbe75ae: mptcp: add scheduler wrappers
- 2e6f3e6c3604: mptcp: use get_send wrapper
- 52f8f6af934c: mptcp: use get_retrans wrapper
- Results: 6c58a377c113..4254b6598308 (export)
- 61c95bbb18c7: selftests/bpf: Add bpf_red scheduler
- 46cf70243ce8: selftests/bpf: Add bpf_red test
- Results: 4254b6598308..224f92634d88 (export)
Tests are now in progress:
https://cirrus-ci.com/github/multipath-tcp/mptcp_net-next/export/20221221T140648
Cheers,
Matt
--
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net
^ permalink raw reply [flat|nested] 11+ messages in thread