* [mptcp-next 0/2] Fix mptcp connection hangs after link failover
@ 2021-09-06 13:10 Florian Westphal
2021-09-06 13:10 ` [PATCH mptcp-next v2 1/2] mptcp: remove tx_pending_data Florian Westphal
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: Florian Westphal @ 2021-09-06 13:10 UTC (permalink / raw)
To: mptcp; +Cc: Florian Westphal
First patch is preparation work: tx_pending_data counter is unreliable.
Second patch fixes premature stop of the retransmit timer.
Florian Westphal (2):
mptcp: remove tx_pending_data
mptcp: re-set push-pending bit on retransmit failure
net/mptcp/protocol.c | 32 +++++++++++++++++++++++++-------
net/mptcp/protocol.h | 1 -
2 files changed, 25 insertions(+), 8 deletions(-)
--
2.32.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH mptcp-next v2 1/2] mptcp: remove tx_pending_data
2021-09-06 13:10 [mptcp-next 0/2] Fix mptcp connection hangs after link failover Florian Westphal
@ 2021-09-06 13:10 ` Florian Westphal
2021-09-06 13:10 ` [PATCH mptcp-next v2 2/2] mptcp: re-arm retransmit timer if data is pending Florian Westphal
` (2 subsequent siblings)
3 siblings, 0 replies; 9+ messages in thread
From: Florian Westphal @ 2021-09-06 13:10 UTC (permalink / raw)
To: mptcp; +Cc: Florian Westphal, Paolo Abeni
The update on recovery is not correct.
msk->tx_pending_data += msk->snd_nxt - rtx_head->data_seq;
will update tx_pending_data multiple times when a subflow is declared
stale while earlier recovery is still in progress.
This means that tx_pending_data will still be positive even after
all data as has been transmitted.
Rather than fix it, remove this field: there are no consumers.
The outstanding data byte count can be computed either via
"msk->write_seq - rtx_head->data_seq" or
"msk->write_seq - msk->snd_una".
The latter is more recent/accurate estimate as rtx_head adjustment
is deferred until mptcp lock can be acquired.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Paolo Abeni <pabeni@redhat.com>
---
No changes in v2.
net/mptcp/protocol.c | 4 ----
net/mptcp/protocol.h | 1 -
2 files changed, 5 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 2a525c7ae920..c0e0ee4cb24f 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1531,7 +1531,6 @@ static void mptcp_update_post_push(struct mptcp_sock *msk,
dfrag->already_sent += sent;
msk->snd_burst -= sent;
- msk->tx_pending_data -= sent;
snd_nxt_new += dfrag->already_sent;
@@ -1761,7 +1760,6 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
frag_truesize += psize;
pfrag->offset += frag_truesize;
WRITE_ONCE(msk->write_seq, msk->write_seq + psize);
- msk->tx_pending_data += psize;
/* charge data on mptcp pending queue to the msk socket
* Note: we charge such data both to sk and ssk
@@ -2254,7 +2252,6 @@ bool __mptcp_retransmit_pending_data(struct sock *sk)
mptcp_data_unlock(sk);
msk->first_pending = rtx_head;
- msk->tx_pending_data += msk->snd_nxt - rtx_head->data_seq;
msk->snd_burst = 0;
/* be sure to clear the "sent status" on all re-injected fragments */
@@ -2525,7 +2522,6 @@ static int __mptcp_init_sock(struct sock *sk)
msk->first_pending = NULL;
msk->wmem_reserved = 0;
WRITE_ONCE(msk->rmem_released, 0);
- msk->tx_pending_data = 0;
msk->timer_ival = TCP_RTO_MIN;
msk->first = NULL;
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 99a23fff7b03..8416810afa8e 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -254,7 +254,6 @@ struct mptcp_sock {
struct sk_buff *ooo_last_skb;
struct rb_root out_of_order_queue;
struct sk_buff_head receive_queue;
- int tx_pending_data;
struct list_head conn_list;
struct list_head rtx_queue;
struct mptcp_data_frag *first_pending;
--
2.32.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH mptcp-next v2 2/2] mptcp: re-arm retransmit timer if data is pending
2021-09-06 13:10 [mptcp-next 0/2] Fix mptcp connection hangs after link failover Florian Westphal
2021-09-06 13:10 ` [PATCH mptcp-next v2 1/2] mptcp: remove tx_pending_data Florian Westphal
@ 2021-09-06 13:10 ` Florian Westphal
2021-09-06 13:42 ` Paolo Abeni
2021-09-08 1:06 ` [mptcp-next 0/2] Fix mptcp connection hangs after link failover Mat Martineau
2021-09-11 5:56 ` Matthieu Baerts
3 siblings, 1 reply; 9+ messages in thread
From: Florian Westphal @ 2021-09-06 13:10 UTC (permalink / raw)
To: mptcp; +Cc: Florian Westphal
The retransmit head will be NULL in case there is no in-flight data
(meaning all data injected into network has been acked).
In that case the retransmit timer is stopped.
This is only correct if there is no more pending, not-yet-sent data.
If there is, the retransmit timer needs to set the PENDING bit again so
that mptcp tries to send the remaining (new) data once a subflow can accept
more data.
Also, mptcp_subflow_get_retrans() has to be called unconditionally.
This function checks for subflows that have become unresponsive and marks
them as stale, so in the case where the rtx queue is empty, subflows
will never be marked stale which prevents available backup subflows from
becoming eligible for transmit.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/226
Signed-off-by: Florian Westphal <fw@strlen.de>
---
Changes in v2:
- drop reset-pending conditional in __mptcp_push_pending,
retrans timer already takes care of this (Paolo)
- rename patch subject
- add a 'closes' tag to commit message
net/mptcp/protocol.c | 21 ++++++++++++++++++---
1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index c0e0ee4cb24f..73cb5d053785 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1105,7 +1105,8 @@ static void __mptcp_clean_una(struct sock *sk)
if (cleaned && tcp_under_memory_pressure(sk))
__mptcp_mem_reclaim_partial(sk);
- if (snd_una == READ_ONCE(msk->snd_nxt) && !msk->recovery) {
+ if (snd_una == READ_ONCE(msk->snd_nxt) &&
+ snd_una == READ_ONCE(msk->write_seq)) {
if (mptcp_timer_pending(sk) && !mptcp_data_fin_enabled(msk))
mptcp_stop_timer(sk);
} else {
@@ -1547,6 +1548,13 @@ static void mptcp_update_post_push(struct mptcp_sock *msk,
msk->snd_nxt = snd_nxt_new;
}
+static void mptcp_check_and_set_pending(struct sock *sk)
+{
+ if (mptcp_send_head(sk) &&
+ !test_bit(MPTCP_PUSH_PENDING, &mptcp_sk(sk)->flags))
+ set_bit(MPTCP_PUSH_PENDING, &mptcp_sk(sk)->flags);
+}
+
void __mptcp_push_pending(struct sock *sk, unsigned int flags)
{
struct sock *prev_ssk = NULL, *ssk = NULL;
@@ -2414,6 +2422,9 @@ static void __mptcp_retrans(struct sock *sk)
int ret;
mptcp_clean_una_wakeup(sk);
+
+ /* first check ssk: need to kick "stale" logic */
+ ssk = mptcp_subflow_get_retrans(msk);
dfrag = mptcp_rtx_head(sk);
if (!dfrag) {
if (mptcp_data_fin_enabled(msk)) {
@@ -2426,10 +2437,12 @@ static void __mptcp_retrans(struct sock *sk)
goto reset_timer;
}
- return;
+ if (!mptcp_send_head(sk))
+ return;
+
+ goto reset_timer;
}
- ssk = mptcp_subflow_get_retrans(msk);
if (!ssk)
goto reset_timer;
@@ -2456,6 +2469,8 @@ static void __mptcp_retrans(struct sock *sk)
release_sock(ssk);
reset_timer:
+ mptcp_check_and_set_pending(sk);
+
if (!mptcp_timer_pending(sk))
mptcp_reset_timer(sk);
}
--
2.32.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH mptcp-next v2 2/2] mptcp: re-arm retransmit timer if data is pending
2021-09-06 13:10 ` [PATCH mptcp-next v2 2/2] mptcp: re-arm retransmit timer if data is pending Florian Westphal
@ 2021-09-06 13:42 ` Paolo Abeni
0 siblings, 0 replies; 9+ messages in thread
From: Paolo Abeni @ 2021-09-06 13:42 UTC (permalink / raw)
To: Florian Westphal, mptcp
On Mon, 2021-09-06 at 15:10 +0200, Florian Westphal wrote:
> The retransmit head will be NULL in case there is no in-flight data
> (meaning all data injected into network has been acked).
>
> In that case the retransmit timer is stopped.
>
> This is only correct if there is no more pending, not-yet-sent data.
>
> If there is, the retransmit timer needs to set the PENDING bit again so
> that mptcp tries to send the remaining (new) data once a subflow can accept
> more data.
>
> Also, mptcp_subflow_get_retrans() has to be called unconditionally.
>
> This function checks for subflows that have become unresponsive and marks
> them as stale, so in the case where the rtx queue is empty, subflows
> will never be marked stale which prevents available backup subflows from
> becoming eligible for transmit.
>
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/226
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
> Changes in v2:
> - drop reset-pending conditional in __mptcp_push_pending,
> retrans timer already takes care of this (Paolo)
> - rename patch subject
> - add a 'closes' tag to commit message
>
> net/mptcp/protocol.c | 21 ++++++++++++++++++---
> 1 file changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index c0e0ee4cb24f..73cb5d053785 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -1105,7 +1105,8 @@ static void __mptcp_clean_una(struct sock *sk)
> if (cleaned && tcp_under_memory_pressure(sk))
> __mptcp_mem_reclaim_partial(sk);
>
> - if (snd_una == READ_ONCE(msk->snd_nxt) && !msk->recovery) {
> + if (snd_una == READ_ONCE(msk->snd_nxt) &&
> + snd_una == READ_ONCE(msk->write_seq)) {
> if (mptcp_timer_pending(sk) && !mptcp_data_fin_enabled(msk))
> mptcp_stop_timer(sk);
> } else {
> @@ -1547,6 +1548,13 @@ static void mptcp_update_post_push(struct mptcp_sock *msk,
> msk->snd_nxt = snd_nxt_new;
> }
>
> +static void mptcp_check_and_set_pending(struct sock *sk)
> +{
> + if (mptcp_send_head(sk) &&
> + !test_bit(MPTCP_PUSH_PENDING, &mptcp_sk(sk)->flags))
> + set_bit(MPTCP_PUSH_PENDING, &mptcp_sk(sk)->flags);
> +}
> +
> void __mptcp_push_pending(struct sock *sk, unsigned int flags)
> {
> struct sock *prev_ssk = NULL, *ssk = NULL;
> @@ -2414,6 +2422,9 @@ static void __mptcp_retrans(struct sock *sk)
> int ret;
>
> mptcp_clean_una_wakeup(sk);
> +
> + /* first check ssk: need to kick "stale" logic */
> + ssk = mptcp_subflow_get_retrans(msk);
> dfrag = mptcp_rtx_head(sk);
> if (!dfrag) {
> if (mptcp_data_fin_enabled(msk)) {
> @@ -2426,10 +2437,12 @@ static void __mptcp_retrans(struct sock *sk)
> goto reset_timer;
> }
>
> - return;
> + if (!mptcp_send_head(sk))
> + return;
> +
> + goto reset_timer;
> }
>
> - ssk = mptcp_subflow_get_retrans(msk);
> if (!ssk)
> goto reset_timer;
>
> @@ -2456,6 +2469,8 @@ static void __mptcp_retrans(struct sock *sk)
> release_sock(ssk);
>
> reset_timer:
> + mptcp_check_and_set_pending(sk);
> +
> if (!mptcp_timer_pending(sk))
> mptcp_reset_timer(sk);
> }
LGTM, thanks Florian!
Acked-by: Paolo Abeni <pabeni@redhat.com>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [mptcp-next 0/2] Fix mptcp connection hangs after link failover
2021-09-06 13:10 [mptcp-next 0/2] Fix mptcp connection hangs after link failover Florian Westphal
2021-09-06 13:10 ` [PATCH mptcp-next v2 1/2] mptcp: remove tx_pending_data Florian Westphal
2021-09-06 13:10 ` [PATCH mptcp-next v2 2/2] mptcp: re-arm retransmit timer if data is pending Florian Westphal
@ 2021-09-08 1:06 ` Mat Martineau
2021-09-08 8:20 ` Florian Westphal
2021-09-11 5:56 ` Matthieu Baerts
3 siblings, 1 reply; 9+ messages in thread
From: Mat Martineau @ 2021-09-08 1:06 UTC (permalink / raw)
To: Florian Westphal; +Cc: mptcp
On Mon, 6 Sep 2021, Florian Westphal wrote:
> First patch is preparation work: tx_pending_data counter is unreliable.
> Second patch fixes premature stop of the retransmit timer.
>
> Florian Westphal (2):
> mptcp: remove tx_pending_data
> mptcp: re-set push-pending bit on retransmit failure
>
> net/mptcp/protocol.c | 32 +++++++++++++++++++++++++-------
> net/mptcp/protocol.h | 1 -
> 2 files changed, 25 insertions(+), 8 deletions(-)
>
> --
> 2.32.0
The code changes look ok, and I don't see any copyfd_io_poll errors. But I
do get consistent failures in the same group of self tests related to
stale links and recovery:
15 multiple flows, signal, link failure syn[ ok ] - synack[ ok ] - ack[ ok
]
add[ ok ] - echo [ ok ]
stale [fail] got 0
stale[s] 0 recover[s], expected stale in range [1..5], stale-recover
delta 1
ns2-0-eBlFRK stats
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode
DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
RX: bytes packets errors dropped missed mcast
1728 3 0 0 0 0
TX: bytes packets errors dropped carrier collsns
1728 3 0 0 0 0
2: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group
default qlen 1000
link/sit 0.0.0.0 brd 0.0.0.0
RX: bytes packets errors dropped missed mcast
0 0 0 0 0 0
TX: bytes packets errors dropped carrier collsns
0 0 0 0 0 0
3: ns2eth1@if3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc netem
state DOWN mode DEFAULT group default qlen 1000
link/ether 1a:ff:34:f1:09:0c brd ff:ff:ff:ff:ff:ff link-netns
ns1-0-eBlFRK
RX: bytes packets errors dropped missed mcast
93900 1188 0 0 0 0
TX: bytes packets errors dropped carrier collsns
3625894 1491 0 0 0 0
4: ns2eth2@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc netem
state UP mode DEFAULT group default qlen 1000
link/ether 86:8e:f2:62:4d:ab brd ff:ff:ff:ff:ff:ff link-netns
ns1-0-eBlFRK
RX: bytes packets errors dropped missed mcast
186852 2394 0 0 0 0
TX: bytes packets errors dropped carrier collsns
8018037 2641 0 0 0 0
5: ns2eth3@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc netem
state UP mode DEFAULT group default qlen 1000
link/ether 6e:bb:bc:a6:6c:e4 brd ff:ff:ff:ff:ff:ff link-netns
ns1-0-eBlFRK
RX: bytes packets errors dropped missed mcast
192608 2468 0 0 0 0
TX: bytes packets errors dropped carrier collsns
8732438 2874 0 0 0 0
6: ns2eth4@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc netem
state UP mode DEFAULT group default qlen 1000
link/ether a2:c9:68:39:8c:68 brd ff:ff:ff:ff:ff:ff link-netns
ns1-0-eBlFRK
RX: bytes packets errors dropped missed mcast
178072 2282 0 0 0 0
TX: bytes packets errors dropped carrier collsns
7742481 2508 0 0 0 0
MPTcpExtMPCapableSYNTX 1 0.0
MPTcpExtMPCapableSYNACKRX 1 0.0
MPTcpExtMPTCPRetrans 84 0.0
MPTcpExtMPJoinSynAckRx 3 0.0
MPTcpExtAddAddr 1 0.0
Created /tmp/tmp.VNjv8beZVJ (size 4096 KB) containing data sent by server
16 multi flows, signal, bidi, link fail syn[ ok ] - synack[ ok ] - ack[ ok
]
add[ ok ] - echo [ ok ]
stale [fail] got 0
stale[s] 0 recover[s], expected stale in range [1..-1], stale-recover
delta 1
ns2-0-McIAzy stats
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode
DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
RX: bytes packets errors dropped missed mcast
2304 4 0 0 0 0
TX: bytes packets errors dropped carrier collsns
2304 4 0 0 0 0
2: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group
default qlen 1000
link/sit 0.0.0.0 brd 0.0.0.0
RX: bytes packets errors dropped missed mcast
0 0 0 0 0 0
TX: bytes packets errors dropped carrier collsns
0 0 0 0 0 0
3: ns2eth1@if3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc netem
state DOWN mode DEFAULT group default qlen 1000
link/ether 5a:55:55:d0:62:02 brd ff:ff:ff:ff:ff:ff link-netns
ns1-0-McIAzy
RX: bytes packets errors dropped missed mcast
1462092 1435 0 0 0 0
TX: bytes packets errors dropped carrier collsns
3174798 1449 0 0 0 0
4: ns2eth2@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc netem
state UP mode DEFAULT group default qlen 1000
link/ether ee:c5:e3:b4:56:91 brd ff:ff:ff:ff:ff:ff link-netns
ns1-0-McIAzy
RX: bytes packets errors dropped missed mcast
1212944 2864 0 0 0 0
TX: bytes packets errors dropped carrier collsns
9191887 3021 0 0 0 0
5: ns2eth3@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc netem
state UP mode DEFAULT group default qlen 1000
link/ether ce:6f:56:5f:4a:85 brd ff:ff:ff:ff:ff:ff link-netns
ns1-0-McIAzy
RX: bytes packets errors dropped missed mcast
896684 2618 0 0 0 0
TX: bytes packets errors dropped carrier collsns
8202792 2745 0 0 0 0
6: ns2eth4@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc netem
state UP mode DEFAULT group default qlen 1000
link/ether 56:7a:df:33:db:f9 brd ff:ff:ff:ff:ff:ff link-netns
ns1-0-McIAzy
RX: bytes packets errors dropped missed mcast
1405836 2366 0 0 0 0
TX: bytes packets errors dropped carrier collsns
7566286 2745 0 0 0 0
MPTcpExtMPCapableSYNTX 1 0.0
MPTcpExtMPCapableSYNACKRX 1 0.0
MPTcpExtMPTCPRetrans 105 0.0
MPTcpExtMPJoinSynAckRx 3 0.0
MPTcpExtOFOQueueTail 319 0.0
MPTcpExtOFOQueue 1012 0.0
MPTcpExtOFOMerge 322 0.0
MPTcpExtDuplicateData 5 0.0
MPTcpExtAddAddr 1 0.0
17 backup subflow unused, link failure syn[ ok ] - synack[ ok ] - ack[ ok
]
add[ ok ] - echo [ ok ]
link usage [fail] got 39%
usage, expected 0%
18 backup flow used, multi links fail syn[ ok ] - synack[ ok ] - ack[ ok
]
add[ ok ] - echo [ ok ]
stale [fail] got 0
stale[s] 0 recover[s], expected stale in range [2..4], stale-recover
delta 2
ns2-0-N0wvlQ stats
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode
DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
RX: bytes packets errors dropped missed mcast
11052 20 0 0 0 0
TX: bytes packets errors dropped carrier collsns
11052 20 0 0 0 0
2: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group
default qlen 1000
link/sit 0.0.0.0 brd 0.0.0.0
RX: bytes packets errors dropped missed mcast
0 0 0 0 0 0
TX: bytes packets errors dropped carrier collsns
0 0 0 0 0 0
3: ns2eth1@if3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc netem
state DOWN mode DEFAULT group default qlen 1000
link/ether 2a:6f:3d:be:d2:84 brd ff:ff:ff:ff:ff:ff link-netns
ns1-0-N0wvlQ
RX: bytes packets errors dropped missed mcast
128512 1632 0 0 0 0
TX: bytes packets errors dropped carrier collsns
4726279 1678 0 0 0 0
4: ns2eth2@if4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc netem
state DOWN mode DEFAULT group default qlen 1000
link/ether 52:e2:d2:54:cd:18 brd ff:ff:ff:ff:ff:ff link-netns
ns1-0-N0wvlQ
RX: bytes packets errors dropped missed mcast
126524 1620 0 0 0 0
TX: bytes packets errors dropped carrier collsns
4682855 1665 0 7 0 0
5: ns2eth3@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc netem
state UP mode DEFAULT group default qlen 1000
link/ether fe:a5:89:cc:e8:8e brd ff:ff:ff:ff:ff:ff link-netns
ns1-0-N0wvlQ
RX: bytes packets errors dropped missed mcast
378726 4855 0 0 0 0
TX: bytes packets errors dropped carrier collsns
18615840 5084 0 0 0 0
6: ns2eth4@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc netem
state UP mode DEFAULT group default qlen 1000
link/ether 4e:90:1f:7e:d2:50 brd ff:ff:ff:ff:ff:ff link-netns
ns1-0-N0wvlQ
RX: bytes packets errors dropped missed mcast
806 9 0 0 0 0
TX: bytes packets errors dropped carrier collsns
1026 11 0 0 0 0
MPTcpExtMPCapableSYNTX 1 0.0
MPTcpExtMPCapableSYNACKRX 1 0.0
MPTcpExtMPTCPRetrans 430 0.0
MPTcpExtMPJoinSynAckRx 2 0.0
MPTcpExtAddAddr 1 0.0
link usage [fail] got 68%
usage, expected 50%
19 backup flow used, bidi, link failure syn[ ok ] - synack[ ok ] - ack[ ok
]
add[ ok ] - echo [ ok ]
stale [fail] got 0
stale[s] 0 recover[s], expected stale in range [1..-1], stale-recover
delta 2
ns2-0-Fvgunu stats
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode
DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
RX: bytes packets errors dropped missed mcast
10476 19 0 0 0 0
TX: bytes packets errors dropped carrier collsns
10476 19 0 0 0 0
2: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group
default qlen 1000
link/sit 0.0.0.0 brd 0.0.0.0
RX: bytes packets errors dropped missed mcast
0 0 0 0 0 0
TX: bytes packets errors dropped carrier collsns
0 0 0 0 0 0
3: ns2eth1@if3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc netem
state DOWN mode DEFAULT group default qlen 1000
link/ether 42:07:fd:15:b9:5d brd ff:ff:ff:ff:ff:ff link-netns
ns1-0-Fvgunu
RX: bytes packets errors dropped missed mcast
2251816 1945 0 0 0 0
TX: bytes packets errors dropped carrier collsns
4705486 2003 0 0 0 0
4: ns2eth2@if4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc netem
state DOWN mode DEFAULT group default qlen 1000
link/ether 26:6f:86:e8:ba:8a brd ff:ff:ff:ff:ff:ff link-netns
ns1-0-Fvgunu
RX: bytes packets errors dropped missed mcast
2272210 1944 0 0 0 0
TX: bytes packets errors dropped carrier collsns
4195868 1994 0 8 0 0
5: ns2eth3@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc netem
state UP mode DEFAULT group default qlen 1000
link/ether 76:9f:8a:36:57:d0 brd ff:ff:ff:ff:ff:ff link-netns
ns1-0-Fvgunu
RX: bytes packets errors dropped missed mcast
404006 5169 0 0 0 0
TX: bytes packets errors dropped carrier collsns
19168550 5433 0 0 0 0
6: ns2eth4@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc netem
state UP mode DEFAULT group default qlen 1000
link/ether ca:9e:ce:5e:8f:d0 brd ff:ff:ff:ff:ff:ff link-netns
ns1-0-Fvgunu
RX: bytes packets errors dropped missed mcast
806 9 0 0 0 0
TX: bytes packets errors dropped carrier collsns
1026 11 0 0 0 0
MPTcpExtMPCapableSYNTX 1 0.0
MPTcpExtMPCapableSYNACKRX 1 0.0
MPTcpExtMPTCPRetrans 419 0.0
MPTcpExtMPJoinSynAckRx 2 0.0
MPTcpExtOFOQueueTail 677 0.0
MPTcpExtOFOQueue 733 0.0
MPTcpExtOFOMerge 513 0.0
MPTcpExtAddAddr 1 0.0
link usage [fail] got 70%
usage, expected 50%
--
Mat Martineau
Intel
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [mptcp-next 0/2] Fix mptcp connection hangs after link failover
2021-09-08 1:06 ` [mptcp-next 0/2] Fix mptcp connection hangs after link failover Mat Martineau
@ 2021-09-08 8:20 ` Florian Westphal
2021-09-08 15:42 ` Florian Westphal
0 siblings, 1 reply; 9+ messages in thread
From: Florian Westphal @ 2021-09-08 8:20 UTC (permalink / raw)
To: Mat Martineau; +Cc: Florian Westphal, mptcp
Mat Martineau <mathew.j.martineau@linux.intel.com> wrote:
> On Mon, 6 Sep 2021, Florian Westphal wrote:
>
> > First patch is preparation work: tx_pending_data counter is unreliable.
> > Second patch fixes premature stop of the retransmit timer.
> >
> > Florian Westphal (2):
> > mptcp: remove tx_pending_data
> > mptcp: re-set push-pending bit on retransmit failure
> >
> > net/mptcp/protocol.c | 32 +++++++++++++++++++++++++-------
> > net/mptcp/protocol.h | 1 -
> > 2 files changed, 25 insertions(+), 8 deletions(-)
> >
> > --
> > 2.32.0
>
> The code changes look ok, and I don't see any copyfd_io_poll errors. But I
> do get consistent failures in the same group of self tests related to stale
> links and recovery:
>
> 15 multiple flows, signal, link failure syn[ ok ] - synack[ ok ] - ack[ ok ]
> add[ ok ] - echo [ ok ]
> stale [fail] got 0
> stale[s] 0 recover[s], expected stale in range [1..5], stale-recover
> delta 1
I'm on export, 9c7d1b9a14eba479466423d64f99c8a4e29b66f4, with these two
patches and I don't see this error.
Running mptcp_join -l in a loop now, no luck so far.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [mptcp-next 0/2] Fix mptcp connection hangs after link failover
2021-09-08 8:20 ` Florian Westphal
@ 2021-09-08 15:42 ` Florian Westphal
2021-09-08 18:35 ` Mat Martineau
0 siblings, 1 reply; 9+ messages in thread
From: Florian Westphal @ 2021-09-08 15:42 UTC (permalink / raw)
To: Florian Westphal; +Cc: Mat Martineau, mptcp
Florian Westphal <fw@strlen.de> wrote:
> Mat Martineau <mathew.j.martineau@linux.intel.com> wrote:
> > On Mon, 6 Sep 2021, Florian Westphal wrote:
> >
> > > First patch is preparation work: tx_pending_data counter is unreliable.
> > > Second patch fixes premature stop of the retransmit timer.
> > >
> > > Florian Westphal (2):
> > > mptcp: remove tx_pending_data
> > > mptcp: re-set push-pending bit on retransmit failure
> > >
> > > net/mptcp/protocol.c | 32 +++++++++++++++++++++++++-------
> > > net/mptcp/protocol.h | 1 -
> > > 2 files changed, 25 insertions(+), 8 deletions(-)
> > >
> > > --
> > > 2.32.0
> >
> > The code changes look ok, and I don't see any copyfd_io_poll errors. But I
> > do get consistent failures in the same group of self tests related to stale
> > links and recovery:
> >
> > 15 multiple flows, signal, link failure syn[ ok ] - synack[ ok ] - ack[ ok ]
> > add[ ok ] - echo [ ok ]
> > stale [fail] got 0
> > stale[s] 0 recover[s], expected stale in range [1..5], stale-recover
> > delta 1
>
> I'm on export, 9c7d1b9a14eba479466423d64f99c8a4e29b66f4, with these two
> patches and I don't see this error.
>
> Running mptcp_join -l in a loop now, no luck so far.
Gave up, its not triggering for me.
Any hints on reproducing this?
Does it pass for you without my changes?
I don't see how they would cause this; if anything this patch makes
the stale detection reliable because the retrans timer is not stopped
too early anymore and it makes sure that mptcp_subflow_get_retrans() is
called once for each retrans timer call.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [mptcp-next 0/2] Fix mptcp connection hangs after link failover
2021-09-08 15:42 ` Florian Westphal
@ 2021-09-08 18:35 ` Mat Martineau
0 siblings, 0 replies; 9+ messages in thread
From: Mat Martineau @ 2021-09-08 18:35 UTC (permalink / raw)
To: Florian Westphal; +Cc: mptcp
On Wed, 8 Sep 2021, Florian Westphal wrote:
> Florian Westphal <fw@strlen.de> wrote:
>> Mat Martineau <mathew.j.martineau@linux.intel.com> wrote:
>>> On Mon, 6 Sep 2021, Florian Westphal wrote:
>>>
>>>> First patch is preparation work: tx_pending_data counter is unreliable.
>>>> Second patch fixes premature stop of the retransmit timer.
>>>>
>>>> Florian Westphal (2):
>>>> mptcp: remove tx_pending_data
>>>> mptcp: re-set push-pending bit on retransmit failure
>>>>
>>>> net/mptcp/protocol.c | 32 +++++++++++++++++++++++++-------
>>>> net/mptcp/protocol.h | 1 -
>>>> 2 files changed, 25 insertions(+), 8 deletions(-)
>>>>
>>>> --
>>>> 2.32.0
>>>
>>> The code changes look ok, and I don't see any copyfd_io_poll errors. But I
>>> do get consistent failures in the same group of self tests related to stale
>>> links and recovery:
>>>
>>> 15 multiple flows, signal, link failure syn[ ok ] - synack[ ok ] - ack[ ok ]
>>> add[ ok ] - echo [ ok ]
>>> stale [fail] got 0
>>> stale[s] 0 recover[s], expected stale in range [1..5], stale-recover
>>> delta 1
>>
>> I'm on export, 9c7d1b9a14eba479466423d64f99c8a4e29b66f4, with these two
>> patches and I don't see this error.
>>
>> Running mptcp_join -l in a loop now, no luck so far.
>
> Gave up, its not triggering for me.
>
> Any hints on reproducing this?
>
> Does it pass for you without my changes?
>
> I don't see how they would cause this; if anything this patch makes
> the stale detection reliable because the retrans timer is not stopped
> too early anymore and it makes sure that mptcp_subflow_get_retrans() is
> called once for each retrans timer call.
>
Hi Florian -
My apologies for wasting your time on this, it was 100% user error on my
side. With everything set up right on my end, the tests are passing.
So, the changes look good:
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
--
Mat Martineau
Intel
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [mptcp-next 0/2] Fix mptcp connection hangs after link failover
2021-09-06 13:10 [mptcp-next 0/2] Fix mptcp connection hangs after link failover Florian Westphal
` (2 preceding siblings ...)
2021-09-08 1:06 ` [mptcp-next 0/2] Fix mptcp connection hangs after link failover Mat Martineau
@ 2021-09-11 5:56 ` Matthieu Baerts
3 siblings, 0 replies; 9+ messages in thread
From: Matthieu Baerts @ 2021-09-11 5:56 UTC (permalink / raw)
To: Florian Westphal, Mat Martineau, Paolo Abeni; +Cc: mptcp
Hi Florian, Paolo, Mat,
On 06/09/2021 15:10, Florian Westphal wrote:
> First patch is preparation work: tx_pending_data counter is unreliable.
> Second patch fixes premature stop of the retransmit timer.
>
> Florian Westphal (2):
> mptcp: remove tx_pending_data
> mptcp: re-set push-pending bit on retransmit failure
Thank you for the patches, reviews and testing!
It is now in our tree (net-next features) with Paolo's Ack and Mat's RvB
tag:
- 4c18199dfdd0: mptcp: remove tx_pending_data
- 7fdafa1b84f9: mptcp: re-arm retransmit timer if data is pending
- Results: 84da6403b775..fecd1f9bf19f
Builds and tests are now in progress:
https://cirrus-ci.com/github/multipath-tcp/mptcp_net-next/export/20210911T055615
https://github.com/multipath-tcp/mptcp_net-next/actions/workflows/build-validation.yml?query=branch:export/20210911T055615
Cheers,
Matt
--
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2021-09-11 5:56 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-06 13:10 [mptcp-next 0/2] Fix mptcp connection hangs after link failover Florian Westphal
2021-09-06 13:10 ` [PATCH mptcp-next v2 1/2] mptcp: remove tx_pending_data Florian Westphal
2021-09-06 13:10 ` [PATCH mptcp-next v2 2/2] mptcp: re-arm retransmit timer if data is pending Florian Westphal
2021-09-06 13:42 ` Paolo Abeni
2021-09-08 1:06 ` [mptcp-next 0/2] Fix mptcp connection hangs after link failover Mat Martineau
2021-09-08 8:20 ` Florian Westphal
2021-09-08 15:42 ` Florian Westphal
2021-09-08 18:35 ` Mat Martineau
2021-09-11 5:56 ` Matthieu Baerts
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.