All of lore.kernel.org
 help / color / mirror / Atom feed
* [mptcp-next 0/2] Fix mptcp connection hangs after link failover
@ 2021-09-06 13:10 Florian Westphal
  2021-09-06 13:10 ` [PATCH mptcp-next v2 1/2] mptcp: remove tx_pending_data Florian Westphal
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Florian Westphal @ 2021-09-06 13:10 UTC (permalink / raw)
  To: mptcp; +Cc: Florian Westphal

First patch is preparation work: tx_pending_data counter is unreliable.
Second patch fixes premature stop of the retransmit timer.

Florian Westphal (2):
  mptcp: remove tx_pending_data
  mptcp: re-set push-pending bit on retransmit failure

 net/mptcp/protocol.c | 32 +++++++++++++++++++++++++-------
 net/mptcp/protocol.h |  1 -
 2 files changed, 25 insertions(+), 8 deletions(-)

-- 
2.32.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH mptcp-next v2 1/2] mptcp: remove tx_pending_data
  2021-09-06 13:10 [mptcp-next 0/2] Fix mptcp connection hangs after link failover Florian Westphal
@ 2021-09-06 13:10 ` Florian Westphal
  2021-09-06 13:10 ` [PATCH mptcp-next v2 2/2] mptcp: re-arm retransmit timer if data is pending Florian Westphal
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Florian Westphal @ 2021-09-06 13:10 UTC (permalink / raw)
  To: mptcp; +Cc: Florian Westphal, Paolo Abeni

The update on recovery is not correct.

msk->tx_pending_data += msk->snd_nxt - rtx_head->data_seq;

will update tx_pending_data multiple times when a subflow is declared
stale while earlier recovery is still in progress.
This means that tx_pending_data will still be positive even after
all data as has been transmitted.

Rather than fix it, remove this field: there are no consumers.
The outstanding data byte count can be computed either via

 "msk->write_seq - rtx_head->data_seq" or
 "msk->write_seq - msk->snd_una".

The latter is more recent/accurate estimate as rtx_head adjustment
is deferred until mptcp lock can be acquired.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Paolo Abeni <pabeni@redhat.com>
---
 No changes in v2.

 net/mptcp/protocol.c | 4 ----
 net/mptcp/protocol.h | 1 -
 2 files changed, 5 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 2a525c7ae920..c0e0ee4cb24f 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1531,7 +1531,6 @@ static void mptcp_update_post_push(struct mptcp_sock *msk,
 	dfrag->already_sent += sent;
 
 	msk->snd_burst -= sent;
-	msk->tx_pending_data -= sent;
 
 	snd_nxt_new += dfrag->already_sent;
 
@@ -1761,7 +1760,6 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		frag_truesize += psize;
 		pfrag->offset += frag_truesize;
 		WRITE_ONCE(msk->write_seq, msk->write_seq + psize);
-		msk->tx_pending_data += psize;
 
 		/* charge data on mptcp pending queue to the msk socket
 		 * Note: we charge such data both to sk and ssk
@@ -2254,7 +2252,6 @@ bool __mptcp_retransmit_pending_data(struct sock *sk)
 	mptcp_data_unlock(sk);
 
 	msk->first_pending = rtx_head;
-	msk->tx_pending_data += msk->snd_nxt - rtx_head->data_seq;
 	msk->snd_burst = 0;
 
 	/* be sure to clear the "sent status" on all re-injected fragments */
@@ -2525,7 +2522,6 @@ static int __mptcp_init_sock(struct sock *sk)
 	msk->first_pending = NULL;
 	msk->wmem_reserved = 0;
 	WRITE_ONCE(msk->rmem_released, 0);
-	msk->tx_pending_data = 0;
 	msk->timer_ival = TCP_RTO_MIN;
 
 	msk->first = NULL;
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 99a23fff7b03..8416810afa8e 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -254,7 +254,6 @@ struct mptcp_sock {
 	struct sk_buff  *ooo_last_skb;
 	struct rb_root  out_of_order_queue;
 	struct sk_buff_head receive_queue;
-	int		tx_pending_data;
 	struct list_head conn_list;
 	struct list_head rtx_queue;
 	struct mptcp_data_frag *first_pending;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH mptcp-next v2 2/2] mptcp: re-arm retransmit timer if data is pending
  2021-09-06 13:10 [mptcp-next 0/2] Fix mptcp connection hangs after link failover Florian Westphal
  2021-09-06 13:10 ` [PATCH mptcp-next v2 1/2] mptcp: remove tx_pending_data Florian Westphal
@ 2021-09-06 13:10 ` Florian Westphal
  2021-09-06 13:42   ` Paolo Abeni
  2021-09-08  1:06 ` [mptcp-next 0/2] Fix mptcp connection hangs after link failover Mat Martineau
  2021-09-11  5:56 ` Matthieu Baerts
  3 siblings, 1 reply; 9+ messages in thread
From: Florian Westphal @ 2021-09-06 13:10 UTC (permalink / raw)
  To: mptcp; +Cc: Florian Westphal

The retransmit head will be NULL in case there is no in-flight data
(meaning all data injected into network has been acked).

In that case the retransmit timer is stopped.

This is only correct if there is no more pending, not-yet-sent data.

If there is, the retransmit timer needs to set the PENDING bit again so
that mptcp tries to send the remaining (new) data once a subflow can accept
more data.

Also, mptcp_subflow_get_retrans() has to be called unconditionally.

This function checks for subflows that have become unresponsive and marks
them as stale, so in the case where the rtx queue is empty, subflows
will never be marked stale which prevents available backup subflows from
becoming eligible for transmit.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/226
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 Changes in v2:
  - drop reset-pending conditional in __mptcp_push_pending,
    retrans timer already takes care of this (Paolo)
  - rename patch subject
  - add a 'closes' tag to commit message

 net/mptcp/protocol.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index c0e0ee4cb24f..73cb5d053785 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1105,7 +1105,8 @@ static void __mptcp_clean_una(struct sock *sk)
 	if (cleaned && tcp_under_memory_pressure(sk))
 		__mptcp_mem_reclaim_partial(sk);
 
-	if (snd_una == READ_ONCE(msk->snd_nxt) && !msk->recovery) {
+	if (snd_una == READ_ONCE(msk->snd_nxt) &&
+	    snd_una == READ_ONCE(msk->write_seq)) {
 		if (mptcp_timer_pending(sk) && !mptcp_data_fin_enabled(msk))
 			mptcp_stop_timer(sk);
 	} else {
@@ -1547,6 +1548,13 @@ static void mptcp_update_post_push(struct mptcp_sock *msk,
 		msk->snd_nxt = snd_nxt_new;
 }
 
+static void mptcp_check_and_set_pending(struct sock *sk)
+{
+	if (mptcp_send_head(sk) &&
+	    !test_bit(MPTCP_PUSH_PENDING, &mptcp_sk(sk)->flags))
+		set_bit(MPTCP_PUSH_PENDING, &mptcp_sk(sk)->flags);
+}
+
 void __mptcp_push_pending(struct sock *sk, unsigned int flags)
 {
 	struct sock *prev_ssk = NULL, *ssk = NULL;
@@ -2414,6 +2422,9 @@ static void __mptcp_retrans(struct sock *sk)
 	int ret;
 
 	mptcp_clean_una_wakeup(sk);
+
+	/* first check ssk: need to kick "stale" logic */
+	ssk = mptcp_subflow_get_retrans(msk);
 	dfrag = mptcp_rtx_head(sk);
 	if (!dfrag) {
 		if (mptcp_data_fin_enabled(msk)) {
@@ -2426,10 +2437,12 @@ static void __mptcp_retrans(struct sock *sk)
 			goto reset_timer;
 		}
 
-		return;
+		if (!mptcp_send_head(sk))
+			return;
+
+		goto reset_timer;
 	}
 
-	ssk = mptcp_subflow_get_retrans(msk);
 	if (!ssk)
 		goto reset_timer;
 
@@ -2456,6 +2469,8 @@ static void __mptcp_retrans(struct sock *sk)
 	release_sock(ssk);
 
 reset_timer:
+	mptcp_check_and_set_pending(sk);
+
 	if (!mptcp_timer_pending(sk))
 		mptcp_reset_timer(sk);
 }
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH mptcp-next v2 2/2] mptcp: re-arm retransmit timer if data is pending
  2021-09-06 13:10 ` [PATCH mptcp-next v2 2/2] mptcp: re-arm retransmit timer if data is pending Florian Westphal
@ 2021-09-06 13:42   ` Paolo Abeni
  0 siblings, 0 replies; 9+ messages in thread
From: Paolo Abeni @ 2021-09-06 13:42 UTC (permalink / raw)
  To: Florian Westphal, mptcp

On Mon, 2021-09-06 at 15:10 +0200, Florian Westphal wrote:
> The retransmit head will be NULL in case there is no in-flight data
> (meaning all data injected into network has been acked).
> 
> In that case the retransmit timer is stopped.
> 
> This is only correct if there is no more pending, not-yet-sent data.
> 
> If there is, the retransmit timer needs to set the PENDING bit again so
> that mptcp tries to send the remaining (new) data once a subflow can accept
> more data.
> 
> Also, mptcp_subflow_get_retrans() has to be called unconditionally.
> 
> This function checks for subflows that have become unresponsive and marks
> them as stale, so in the case where the rtx queue is empty, subflows
> will never be marked stale which prevents available backup subflows from
> becoming eligible for transmit.
> 
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/226
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  Changes in v2:
>   - drop reset-pending conditional in __mptcp_push_pending,
>     retrans timer already takes care of this (Paolo)
>   - rename patch subject
>   - add a 'closes' tag to commit message
> 
>  net/mptcp/protocol.c | 21 ++++++++++++++++++---
>  1 file changed, 18 insertions(+), 3 deletions(-)
> 
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index c0e0ee4cb24f..73cb5d053785 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -1105,7 +1105,8 @@ static void __mptcp_clean_una(struct sock *sk)
>  	if (cleaned && tcp_under_memory_pressure(sk))
>  		__mptcp_mem_reclaim_partial(sk);
>  
> -	if (snd_una == READ_ONCE(msk->snd_nxt) && !msk->recovery) {
> +	if (snd_una == READ_ONCE(msk->snd_nxt) &&
> +	    snd_una == READ_ONCE(msk->write_seq)) {
>  		if (mptcp_timer_pending(sk) && !mptcp_data_fin_enabled(msk))
>  			mptcp_stop_timer(sk);
>  	} else {
> @@ -1547,6 +1548,13 @@ static void mptcp_update_post_push(struct mptcp_sock *msk,
>  		msk->snd_nxt = snd_nxt_new;
>  }
>  
> +static void mptcp_check_and_set_pending(struct sock *sk)
> +{
> +	if (mptcp_send_head(sk) &&
> +	    !test_bit(MPTCP_PUSH_PENDING, &mptcp_sk(sk)->flags))
> +		set_bit(MPTCP_PUSH_PENDING, &mptcp_sk(sk)->flags);
> +}
> +
>  void __mptcp_push_pending(struct sock *sk, unsigned int flags)
>  {
>  	struct sock *prev_ssk = NULL, *ssk = NULL;
> @@ -2414,6 +2422,9 @@ static void __mptcp_retrans(struct sock *sk)
>  	int ret;
>  
>  	mptcp_clean_una_wakeup(sk);
> +
> +	/* first check ssk: need to kick "stale" logic */
> +	ssk = mptcp_subflow_get_retrans(msk);
>  	dfrag = mptcp_rtx_head(sk);
>  	if (!dfrag) {
>  		if (mptcp_data_fin_enabled(msk)) {
> @@ -2426,10 +2437,12 @@ static void __mptcp_retrans(struct sock *sk)
>  			goto reset_timer;
>  		}
>  
> -		return;
> +		if (!mptcp_send_head(sk))
> +			return;
> +
> +		goto reset_timer;
>  	}
>  
> -	ssk = mptcp_subflow_get_retrans(msk);
>  	if (!ssk)
>  		goto reset_timer;
>  
> @@ -2456,6 +2469,8 @@ static void __mptcp_retrans(struct sock *sk)
>  	release_sock(ssk);
>  
>  reset_timer:
> +	mptcp_check_and_set_pending(sk);
> +
>  	if (!mptcp_timer_pending(sk))
>  		mptcp_reset_timer(sk);
>  }

LGTM, thanks Florian!

Acked-by: Paolo Abeni <pabeni@redhat.com>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [mptcp-next 0/2] Fix mptcp connection hangs after link failover
  2021-09-06 13:10 [mptcp-next 0/2] Fix mptcp connection hangs after link failover Florian Westphal
  2021-09-06 13:10 ` [PATCH mptcp-next v2 1/2] mptcp: remove tx_pending_data Florian Westphal
  2021-09-06 13:10 ` [PATCH mptcp-next v2 2/2] mptcp: re-arm retransmit timer if data is pending Florian Westphal
@ 2021-09-08  1:06 ` Mat Martineau
  2021-09-08  8:20   ` Florian Westphal
  2021-09-11  5:56 ` Matthieu Baerts
  3 siblings, 1 reply; 9+ messages in thread
From: Mat Martineau @ 2021-09-08  1:06 UTC (permalink / raw)
  To: Florian Westphal; +Cc: mptcp

On Mon, 6 Sep 2021, Florian Westphal wrote:

> First patch is preparation work: tx_pending_data counter is unreliable.
> Second patch fixes premature stop of the retransmit timer.
>
> Florian Westphal (2):
>  mptcp: remove tx_pending_data
>  mptcp: re-set push-pending bit on retransmit failure
>
> net/mptcp/protocol.c | 32 +++++++++++++++++++++++++-------
> net/mptcp/protocol.h |  1 -
> 2 files changed, 25 insertions(+), 8 deletions(-)
>
> -- 
> 2.32.0

The code changes look ok, and I don't see any copyfd_io_poll errors. But I 
do get consistent failures in the same group of self tests related to 
stale links and recovery:

15 multiple flows, signal, link failure syn[ ok ] - synack[ ok ] - ack[ ok 
]
                                         add[ ok ] - echo  [ ok ]
                                         stale             [fail] got 0 
stale[s] 0 recover[s],   expected stale in range [1..5],  stale-recover 
delta 1
ns2-0-eBlFRK stats
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode 
DEFAULT group default qlen 1000
     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
     RX: bytes  packets  errors  dropped missed  mcast
     1728       3        0       0       0       0
     TX: bytes  packets  errors  dropped carrier collsns
     1728       3        0       0       0       0
2: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group 
default qlen 1000
     link/sit 0.0.0.0 brd 0.0.0.0
     RX: bytes  packets  errors  dropped missed  mcast
     0          0        0       0       0       0
     TX: bytes  packets  errors  dropped carrier collsns
     0          0        0       0       0       0
3: ns2eth1@if3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc netem 
state DOWN mode DEFAULT group default qlen 1000
     link/ether 1a:ff:34:f1:09:0c brd ff:ff:ff:ff:ff:ff link-netns 
ns1-0-eBlFRK
     RX: bytes  packets  errors  dropped missed  mcast
     93900      1188     0       0       0       0
     TX: bytes  packets  errors  dropped carrier collsns
     3625894    1491     0       0       0       0
4: ns2eth2@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc netem 
state UP mode DEFAULT group default qlen 1000
     link/ether 86:8e:f2:62:4d:ab brd ff:ff:ff:ff:ff:ff link-netns 
ns1-0-eBlFRK
     RX: bytes  packets  errors  dropped missed  mcast
     186852     2394     0       0       0       0
     TX: bytes  packets  errors  dropped carrier collsns
     8018037    2641     0       0       0       0
5: ns2eth3@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc netem 
state UP mode DEFAULT group default qlen 1000
     link/ether 6e:bb:bc:a6:6c:e4 brd ff:ff:ff:ff:ff:ff link-netns 
ns1-0-eBlFRK
     RX: bytes  packets  errors  dropped missed  mcast
     192608     2468     0       0       0       0
     TX: bytes  packets  errors  dropped carrier collsns
     8732438    2874     0       0       0       0
6: ns2eth4@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc netem 
state UP mode DEFAULT group default qlen 1000
     link/ether a2:c9:68:39:8c:68 brd ff:ff:ff:ff:ff:ff link-netns 
ns1-0-eBlFRK
     RX: bytes  packets  errors  dropped missed  mcast
     178072     2282     0       0       0       0
     TX: bytes  packets  errors  dropped carrier collsns
     7742481    2508     0       0       0       0
MPTcpExtMPCapableSYNTX          1                  0.0
MPTcpExtMPCapableSYNACKRX       1                  0.0
MPTcpExtMPTCPRetrans            84                 0.0
MPTcpExtMPJoinSynAckRx          3                  0.0
MPTcpExtAddAddr                 1                  0.0
Created /tmp/tmp.VNjv8beZVJ (size 4096 KB) containing data sent by server
16 multi flows, signal, bidi, link fail syn[ ok ] - synack[ ok ] - ack[ ok 
]
                                         add[ ok ] - echo  [ ok ]
                                         stale             [fail] got 0 
stale[s] 0 recover[s],   expected stale in range [1..-1],  stale-recover 
delta 1
ns2-0-McIAzy stats
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode 
DEFAULT group default qlen 1000
     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
     RX: bytes  packets  errors  dropped missed  mcast
     2304       4        0       0       0       0
     TX: bytes  packets  errors  dropped carrier collsns
     2304       4        0       0       0       0
2: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group 
default qlen 1000
     link/sit 0.0.0.0 brd 0.0.0.0
     RX: bytes  packets  errors  dropped missed  mcast
     0          0        0       0       0       0
     TX: bytes  packets  errors  dropped carrier collsns
     0          0        0       0       0       0
3: ns2eth1@if3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc netem 
state DOWN mode DEFAULT group default qlen 1000
     link/ether 5a:55:55:d0:62:02 brd ff:ff:ff:ff:ff:ff link-netns 
ns1-0-McIAzy
     RX: bytes  packets  errors  dropped missed  mcast
     1462092    1435     0       0       0       0
     TX: bytes  packets  errors  dropped carrier collsns
     3174798    1449     0       0       0       0
4: ns2eth2@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc netem 
state UP mode DEFAULT group default qlen 1000
     link/ether ee:c5:e3:b4:56:91 brd ff:ff:ff:ff:ff:ff link-netns 
ns1-0-McIAzy
     RX: bytes  packets  errors  dropped missed  mcast
     1212944    2864     0       0       0       0
     TX: bytes  packets  errors  dropped carrier collsns
     9191887    3021     0       0       0       0
5: ns2eth3@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc netem 
state UP mode DEFAULT group default qlen 1000
     link/ether ce:6f:56:5f:4a:85 brd ff:ff:ff:ff:ff:ff link-netns 
ns1-0-McIAzy
     RX: bytes  packets  errors  dropped missed  mcast
     896684     2618     0       0       0       0
     TX: bytes  packets  errors  dropped carrier collsns
     8202792    2745     0       0       0       0
6: ns2eth4@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc netem 
state UP mode DEFAULT group default qlen 1000
     link/ether 56:7a:df:33:db:f9 brd ff:ff:ff:ff:ff:ff link-netns 
ns1-0-McIAzy
     RX: bytes  packets  errors  dropped missed  mcast
     1405836    2366     0       0       0       0
     TX: bytes  packets  errors  dropped carrier collsns
     7566286    2745     0       0       0       0
MPTcpExtMPCapableSYNTX          1                  0.0
MPTcpExtMPCapableSYNACKRX       1                  0.0
MPTcpExtMPTCPRetrans            105                0.0
MPTcpExtMPJoinSynAckRx          3                  0.0
MPTcpExtOFOQueueTail            319                0.0
MPTcpExtOFOQueue                1012               0.0
MPTcpExtOFOMerge                322                0.0
MPTcpExtDuplicateData           5                  0.0
MPTcpExtAddAddr                 1                  0.0
17 backup subflow unused, link failure  syn[ ok ] - synack[ ok ] - ack[ ok 
]
                                         add[ ok ] - echo  [ ok ]
                                         link usage        [fail] got 39% 
usage, expected 0%
18 backup flow used, multi links fail   syn[ ok ] - synack[ ok ] - ack[ ok 
]
                                         add[ ok ] - echo  [ ok ]
                                         stale             [fail] got 0 
stale[s] 0 recover[s],   expected stale in range [2..4],  stale-recover 
delta 2
ns2-0-N0wvlQ stats
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode 
DEFAULT group default qlen 1000
     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
     RX: bytes  packets  errors  dropped missed  mcast
     11052      20       0       0       0       0
     TX: bytes  packets  errors  dropped carrier collsns
     11052      20       0       0       0       0
2: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group 
default qlen 1000
     link/sit 0.0.0.0 brd 0.0.0.0
     RX: bytes  packets  errors  dropped missed  mcast
     0          0        0       0       0       0
     TX: bytes  packets  errors  dropped carrier collsns
     0          0        0       0       0       0
3: ns2eth1@if3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc netem 
state DOWN mode DEFAULT group default qlen 1000
     link/ether 2a:6f:3d:be:d2:84 brd ff:ff:ff:ff:ff:ff link-netns 
ns1-0-N0wvlQ
     RX: bytes  packets  errors  dropped missed  mcast
     128512     1632     0       0       0       0
     TX: bytes  packets  errors  dropped carrier collsns
     4726279    1678     0       0       0       0
4: ns2eth2@if4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc netem 
state DOWN mode DEFAULT group default qlen 1000
     link/ether 52:e2:d2:54:cd:18 brd ff:ff:ff:ff:ff:ff link-netns 
ns1-0-N0wvlQ
     RX: bytes  packets  errors  dropped missed  mcast
     126524     1620     0       0       0       0
     TX: bytes  packets  errors  dropped carrier collsns
     4682855    1665     0       7       0       0
5: ns2eth3@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc netem 
state UP mode DEFAULT group default qlen 1000
     link/ether fe:a5:89:cc:e8:8e brd ff:ff:ff:ff:ff:ff link-netns 
ns1-0-N0wvlQ
     RX: bytes  packets  errors  dropped missed  mcast
     378726     4855     0       0       0       0
     TX: bytes  packets  errors  dropped carrier collsns
     18615840   5084     0       0       0       0
6: ns2eth4@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc netem 
state UP mode DEFAULT group default qlen 1000
     link/ether 4e:90:1f:7e:d2:50 brd ff:ff:ff:ff:ff:ff link-netns 
ns1-0-N0wvlQ
     RX: bytes  packets  errors  dropped missed  mcast
     806        9        0       0       0       0
     TX: bytes  packets  errors  dropped carrier collsns
     1026       11       0       0       0       0
MPTcpExtMPCapableSYNTX          1                  0.0
MPTcpExtMPCapableSYNACKRX       1                  0.0
MPTcpExtMPTCPRetrans            430                0.0
MPTcpExtMPJoinSynAckRx          2                  0.0
MPTcpExtAddAddr                 1                  0.0
                                         link usage        [fail] got 68% 
usage, expected 50%
19 backup flow used, bidi, link failure syn[ ok ] - synack[ ok ] - ack[ ok 
]
                                         add[ ok ] - echo  [ ok ]
                                         stale             [fail] got 0 
stale[s] 0 recover[s],   expected stale in range [1..-1],  stale-recover 
delta 2
ns2-0-Fvgunu stats
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode 
DEFAULT group default qlen 1000
     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
     RX: bytes  packets  errors  dropped missed  mcast
     10476      19       0       0       0       0
     TX: bytes  packets  errors  dropped carrier collsns
     10476      19       0       0       0       0
2: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group 
default qlen 1000
     link/sit 0.0.0.0 brd 0.0.0.0
     RX: bytes  packets  errors  dropped missed  mcast
     0          0        0       0       0       0
     TX: bytes  packets  errors  dropped carrier collsns
     0          0        0       0       0       0
3: ns2eth1@if3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc netem 
state DOWN mode DEFAULT group default qlen 1000
     link/ether 42:07:fd:15:b9:5d brd ff:ff:ff:ff:ff:ff link-netns 
ns1-0-Fvgunu
     RX: bytes  packets  errors  dropped missed  mcast
     2251816    1945     0       0       0       0
     TX: bytes  packets  errors  dropped carrier collsns
     4705486    2003     0       0       0       0
4: ns2eth2@if4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc netem 
state DOWN mode DEFAULT group default qlen 1000
     link/ether 26:6f:86:e8:ba:8a brd ff:ff:ff:ff:ff:ff link-netns 
ns1-0-Fvgunu
     RX: bytes  packets  errors  dropped missed  mcast
     2272210    1944     0       0       0       0
     TX: bytes  packets  errors  dropped carrier collsns
     4195868    1994     0       8       0       0
5: ns2eth3@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc netem 
state UP mode DEFAULT group default qlen 1000
     link/ether 76:9f:8a:36:57:d0 brd ff:ff:ff:ff:ff:ff link-netns 
ns1-0-Fvgunu
     RX: bytes  packets  errors  dropped missed  mcast
     404006     5169     0       0       0       0
     TX: bytes  packets  errors  dropped carrier collsns
     19168550   5433     0       0       0       0
6: ns2eth4@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc netem 
state UP mode DEFAULT group default qlen 1000
     link/ether ca:9e:ce:5e:8f:d0 brd ff:ff:ff:ff:ff:ff link-netns 
ns1-0-Fvgunu
     RX: bytes  packets  errors  dropped missed  mcast
     806        9        0       0       0       0
     TX: bytes  packets  errors  dropped carrier collsns
     1026       11       0       0       0       0
MPTcpExtMPCapableSYNTX          1                  0.0
MPTcpExtMPCapableSYNACKRX       1                  0.0
MPTcpExtMPTCPRetrans            419                0.0
MPTcpExtMPJoinSynAckRx          2                  0.0
MPTcpExtOFOQueueTail            677                0.0
MPTcpExtOFOQueue                733                0.0
MPTcpExtOFOMerge                513                0.0
MPTcpExtAddAddr                 1                  0.0
                                         link usage        [fail] got 70% 
usage, expected 50%


--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [mptcp-next 0/2] Fix mptcp connection hangs after link failover
  2021-09-08  1:06 ` [mptcp-next 0/2] Fix mptcp connection hangs after link failover Mat Martineau
@ 2021-09-08  8:20   ` Florian Westphal
  2021-09-08 15:42     ` Florian Westphal
  0 siblings, 1 reply; 9+ messages in thread
From: Florian Westphal @ 2021-09-08  8:20 UTC (permalink / raw)
  To: Mat Martineau; +Cc: Florian Westphal, mptcp

Mat Martineau <mathew.j.martineau@linux.intel.com> wrote:
> On Mon, 6 Sep 2021, Florian Westphal wrote:
> 
> > First patch is preparation work: tx_pending_data counter is unreliable.
> > Second patch fixes premature stop of the retransmit timer.
> > 
> > Florian Westphal (2):
> >  mptcp: remove tx_pending_data
> >  mptcp: re-set push-pending bit on retransmit failure
> > 
> > net/mptcp/protocol.c | 32 +++++++++++++++++++++++++-------
> > net/mptcp/protocol.h |  1 -
> > 2 files changed, 25 insertions(+), 8 deletions(-)
> > 
> > -- 
> > 2.32.0
> 
> The code changes look ok, and I don't see any copyfd_io_poll errors. But I
> do get consistent failures in the same group of self tests related to stale
> links and recovery:
> 
> 15 multiple flows, signal, link failure syn[ ok ] - synack[ ok ] - ack[ ok ]
>                                         add[ ok ] - echo  [ ok ]
>                                         stale             [fail] got 0
> stale[s] 0 recover[s],   expected stale in range [1..5],  stale-recover
> delta 1

I'm on export, 9c7d1b9a14eba479466423d64f99c8a4e29b66f4, with these two
patches and I don't see this error.

Running mptcp_join -l in a loop now, no luck so far.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [mptcp-next 0/2] Fix mptcp connection hangs after link failover
  2021-09-08  8:20   ` Florian Westphal
@ 2021-09-08 15:42     ` Florian Westphal
  2021-09-08 18:35       ` Mat Martineau
  0 siblings, 1 reply; 9+ messages in thread
From: Florian Westphal @ 2021-09-08 15:42 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Mat Martineau, mptcp

Florian Westphal <fw@strlen.de> wrote:
> Mat Martineau <mathew.j.martineau@linux.intel.com> wrote:
> > On Mon, 6 Sep 2021, Florian Westphal wrote:
> > 
> > > First patch is preparation work: tx_pending_data counter is unreliable.
> > > Second patch fixes premature stop of the retransmit timer.
> > > 
> > > Florian Westphal (2):
> > >  mptcp: remove tx_pending_data
> > >  mptcp: re-set push-pending bit on retransmit failure
> > > 
> > > net/mptcp/protocol.c | 32 +++++++++++++++++++++++++-------
> > > net/mptcp/protocol.h |  1 -
> > > 2 files changed, 25 insertions(+), 8 deletions(-)
> > > 
> > > -- 
> > > 2.32.0
> > 
> > The code changes look ok, and I don't see any copyfd_io_poll errors. But I
> > do get consistent failures in the same group of self tests related to stale
> > links and recovery:
> > 
> > 15 multiple flows, signal, link failure syn[ ok ] - synack[ ok ] - ack[ ok ]
> >                                         add[ ok ] - echo  [ ok ]
> >                                         stale             [fail] got 0
> > stale[s] 0 recover[s],   expected stale in range [1..5],  stale-recover
> > delta 1
> 
> I'm on export, 9c7d1b9a14eba479466423d64f99c8a4e29b66f4, with these two
> patches and I don't see this error.
> 
> Running mptcp_join -l in a loop now, no luck so far.

Gave up, its not triggering for me.

Any hints on reproducing this?

Does it pass for you without my changes?

I don't see how they would cause this; if anything this patch makes
the stale detection reliable because the retrans timer is not stopped
too early anymore and it makes sure that mptcp_subflow_get_retrans() is
called once for each retrans timer call.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [mptcp-next 0/2] Fix mptcp connection hangs after link failover
  2021-09-08 15:42     ` Florian Westphal
@ 2021-09-08 18:35       ` Mat Martineau
  0 siblings, 0 replies; 9+ messages in thread
From: Mat Martineau @ 2021-09-08 18:35 UTC (permalink / raw)
  To: Florian Westphal; +Cc: mptcp

On Wed, 8 Sep 2021, Florian Westphal wrote:

> Florian Westphal <fw@strlen.de> wrote:
>> Mat Martineau <mathew.j.martineau@linux.intel.com> wrote:
>>> On Mon, 6 Sep 2021, Florian Westphal wrote:
>>>
>>>> First patch is preparation work: tx_pending_data counter is unreliable.
>>>> Second patch fixes premature stop of the retransmit timer.
>>>>
>>>> Florian Westphal (2):
>>>>  mptcp: remove tx_pending_data
>>>>  mptcp: re-set push-pending bit on retransmit failure
>>>>
>>>> net/mptcp/protocol.c | 32 +++++++++++++++++++++++++-------
>>>> net/mptcp/protocol.h |  1 -
>>>> 2 files changed, 25 insertions(+), 8 deletions(-)
>>>>
>>>> --
>>>> 2.32.0
>>>
>>> The code changes look ok, and I don't see any copyfd_io_poll errors. But I
>>> do get consistent failures in the same group of self tests related to stale
>>> links and recovery:
>>>
>>> 15 multiple flows, signal, link failure syn[ ok ] - synack[ ok ] - ack[ ok ]
>>>                                         add[ ok ] - echo  [ ok ]
>>>                                         stale             [fail] got 0
>>> stale[s] 0 recover[s],   expected stale in range [1..5],  stale-recover
>>> delta 1
>>
>> I'm on export, 9c7d1b9a14eba479466423d64f99c8a4e29b66f4, with these two
>> patches and I don't see this error.
>>
>> Running mptcp_join -l in a loop now, no luck so far.
>
> Gave up, its not triggering for me.
>
> Any hints on reproducing this?
>
> Does it pass for you without my changes?
>
> I don't see how they would cause this; if anything this patch makes
> the stale detection reliable because the retrans timer is not stopped
> too early anymore and it makes sure that mptcp_subflow_get_retrans() is
> called once for each retrans timer call.
>

Hi Florian -

My apologies for wasting your time on this, it was 100% user error on my 
side. With everything set up right on my end, the tests are passing.

So, the changes look good:

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [mptcp-next 0/2] Fix mptcp connection hangs after link failover
  2021-09-06 13:10 [mptcp-next 0/2] Fix mptcp connection hangs after link failover Florian Westphal
                   ` (2 preceding siblings ...)
  2021-09-08  1:06 ` [mptcp-next 0/2] Fix mptcp connection hangs after link failover Mat Martineau
@ 2021-09-11  5:56 ` Matthieu Baerts
  3 siblings, 0 replies; 9+ messages in thread
From: Matthieu Baerts @ 2021-09-11  5:56 UTC (permalink / raw)
  To: Florian Westphal, Mat Martineau, Paolo Abeni; +Cc: mptcp

Hi Florian, Paolo, Mat,

On 06/09/2021 15:10, Florian Westphal wrote:
> First patch is preparation work: tx_pending_data counter is unreliable.
> Second patch fixes premature stop of the retransmit timer.
> 
> Florian Westphal (2):
>   mptcp: remove tx_pending_data
>   mptcp: re-set push-pending bit on retransmit failure

Thank you for the patches, reviews and testing!

It is now in our tree (net-next features) with Paolo's Ack and Mat's RvB
tag:

- 4c18199dfdd0: mptcp: remove tx_pending_data
- 7fdafa1b84f9: mptcp: re-arm retransmit timer if data is pending
- Results: 84da6403b775..fecd1f9bf19f

Builds and tests are now in progress:

https://cirrus-ci.com/github/multipath-tcp/mptcp_net-next/export/20210911T055615
https://github.com/multipath-tcp/mptcp_net-next/actions/workflows/build-validation.yml?query=branch:export/20210911T055615

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-09-11  5:56 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-06 13:10 [mptcp-next 0/2] Fix mptcp connection hangs after link failover Florian Westphal
2021-09-06 13:10 ` [PATCH mptcp-next v2 1/2] mptcp: remove tx_pending_data Florian Westphal
2021-09-06 13:10 ` [PATCH mptcp-next v2 2/2] mptcp: re-arm retransmit timer if data is pending Florian Westphal
2021-09-06 13:42   ` Paolo Abeni
2021-09-08  1:06 ` [mptcp-next 0/2] Fix mptcp connection hangs after link failover Mat Martineau
2021-09-08  8:20   ` Florian Westphal
2021-09-08 15:42     ` Florian Westphal
2021-09-08 18:35       ` Mat Martineau
2021-09-11  5:56 ` Matthieu Baerts

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.