All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v2] tcp: avoid reducing cwnd when ACK+DSACK is received
@ 2015-01-08 12:20 Sébastien Barré
  2015-01-08 15:07 ` Eric Dumazet
  2015-01-08 15:19 ` Eric Dumazet
  0 siblings, 2 replies; 19+ messages in thread
From: Sébastien Barré @ 2015-01-08 12:20 UTC (permalink / raw)
  To: David Miller
  Cc: Sébastien Barré,
	netdev, Gregory Detal, Nandita Dukkipati, Yuchung Cheng,
	Neal Cardwell

When the peer has delayed ack enabled, it may reply to a probe with an
ACK+D-SACK, with ack value set to tlp_high_seq. In the current code,
such ACK+DSACK will be missed and only at next, higher ack will the TLP
episode be considered done. Since the DSACK is not present anymore,
this will cost a cwnd reduction.

This patch ensures that this scenario does not cause a cwnd reduction, since
receiving an ACK+DSACK indicates that both the initial segment and the probe
have been received by the peer.

Cc: Gregory Detal <gregory.detal@uclouvain.be>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Sébastien Barré <sebastien.barre@uclouvain.be>

---

Changes:
Applied Neal's comments:
-adapted commit first line
-moved logic to if condition, and removed is_tlp_dupack

Thanks Neal for those comments !

 net/ipv4/tcp_input.c | 30 +++++++++++++-----------------
 1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 075ab4d..cf63a29 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3363,29 +3363,25 @@ static void tcp_replace_ts_recent(struct tcp_sock *tp, u32 seq)
 static void tcp_process_tlp_ack(struct sock *sk, u32 ack, int flag)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
-	bool is_tlp_dupack = (ack == tp->tlp_high_seq) &&
-			     !(flag & (FLAG_SND_UNA_ADVANCED |
-				       FLAG_NOT_DUP | FLAG_DATA_SACKED));
 
 	/* Mark the end of TLP episode on receiving TLP dupack or when
 	 * ack is after tlp_high_seq.
+	 * With delayed acks, we may also get a regular ACK+DSACK, in which
+	 * case we don't want to reduce the cwnd either.
 	 */
-	if (is_tlp_dupack) {
+	if (((ack == tp->tlp_high_seq) &&
+	     !(flag & (FLAG_SND_UNA_ADVANCED |
+		       FLAG_NOT_DUP | FLAG_DATA_SACKED))) ||
+	    (!before(ack, tp->tlp_high_seq) && (flag & FLAG_DSACKING_ACK))) {
 		tp->tlp_high_seq = 0;
-		return;
-	}
-
-	if (after(ack, tp->tlp_high_seq)) {
+	} else if (after(ack, tp->tlp_high_seq)) {
 		tp->tlp_high_seq = 0;
-		/* Don't reduce cwnd if DSACK arrives for TLP retrans. */
-		if (!(flag & FLAG_DSACKING_ACK)) {
-			tcp_init_cwnd_reduction(sk);
-			tcp_set_ca_state(sk, TCP_CA_CWR);
-			tcp_end_cwnd_reduction(sk);
-			tcp_try_keep_open(sk);
-			NET_INC_STATS_BH(sock_net(sk),
-					 LINUX_MIB_TCPLOSSPROBERECOVERY);
-		}
+		tcp_init_cwnd_reduction(sk);
+		tcp_set_ca_state(sk, TCP_CA_CWR);
+		tcp_end_cwnd_reduction(sk);
+		tcp_try_keep_open(sk);
+		NET_INC_STATS_BH(sock_net(sk),
+				 LINUX_MIB_TCPLOSSPROBERECOVERY);
 	}
 }
 
-- 
tg: (44d84d7..) net-next/tlp-dsack-handling (depends on: net-next/master)

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v2] tcp: avoid reducing cwnd when ACK+DSACK is received
  2015-01-08 12:20 [PATCH net-next v2] tcp: avoid reducing cwnd when ACK+DSACK is received Sébastien Barré
@ 2015-01-08 15:07 ` Eric Dumazet
  2015-01-08 15:24   ` Sébastien Barré
  2015-01-08 15:19 ` Eric Dumazet
  1 sibling, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2015-01-08 15:07 UTC (permalink / raw)
  To: Sébastien Barré
  Cc: David Miller, netdev, Gregory Detal, Nandita Dukkipati,
	Yuchung Cheng, Neal Cardwell

On Thu, 2015-01-08 at 13:20 +0100, Sébastien Barré wrote:
> When the peer has delayed ack enabled, it may reply to a probe with an
> ACK+D-SACK, with ack value set to tlp_high_seq.


The most probable cause would not be a delayed ack (rto should be much
bigger), but a lost ACK.


>  In the current code,
> such ACK+DSACK will be missed and only at next, higher ack will the TLP
> episode be considered done. Since the DSACK is not present anymore,
> this will cost a cwnd reduction.
> 
> This patch ensures that this scenario does not cause a cwnd reduction, since
> receiving an ACK+DSACK indicates that both the initial segment and the probe
> have been received by the peer.

Do you have at hand a packetdrill test to demonstrate that the patch
works ?

Thanks !

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v2] tcp: avoid reducing cwnd when ACK+DSACK is received
  2015-01-08 12:20 [PATCH net-next v2] tcp: avoid reducing cwnd when ACK+DSACK is received Sébastien Barré
  2015-01-08 15:07 ` Eric Dumazet
@ 2015-01-08 15:19 ` Eric Dumazet
  2015-01-08 15:39   ` Sébastien Barré
  1 sibling, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2015-01-08 15:19 UTC (permalink / raw)
  To: Sébastien Barré
  Cc: David Miller, netdev, Gregory Detal, Nandita Dukkipati,
	Yuchung Cheng, Neal Cardwell

On Thu, 2015-01-08 at 13:20 +0100, Sébastien Barré wrote:
> When the peer has delayed ack enabled, it may reply to a probe with an
> ACK+D-SACK, with ack value set to tlp_high_seq. In the current code,
> such ACK+DSACK will be missed and only at next, higher ack will the TLP
> episode be considered done. Since the DSACK is not present anymore,
> this will cost a cwnd reduction.
> 
...
>  net/ipv4/tcp_input.c | 30 +++++++++++++-----------------
>  1 file changed, 13 insertions(+), 17 deletions(-)
> 
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 075ab4d..cf63a29 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -3363,29 +3363,25 @@ static void tcp_replace_ts_recent(struct tcp_sock *tp, u32 seq)
>  static void tcp_process_tlp_ack(struct sock *sk, u32 ack, int flag)
>  {
>  	struct tcp_sock *tp = tcp_sk(sk);
> -	bool is_tlp_dupack = (ack == tp->tlp_high_seq) &&
> -			     !(flag & (FLAG_SND_UNA_ADVANCED |
> -				       FLAG_NOT_DUP | FLAG_DATA_SACKED));
>  

No idea why you removed this bool, it really helped to understand the
code. This makes your patch looks more complex than needed.

>  	/* Mark the end of TLP episode on receiving TLP dupack or when
>  	 * ack is after tlp_high_seq.
> +	 * With delayed acks, we may also get a regular ACK+DSACK, in which
> +	 * case we don't want to reduce the cwnd either.
>  	 */
> -	if (is_tlp_dupack) {
> +	if (((ack == tp->tlp_high_seq) &&
> +	     !(flag & (FLAG_SND_UNA_ADVANCED |
> +		       FLAG_NOT_DUP | FLAG_DATA_SACKED))) ||
> +	    (!before(ack, tp->tlp_high_seq) && (flag & FLAG_DSACKING_ACK))) {
>  		tp->tlp_high_seq = 0;
> -		return;
> -	}

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v2] tcp: avoid reducing cwnd when ACK+DSACK is received
  2015-01-08 15:07 ` Eric Dumazet
@ 2015-01-08 15:24   ` Sébastien Barré
  2015-01-08 15:43     ` Neal Cardwell
  2015-01-08 15:59     ` Eric Dumazet
  0 siblings, 2 replies; 19+ messages in thread
From: Sébastien Barré @ 2015-01-08 15:24 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev, Gregory Detal, Nandita Dukkipati,
	Yuchung Cheng, Neal Cardwell

Hi Eric,

Le 08/01/2015 16:07, Eric Dumazet a écrit :
>
>>   In the current code,
>> such ACK+DSACK will be missed and only at next, higher ack will the TLP
>> episode be considered done. Since the DSACK is not present anymore,
>> this will cost a cwnd reduction.
>>
>> This patch ensures that this scenario does not cause a cwnd reduction, since
>> receiving an ACK+DSACK indicates that both the initial segment and the probe
>> have been received by the peer.
> Do you have at hand a packetdrill test to demonstrate that the patch
> works ?
Not currently, but that would be very good indeed.
I will prepare one and send it.

regards,

Sébastien.
>
> Thanks !
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v2] tcp: avoid reducing cwnd when ACK+DSACK is received
  2015-01-08 15:19 ` Eric Dumazet
@ 2015-01-08 15:39   ` Sébastien Barré
  2015-01-08 15:49     ` Neal Cardwell
  0 siblings, 1 reply; 19+ messages in thread
From: Sébastien Barré @ 2015-01-08 15:39 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev, Gregory Detal, Nandita Dukkipati,
	Yuchung Cheng, Neal Cardwell


Le 08/01/2015 16:19, Eric Dumazet a écrit :
> On Thu, 2015-01-08 at 13:20 +0100, Sébastien Barré wrote:
>> When the peer has delayed ack enabled, it may reply to a probe with an
>> ACK+D-SACK, with ack value set to tlp_high_seq. In the current code,
>> such ACK+DSACK will be missed and only at next, higher ack will the TLP
>> episode be considered done. Since the DSACK is not present anymore,
>> this will cost a cwnd reduction.
>>
> ...
>>   net/ipv4/tcp_input.c | 30 +++++++++++++-----------------
>>   1 file changed, 13 insertions(+), 17 deletions(-)
>>
>> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
>> index 075ab4d..cf63a29 100644
>> --- a/net/ipv4/tcp_input.c
>> +++ b/net/ipv4/tcp_input.c
>> @@ -3363,29 +3363,25 @@ static void tcp_replace_ts_recent(struct tcp_sock *tp, u32 seq)
>>   static void tcp_process_tlp_ack(struct sock *sk, u32 ack, int flag)
>>   {
>>   	struct tcp_sock *tp = tcp_sk(sk);
>> -	bool is_tlp_dupack = (ack == tp->tlp_high_seq) &&
>> -			     !(flag & (FLAG_SND_UNA_ADVANCED |
>> -				       FLAG_NOT_DUP | FLAG_DATA_SACKED));
>>   
> No idea why you removed this bool, it really helped to understand the
> code. This makes your patch looks more complex than needed.
I did not remove it in v1 
(http://www.spinics.net/lists/netdev/msg308653.html)
The removal was a request from Neal 
(http://www.spinics.net/lists/netdev/msg308758.html)

I think he found it special to have part of the logic apart, in a bool, 
and part of it in the if condition.
One possible option is to restore is_tlp_dupack and include the DSACK 
check in it, although this will
not do much more than moving complexity in the bool definition. But 
indeed, that might make
the patch more readable.

What do you and Neal think ?

regards,

Sébastien.
>
>>   	/* Mark the end of TLP episode on receiving TLP dupack or when
>>   	 * ack is after tlp_high_seq.
>> +	 * With delayed acks, we may also get a regular ACK+DSACK, in which
>> +	 * case we don't want to reduce the cwnd either.
>>   	 */
>> -	if (is_tlp_dupack) {
>> +	if (((ack == tp->tlp_high_seq) &&
>> +	     !(flag & (FLAG_SND_UNA_ADVANCED |
>> +		       FLAG_NOT_DUP | FLAG_DATA_SACKED))) ||
>> +	    (!before(ack, tp->tlp_high_seq) && (flag & FLAG_DSACKING_ACK))) {
>>   		tp->tlp_high_seq = 0;
>> -		return;
>> -	}
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v2] tcp: avoid reducing cwnd when ACK+DSACK is received
  2015-01-08 15:24   ` Sébastien Barré
@ 2015-01-08 15:43     ` Neal Cardwell
  2015-01-08 16:00       ` Sébastien Barré
  2015-01-08 15:59     ` Eric Dumazet
  1 sibling, 1 reply; 19+ messages in thread
From: Neal Cardwell @ 2015-01-08 15:43 UTC (permalink / raw)
  To: Sébastien Barré
  Cc: Eric Dumazet, David Miller, Netdev, Gregory Detal,
	Nandita Dukkipati, Yuchung Cheng

> Le 08/01/2015 16:07, Eric Dumazet a écrit :
>> Do you have at hand a packetdrill test to demonstrate that the patch
>> works ?

I cooked up the packetdrill test below when Sebastien sent out his v1
a few weeks ago. It fails on a kernel without his patch, and passes on
a kernel with his patch.

The code change looks fine to me, but if Eric prefers that the
expression be assigned to a bool before the check, that also sounds
fine to me.

neal


------------
// Establish a connection.
0     socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0     setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0    bind(3, ..., ...) = 0
+0    listen(3, 1) = 0

+0    < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
+0    > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6>
+.020 < . 1:1(0) ack 1 win 257
+0    accept(3, ..., ...) = 4

// Send 1 packet.
+0    write(4, ..., 1000) = 1000
+0    > P. 1:1001(1000) ack 1

// Loss probe retransmission.
// packets_out == 1 => schedule PTO in max(2*RTT, 1.5*RTT + 200ms)
// In this case, this means: 1.5*RTT + 200ms = 230ms
+.230 > P. 1:1001(1000) ack 1
+0    %{ assert tcpi_snd_cwnd == 10 }%

// Receiver ACKs at tlp_high_seq with a DSACK,
// indicating they received the original packet and probe.
+.020 < . 1:1(0) ack 1001 win 257 <sack 1:1001,nop,nop>
+0    %{ assert tcpi_snd_cwnd == 10 }%

// Send another packet.
+0    write(4, ..., 1000) = 1000
+0    > P. 1001:2001(1000) ack 1

// Receiver ACKs above tlp_high_seq, which should end the TLP episode
// if we haven't already. We should not reduce cwnd.
+.020 < . 1:1(0) ack 2001 win 257
+0    %{ assert tcpi_snd_cwnd == 10, tcpi_snd_cwnd }%

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v2] tcp: avoid reducing cwnd when ACK+DSACK is received
  2015-01-08 15:39   ` Sébastien Barré
@ 2015-01-08 15:49     ` Neal Cardwell
  2015-01-08 15:52       ` Neal Cardwell
  2015-01-08 16:25       ` Eric Dumazet
  0 siblings, 2 replies; 19+ messages in thread
From: Neal Cardwell @ 2015-01-08 15:49 UTC (permalink / raw)
  To: Sébastien Barré
  Cc: Eric Dumazet, David Miller, Netdev, Gregory Detal,
	Nandita Dukkipati, Yuchung Cheng

On Thu, Jan 8, 2015 at 10:39 AM, Sébastien Barré
<sebastien.barre@uclouvain.be> wrote:
> What do you and Neal think ?

My preference is to have the whole expression detecting the case where
the receiver got the probe packet encoded in a single expression. I
don't have a strong feeling about whether it should be stored in a
bool (to reduce the size of the diff) or written directly into the if
() expression (to reduce the size of the code). I'll defer to Eric on
which he thinks is better. :-)

neal

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v2] tcp: avoid reducing cwnd when ACK+DSACK is received
  2015-01-08 15:49     ` Neal Cardwell
@ 2015-01-08 15:52       ` Neal Cardwell
  2015-01-08 16:25       ` Eric Dumazet
  1 sibling, 0 replies; 19+ messages in thread
From: Neal Cardwell @ 2015-01-08 15:52 UTC (permalink / raw)
  To: Sébastien Barré
  Cc: Eric Dumazet, David Miller, Netdev, Gregory Detal,
	Nandita Dukkipati, Yuchung Cheng

On Thu, Jan 8, 2015 at 10:49 AM, Neal Cardwell <ncardwell@google.com> wrote:
> On Thu, Jan 8, 2015 at 10:39 AM, Sébastien Barré
> <sebastien.barre@uclouvain.be> wrote:
>> What do you and Neal think ?
>
> My preference is to have the whole expression detecting the case where
> the receiver got the probe packet encoded in a single expression.

If we do store it in a bool, then "is_tlp_dupack" is no longer quite
accurate (we're expanding the check to include ACKs that are not
dupacks), so I'd suggest a name like "is_probe_rcvd".

neal

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v2] tcp: avoid reducing cwnd when ACK+DSACK is received
  2015-01-08 15:24   ` Sébastien Barré
  2015-01-08 15:43     ` Neal Cardwell
@ 2015-01-08 15:59     ` Eric Dumazet
  1 sibling, 0 replies; 19+ messages in thread
From: Eric Dumazet @ 2015-01-08 15:59 UTC (permalink / raw)
  To: Sébastien Barré
  Cc: David Miller, netdev, Gregory Detal, Nandita Dukkipati,
	Yuchung Cheng, Neal Cardwell

On Thu, 2015-01-08 at 16:24 +0100, Sébastien Barré wrote:
> Hi Eric,
> 
> Le 08/01/2015 16:07, Eric Dumazet a écrit :
> >
> >>   In the current code,
> >> such ACK+DSACK will be missed and only at next, higher ack will the TLP
> >> episode be considered done. Since the DSACK is not present anymore,
> >> this will cost a cwnd reduction.
> >>
> >> This patch ensures that this scenario does not cause a cwnd reduction, since
> >> receiving an ACK+DSACK indicates that both the initial segment and the probe
> >> have been received by the peer.
> > Do you have at hand a packetdrill test to demonstrate that the patch
> > works ?
> Not currently, but that would be very good indeed.
> I will prepare one and send it.

Here is a skeleton you might adapt :

// TLP test when a sender does not have any new segments for transmission.
// In the absence of ACKs, the sender retransmits an old segment as a probe
// after 2 RTTs.

// Tolerate RTO variations
--tolerance_usecs=15000

// Set up production config.
`../common/defaults.sh
sysctl -q net.ipv4.tcp_early_retrans=3
`

// Establish a connection.
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0

0.100  < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
+0     > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6>
+0.100 < . 1:1(0) ack 1 win 257
+0     accept(3, ..., ...) = 4

// Send 10 MSS.
+0.100 write(4, ..., 10000) = 10000
+0 > . 1:5001(5000) ack 1
+0 > P. 5001:10001(5000) ack 1

// TLP retransmission in 2 RTTs.
+0.200 > P. 9001:10001(1000) ack 1
+0 %{
assert tcpi_snd_cwnd == 10, 'tcpi_snd_cwnd=%d' % tcpi_snd_cwnd
assert tcpi_unacked == 10
}%

+0.100 < . 1:1(0) ack 10001 win 257 <sack 9001:1001,nop,nop>

+0 %{
assert tcpi_snd_cwnd == 20, 'tcpi_snd_cwnd=%d' % tcpi_snd_cwnd
assert tcpi_unacked == 0, 'tcpi_unacked=%d' % tcpi_unacked
}%

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v2] tcp: avoid reducing cwnd when ACK+DSACK is received
  2015-01-08 15:43     ` Neal Cardwell
@ 2015-01-08 16:00       ` Sébastien Barré
  0 siblings, 0 replies; 19+ messages in thread
From: Sébastien Barré @ 2015-01-08 16:00 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: Eric Dumazet, David Miller, Netdev, Gregory Detal,
	Nandita Dukkipati, Yuchung Cheng


Le 08/01/2015 16:43, Neal Cardwell a écrit :
>> Le 08/01/2015 16:07, Eric Dumazet a écrit :
>>> Do you have at hand a packetdrill test to demonstrate that the patch
>>> works ?
> I cooked up the packetdrill test below when Sebastien sent out his v1
> a few weeks ago. It fails on a kernel without his patch, and passes on
> a kernel with his patch.
Thanks a lot for this ! I will include it in our test suite then.
I understand that there is convergence on having a
bool called is_probe_rcvd with the whole logic.
Eric, is this ok for you ?

Thanks,

Sébastien.
>
> The code change looks fine to me, but if Eric prefers that the
> expression be assigned to a bool before the check, that also sounds
> fine to me.
>
> neal
>
>
> ------------
> // Establish a connection.
> 0     socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
> +0     setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> +0    bind(3, ..., ...) = 0
> +0    listen(3, 1) = 0
>
> +0    < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
> +0    > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6>
> +.020 < . 1:1(0) ack 1 win 257
> +0    accept(3, ..., ...) = 4
>
> // Send 1 packet.
> +0    write(4, ..., 1000) = 1000
> +0    > P. 1:1001(1000) ack 1
>
> // Loss probe retransmission.
> // packets_out == 1 => schedule PTO in max(2*RTT, 1.5*RTT + 200ms)
> // In this case, this means: 1.5*RTT + 200ms = 230ms
> +.230 > P. 1:1001(1000) ack 1
> +0    %{ assert tcpi_snd_cwnd == 10 }%
>
> // Receiver ACKs at tlp_high_seq with a DSACK,
> // indicating they received the original packet and probe.
> +.020 < . 1:1(0) ack 1001 win 257 <sack 1:1001,nop,nop>
> +0    %{ assert tcpi_snd_cwnd == 10 }%
>
> // Send another packet.
> +0    write(4, ..., 1000) = 1000
> +0    > P. 1001:2001(1000) ack 1
>
> // Receiver ACKs above tlp_high_seq, which should end the TLP episode
> // if we haven't already. We should not reduce cwnd.
> +.020 < . 1:1(0) ack 2001 win 257
> +0    %{ assert tcpi_snd_cwnd == 10, tcpi_snd_cwnd }%

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v2] tcp: avoid reducing cwnd when ACK+DSACK is received
  2015-01-08 15:49     ` Neal Cardwell
  2015-01-08 15:52       ` Neal Cardwell
@ 2015-01-08 16:25       ` Eric Dumazet
  2015-01-08 17:17         ` Eric Dumazet
  2015-01-09 19:43         ` Yuchung Cheng
  1 sibling, 2 replies; 19+ messages in thread
From: Eric Dumazet @ 2015-01-08 16:25 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: Sébastien Barré,
	David Miller, Netdev, Gregory Detal, Nandita Dukkipati,
	Yuchung Cheng

On Thu, 2015-01-08 at 10:49 -0500, Neal Cardwell wrote:
> On Thu, Jan 8, 2015 at 10:39 AM, Sébastien Barré
> <sebastien.barre@uclouvain.be> wrote:
> > What do you and Neal think ?
> 
> My preference is to have the whole expression detecting the case where
> the receiver got the probe packet encoded in a single expression. I
> don't have a strong feeling about whether it should be stored in a
> bool (to reduce the size of the diff) or written directly into the if
> () expression (to reduce the size of the code). I'll defer to Eric on
> which he thinks is better. :-)

There is no shame using helpers with nice names to help understand this
TCP stack. Even if the helper is used exactly once.

In this case, it seems we test 2 different conditions, so this could use
2 helpers with self describing names.

When I see : 

> +     if (((ack == tp->tlp_high_seq) &&
> +          !(flag & (FLAG_SND_UNA_ADVANCED |
> +                    FLAG_NOT_DUP | FLAG_DATA_SACKED))) ||
> +         (!before(ack, tp->tlp_high_seq) && (flag & FLAG_DSACKING_ACK))) {
>               tp->tlp_high_seq = 0;

My brain hurts.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v2] tcp: avoid reducing cwnd when ACK+DSACK is received
  2015-01-08 16:25       ` Eric Dumazet
@ 2015-01-08 17:17         ` Eric Dumazet
  2015-01-08 17:27           ` Eric Dumazet
  2015-01-09 19:43         ` Yuchung Cheng
  1 sibling, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2015-01-08 17:17 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: Sébastien Barré,
	David Miller, Netdev, Gregory Detal, Nandita Dukkipati,
	Yuchung Cheng

On Thu, 2015-01-08 at 08:25 -0800, Eric Dumazet wrote:
> On Thu, 2015-01-08 at 10:49 -0500, Neal Cardwell wrote:
> > On Thu, Jan 8, 2015 at 10:39 AM, Sébastien Barré
> > <sebastien.barre@uclouvain.be> wrote:
> > > What do you and Neal think ?
> > 
> > My preference is to have the whole expression detecting the case where
> > the receiver got the probe packet encoded in a single expression. I
> > don't have a strong feeling about whether it should be stored in a
> > bool (to reduce the size of the diff) or written directly into the if
> > () expression (to reduce the size of the code). I'll defer to Eric on
> > which he thinks is better. :-)
> 
> There is no shame using helpers with nice names to help understand this
> TCP stack. Even if the helper is used exactly once.
> 
> In this case, it seems we test 2 different conditions, so this could use
> 2 helpers with self describing names.
> 
> When I see : 
> 
> > +     if (((ack == tp->tlp_high_seq) &&
> > +          !(flag & (FLAG_SND_UNA_ADVANCED |
> > +                    FLAG_NOT_DUP | FLAG_DATA_SACKED))) ||
> > +         (!before(ack, tp->tlp_high_seq) && (flag & FLAG_DSACKING_ACK))) {
> >               tp->tlp_high_seq = 0;
> 
> My brain hurts.

BTW, tlp_high_seq is used no matter what (initial value is 0).

While the probability ACK seq being exactly 0 is small (1 out of 2^32),
probability that !before(ack, tp->tlp_high_seq) being a false positive
is very high. Initial sequence number are supposed to be random.

I wonder if setting tp->tlp_high_seq to 0 is the right thing to do.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v2] tcp: avoid reducing cwnd when ACK+DSACK is received
  2015-01-08 17:17         ` Eric Dumazet
@ 2015-01-08 17:27           ` Eric Dumazet
  0 siblings, 0 replies; 19+ messages in thread
From: Eric Dumazet @ 2015-01-08 17:27 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: Sébastien Barré,
	David Miller, Netdev, Gregory Detal, Nandita Dukkipati,
	Yuchung Cheng

On Thu, 2015-01-08 at 09:17 -0800, Eric Dumazet wrote:

> BTW, tlp_high_seq is used no matter what (initial value is 0).
> 
> While the probability ACK seq being exactly 0 is small (1 out of 2^32),
> probability that !before(ack, tp->tlp_high_seq) being a false positive
> is very high. Initial sequence number are supposed to be random.
> 

Oh well, calls to tcp_process_tlp_ack() are guarded by :

if (tp->tlp_high_seq)
    tcp_process_tlp_ack(sk, ack, flag);

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v2] tcp: avoid reducing cwnd when ACK+DSACK is received
  2015-01-08 16:25       ` Eric Dumazet
  2015-01-08 17:17         ` Eric Dumazet
@ 2015-01-09 19:43         ` Yuchung Cheng
  2015-01-09 20:36           ` Neal Cardwell
                             ` (2 more replies)
  1 sibling, 3 replies; 19+ messages in thread
From: Yuchung Cheng @ 2015-01-09 19:43 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Neal Cardwell, Sébastien Barré,
	David Miller, Netdev, Gregory Detal, Nandita Dukkipati

On Thu, Jan 8, 2015 at 8:25 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> On Thu, 2015-01-08 at 10:49 -0500, Neal Cardwell wrote:
> > On Thu, Jan 8, 2015 at 10:39 AM, Sébastien Barré
> > <sebastien.barre@uclouvain.be> wrote:
> > > What do you and Neal think ?
> >
> > My preference is to have the whole expression detecting the case where
> > the receiver got the probe packet encoded in a single expression. I
> > don't have a strong feeling about whether it should be stored in a
> > bool (to reduce the size of the diff) or written directly into the if
> > () expression (to reduce the size of the code). I'll defer to Eric on
> > which he thinks is better. :-)
>
> There is no shame using helpers with nice names to help understand this
> TCP stack. Even if the helper is used exactly once.
>
> In this case, it seems we test 2 different conditions, so this could use
> 2 helpers with self describing names.
>
> When I see :
>
> > +     if (((ack == tp->tlp_high_seq) &&
> > +          !(flag & (FLAG_SND_UNA_ADVANCED |
> > +                    FLAG_NOT_DUP | FLAG_DATA_SACKED))) ||
> > +         (!before(ack, tp->tlp_high_seq) && (flag & FLAG_DSACKING_ACK))) {
> >               tp->tlp_high_seq = 0;
>
> My brain hurts.
Sebastien: I suggest breaking down by ACK types for readability. e.g.,

/* This routine deals with acks during a TLP episode.
 * We mark the end of a TLP episode on receiving TLP dupack or when
 * ack is after tlp_high_seq.
 * Ref: loss detection algorithm in draft-dukkipati-tcpm-tcp-loss-probe.
 */
static void tcp_process_tlp_ack(struct sock *sk, u32 ack, int flag)
{
        struct tcp_sock *tp = tcp_sk(sk);

        if (before(ack, tp->tlp_high_seq))
                return;

        if (flag & FLAG_DSACKING_ACK) {
                /* This DSACK means original and TLP probe arrived; no loss */
                tp->tlp_high_seq = 0;
        } else if (after(ack, tp->tlp_high_seq)) {
                /* ACK advances: there was a loss, so reduce cwnd. Reset
                 * tlp_high_seq in tcp_init_cwnd_reduction()
                 */
                tcp_init_cwnd_reduction(sk);
                tcp_set_ca_state(sk, TCP_CA_CWR);
                tcp_end_cwnd_reduction(sk);
                tcp_try_keep_open(sk);
                NET_INC_STATS_BH(sock_net(sk),
                                 LINUX_MIB_TCPLOSSPROBERECOVERY);
        } else if (!(flag & (FLAG_SND_UNA_ADVANCED |
                             FLAG_NOT_DUP | FLAG_DATA_SACKED))) {
                /* Pure dupack: original and TLP probe arrived; no loss */
                tp->tlp_high_seq = 0;
        }
}

>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v2] tcp: avoid reducing cwnd when ACK+DSACK is received
  2015-01-09 19:43         ` Yuchung Cheng
@ 2015-01-09 20:36           ` Neal Cardwell
  2015-01-10 11:51           ` Sébastien Barré
  2015-01-12 11:52           ` David Laight
  2 siblings, 0 replies; 19+ messages in thread
From: Neal Cardwell @ 2015-01-09 20:36 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: Eric Dumazet, Sébastien Barré,
	David Miller, Netdev, Gregory Detal, Nandita Dukkipati

On Fri, Jan 9, 2015 at 2:43 PM, Yuchung Cheng <ycheng@google.com> wrote:
> Sebastien: I suggest breaking down by ACK types for readability. e.g.,

I like this approach, as well.  It breaks up the logic into smaller
pieces that are easier to comment and understand, without having
helper bool flags to read and mentally track. And this version passes
all our internal TLP tests, FWIW.

neal

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v2] tcp: avoid reducing cwnd when ACK+DSACK is received
  2015-01-09 19:43         ` Yuchung Cheng
  2015-01-09 20:36           ` Neal Cardwell
@ 2015-01-10 11:51           ` Sébastien Barré
  2015-01-10 17:37             ` Eric Dumazet
  2015-01-12 11:52           ` David Laight
  2 siblings, 1 reply; 19+ messages in thread
From: Sébastien Barré @ 2015-01-10 11:51 UTC (permalink / raw)
  To: Yuchung Cheng, Eric Dumazet
  Cc: Neal Cardwell, David Miller, Netdev, Gregory Detal, Nandita Dukkipati

All,

Le 09/01/2015 20:43, Yuchung Cheng a écrit :
>
> Sebastien: I suggest breaking down by ACK types for readability. e.g.,
>
> /* This routine deals with acks during a TLP episode.
>   * We mark the end of a TLP episode on receiving TLP dupack or when
>   * ack is after tlp_high_seq.
>   * Ref: loss detection algorithm in draft-dukkipati-tcpm-tcp-loss-probe.
>   */
> static void tcp_process_tlp_ack(struct sock *sk, u32 ack, int flag)
> {
>          struct tcp_sock *tp = tcp_sk(sk);
>
>          if (before(ack, tp->tlp_high_seq))
>                  return;
>
>          if (flag & FLAG_DSACKING_ACK) {
>                  /* This DSACK means original and TLP probe arrived; no loss */
>                  tp->tlp_high_seq = 0;
>          } else if (after(ack, tp->tlp_high_seq)) {
>                  /* ACK advances: there was a loss, so reduce cwnd. Reset
>                   * tlp_high_seq in tcp_init_cwnd_reduction()
Indeed, hadn't seen that.
>                   */
>                  tcp_init_cwnd_reduction(sk);
>                  tcp_set_ca_state(sk, TCP_CA_CWR);
>                  tcp_end_cwnd_reduction(sk);
>                  tcp_try_keep_open(sk);
>                  NET_INC_STATS_BH(sock_net(sk),
>                                   LINUX_MIB_TCPLOSSPROBERECOVERY);
>          } else if (!(flag & (FLAG_SND_UNA_ADVANCED |
>                               FLAG_NOT_DUP | FLAG_DATA_SACKED))) {
>                  /* Pure dupack: original and TLP probe arrived; no loss */
>                  tp->tlp_high_seq = 0;
>          }
> }
That looks much more readable compared to my v2.
It is currently passing our tests (These are in fact MPTCP tests appart 
from Neal's packetdrill that I will add, but actually the MPTCP stack 
happens to reveal this situation quite easily, I think because in MPTCP, 
we store the send queue in the "meta-flow", which currently cannot be 
used for tail loss probes).

As probably everyone will be happy with this (Eric as well ?), I suggest 
I prepare a v3 once all our tests are passed as well, with Yuchung's 
structure and Neal's packetdrill test in the commit text. Will also add 
proper credit as there is now stuff from several people in those few 
lines now :-).

Looks good ?

Thanks again for your fast and helpful interactions !

Sébastien.

>
>>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v2] tcp: avoid reducing cwnd when ACK+DSACK is received
  2015-01-10 11:51           ` Sébastien Barré
@ 2015-01-10 17:37             ` Eric Dumazet
  0 siblings, 0 replies; 19+ messages in thread
From: Eric Dumazet @ 2015-01-10 17:37 UTC (permalink / raw)
  To: Sébastien Barré
  Cc: Yuchung Cheng, Neal Cardwell, David Miller, Netdev,
	Gregory Detal, Nandita Dukkipati

On Sat, 2015-01-10 at 12:51 +0100, Sébastien Barré wrote:

> That looks much more readable compared to my v2.
> It is currently passing our tests (These are in fact MPTCP tests appart 
> from Neal's packetdrill that I will add, but actually the MPTCP stack 
> happens to reveal this situation quite easily, I think because in MPTCP, 
> we store the send queue in the "meta-flow", which currently cannot be 
> used for tail loss probes).
> 
> As probably everyone will be happy with this (Eric as well ?), I suggest 
> I prepare a v3 once all our tests are passed as well, with Yuchung's 
> structure and Neal's packetdrill test in the commit text. Will also add 
> proper credit as there is now stuff from several people in those few 
> lines now :-).
> 
> Looks good ?

Definitely !

Thanks !

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH net-next v2] tcp: avoid reducing cwnd when ACK+DSACK is received
  2015-01-09 19:43         ` Yuchung Cheng
  2015-01-09 20:36           ` Neal Cardwell
  2015-01-10 11:51           ` Sébastien Barré
@ 2015-01-12 11:52           ` David Laight
  2015-01-12 15:02             ` Neal Cardwell
  2 siblings, 1 reply; 19+ messages in thread
From: David Laight @ 2015-01-12 11:52 UTC (permalink / raw)
  To: 'Yuchung Cheng', Eric Dumazet
  Cc: Neal Cardwell, Sébastien Barré,
	David Miller, Netdev, Gregory Detal, Nandita Dukkipati

From: Yuchung Cheng
> Sent: 09 January 2015 19:44
...
> Sebastien: I suggest breaking down by ACK types for readability. e.g.,
> 
> /* This routine deals with acks during a TLP episode.
>  * We mark the end of a TLP episode on receiving TLP dupack or when
>  * ack is after tlp_high_seq.
>  * Ref: loss detection algorithm in draft-dukkipati-tcpm-tcp-loss-probe.
>  */
> static void tcp_process_tlp_ack(struct sock *sk, u32 ack, int flag)
> {
>         struct tcp_sock *tp = tcp_sk(sk);
> 
>         if (before(ack, tp->tlp_high_seq))
>                 return;
> 
>         if (flag & FLAG_DSACKING_ACK) {
>                 /* This DSACK means original and TLP probe arrived; no loss */
>                 tp->tlp_high_seq = 0;

I think I'd add a 'return' here.

>         } else if (after(ack, tp->tlp_high_seq)) {
>                 /* ACK advances: there was a loss, so reduce cwnd. Reset
>                  * tlp_high_seq in tcp_init_cwnd_reduction()
>                  */
>                 tcp_init_cwnd_reduction(sk);
>                 tcp_set_ca_state(sk, TCP_CA_CWR);
>                 tcp_end_cwnd_reduction(sk);
>                 tcp_try_keep_open(sk);
>                 NET_INC_STATS_BH(sock_net(sk),
>                                  LINUX_MIB_TCPLOSSPROBERECOVERY);

and here

>         } else if (!(flag & (FLAG_SND_UNA_ADVANCED |
>                              FLAG_NOT_DUP | FLAG_DATA_SACKED))) {
>                 /* Pure dupack: original and TLP probe arrived; no loss */
>                 tp->tlp_high_seq = 0;
>         }
> }

	David

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v2] tcp: avoid reducing cwnd when ACK+DSACK is received
  2015-01-12 11:52           ` David Laight
@ 2015-01-12 15:02             ` Neal Cardwell
  0 siblings, 0 replies; 19+ messages in thread
From: Neal Cardwell @ 2015-01-12 15:02 UTC (permalink / raw)
  To: David Laight
  Cc: Yuchung Cheng, Eric Dumazet, Sébastien Barré,
	David Miller, Netdev, Gregory Detal, Nandita Dukkipati

On Mon, Jan 12, 2015 at 6:52 AM, David Laight <David.Laight@aculab.com> wrote:
>>         if (flag & FLAG_DSACKING_ACK) {
>>                 /* This DSACK means original and TLP probe arrived; no loss */
>>                 tp->tlp_high_seq = 0;
>
> I think I'd add a 'return' here.

What's the benefit of adding 'return' in those two spots? That adds
extra code to read, with no change in behavior, and no increase in
maintainability.

neal

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2015-01-12 15:02 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-08 12:20 [PATCH net-next v2] tcp: avoid reducing cwnd when ACK+DSACK is received Sébastien Barré
2015-01-08 15:07 ` Eric Dumazet
2015-01-08 15:24   ` Sébastien Barré
2015-01-08 15:43     ` Neal Cardwell
2015-01-08 16:00       ` Sébastien Barré
2015-01-08 15:59     ` Eric Dumazet
2015-01-08 15:19 ` Eric Dumazet
2015-01-08 15:39   ` Sébastien Barré
2015-01-08 15:49     ` Neal Cardwell
2015-01-08 15:52       ` Neal Cardwell
2015-01-08 16:25       ` Eric Dumazet
2015-01-08 17:17         ` Eric Dumazet
2015-01-08 17:27           ` Eric Dumazet
2015-01-09 19:43         ` Yuchung Cheng
2015-01-09 20:36           ` Neal Cardwell
2015-01-10 11:51           ` Sébastien Barré
2015-01-10 17:37             ` Eric Dumazet
2015-01-12 11:52           ` David Laight
2015-01-12 15:02             ` Neal Cardwell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.