A buggy behavior for Linux TCP Reno and HTCP

* A buggy behavior for Linux TCP Reno and HTCP
@ 2017-07-18 21:36 Wei Sun
  2017-07-19 19:31 ` Yuchung Cheng
  0 siblings, 1 reply; 14+ messages in thread
From: Wei Sun @ 2017-07-18 21:36 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 928 bytes --]

Hi there,

We find a buggy behavior when using Linux TCP Reno and HTCP in low
bandwidth or highly congested network environments.

In a simple word, their undo functions may mistakenly double the cwnd,
leading to a more aggressive behavior in a highly congested scenario.

The detailed reason:

The current reno undo function assumes cwnd halving (and thus doubles
the cwnd), but it doesn't consider a corner case condition that
ssthresh is at least 2.

e.g.,
                         cwnd              ssth
An initial state:     2                    5
A spurious loss:   1                    2
Undo:                   4                    5

Here the cwnd after undo is two times as that before undo. Attached is
a simple script to reproduce it.

A similar reason for HTCP, so we recommend to store the cwnd on loss
in .ssthresh implementation and restore it again in .undo_cwnd for TCP
Reno and HTCP implementations.

Thanks

[-- Attachment #2: undo-2-1-4.pkt --]
[-- Type: application/octet-stream, Size: 2102 bytes --]

/***************
A simpe script to trigger the bug
usage:
1. Download packetdrill tool (https://github.com/google/packetdrill/tree/master/gtests/net/packetdrill) and then run this script
2. sudo ./packetdrill undo-2-1-4.pkt --tolerance_usecs=500000

output:
[Before undo] cwnd: 2 ssth: 5
[Loss Detecting] cwnd: 1 ssth: 2
[After undo] cwnd: 5 ssth: 5
*****************/

+0 `sysctl -q net.ipv4.tcp_congestion_control=reno`
+0 `sysctl -q net.ipv4.tcp_sack=0`

// Establish a connection.
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0

0.100 < S 0:0(0) win 42340 <mss 1000,nop,wscale 7>
0.100 > S. 0:0(0) ack 1 <...>
0.110 < . 1:1(0) ack 1 win 257
0.120 accept(3, ..., ...) = 4

// Send 10 MSS.
0.13 write(4, ..., 10000) = 10000

0.13 > . 1:1001(1000) ack 1
0.13 > . 1001:2001(1000) ack 1
0.13 > . 2001:3001(1000) ack 1
0.13 > . 3001:4001(1000) ack 1
0.13 > . 4001:5001(1000) ack 1
0.13 > . 5001:6001(1000) ack 1
0.13 > . 6001:7001(1000) ack 1
0.13 > . 7001:8001(1000) ack 1
0.13 > . 8001:9001(1000) ack 1
0.13 > P. 9001:10001(1000) ack 1

0.4 > . 1:1001(1000) ack 1
0.7 > . 1:1001(1000) ack 1

0.9 < . 1:1(0) ack 1001 win 257
0.9 %{print "[Before undo] cwnd:", tcpi_snd_cwnd, "ssth:", tcpi_snd_ssthresh}%
1.0 > . 1001:2001(1000) ack 1
1.0 > . 2001:3001(1000) ack 1

// Get 3 dupacks.
1.300 < . 1:1(0) ack 1 win 257 <sack 2001:3001,nop,nop>
1.300 < . 1:1(0) ack 1 win 257 <sack 2001:4001,nop,nop>
1.300 < . 1:1(0) ack 1 win 257 <sack 2001:5001,nop,nop>

// We've received 3 duplicate ACKs, so we do a fast retransmit.
1.400 > . 1001:2001(1000) ack 1

1.4 %{print "[Loss Detecting] cwnd:", tcpi_snd_cwnd, "ssth:", tcpi_snd_ssthresh}%
// Apparently just reordering. Retransmit was spurious.
// Original ACKs for sequence ranges up to 10001 are all lost.
// Receiver sends DSACK for retransmitted packet.
1.4 < . 1:1(0) ack 5001 win 257 <sack 1001:2001,nop,nop>
1.401 %{print "[After undo] cwnd:", tcpi_snd_cwnd, "ssth:", tcpi_snd_ssthresh}%

+0 `sysctl -q net.ipv4.tcp_congestion_control=cubic`

^ permalink raw reply	[flat|nested] 14+ messages in thread