TCP rx window autotuning harmful at LAN context

* TCP rx window autotuning harmful at LAN context
@ 2009-03-09 11:25 Marian Ďurkovič
  2009-03-09 18:01 ` John Heffner
  2009-03-11  9:02 ` Rémi Denis-Courmont
  0 siblings, 2 replies; 30+ messages in thread
From: Marian Ďurkovič @ 2009-03-09 11:25 UTC (permalink / raw)
  To: netdev

Hi all,

  based on multiple user complaints about poor LAN performance with 
TCP window autotuning on receiver side we conducted several tests at
our university to verify whether these complaints are valid. Unfortunately,
our results confirmed, that the present implementation indeed behaves 
erratically in LAN context and causes serious harm to LAN operation.

  The behaviour could be descibed as "spiraling death" syndrome. While
TCP with constant and decently sized rx window natively reduces transmission
rate when RTT increases, autotuning performs exactly the opposite - as a
response to increased RTT it increases the rx window size (which in turn
again increases RTT...) As this happens again and again, the result is
complete waste of all available buffers at sending host or at the bottleneck
point, resulting in upto 267 msec (!) latency in LAN context (with 100 Mbps
ethernet connection, default txqueuelen=1000, MTU=1500 and sky2 driver).  
Needles to say that this means the LAN is almost unusable.

   With autotuning disabled, the same situation results in just 5 msec
latency and still full 100 Mpbs link utilization, since with 64 kB rx window
the TCP transmission is solely controlled by RTT without ever going into
congestion avoidance mode since there are no packet drops.

   As rx window autotuning is enabled in all recent kernels and with 1 GB
of RAM the maximum tcp_rmem becomes 4 MB, this problem is spreading rapidly
and we believe it needs urgent attention. As demontrated above, such huge
rx window (which is at least 100*BDP of the example above) does not deliver
any performance gain but instead it seriously harms other hosts and/or
applications. It should also be noted, that host with autotuning enabled
steals an unfair share of the total available bandwidth, which might look
like a "better" performing TCP stack at first sight - however such behaviour
is not appropriate (RFC2914, section 3.2).

   The possible solution to the above problem could be e.g. to limit RTT
measuring only to the initial phase of the TCP connection and computing
the BDP from this value for the whole lifetime of the connection. With
such modification, increases in RTT due to buffering at the bottleneck 
point will again reduce transmission rate and the "spiraling death" 
syndrome will no longer exist.

    Thanks kind regards,

--------------------------------------------------------------------------
----                                                                  ----
----   Marian Ďurkovič                       network  manager         ----
----                                                                  ----
----   Slovak Technical University           Tel: +421 2 571 041 81   ----
----   Computer Centre, Nám. Slobody 17      Fax: +421 2 524 94 351   ----
----   812 43 Bratislava, Slovak Republic    E-mail/sip: md@bts.sk    ----
----                                                                  ----
--------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 30+ messages in thread