netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Gil Pedersen <kanongil@gmail.com>
To: davem@davemloft.net, yoshfuji@linux-ipv6.org, dsahern@kernel.org
Cc: netdev@vger.kernel.org
Subject: TCP stall issue
Date: Tue, 23 Feb 2021 11:09:19 +0100	[thread overview]
Message-ID: <35A4DDAA-7E8D-43CB-A1F5-D1E46A4ED42E@gmail.com> (raw)

Hi,

I am investigating a TCP stall that can occur when sending to an Android device (kernel 4.9.148) from an Ubuntu server running kernel 5.11.0.

The issue seems to be that RACK is not applied when a D-SACK (with SACK) is received on the server after an RTO re-transmission (CA_Loss state). Here the re-transmitted segment is considered to be already delivered and loss undo logic is applied. Then nothing is re-transmitted until the next RTO, where the next segment is sent and the same thing happens again. The causes the retransmitted segments to be delivered at a rate of ~1 per second, so a burst loss of eg. 20 segments cause a 20+ second stall. I would expect RACK to kick in long before this happens.

Note the D-SACK should not be considered spurious, as the TSecr value matches the re-transmission TSval.

Also, the Android receiver is definitely sending strange D-SACKs that does not properly advance the ACK number to include received segments. However, I can't control it and need to fix it on the server by quickly re-transmitting the segments. The connection itself is functional. If the client makes a request to the server in this state, it can respond and the client will receive any segments sent in reply.

I can see from counters that TcpExtTCPLossUndo & TcpExtTCPSackFailures are incremented on the server when this happens.
The issue appears both with F-RTO enabled and disabled. Also appears both with BBR and RENO.

Any idea of why this happens, or suggestions on how to debug the issue further?

/Gil

             reply	other threads:[~2021-02-23 10:11 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-23 10:09 Gil Pedersen [this message]
2021-02-23 15:41 ` TCP stall issue Neal Cardwell
2021-02-24 10:03   ` Gil Pedersen
2021-02-24 14:55     ` Neal Cardwell
2021-02-24 15:36       ` Gil Pedersen
2021-02-25 15:05         ` Neal Cardwell
2021-02-26 14:39           ` David Laight
2021-02-26 16:26             ` Gil Pedersen
2021-02-26 17:50               ` Neal Cardwell
2021-02-26 21:59                 ` Maciej Żenczykowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=35A4DDAA-7E8D-43CB-A1F5-D1E46A4ED42E@gmail.com \
    --to=kanongil@gmail.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).