netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Neal Cardwell <ncardwell@google.com>
To: Gil Pedersen <kanongil@gmail.com>
Cc: David Miller <davem@davemloft.net>,
	Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
	dsahern@kernel.org, Netdev <netdev@vger.kernel.org>,
	Yuchung Cheng <ycheng@google.com>,
	Eric Dumazet <edumazet@google.com>
Subject: Re: TCP stall issue
Date: Wed, 24 Feb 2021 09:55:27 -0500	[thread overview]
Message-ID: <CADVnQynP40vvvTV3VY0fvYwEcSGQ=Y=F53FU8sEc-Bc=mzij5g@mail.gmail.com> (raw)
In-Reply-To: <C5332AE4-DFAF-4127-91D1-A9108877507A@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1469 bytes --]

On Wed, Feb 24, 2021 at 5:03 AM Gil Pedersen <kanongil@gmail.com> wrote:
> Sure, I attached a trace from the server that should illustrate the issue.
>
> The trace is cut from a longer flow with the server at 188.120.85.11 and a client window scaling factor of 256.
>
> Packet 78 is a TLP, followed by a delayed DUPACK with a SACK from the client.
> The SACK triggers a single segment fast re-transmit with an ignored?? D-SACK in packet 81.
> The first RTO happens at packet 82.

Thanks for the trace! That is very helpful. I have attached a plot and
my notes on the trace, for discussion.

AFAICT the client appears to be badly misbehaving, and misrepresenting
what has happened.  At each point where the client sends a DSACK,
there is an apparent contradiction. Either the client has received
that data before, or it hasn't. If the client *has* already received
that data, then it should have already cumulatively ACKed it. If the
client has *not* already received that data, then it shouldn't send a
DSACK for it.

Given that, from the server's perspective, the client is
misbehaving/lying, it's not clear what inferences the server can
safely make. Though I agree it's probably possible to do much better
than the current server behavior.

A few questions.

(a) is there a middlebox (firewall, NAT, etc) in the path?

(b) is it possible to capture a client-side trace, to help
disambiguate whether there is a client-side Linux bug or a middlebox
bug?

thanks,
neal

[-- Attachment #2: slow-recovery.txt --]
[-- Type: text/plain, Size: 3821 bytes --]

# server finishes sending a flight of data:
04:23:49.383419 IP 51.91.154.158.443 > 188.120.85.11.55038: . 358577:365817(7240) ack 395 win 182 <nop,nop,TS val 1514702387 ecr 72958509>
04:23:49.384558 IP 51.91.154.158.443 > 188.120.85.11.55038: . 365817:373057(7240) ack 395 win 182 <nop,nop,TS val 1514702388 ecr 72958509>
04:23:49.385693 IP 51.91.154.158.443 > 188.120.85.11.55038: . 373057:380297(7240) ack 395 win 182 <nop,nop,TS val 1514702389 ecr 72958509>
04:23:49.386798 IP 51.91.154.158.443 > 188.120.85.11.55038: . 380297:387537(7240) ack 395 win 182 <nop,nop,TS val 1514702390 ecr 72958509>
04:23:49.387903 IP 51.91.154.158.443 > 188.120.85.11.55038: . 387537:394777(7240) ack 395 win 182 <nop,nop,TS val 1514702391 ecr 72958509>
04:23:49.389012 IP 51.91.154.158.443 > 188.120.85.11.55038: . 394777:402017(7240) ack 395 win 182 <nop,nop,TS val 1514702392 ecr 72958509>
04:23:49.390117 IP 51.91.154.158.443 > 188.120.85.11.55038: P. 402017:406316(4299) ack 395 win 182 <nop,nop,TS val 1514702393 ecr 72958509>

# client ACKs the first portion of the flight:
04:23:49.492714 IP 188.120.85.11.55038 > 51.91.154.158.443: . 395:395(0) ack 3111 win 2087 <nop,nop,TS val 72958562 ecr 1514702331>
04:23:49.508450 IP 188.120.85.11.55038 > 51.91.154.158.443: . 395:395(0) ack 189161 win 1540 <nop,nop,TS val 72958563 ecr 1514702332>

# client sends a request:
04:23:49.543869 IP 188.120.85.11.55038 > 51.91.154.158.443: P. 395:498(103) ack 189161 win 1770 <nop,nop,TS val 72958567 ecr 1514702332>
# server ACKs request:
04:23:49.543928 IP 51.91.154.158.443 > 188.120.85.11.55038: . 406316:406316(0) ack 498 win 182 <nop,nop,TS val 1514702547 ecr 72958567>

# client sends another request:
04:23:49.565850 IP 188.120.85.11.55038 > 51.91.154.158.443: P. 498:705(207) ack 189161 win 2087 <nop,nop,TS val 72958569 ecr 1514702332>
# server ACKs request:
04:23:49.565888 IP 51.91.154.158.443 > 188.120.85.11.55038: . 406316:406316(0) ack 705 win 182 <nop,nop,TS val 1514702569 ecr 72958569>

# server sends TLP retransmit:
04:23:49.824734 IP 51.91.154.158.443 > 188.120.85.11.55038: P. 404913:406316(1403) ack 705 win 182 <nop,nop,TS val 1514702828 ecr 72958569>
# client SACKs TLP retransmit:
04:23:49.846128 IP 188.120.85.11.55038 > 51.91.154.158.443: . 705:705(0) ack 189161 win 2087 <nop,nop,TS val 72958597 ecr 1514702332,nop,nop,sack 1 {404913:406316}>

# server fast-retransmits head packet at snd_una:
04:23:49.846141 IP 51.91.154.158.443 > 188.120.85.11.55038: . 189161:190609(1448) ack 705 win 182 <nop,nop,TS val 1514702849 ecr 72958597>
# 21ms later, client cumulatively ACKs, DSACKs, and echoes timestamp from the fast retransmit:
04:23:49.867017 IP 188.120.85.11.55038 > 51.91.154.158.443: . 705:705(0) ack 190609 win 2082 <nop,nop,TS val 72958599 ecr 1514702849,nop,nop,sack 2 {189161:190609}{404913:406316}>

# server RTOs and retransmits head packet at snd_una:
04:23:50.240699 IP 51.91.154.158.443 > 188.120.85.11.55038: . 190609:192057(1448) ack 705 win 182 <nop,nop,TS val 1514703244 ecr 72958599>
# 478ms later, client cumulatively ACKs, DSACKs, and echoes timestamp from the RTO retansmit:
04:23:50.718823 IP 188.120.85.11.55038 > 51.91.154.158.443: . 705:705(0) ack 192057 win 2080 <nop,nop,TS val 72958684 ecr 1514703244,nop,nop,sack 2 {190609:192057}{404913:406316}>

# again, server RTOs and retransmits head packet at snd_una:
04:23:51.328726 IP 51.91.154.158.443 > 188.120.85.11.55038: . 192057:193505(1448) ack 705 win 182 <nop,nop,TS val 1514704332 ecr 72958684>
# 620ms later, client again sends cumulative-ACK/DSACK/ecr...
04:23:51.949296 IP 188.120.85.11.55038 > 51.91.154.158.443: . 705:705(0) ack 193505 win 2080 <nop,nop,TS val 72958807 ecr 1514704332,nop,nop,sack 2 {192057:193505}{404913:406316}>

# repeat RTO/DSACK process, with ACKs arriving mostly just over 200ms after data is sent...

[-- Attachment #3: slow-recovery-time-seq-plot.png --]
[-- Type: image/png, Size: 38360 bytes --]

  reply	other threads:[~2021-02-24 15:25 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-23 10:09 TCP stall issue Gil Pedersen
2021-02-23 15:41 ` Neal Cardwell
2021-02-24 10:03   ` Gil Pedersen
2021-02-24 14:55     ` Neal Cardwell [this message]
2021-02-24 15:36       ` Gil Pedersen
2021-02-25 15:05         ` Neal Cardwell
2021-02-26 14:39           ` David Laight
2021-02-26 16:26             ` Gil Pedersen
2021-02-26 17:50               ` Neal Cardwell
2021-02-26 21:59                 ` Maciej Żenczykowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CADVnQynP40vvvTV3VY0fvYwEcSGQ=Y=F53FU8sEc-Bc=mzij5g@mail.gmail.com' \
    --to=ncardwell@google.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=kanongil@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=ycheng@google.com \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).