All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yuchung Cheng <ycheng@google.com>
To: Michal Kubecek <mkubecek@suse.cz>
Cc: netdev <netdev@vger.kernel.org>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Neal Cardwell <ncardwell@google.com>,
	Kenneth Klette Jonassen <kennetkl@ifi.uio.no>
Subject: Re: TCP one-by-one acking - RFC interpretation question
Date: Thu, 12 Apr 2018 09:20:31 -0700	[thread overview]
Message-ID: <CAK6E8=cWfjLh_sUSi2tVm8v-iWd8KtPASkC6kgPnKTN+4Mdywg@mail.gmail.com> (raw)
In-Reply-To: <20180411120655.aib34ye5xnch4zw3@unicorn.suse.cz>

On Wed, Apr 11, 2018 at 5:06 AM, Michal Kubecek <mkubecek@suse.cz> wrote:
> On Wed, Apr 11, 2018 at 12:58:37PM +0200, Michal Kubecek wrote:
>> There is something else I don't understand, though. In the case of
>> acking previously sacked and never retransmitted segment,
>> tcp_clean_rtx_queue() calculates the parameters for tcp_ack_update_rtt()
>> using
>>
>>         if (sack->first_sackt.v64) {
>>                 sack_rtt_us = skb_mstamp_us_delta(&now,
>> &sack->first_sackt);
>>                 ca_rtt_us = skb_mstamp_us_delta(&now,
>> &sack->last_sackt);
>>         }
>>
>> (in 4.4; mainline code replaces &now with tp->tcp_mstamp). If I read the
>> code correctly, both sack->first_sackt and sack->last_sackt contain
>> timestamps of initial segment transmission. This would mean we use the
>> time difference between the initial transmission and now, i.e. including
>> the RTO of the lost packet).
>>
>> IMHO we should take the actual round trip time instead, i.e. the
>> difference between the original transmission and the time the packet
>> sacked (first time). It seems we have been doing this before commit
>> 31231a8a8730 ("tcp: improve RTT from SACK for CC").
>
> Sorry for the noise, this was my misunderstanding, the first_sackt and
> last_sackt values are only taken from segments newly sacked by ack
> received right now, not those which were already sacked before.
>
> The actual problem and unrealistic RTT measurements come from another
> RFC violation I didn't mention before: the NAS doesn't follow RFC 2018
> section 4 rule for ordering of SACK blocks. Rather than sending SACK
> blocks three most recently received out-of-order blocks, it simply sends
> first three ordered by sequence numbers. In the earlier example (odd
> packets were received, even lost)
>
>        ACK             SAK             SAK             SAK
>     +-------+-------+-------+-------+-------+-------+-------+-------+-------+
>     |   1   |   2   |   3   |   4   |   5   |   6   |   7   |   8   |   9   |
>     +-------+-------+-------+-------+-------+-------+-------+-------+-------+
>   34273   35701   37129   38557   39985   41413   42841   44269   45697   47125
>
> it responds to retransmitted segment 2 by
>
>   1. ACK 37129, SACK 37129-38557 39985-41413 42841-44269
>   2. ACK 38557, SACK 39985-41413 42841-44269 45697-47125
>
> This new SACK block 45697-47125 has not been retransmitted and as it
> wasn't sacked before, it is considered newly sacked. Therefore it gets
> processed and its deemed RTT (time since its original transmit time)
> "poisons" the RTT calculation, leading to RTO spiraling up.
>
> Thus if we want to work around the NAS behaviour, we would need to
> recognize such new SACK block as "not really new" and ignore it for
> first_sackt/last_sackt. I'm not sure if it's possible without
> misinterpreting actually delayed out of order packets. Of course, it is
> not clear if it's worth the effort to work around so severely broken TCP
> implementations (two obvious RFC violations, even if we don't count the
> one-by-one acking).
Right. Not much we (sender) can do if the receiver is not reporting
the delivery status correctly. This also negatively impacts TCP
congestion control (Cubic, Reno, BBR, CDG etc) because we've changed
it to increase/decrease cwnd based on both inorder and out-of-order
delivery.

We're close to publish our internal packetdrill tests. Hopefully they
can be used to test these poor implementations.

>
> Michal Kubecek

      reply	other threads:[~2018-04-12 16:21 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-06 10:05 TCP one-by-one acking - RFC interpretation question Michal Kubecek
2018-04-06 12:01 ` Eric Dumazet
2018-04-06 15:03   ` Michal Kubecek
2018-04-06 16:49     ` Eric Dumazet
2018-04-11 10:58       ` Michal Kubecek
2018-04-11 12:06         ` Michal Kubecek
2018-04-12 16:20           ` Yuchung Cheng [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAK6E8=cWfjLh_sUSi2tVm8v-iWd8KtPASkC6kgPnKTN+4Mdywg@mail.gmail.com' \
    --to=ycheng@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=kennetkl@ifi.uio.no \
    --cc=mkubecek@suse.cz \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.