linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Kubecek <mkubecek@suse.cz>
To: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>
Cc: Yuchung Cheng <ycheng@google.com>,
	netdev <netdev@vger.kernel.org>,
	Eric Dumazet <edumazet@google.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH RESEND] tcp: avoid F-RTO if SACK and timestamps are disabled
Date: Fri, 15 Jun 2018 11:27:53 +0200	[thread overview]
Message-ID: <20180615092753.lmxqh65moc33rzbq@unicorn.suse.cz> (raw)
In-Reply-To: <alpine.DEB.2.20.1806151048000.29120@whs-18.cs.helsinki.fi>

On Fri, Jun 15, 2018 at 11:05:03AM +0300, Ilpo Järvinen wrote:
> On Thu, 14 Jun 2018, Michal Kubecek wrote:
> > The trace wouldn't look so nice but it can be reproduced even with more
> > data to send. I've copied an example below. I couldn't find a really
> > nice one quickly so that first few retransmits (17:22:13.865105 through
> > 17:23:05.841105) are without new data but starting at 17:23:58.189150,
> > you can see that sending new (previously unsent) data may not suffice to
> > break the loop.
> 
> My point was that the new data segment bursts that occur if the sender 
> isn't application limited indicate that there's something going wrong
> with FRTO. And that wrong is also what is causing that RTO loop because
> the sender doesn't see the previous FRTO recovery on second RTO. With 
> my FRTO undo fix, (new_recovery || icsk->icsk_retransmits) will be false
> and that will prevent the RTO loop.

Yes, it would prevent the loop in this case (except it would be a bit
later, after second RTO rather than after first). But I'm not convinced
the logic of the patch is correct. If I understand it correctly, it
essentially changes "presumption of innocence" (if we get an ack past
what we retransmitted, we assume it was received earlier - i.e. would
have been sacked before if SACK was in use) to "presumption of guilt"
(whenever a retransmitted segment is acked, we assume nothing else acked
with it was received earlier). Or that you trade false negatives for
false positives.

Maybe I understand it wrong but it seems that you de facto prevent
Step (3b) from ever happening in non-SACK case.

> > > No! The window should not update window on ACKs the receiver intends to 
> > > designate as "duplicate ACKs". That is not without some potential cost 
> > > though as it requires delaying window updates up to the next cumulative 
> > > ACK. In the non-SACK series one of the changes is fixing this for
> > > non-SACK Linux TCP flows.
> > 
> > That sounds like a reasonable change (at least at the first glance,
> > I didn't think about it too deeply) but even if we fix Linux stack to
> > behave like this, we cannot force everyone else to do the same.
> 
> Unfortunately I don't know what the other stacks besides Linux do. But 
> for Linux, the cause for the changing receiver window is the receiver 
> window auto-tuning and I'm not sure if other stacks have a similar 
> feature (or if that affects (almost) all ACKs like in Linux).

The capture from my previous e-mail and some others I have seen indicate
that at least some implementations do not take care to never change
window size when responding to an out-of-order segment.  That means that
even if we change linux TCP this way (we might still need to send
a separate window update in some cases), we still cannot rely on others
doing the same.

I checked the capture attached to my previous e-mail again and there is
one thing where our F-RTO implementation (in 4.4, at least) is wrong,
IMHO. While the first ACK after "new data" (sent in (2b)) was a window
update (and therefore not dupack by definition) so that we could take
neither (3a) nor (3b), in some iterations there were further acks which
did not change window size. The text just before Step 1 says

  The F-RTO algorithm does not specify actions for receiving
  a segment that neither acknowledges new data nor is a duplicate
  acknowledgment.  The TCP sender SHOULD ignore such segments and
  wait for a segment that either acknowledges new data or is
  a duplicate acknowledgment.

My understanding is that this means that while the first ack after new
data is correctly ignored, the following ack which preserves window size
should be recognized as a dupack and (3a) should be taken.

Michal Kubecek

  reply	other threads:[~2018-06-15  9:28 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20180613164802.99B89A09E2@unicorn.suse.cz>
2018-06-13 16:55 ` [RFC PATCH RESEND] tcp: avoid F-RTO if SACK and timestamps are disabled Michal Kubecek
2018-06-13 16:57   ` Michal Kubecek
2018-06-14 10:18     ` Ilpo Järvinen
2018-06-13 17:32   ` Yuchung Cheng
2018-06-13 17:48     ` Eric Dumazet
2018-06-14  8:42     ` Ilpo Järvinen
2018-06-14  9:34       ` Michal Kubecek
2018-06-14 11:51         ` Ilpo Järvinen
2018-06-14 13:18           ` Michal Kubecek
2018-06-15  8:05             ` Ilpo Järvinen
2018-06-15  9:27               ` Michal Kubecek [this message]
2018-06-15 10:35                 ` Ilpo Järvinen
2018-06-27 23:56                   ` Yuchung Cheng
2018-06-29 10:17                     ` Ilpo Järvinen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180615092753.lmxqh65moc33rzbq@unicorn.suse.cz \
    --to=mkubecek@suse.cz \
    --cc=edumazet@google.com \
    --cc=ilpo.jarvinen@helsinki.fi \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=ycheng@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).