From mboxrd@z Thu Jan  1 00:00:00 1970
From: Neal Cardwell <ncardwell@google.com>
Subject: Re: Linux ECN Handling
Date: Tue, 21 Nov 2017 10:01:40 -0500
Message-ID: <CADVnQykrF3JY9seYfaV4cSrdg7mHorvyCD12n8N1HAxnXXGrJA@mail.gmail.com>
References: <CACJspmLFdy9i8K=TkzXHnofyFMNoJf9HkYE7On8uG+PREc2Dqw@mail.gmail.com>
 <20171019124312.GE16796@breakpoint.cc> <CACJspmKjAr+q9cFVssXVxWQMCUWe3TNYO77m0nQwzQK4hTCOzA@mail.gmail.com>
 <CADVnQymmMeo=cemUPQCzHpNvFPKCaqY1+VPchmui+HWuCL7CuA@mail.gmail.com>
 <5A006CF6.1020608@iogearbox.net> <CACJspmL3SKhsAa9_2uEFNGADd0Q3e8S1hrgRAmvXjbnuevasyg@mail.gmail.com>
 <CACJspmLJWE0Ld1ZW_PgKoyeNgbDumL-P3mWAN-8WZRM9fGVvFg@mail.gmail.com>
 <CADVnQy=q4qBpe0Ymo8dtFTYU_0x0q_XKE+ZvazLQE-ULwu7pQA@mail.gmail.com> <CACJspm+SHCS=RV0QwKGwY0qGvp4nQKCRFMwmRQdQr2OY5ka9YQ@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Cc: Daniel Borkmann <daniel@iogearbox.net>,
        Netdev <netdev@vger.kernel.org>, Florian Westphal <fw@strlen.de>,
        Mohammad Alizadeh <alizadeh@csail.mit.edu>,
        Lawrence Brakmo <brakmo@fb.com>,
        Yuchung Cheng <ycheng@google.com>,
        Eric Dumazet <edumazet@google.com>
To: Steve Ibanez <sibanez@stanford.edu>
Return-path: <netdev-owner@vger.kernel.org>
Received: from mail-wr0-f181.google.com ([209.85.128.181]:36644 "EHLO
        mail-wr0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1751226AbdKUPCE (ORCPT
        <rfc822;netdev@vger.kernel.org>); Tue, 21 Nov 2017 10:02:04 -0500
Received: by mail-wr0-f181.google.com with SMTP id y42so11594404wrd.3
        for <netdev@vger.kernel.org>; Tue, 21 Nov 2017 07:02:03 -0800 (PST)
In-Reply-To: <CACJspm+SHCS=RV0QwKGwY0qGvp4nQKCRFMwmRQdQr2OY5ka9YQ@mail.gmail.com>
Sender: netdev-owner@vger.kernel.org
List-ID: <netdev.vger.kernel.org>

On Tue, Nov 21, 2017 at 12:58 AM, Steve Ibanez <sibanez@stanford.edu> wrote:
> Hi Neal,
>
> I tried your suggestion to disable tcp_tso_should_defer() and it does
> indeed look like it is preventing the host from entering timeouts.
> I'll have to do a bit more digging to try and find where the packets
> are being dropped. I've verified that the bottleneck link queue is
> capacity is at about the configured marking threshold when the timeout
> occurs, so the drops may be happening at the NIC interfaces or perhaps
> somewhere unexpected in the switch.

Great! Thanks for running that test.

> I wonder if you can explain why the TLP doesn't fire when in the CWR
> state? It seems like that might be worth having for cases like this.

The original motivation for only allowing TLP in the CA_Open state was
to be conservative and avoid having the TLP impose extra load on the
bottleneck when it may be congested. Plus if there are any SACKed
packets in the SACK scoreboard then there are other existing
mechanisms to do speedy loss recovery.

But at various times we have talked about expanding the set of
scenarios where TLP is used. And I think this example demonstrates
that there is a class of real-world cases where it probably makes
sense to allow TLP in the CWR state.

If you have time, would you be able to check if leaving
tcp_tso_should_defer () as-is but enabling TLP probes in CWR state
also fixes your performance issue? Perhaps something like
(uncompiled/untested):

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 4ea79b2ad82e..deccf8070f84 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2536,11 +2536,11 @@ bool tcp_schedule_loss_probe(struct sock *sk,
bool advancing_rto)

        early_retrans = sock_net(sk)->ipv4.sysctl_tcp_early_retrans;
        /* Schedule a loss probe in 2*RTT for SACK capable connections
-        * in Open state, that are either limited by cwnd or application.
+        * not in loss recovery, that are either limited by cwnd or application.
         */
        if ((early_retrans != 3 && early_retrans != 4) ||
            !tp->packets_out || !tcp_is_sack(tp) ||
-           icsk->icsk_ca_state != TCP_CA_Open)
+           icsk->icsk_ca_state >= TCP_CA_Recovery)
                return false;

        if ((tp->snd_cwnd > tcp_packets_in_flight(tp)) &&

> Btw, thank you very much for all the help! It is greatly appreciated :)

You are very welcome! :-)

cheers,
neal