From mboxrd@z Thu Jan 1 00:00:00 1970 From: Neal Cardwell Subject: Re: Linux ECN Handling Date: Tue, 21 Nov 2017 10:01:40 -0500 Message-ID: References: <20171019124312.GE16796@breakpoint.cc> <5A006CF6.1020608@iogearbox.net> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Cc: Daniel Borkmann , Netdev , Florian Westphal , Mohammad Alizadeh , Lawrence Brakmo , Yuchung Cheng , Eric Dumazet To: Steve Ibanez Return-path: Received: from mail-wr0-f181.google.com ([209.85.128.181]:36644 "EHLO mail-wr0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751226AbdKUPCE (ORCPT ); Tue, 21 Nov 2017 10:02:04 -0500 Received: by mail-wr0-f181.google.com with SMTP id y42so11594404wrd.3 for ; Tue, 21 Nov 2017 07:02:03 -0800 (PST) In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: On Tue, Nov 21, 2017 at 12:58 AM, Steve Ibanez wrote: > Hi Neal, > > I tried your suggestion to disable tcp_tso_should_defer() and it does > indeed look like it is preventing the host from entering timeouts. > I'll have to do a bit more digging to try and find where the packets > are being dropped. I've verified that the bottleneck link queue is > capacity is at about the configured marking threshold when the timeout > occurs, so the drops may be happening at the NIC interfaces or perhaps > somewhere unexpected in the switch. Great! Thanks for running that test. > I wonder if you can explain why the TLP doesn't fire when in the CWR > state? It seems like that might be worth having for cases like this. The original motivation for only allowing TLP in the CA_Open state was to be conservative and avoid having the TLP impose extra load on the bottleneck when it may be congested. Plus if there are any SACKed packets in the SACK scoreboard then there are other existing mechanisms to do speedy loss recovery. But at various times we have talked about expanding the set of scenarios where TLP is used. And I think this example demonstrates that there is a class of real-world cases where it probably makes sense to allow TLP in the CWR state. If you have time, would you be able to check if leaving tcp_tso_should_defer () as-is but enabling TLP probes in CWR state also fixes your performance issue? Perhaps something like (uncompiled/untested): diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 4ea79b2ad82e..deccf8070f84 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2536,11 +2536,11 @@ bool tcp_schedule_loss_probe(struct sock *sk, bool advancing_rto) early_retrans = sock_net(sk)->ipv4.sysctl_tcp_early_retrans; /* Schedule a loss probe in 2*RTT for SACK capable connections - * in Open state, that are either limited by cwnd or application. + * not in loss recovery, that are either limited by cwnd or application. */ if ((early_retrans != 3 && early_retrans != 4) || !tp->packets_out || !tcp_is_sack(tp) || - icsk->icsk_ca_state != TCP_CA_Open) + icsk->icsk_ca_state >= TCP_CA_Recovery) return false; if ((tp->snd_cwnd > tcp_packets_in_flight(tp)) && > Btw, thank you very much for all the help! It is greatly appreciated :) You are very welcome! :-) cheers, neal