From mboxrd@z Thu Jan 1 00:00:00 1970 From: Neal Cardwell Subject: Re: Linux ECN Handling Date: Mon, 23 Oct 2017 21:11:38 -0400 Message-ID: References: <20171019124312.GE16796@breakpoint.cc> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Cc: Netdev , Florian Westphal , Mohammad Alizadeh To: Steve Ibanez Return-path: Received: from mail-wr0-f179.google.com ([209.85.128.179]:51743 "EHLO mail-wr0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751123AbdJXBMA (ORCPT ); Mon, 23 Oct 2017 21:12:00 -0400 Received: by mail-wr0-f179.google.com with SMTP id j15so6943737wre.8 for ; Mon, 23 Oct 2017 18:12:00 -0700 (PDT) In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: On Mon, Oct 23, 2017 at 6:15 PM, Steve Ibanez wrote: > Hi All, > > I upgraded the kernel on all of our machines to Linux > 4.13.8-041308-lowlatency. However, I'm still observing the same > behavior where the source enters a timeout when the CWND=1MSS and it > receives ECN marks. > > Here are the measured flow rates: > > > Here are snapshots of the packet traces at the sources when they both > enter a timeout at t=1.6sec: > > 10.0.0.1 timeout event: > > > 10.0.0.3 timeout event: > > > Both still essentially follow the same sequence of events that I > mentioned earlier: > (1) receives an ACK for byte XYZ with the ECN flag set > (2) stops sending for RTO_min=300ms > (3) sends a retransmission for byte XYZ > > The cwnd samples reported by tcp_probe still indicate that the sources > are reacting to the ECN marks more than once per window. Here are the > cwnd samples at the same timeout event mentioned above: > > > Let me know if there is anything else you think I should try. Sounds like perhaps cwnd is being set to 0 somewhere in this DCTCP scenario. Would you be able to add printk statements in tcp_init_cwnd_reduction(), tcp_cwnd_reduction(), and tcp_end_cwnd_reduction(), printing the IP:port, tp->snd_cwnd, and tp->snd_ssthresh? Based on the output you may be able to figure out where cwnd is being set to zero. If not, could you please post the printk output and tcpdump traces (.pcap, headers-only is fine) from your tests? thanks, neal