From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: Linux ECN Handling Date: Mon, 20 Nov 2017 07:40:52 -0800 Message-ID: References: <20171019124312.GE16796@breakpoint.cc> <5A006CF6.1020608@iogearbox.net> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Cc: Steve Ibanez , Daniel Borkmann , Netdev , Florian Westphal , Mohammad Alizadeh , Lawrence Brakmo , Yuchung Cheng To: Neal Cardwell Return-path: Received: from mail-wm0-f46.google.com ([74.125.82.46]:47068 "EHLO mail-wm0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751312AbdKTPkz (ORCPT ); Mon, 20 Nov 2017 10:40:55 -0500 Received: by mail-wm0-f46.google.com with SMTP id u83so11240214wmb.5 for ; Mon, 20 Nov 2017 07:40:54 -0800 (PST) In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: On Mon, Nov 20, 2017 at 7:01 AM, Neal Cardwell wrote: > Going back to one of your Oct 19 trace snapshots (attached), AFAICT at the > time of the timeout there is actually almost 64KBytes (352553398 + 1448 - > 352489686 = 65160) of unacknowledged data. So there really does seem to be a > significant chunk of packets that were in-flight that were then declared > lost. > > So here is a possibility: perhaps the combination of CWR+PRR plus > tcp_tso_should_defer() means that PRR can make cwnd so gentle that > tcp_tso_should_defer() thinks we should wait for another ACK to send, and > that ACK doesn't come. Breaking it, down, the potential sequence would be: > > (1) tcp_write_xmit() does not send, because the CWR behavior, using PRR, > does not leave enough cwnd for tcp_tso_should_defer() to think we should > send (PRR was originally designed for recovery, which did not have TSO > deferral) > > (2) TLP does not fire, because we are in state CWR, not Open > > (3) The only remaining option is an RTO, which fires. > > In other words, the possibility is that, at the time of the stall, the cwnd > is reasonably high, but tcp_packets_in_flight() is also quite high, so > either there is (a) literally no unused cwnd left ( tcp_packets_in_flight() > == cwnd), or (b) some mechanism like tcp_tso_should_defer() is deciding that > there is not enough available cwnd for it to make sense to chop off a > fraction of a TSO skb to send now. > > One way to test that conjecture would be to disable tcp_tso_should_defer() > by adding a: > > goto send_now; > > at the top of tcp_tso_should_defer(). > > If that doesn't prevent the freezes then I would recommend adding printks or > other instrumentation to tcp_write_xmit() to log: > > - time > - ca_state > - cwnd > - ssthresh > - tcp_packets_in_flight() > - the reason for breaking out of the tcp_write_xmit() loop (tso deferral, no > packets left, tcp_snd_wnd_test, tcp_nagle_test, etc) > > cheers, > neal > > > > On Mon, Nov 20, 2017 at 2:31 AM, Steve Ibanez wrote: >> >> Hi Folks, >> >> I wanted to check back in on this for another update and to solicit >> some more suggestions. I did a bit more digging to try an isolate the >> problem. >> >> As I explained earlier, the log generated by tcp_probe indicates that >> the snd_cwnd is set to 1 just before the end host receives an ECN >> marked ACK and unexpectedly enters a timeout ( >> https://drive.google.com/open?id=1iyt8PvBxQga2jpRpBJ8KdQw3Q_mPTzZF ). >> I was trying to track down where this is happening, but the only place >> I could find that might be setting the snd_cwnd to 1 is in the >> tcp_enter_loss() function. I inserted a printk() call in this function >> to see when it is being invoked and it looks like it is only called by >> the tcp_retransmit_timer() function after the timer expires. >> >> I decided to try recording the snd_cwnd, ss-thresh, and icsk_ca_state >> inside the tcp_fastretrans_alert() function whenever it processes an >> ECN marked ACK ( >> https://drive.google.com/open?id=17GD77lb9lkCSu0_s9p40GZ5r4EU8B4VB ) >> This plot also shows when the tcp_retransmit_timer() and >> tcp_enter_loss() functions are invoked (red and purple dots >> respectively). And I see that the ACK state machine is always either >> in the TCP_CA_Open or TCP_CA_CWR state whenever the >> tcp_fastretrans_alert() function processes ECN marked ACKs ( >> https://drive.google.com/open?id=1xwuPxjgwriT9DSblFx2uILfQ95Fy-Eqq ). >> So I'm not sure where the snd_cwnd is being set to 1 (or possibly 0 as >> Neal suggested) just before entering a timeout. Any suggestions here? >> >> In order to do a bit of profiling of the tcp_dctcp code I added >> support into tcp_probe for recording the dctcp alpha parameter. I see >> that alpha oscillates around about 0.1 when the flow rates have >> converged, it goes to zero when the other host enters a timeout, and I >> don't see any unexpected behavior just before the timeout ( >> https://drive.google.com/open?id=1zPdyS57TrUYZIekbid9p1UNyraLYrdw7 ). >> >> So I haven't had much luck yet trying to track down where the problem >> is. If you have any suggestions that would help me to focus my search >> efforts, I would appreciate the comments. >> >> Thanks! >> -Steve Steve, what HZ value your kernel is compiled with ?