From mboxrd@z Thu Jan 1 00:00:00 1970 From: Vitaly Davidovich Subject: Re: TCP connection closed without FIN or RST Date: Fri, 3 Nov 2017 11:13:57 -0400 Message-ID: References: <1509568471.3828.50.camel@edumazet-glaptop3.roam.corp.google.com> <1509569515.3828.53.camel@edumazet-glaptop3.roam.corp.google.com> <1509573771.3828.58.camel@edumazet-glaptop3.roam.corp.google.com> <1509577617.3828.62.camel@edumazet-glaptop3.roam.corp.google.com> <1509714010.2849.41.camel@edumazet-glaptop3.roam.corp.google.com> <1509714167.2849.43.camel@edumazet-glaptop3.roam.corp.google.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Cc: netdev To: Eric Dumazet Return-path: Received: from mail-lf0-f47.google.com ([209.85.215.47]:43364 "EHLO mail-lf0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750971AbdKCPN7 (ORCPT ); Fri, 3 Nov 2017 11:13:59 -0400 Received: by mail-lf0-f47.google.com with SMTP id a16so3564869lfk.0 for ; Fri, 03 Nov 2017 08:13:58 -0700 (PDT) In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: Ok, an interesting finding. The client was originally running with SO_RCVBUF of 75K (apparently someone decided to set that for some unknown reason). I tried the test with a 1MB recv buffer and everything works perfectly! The client responds with 0 window alerts, the server just hits the persist condition and sends keep-alive probes; the client continues answering with a 0 window up until it wakes up and starts processing data in its receive buffer. At that point, the window opens up and the server sends more data. Basically, things look as one would expect in this situation :). /proc/sys/net/ipv4/tcp_rmem is 131072 1048576 20971520. The conversation flows normally, as described above, when I change the client's recv buf size to 1048576. I also tried 131072, but that doesn't work - same retrans/no ACKs situation. I think this eliminates (right?) any middleware from the equation. Instead, perhaps it's some bad interaction between a low recv buf size and either some other TCP setting or TSO mechanics (LRO specifically). Still investigating further. On Fri, Nov 3, 2017 at 10:02 AM, Vitaly Davidovich wrote: > On Fri, Nov 3, 2017 at 9:39 AM, Vitaly Davidovich wrote: >> On Fri, Nov 3, 2017 at 9:02 AM, Eric Dumazet wrote: >>> On Fri, 2017-11-03 at 06:00 -0700, Eric Dumazet wrote: >>>> On Fri, 2017-11-03 at 08:41 -0400, Vitaly Davidovich wrote: >>>> > Hi Eric, >>>> > >>>> > Ran a few more tests yesterday with packet captures, including a >>>> > capture on the client. It turns out that the client stops ack'ing >>>> > entirely at some point in the conversation - the last advertised >>>> > client window is not even close to zero (it's actually ~348K). So >>>> > there's complete radio silence from the client for some reason, even >>>> > though it does send back ACKs early on in the conversation. So yes, >>>> > as far as the server is concerned, the client is completely gone and >>>> > tcp_retries2 rightfully breaches eventually once the server retrans go >>>> > unanswered long (and for sufficient times) enough. >>>> > >>>> > What's odd though is the packet capture on the client shows the server >>>> > retrans packets arriving, so it's not like the segments don't reach >>>> > the client. I'll keep investigating, but if you (or anyone else >>>> > reading this) knows of circumstances that might cause this, I'd >>>> > appreciate any tips on where/what to look at. >>>> >>>> >>>> Might be a middle box issue ? Like a firewall connection tracking >>>> having some kind of timeout if nothing is sent on one direction ? >>>> >>>> What output do you have from client side with : >>>> >>>> ss -temoi dst >>> >>> It also could be a wrapping issue on TCP timestamps. >>> >>> You could try disabling tcp timestamps, and restart the TCP flow. >>> >>> echo 0 >/proc/sys/net/ipv4/tcp_timestamps >> Ok, I will try to do that. Thanks for the tip. > Tried with tcp_timestamps disabled on the client (didn't touch the > server), but that didn't change the outcome - same issue at the end. >>> >>> >>> >>> >>>