From mboxrd@z Thu Jan 1 00:00:00 1970 From: Vitaly Davidovich Subject: Re: TCP connection closed without FIN or RST Date: Fri, 3 Nov 2017 09:38:19 -0400 Message-ID: References: <1509568471.3828.50.camel@edumazet-glaptop3.roam.corp.google.com> <1509569515.3828.53.camel@edumazet-glaptop3.roam.corp.google.com> <1509573771.3828.58.camel@edumazet-glaptop3.roam.corp.google.com> <1509577617.3828.62.camel@edumazet-glaptop3.roam.corp.google.com> <1509714010.2849.41.camel@edumazet-glaptop3.roam.corp.google.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Cc: netdev To: Eric Dumazet Return-path: Received: from mail-lf0-f44.google.com ([209.85.215.44]:44890 "EHLO mail-lf0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932201AbdKCNiW (ORCPT ); Fri, 3 Nov 2017 09:38:22 -0400 Received: by mail-lf0-f44.google.com with SMTP id 75so3225206lfx.1 for ; Fri, 03 Nov 2017 06:38:22 -0700 (PDT) In-Reply-To: <1509714010.2849.41.camel@edumazet-glaptop3.roam.corp.google.com> Sender: netdev-owner@vger.kernel.org List-ID: On Fri, Nov 3, 2017 at 9:00 AM, Eric Dumazet wrote: > On Fri, 2017-11-03 at 08:41 -0400, Vitaly Davidovich wrote: >> Hi Eric, >> >> Ran a few more tests yesterday with packet captures, including a >> capture on the client. It turns out that the client stops ack'ing >> entirely at some point in the conversation - the last advertised >> client window is not even close to zero (it's actually ~348K). So >> there's complete radio silence from the client for some reason, even >> though it does send back ACKs early on in the conversation. So yes, >> as far as the server is concerned, the client is completely gone and >> tcp_retries2 rightfully breaches eventually once the server retrans go >> unanswered long (and for sufficient times) enough. >> >> What's odd though is the packet capture on the client shows the server >> retrans packets arriving, so it's not like the segments don't reach >> the client. I'll keep investigating, but if you (or anyone else >> reading this) knows of circumstances that might cause this, I'd >> appreciate any tips on where/what to look at. > > > Might be a middle box issue ? Like a firewall connection tracking > having some kind of timeout if nothing is sent on one direction ? Yeah, that's certainly possible although I've not found evidence of that yet, including asking sysadmins. But it's definitely an avenue I'm going to walk a bit further down. > > What output do you have from client side with : > > ss -temoi dst I snipped some irrelevant info, like IP addresses, uid, inode number, etc. Client before it wakes up - the recvq has been at 125976 for the entire time it's been sleeping (15 minutes): State Recv-Q Send-Q ESTAB 125976 0 skmem:(r151040,rb150000,t0,tb150000,f512,w0,o0,bl0) ts sack scalable wscale:0,11 rto:208 rtt:4.664/8.781 ato:40 mss:1448 cwnd:10 send 24.8Mbps rcv_rtt:321786 rcv_space:524140 While the server is on its last retrans timer, the client wakes up and slurps up its recv buffer: State Recv-Q Send-Q ESTAB 0 0 skmem:(r0,rb150000,t0,tb150000,f151552,w0,o0,bl0) ts sack scalable wscale:0,11 rto:208 rtt:4.664/8.781 ato:40 mss:1448 cwnd:10 send 24.8Mbps rcv_rtt:321786 rcv_space:524140 Here's the cmd output from the server right before the last retrans timer expires and the socket is aborted. Note that this output is after the client has drained its recv queue (the output right above): State Recv-Q Send-Q ESTAB 0 925272 timer:(on,14sec,15) skmem:(r0,rb100000,t0,tb1050000,f2440,w947832,o0,bl0) ts sack scalable wscale:11,0 rto:120000 rtt:9.69/16.482 ato:40 mss:1448 cwnd:1 ssthresh:89 send 1.2Mbps unacked:99 retrans:1/15 lost:99 rcv_rtt:4 rcv_space:28960 Also worth noting the server's sendq has been at 925272 the entire time as well. Does anything stand out here? I guess one thing that stands out to me (but that could be due to my lack of in-depth knowledge of this) is that the client rcv_space is significantly larger than the recvq. Thanks Eric! > >