From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: TCP connection closed without FIN or RST Date: Fri, 03 Nov 2017 10:58:30 -0700 Message-ID: <1509731910.2849.64.camel@edumazet-glaptop3.roam.corp.google.com> References: <1509568471.3828.50.camel@edumazet-glaptop3.roam.corp.google.com> <1509569515.3828.53.camel@edumazet-glaptop3.roam.corp.google.com> <1509573771.3828.58.camel@edumazet-glaptop3.roam.corp.google.com> <1509577617.3828.62.camel@edumazet-glaptop3.roam.corp.google.com> <1509714010.2849.41.camel@edumazet-glaptop3.roam.corp.google.com> <1509714167.2849.43.camel@edumazet-glaptop3.roam.corp.google.com> <1509725144.2849.57.camel@edumazet-glaptop3.roam.corp.google.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Cc: netdev To: Vitaly Davidovich Return-path: Received: from mail-io0-f194.google.com ([209.85.223.194]:43600 "EHLO mail-io0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933233AbdKCR6c (ORCPT ); Fri, 3 Nov 2017 13:58:32 -0400 Received: by mail-io0-f194.google.com with SMTP id 134so8069141ioo.0 for ; Fri, 03 Nov 2017 10:58:32 -0700 (PDT) In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: On Fri, 2017-11-03 at 13:23 -0400, Vitaly Davidovich wrote: > On Fri, Nov 3, 2017 at 12:05 PM, Eric Dumazet wrote: > > On Fri, 2017-11-03 at 11:13 -0400, Vitaly Davidovich wrote: > >> Ok, an interesting finding. The client was originally running with > >> SO_RCVBUF of 75K (apparently someone decided to set that for some > >> unknown reason). I tried the test with a 1MB recv buffer and > >> everything works perfectly! The client responds with 0 window alerts, > >> the server just hits the persist condition and sends keep-alive > >> probes; the client continues answering with a 0 window up until it > >> wakes up and starts processing data in its receive buffer. At that > >> point, the window opens up and the server sends more data. Basically, > >> things look as one would expect in this situation :). > >> > >> /proc/sys/net/ipv4/tcp_rmem is 131072 1048576 20971520. The > >> conversation flows normally, as described above, when I change the > >> client's recv buf size to 1048576. I also tried 131072, but that > >> doesn't work - same retrans/no ACKs situation. > >> > >> I think this eliminates (right?) any middleware from the equation. > >> Instead, perhaps it's some bad interaction between a low recv buf size > >> and either some other TCP setting or TSO mechanics (LRO specifically). > >> Still investigating further. > > > > Just in case, have you tried a more recent linux kernel ? > I haven't but will look into that. I was mostly hoping to see if > anyone perhaps has seen similar symptoms/behavior and figured out what > the root cause is - just a stab in the dark with the well-informed > folks on this list :). As of right now, based on the fact that a 1MB > recv buffer works, I would surmise the issue is perhaps some poor > interaction between a lower recv buffer size and some other tcp > settings. But I'm just speculating - will continue investigating, and > I'll update this thread if I get to the bottom of it. > > > > I would rather not spend time on some problem that might already be > > fixed. > Completely understandable - I really appreciate the tips and pointers > thus far Eric, they've been helpful in their own right. I am interested to see if the issue with small sk_rcvbuf is still there. We have an upcoming change to rcvbuf autotuning to not blindly give tcp_rmem[2] to all sockets, but use a function based on RTT. Meaning that local flows could use small sk_rcvbuf instead of inflated ones. And meaning that we could increase tcp_rmem[2] to better match modern capabilities (more memory on hosts, larger BDP)