From mboxrd@z Thu Jan 1 00:00:00 1970 From: Vitaly Davidovich Subject: Re: TCP connection closed without FIN or RST Date: Wed, 8 Nov 2017 11:04:14 -0500 Message-ID: References: <1509568471.3828.50.camel@edumazet-glaptop3.roam.corp.google.com> <1509569515.3828.53.camel@edumazet-glaptop3.roam.corp.google.com> <1509573771.3828.58.camel@edumazet-glaptop3.roam.corp.google.com> <1509577617.3828.62.camel@edumazet-glaptop3.roam.corp.google.com> <1509714010.2849.41.camel@edumazet-glaptop3.roam.corp.google.com> <1509714167.2849.43.camel@edumazet-glaptop3.roam.corp.google.com> <1509725144.2849.57.camel@edumazet-glaptop3.roam.corp.google.com> <1509731910.2849.64.camel@edumazet-glaptop3.roam.corp.google.com> <1509744817.2849.68.camel@edumazet-glaptop3.roam.corp.google.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Cc: netdev To: Eric Dumazet Return-path: Received: from mail-lf0-f47.google.com ([209.85.215.47]:50844 "EHLO mail-lf0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752095AbdKHQEQ (ORCPT ); Wed, 8 Nov 2017 11:04:16 -0500 Received: by mail-lf0-f47.google.com with SMTP id a132so3748959lfa.7 for ; Wed, 08 Nov 2017 08:04:16 -0800 (PST) In-Reply-To: <1509744817.2849.68.camel@edumazet-glaptop3.roam.corp.google.com> Sender: netdev-owner@vger.kernel.org List-ID: So this issue is somehow related to setting SO_RCVBUF *after* connecting the socket (from the client). The system is configured such that the default rcvbuf size is 1MB, but the code was shrinking this down to 75Kb right after connect(). I think that explains why the window size advertised by the client was much larger than expected. I see that the kernel does not want to shrink the previously advertised window without advancement in the sequence space. So my guess is that the client runs out of buffer and starts dropping packets. Not sure how to further debug this from userspace (systemtap? bpf?) - any tips on that front would be appreciated. Thanks again for the help. On Fri, Nov 3, 2017 at 5:33 PM, Eric Dumazet wrote: > On Fri, 2017-11-03 at 14:28 -0400, Vitaly Davidovich wrote: > >> So Eric, while I still have your interest here (although I know it's >> waning :)), any code pointers to where I might look to see if a >> specific small-ish rcv buf size may interact poorly with the rest of >> the stack? Is it possible some buffer was starved in the client stack >> which prevented it from sending any segments to the server? Maybe the >> incoming retrans were actually dropped somewhere in the ingress pkt >> processing and so the stack doesn't know it needs to react to >> something? Pulling at straws here but clearly the recv buf size, and a >> somewhat small one at that, has some play. >> >> I checked dmesg (just in case something would pop up there) but didn't >> observe any warnings or anything interesting. > > I believe you could reproduce the issue with packetdrill. > > If you can provide a packetdrill file demonstrating the issue, that > would be awesome ;) > > >