From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: TCP connection closed without FIN or RST Date: Wed, 01 Nov 2017 13:34:31 -0700 Message-ID: <1509568471.3828.50.camel@edumazet-glaptop3.roam.corp.google.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Cc: netdev@vger.kernel.org To: Vitaly Davidovich Return-path: Received: from mail-pg0-f44.google.com ([74.125.83.44]:45382 "EHLO mail-pg0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755189AbdKAUed (ORCPT ); Wed, 1 Nov 2017 16:34:33 -0400 Received: by mail-pg0-f44.google.com with SMTP id b192so3082419pga.2 for ; Wed, 01 Nov 2017 13:34:33 -0700 (PDT) In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: On Wed, 2017-11-01 at 16:25 -0400, Vitaly Davidovich wrote: > Hi all, > > I'm seeing some puzzling TCP behavior that I'm hoping someone on this > list can shed some light on. Apologies if this isn't the right forum > for this type of question. But here goes anyway :) > > I have client and server x86-64 linux machines with the 4.1.35 kernel. > I set up the following test/scenario: > > 1) Client connects to the server and requests a stream of data. The > server (written in Java) starts to send data. > 2) Client then goes to sleep for 15 minutes (I'll explain why below). > 3) Naturally, the server's sendq fills up and it blocks on a write() syscall. > 4) Similarly, the client's recvq fills up. > 5) After 15 minutes the client wakes up and reads the data off the > socket fairly quickly - the recvq is fully drained. > 6) At about the same time, the server's write() fails with ETIMEDOUT. > The server then proceeds to close() the socket. > 7) The client, however, remains forever stuck in its read() call. > > When the client is stuck in read(), netstat on the server does not > show the tcp connection - it's gone. On the client, netstat shows the > connection with 0 recv (and send) queue size and in ESTABLISHED state. > > I have done a packet capture (using tcpdump) on the server, and > expected to see either a FIN or RST packet to be sent to the client - > neither of these are present. What is present, however, is a bunch of > retrans from the server to the client, with what appears to be > exponential backoff. However, the conversation just stops around the > time when the ETIMEDOUT error occurred. I do not see any attempt to > abort or gracefully shut down the TCP stream. > > When I strace the server thread that was blocked on write(), I do see > the ETIMEDOUT error from write(), followed by a close() on the socket > fd. > > Would anyone possibly know what could cause this? Or suggestions on > how to troubleshoot further? In particular, are there any known cases > where a FIN or RST wouldn't be sent after a write() times out due to > too many retrans? I believe this might be related to the tcp_retries2 > behavior (the system is configured with the default value of 15), > where too many retrans attempts will cause write() to error with a > timeout. My understanding is that this shouldn't do anything to the > state of the socket on its own - it should stay in the ESTABLISHED > state. But then presumably a close() should start the shutdown state > machine by sending a FIN packet to the client and entering FIN WAIT1 > on the server. > > Ok, as to why I'm doing a test where the client sleeps for 15 minutes > - this is an attempt at reproducing a problem that I saw with a client > that wasn't sleeping intentionally, but otherwise the situation > appeared to be the same - the server write() blocked, eventually timed > out, server tcp session was gone, but client was stuck in a read() > syscall with the tcp session still in ESTABLISHED state. > > Thanks a lot ahead of time for any insights/help! We might have an issue with win 0 probes (Probe0), hitting a max number of retransmits/probes. I can check this.