All of lore.kernel.org
 help / color / mirror / Atom feed
* TCP connection closed without FIN or RST
@ 2017-11-01 20:25 Vitaly Davidovich
  2017-11-01 20:34 ` Eric Dumazet
  0 siblings, 1 reply; 20+ messages in thread
From: Vitaly Davidovich @ 2017-11-01 20:25 UTC (permalink / raw)
  To: netdev

Hi all,

I'm seeing some puzzling TCP behavior that I'm hoping someone on this
list can shed some light on.  Apologies if this isn't the right forum
for this type of question.  But here goes anyway :)

I have client and server x86-64 linux machines with the 4.1.35 kernel.
I set up the following test/scenario:

1) Client connects to the server and requests a stream of data.  The
server (written in Java) starts to send data.
2) Client then goes to sleep for 15 minutes (I'll explain why below).
3) Naturally, the server's sendq fills up and it blocks on a write() syscall.
4) Similarly, the client's recvq fills up.
5) After 15 minutes the client wakes up and reads the data off the
socket fairly quickly - the recvq is fully drained.
6) At about the same time, the server's write() fails with ETIMEDOUT.
The server then proceeds to close() the socket.
7) The client, however, remains forever stuck in its read() call.

When the client is stuck in read(), netstat on the server does not
show the tcp connection - it's gone.  On the client, netstat shows the
connection with 0 recv (and send) queue size and in ESTABLISHED state.

I have done a packet capture (using tcpdump) on the server, and
expected to see either a FIN or RST packet to be sent to the client -
neither of these are present.  What is present, however, is a bunch of
retrans from the server to the client, with what appears to be
exponential backoff.  However, the conversation just stops around the
time when the ETIMEDOUT error occurred.  I do not see any attempt to
abort or gracefully shut down the TCP stream.

When I strace the server thread that was blocked on write(), I do see
the ETIMEDOUT error from write(), followed by a close() on the socket
fd.

Would anyone possibly know what could cause this? Or suggestions on
how to troubleshoot further? In particular, are there any known cases
where a FIN or RST wouldn't be sent after a write() times out due to
too many retrans? I believe this might be related to the tcp_retries2
behavior (the system is configured with the default value of 15),
where too many retrans attempts will cause write() to error with a
timeout.  My understanding is that this shouldn't do anything to the
state of the socket on its own - it should stay in the ESTABLISHED
state.  But then presumably a close() should start the shutdown state
machine by sending a FIN packet to the client and entering FIN WAIT1
on the server.

Ok, as to why I'm doing a test where the client sleeps for 15 minutes
- this is an attempt at reproducing a problem that I saw with a client
that wasn't sleeping intentionally, but otherwise the situation
appeared to be the same - the server write() blocked, eventually timed
out, server tcp session was gone, but client was stuck in a read()
syscall with the tcp session still in ESTABLISHED state.

Thanks a lot ahead of time for any insights/help!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: TCP connection closed without FIN or RST
  2017-11-01 20:25 TCP connection closed without FIN or RST Vitaly Davidovich
@ 2017-11-01 20:34 ` Eric Dumazet
  2017-11-01 20:51   ` Eric Dumazet
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2017-11-01 20:34 UTC (permalink / raw)
  To: Vitaly Davidovich; +Cc: netdev

On Wed, 2017-11-01 at 16:25 -0400, Vitaly Davidovich wrote:
> Hi all,
> 
> I'm seeing some puzzling TCP behavior that I'm hoping someone on this
> list can shed some light on.  Apologies if this isn't the right forum
> for this type of question.  But here goes anyway :)
> 
> I have client and server x86-64 linux machines with the 4.1.35 kernel.
> I set up the following test/scenario:
> 
> 1) Client connects to the server and requests a stream of data.  The
> server (written in Java) starts to send data.
> 2) Client then goes to sleep for 15 minutes (I'll explain why below).
> 3) Naturally, the server's sendq fills up and it blocks on a write() syscall.
> 4) Similarly, the client's recvq fills up.
> 5) After 15 minutes the client wakes up and reads the data off the
> socket fairly quickly - the recvq is fully drained.
> 6) At about the same time, the server's write() fails with ETIMEDOUT.
> The server then proceeds to close() the socket.
> 7) The client, however, remains forever stuck in its read() call.
> 
> When the client is stuck in read(), netstat on the server does not
> show the tcp connection - it's gone.  On the client, netstat shows the
> connection with 0 recv (and send) queue size and in ESTABLISHED state.
> 
> I have done a packet capture (using tcpdump) on the server, and
> expected to see either a FIN or RST packet to be sent to the client -
> neither of these are present.  What is present, however, is a bunch of
> retrans from the server to the client, with what appears to be
> exponential backoff.  However, the conversation just stops around the
> time when the ETIMEDOUT error occurred.  I do not see any attempt to
> abort or gracefully shut down the TCP stream.
> 
> When I strace the server thread that was blocked on write(), I do see
> the ETIMEDOUT error from write(), followed by a close() on the socket
> fd.
> 
> Would anyone possibly know what could cause this? Or suggestions on
> how to troubleshoot further? In particular, are there any known cases
> where a FIN or RST wouldn't be sent after a write() times out due to
> too many retrans? I believe this might be related to the tcp_retries2
> behavior (the system is configured with the default value of 15),
> where too many retrans attempts will cause write() to error with a
> timeout.  My understanding is that this shouldn't do anything to the
> state of the socket on its own - it should stay in the ESTABLISHED
> state.  But then presumably a close() should start the shutdown state
> machine by sending a FIN packet to the client and entering FIN WAIT1
> on the server.
> 
> Ok, as to why I'm doing a test where the client sleeps for 15 minutes
> - this is an attempt at reproducing a problem that I saw with a client
> that wasn't sleeping intentionally, but otherwise the situation
> appeared to be the same - the server write() blocked, eventually timed
> out, server tcp session was gone, but client was stuck in a read()
> syscall with the tcp session still in ESTABLISHED state.
> 
> Thanks a lot ahead of time for any insights/help!

We might have an issue with win 0 probes (Probe0), hitting a max number
of retransmits/probes.

I can check this.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: TCP connection closed without FIN or RST
  2017-11-01 20:34 ` Eric Dumazet
@ 2017-11-01 20:51   ` Eric Dumazet
       [not found]     ` <CAHjP37GkjJyY_6GsVpiZugp+DOKbA8bV2a77iKrAnxrn80Q9Rw@mail.gmail.com>
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2017-11-01 20:51 UTC (permalink / raw)
  To: Vitaly Davidovich; +Cc: netdev

On Wed, 2017-11-01 at 13:34 -0700, Eric Dumazet wrote:
> On Wed, 2017-11-01 at 16:25 -0400, Vitaly Davidovich wrote:
> > Hi all,
> > 
> > I'm seeing some puzzling TCP behavior that I'm hoping someone on this
> > list can shed some light on.  Apologies if this isn't the right forum
> > for this type of question.  But here goes anyway :)
> > 
> > I have client and server x86-64 linux machines with the 4.1.35 kernel.
> > I set up the following test/scenario:
> > 
> > 1) Client connects to the server and requests a stream of data.  The
> > server (written in Java) starts to send data.
> > 2) Client then goes to sleep for 15 minutes (I'll explain why below).
> > 3) Naturally, the server's sendq fills up and it blocks on a write() syscall.
> > 4) Similarly, the client's recvq fills up.
> > 5) After 15 minutes the client wakes up and reads the data off the
> > socket fairly quickly - the recvq is fully drained.
> > 6) At about the same time, the server's write() fails with ETIMEDOUT.
> > The server then proceeds to close() the socket.
> > 7) The client, however, remains forever stuck in its read() call.
> > 
> > When the client is stuck in read(), netstat on the server does not
> > show the tcp connection - it's gone.  On the client, netstat shows the
> > connection with 0 recv (and send) queue size and in ESTABLISHED state.
> > 
> > I have done a packet capture (using tcpdump) on the server, and
> > expected to see either a FIN or RST packet to be sent to the client -
> > neither of these are present.  What is present, however, is a bunch of
> > retrans from the server to the client, with what appears to be
> > exponential backoff.  However, the conversation just stops around the
> > time when the ETIMEDOUT error occurred.  I do not see any attempt to
> > abort or gracefully shut down the TCP stream.
> > 
> > When I strace the server thread that was blocked on write(), I do see
> > the ETIMEDOUT error from write(), followed by a close() on the socket
> > fd.
> > 
> > Would anyone possibly know what could cause this? Or suggestions on
> > how to troubleshoot further? In particular, are there any known cases
> > where a FIN or RST wouldn't be sent after a write() times out due to
> > too many retrans? I believe this might be related to the tcp_retries2
> > behavior (the system is configured with the default value of 15),
> > where too many retrans attempts will cause write() to error with a
> > timeout.  My understanding is that this shouldn't do anything to the
> > state of the socket on its own - it should stay in the ESTABLISHED
> > state.  But then presumably a close() should start the shutdown state
> > machine by sending a FIN packet to the client and entering FIN WAIT1
> > on the server.
> > 
> > Ok, as to why I'm doing a test where the client sleeps for 15 minutes
> > - this is an attempt at reproducing a problem that I saw with a client
> > that wasn't sleeping intentionally, but otherwise the situation
> > appeared to be the same - the server write() blocked, eventually timed
> > out, server tcp session was gone, but client was stuck in a read()
> > syscall with the tcp session still in ESTABLISHED state.
> > 
> > Thanks a lot ahead of time for any insights/help!
> 
> We might have an issue with win 0 probes (Probe0), hitting a max number
> of retransmits/probes.
> 
> I can check this

If the receiver does not reply to window probes, then sender consider
the flow is dead after 10 attempts (/proc/sys/net/ipv4/tcp_retries2 )

Not sure why sending a FIN or RST in this state would be okay, since
there is obviously something wrong on the receiver TCP implementation.

If after sending 10 probes, we need to add 10 more FIN packets just in
case there is still something at the other end, it adds a lot of
overhead on the network.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: TCP connection closed without FIN or RST
       [not found]     ` <CAHjP37GkjJyY_6GsVpiZugp+DOKbA8bV2a77iKrAnxrn80Q9Rw@mail.gmail.com>
@ 2017-11-01 22:02       ` Eric Dumazet
       [not found]         ` <CAHjP37FyGBmrEi7peAsHBfU=-BzrAafb42RMyjcOoieDsN0vrg@mail.gmail.com>
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2017-11-01 22:02 UTC (permalink / raw)
  To: Vitaly Davidovich; +Cc: netdev

On Wed, 2017-11-01 at 21:45 +0000, Vitaly Davidovich wrote:
> Hi Eric,
> 
> 
> First, thanks for replying.  A couple of comments inline.
> 
> On Wed, Nov 1, 2017 at 4:51 PM Eric Dumazet <eric.dumazet@gmail.com>
> wrote:
> 
>         On Wed, 2017-11-01 at 13:34 -0700, Eric Dumazet wrote:
>         > On Wed, 2017-11-01 at 16:25 -0400, Vitaly Davidovich wrote:
>         > > Hi all,
>         > >
>         > > I'm seeing some puzzling TCP behavior that I'm hoping
>         someone on this
>         > > list can shed some light on.  Apologies if this isn't the
>         right forum
>         > > for this type of question.  But here goes anyway :)
>         > >
>         > > I have client and server x86-64 linux machines with the
>         4.1.35 kernel.
>         > > I set up the following test/scenario:
>         > >
>         > > 1) Client connects to the server and requests a stream of
>         data.  The
>         > > server (written in Java) starts to send data.
>         > > 2) Client then goes to sleep for 15 minutes (I'll explain
>         why below).
>         > > 3) Naturally, the server's sendq fills up and it blocks on
>         a write() syscall.
>         > > 4) Similarly, the client's recvq fills up.
>         > > 5) After 15 minutes the client wakes up and reads the data
>         off the
>         > > socket fairly quickly - the recvq is fully drained.
>         > > 6) At about the same time, the server's write() fails with
>         ETIMEDOUT.
>         > > The server then proceeds to close() the socket.
>         > > 7) The client, however, remains forever stuck in its
>         read() call.
>         > >
>         > > When the client is stuck in read(), netstat on the server
>         does not
>         > > show the tcp connection - it's gone.  On the client,
>         netstat shows the
>         > > connection with 0 recv (and send) queue size and in
>         ESTABLISHED state.
>         > >
>         > > I have done a packet capture (using tcpdump) on the
>         server, and
>         > > expected to see either a FIN or RST packet to be sent to
>         the client -
>         > > neither of these are present.  What is present, however,
>         is a bunch of
>         > > retrans from the server to the client, with what appears
>         to be
>         > > exponential backoff.  However, the conversation just stops
>         around the
>         > > time when the ETIMEDOUT error occurred.  I do not see any
>         attempt to
>         > > abort or gracefully shut down the TCP stream.
>         > >
>         > > When I strace the server thread that was blocked on
>         write(), I do see
>         > > the ETIMEDOUT error from write(), followed by a close() on
>         the socket
>         > > fd.
>         > >
>         > > Would anyone possibly know what could cause this? Or
>         suggestions on
>         > > how to troubleshoot further? In particular, are there any
>         known cases
>         > > where a FIN or RST wouldn't be sent after a write() times
>         out due to
>         > > too many retrans? I believe this might be related to the
>         tcp_retries2
>         > > behavior (the system is configured with the default value
>         of 15),
>         > > where too many retrans attempts will cause write() to
>         error with a
>         > > timeout.  My understanding is that this shouldn't do
>         anything to the
>         > > state of the socket on its own - it should stay in the
>         ESTABLISHED
>         > > state.  But then presumably a close() should start the
>         shutdown state
>         > > machine by sending a FIN packet to the client and entering
>         FIN WAIT1
>         > > on the server.
>         > >
>         > > Ok, as to why I'm doing a test where the client sleeps for
>         15 minutes
>         > > - this is an attempt at reproducing a problem that I saw
>         with a client
>         > > that wasn't sleeping intentionally, but otherwise the
>         situation
>         > > appeared to be the same - the server write() blocked,
>         eventually timed
>         > > out, server tcp session was gone, but client was stuck in
>         a read()
>         > > syscall with the tcp session still in ESTABLISHED state.
>         > >
>         > > Thanks a lot ahead of time for any insights/help!
>         >
>         > We might have an issue with win 0 probes (Probe0), hitting a
>         max number
>         > of retransmits/probes.
>         >
>         > I can check this
>         
>         If the receiver does not reply to window probes, then sender
>         consider
>         the flow is dead after 10 attempts
>         (/proc/sys/net/ipv4/tcp_retries2 )
> Right, except I have it at 15 (which is also the default).
>         
>         
>         Not sure why sending a FIN or RST in this state would be okay,
>         since
>         there is obviously something wrong on the receiver TCP
>         implementation.
>         
>         If after sending 10 probes, we need to add 10 more FIN packets
>         just in
>         case there is still something at the other end, it adds a lot
>         of
>         overhead on the network.
> Yes, I was thinking about this as well - if the peer is causing
> retrans and there’re too many unack’d segments as-is, the likelihood
> of a FIN handshake or even an RST reaching there is pretty low.
> 
> 
> I need to look at the tcpdump again - I feel like I didn’t see a 0
> window advertised by the client but maybe I missed it.  I did see the
> exponential looking retrans from the server, as mentioned, so there
> were unacked bytes in the server stack for a long time.

If client sends nothing, there is a bug in it.
> 
> 
> So I guess there’s codepath in the kernel where a tcp socket is torn
> down “quietly” (ie with no segments sent out)?
> 
Yes, after /proc/sys/net/ipv4/tcp_retries2 probes, we give up.

What would be the point sending another packet is the prior 15 ones gave
no answer ?

What if the 'another packet' is dropped by the network,
should we attempt to send this FIN/RST 15 times ? :)

So really it looks it works as intended.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: TCP connection closed without FIN or RST
       [not found]         ` <CAHjP37FyGBmrEi7peAsHBfU=-BzrAafb42RMyjcOoieDsN0vrg@mail.gmail.com>
@ 2017-11-01 23:06           ` Eric Dumazet
  2017-11-03 12:41             ` Vitaly Davidovich
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2017-11-01 23:06 UTC (permalink / raw)
  To: Vitaly Davidovich; +Cc: netdev

On Wed, 2017-11-01 at 22:22 +0000, Vitaly Davidovich wrote:
> Eric,
> 

> Yes I agree.  However the thing I’m still puzzled about is the client
> application is not reading/draining the recvq - ok, the client tcp
> stack should start advertising a 0 window size.  Does a 0 window size
> count against the tcp_retries2? Is that what you were alluding to in
> your first reply?
> 

Every time we receive an (valid) ACK, with a win 0 or not, the counter
of attempts is cleared, given the opportunity for the sender to send 15
more probes.
> 
> If it *does* count towards the retries limit then a RST doesn’t seem
> like a bad idea.  The client is responding with segments but the user
> app there just isn’t draining the data.  Presumably that RST has a
> good chance of reaching the client and then unblocking the read()
> there with a peer reset error.  Or am I missing something?
> 
> 
> If it doesn’t count towards the limit then I need to figure out why
> the 0 window size segments weren’t being sent by the client.

Yes please :)
> 
> 
> I will try to double check that the client was indeed advertising 0
> window size.  There’s nothing special about that machine - it’s a
> 4.1.35 kernel as well.  I wouldn’t expect the tcp stack there to be
> unresponsive just because the user app is sleeping.
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: TCP connection closed without FIN or RST
  2017-11-01 23:06           ` Eric Dumazet
@ 2017-11-03 12:41             ` Vitaly Davidovich
  2017-11-03 13:00               ` Eric Dumazet
  0 siblings, 1 reply; 20+ messages in thread
From: Vitaly Davidovich @ 2017-11-03 12:41 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

Hi Eric,

Ran a few more tests yesterday with packet captures, including a
capture on the client.  It turns out that the client stops ack'ing
entirely at some point in the conversation - the last advertised
client window is not even close to zero (it's actually ~348K).  So
there's complete radio silence from the client for some reason, even
though it does send back ACKs early on in the conversation.  So yes,
as far as the server is concerned, the client is completely gone and
tcp_retries2 rightfully breaches eventually once the server retrans go
unanswered long (and for sufficient times) enough.

What's odd though is the packet capture on the client shows the server
retrans packets arriving, so it's not like the segments don't reach
the client.  I'll keep investigating, but if you (or anyone else
reading this) knows of circumstances that might cause this, I'd
appreciate any tips on where/what to look at.

Thanks

On Wed, Nov 1, 2017 at 7:06 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2017-11-01 at 22:22 +0000, Vitaly Davidovich wrote:
>> Eric,
>>
>
>> Yes I agree.  However the thing I’m still puzzled about is the client
>> application is not reading/draining the recvq - ok, the client tcp
>> stack should start advertising a 0 window size.  Does a 0 window size
>> count against the tcp_retries2? Is that what you were alluding to in
>> your first reply?
>>
>
> Every time we receive an (valid) ACK, with a win 0 or not, the counter
> of attempts is cleared, given the opportunity for the sender to send 15
> more probes.
>>
>> If it *does* count towards the retries limit then a RST doesn’t seem
>> like a bad idea.  The client is responding with segments but the user
>> app there just isn’t draining the data.  Presumably that RST has a
>> good chance of reaching the client and then unblocking the read()
>> there with a peer reset error.  Or am I missing something?
>>
>>
>> If it doesn’t count towards the limit then I need to figure out why
>> the 0 window size segments weren’t being sent by the client.
>
> Yes please :)
>>
>>
>> I will try to double check that the client was indeed advertising 0
>> window size.  There’s nothing special about that machine - it’s a
>> 4.1.35 kernel as well.  I wouldn’t expect the tcp stack there to be
>> unresponsive just because the user app is sleeping.
>>
>
>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: TCP connection closed without FIN or RST
  2017-11-03 12:41             ` Vitaly Davidovich
@ 2017-11-03 13:00               ` Eric Dumazet
  2017-11-03 13:02                 ` Eric Dumazet
  2017-11-03 13:38                 ` Vitaly Davidovich
  0 siblings, 2 replies; 20+ messages in thread
From: Eric Dumazet @ 2017-11-03 13:00 UTC (permalink / raw)
  To: Vitaly Davidovich; +Cc: netdev

On Fri, 2017-11-03 at 08:41 -0400, Vitaly Davidovich wrote:
> Hi Eric,
> 
> Ran a few more tests yesterday with packet captures, including a
> capture on the client.  It turns out that the client stops ack'ing
> entirely at some point in the conversation - the last advertised
> client window is not even close to zero (it's actually ~348K).  So
> there's complete radio silence from the client for some reason, even
> though it does send back ACKs early on in the conversation.  So yes,
> as far as the server is concerned, the client is completely gone and
> tcp_retries2 rightfully breaches eventually once the server retrans go
> unanswered long (and for sufficient times) enough.
> 
> What's odd though is the packet capture on the client shows the server
> retrans packets arriving, so it's not like the segments don't reach
> the client.  I'll keep investigating, but if you (or anyone else
> reading this) knows of circumstances that might cause this, I'd
> appreciate any tips on where/what to look at.


Might be a middle box issue ?  Like a firewall connection tracking
having some kind of timeout if nothing is sent on one direction ?

What output do you have from client side with :

ss -temoi dst <server_ip>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: TCP connection closed without FIN or RST
  2017-11-03 13:00               ` Eric Dumazet
@ 2017-11-03 13:02                 ` Eric Dumazet
  2017-11-03 13:39                   ` Vitaly Davidovich
  2017-11-03 13:38                 ` Vitaly Davidovich
  1 sibling, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2017-11-03 13:02 UTC (permalink / raw)
  To: Vitaly Davidovich; +Cc: netdev

On Fri, 2017-11-03 at 06:00 -0700, Eric Dumazet wrote:
> On Fri, 2017-11-03 at 08:41 -0400, Vitaly Davidovich wrote:
> > Hi Eric,
> > 
> > Ran a few more tests yesterday with packet captures, including a
> > capture on the client.  It turns out that the client stops ack'ing
> > entirely at some point in the conversation - the last advertised
> > client window is not even close to zero (it's actually ~348K).  So
> > there's complete radio silence from the client for some reason, even
> > though it does send back ACKs early on in the conversation.  So yes,
> > as far as the server is concerned, the client is completely gone and
> > tcp_retries2 rightfully breaches eventually once the server retrans go
> > unanswered long (and for sufficient times) enough.
> > 
> > What's odd though is the packet capture on the client shows the server
> > retrans packets arriving, so it's not like the segments don't reach
> > the client.  I'll keep investigating, but if you (or anyone else
> > reading this) knows of circumstances that might cause this, I'd
> > appreciate any tips on where/what to look at.
> 
> 
> Might be a middle box issue ?  Like a firewall connection tracking
> having some kind of timeout if nothing is sent on one direction ?
> 
> What output do you have from client side with :
> 
> ss -temoi dst <server_ip>

It also could be a wrapping issue on TCP timestamps.

You could try disabling tcp timestamps, and restart the TCP flow.

echo 0 >/proc/sys/net/ipv4/tcp_timestamps

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: TCP connection closed without FIN or RST
  2017-11-03 13:00               ` Eric Dumazet
  2017-11-03 13:02                 ` Eric Dumazet
@ 2017-11-03 13:38                 ` Vitaly Davidovich
  1 sibling, 0 replies; 20+ messages in thread
From: Vitaly Davidovich @ 2017-11-03 13:38 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On Fri, Nov 3, 2017 at 9:00 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2017-11-03 at 08:41 -0400, Vitaly Davidovich wrote:
>> Hi Eric,
>>
>> Ran a few more tests yesterday with packet captures, including a
>> capture on the client.  It turns out that the client stops ack'ing
>> entirely at some point in the conversation - the last advertised
>> client window is not even close to zero (it's actually ~348K).  So
>> there's complete radio silence from the client for some reason, even
>> though it does send back ACKs early on in the conversation.  So yes,
>> as far as the server is concerned, the client is completely gone and
>> tcp_retries2 rightfully breaches eventually once the server retrans go
>> unanswered long (and for sufficient times) enough.
>>
>> What's odd though is the packet capture on the client shows the server
>> retrans packets arriving, so it's not like the segments don't reach
>> the client.  I'll keep investigating, but if you (or anyone else
>> reading this) knows of circumstances that might cause this, I'd
>> appreciate any tips on where/what to look at.
>
>
> Might be a middle box issue ?  Like a firewall connection tracking
> having some kind of timeout if nothing is sent on one direction ?
Yeah, that's certainly possible although I've not found evidence of
that yet, including asking sysadmins.  But it's definitely an avenue
I'm going to walk a bit further down.
>
> What output do you have from client side with :
>
> ss -temoi dst <server_ip>
I snipped some irrelevant info, like IP addresses, uid, inode number, etc.

Client before it wakes up - the recvq has been at 125976 for the
entire time it's been sleeping (15 minutes):

State       Recv-Q Send-Q

ESTAB       125976 0



                skmem:(r151040,rb150000,t0,tb150000,f512,w0,o0,bl0) ts
sack scalable wscale:0,11 rto:208 rtt:4.664/8.781 ato:40 mss:1448
cwnd:10 send 24.8Mbps rcv_rtt:321786 rcv_space:524140


While the server is on its last retrans timer, the client wakes up and
slurps up its recv buffer:

State       Recv-Q Send-Q

ESTAB       0      0

                 skmem:(r0,rb150000,t0,tb150000,f151552,w0,o0,bl0) ts
sack scalable wscale:0,11 rto:208 rtt:4.664/8.781 ato:40 mss:1448
cwnd:10 send 24.8Mbps rcv_rtt:321786 rcv_space:524140



Here's the cmd output from the server right before the last retrans
timer expires and the socket is aborted.  Note that this output is
after the client has drained its recv queue (the output right above):


State       Recv-Q Send-Q

ESTAB       0      925272

timer:(on,14sec,15)

         skmem:(r0,rb100000,t0,tb1050000,f2440,w947832,o0,bl0) ts sack
scalable wscale:11,0 rto:120000 rtt:9.69/16.482 ato:40 mss:1448 cwnd:1
ssthresh:89 send 1.2Mbps unacked:99 retrans:1/15 lost:99 rcv_rtt:4
rcv_space:28960

Also worth noting the server's sendq has been at 925272 the entire time as well.


Does anything stand out here? I guess one thing that stands out to me
(but that could be due to my lack of in-depth knowledge of this) is
that the client rcv_space is significantly larger than the recvq.

Thanks Eric!

>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: TCP connection closed without FIN or RST
  2017-11-03 13:02                 ` Eric Dumazet
@ 2017-11-03 13:39                   ` Vitaly Davidovich
  2017-11-03 14:02                     ` Vitaly Davidovich
  0 siblings, 1 reply; 20+ messages in thread
From: Vitaly Davidovich @ 2017-11-03 13:39 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On Fri, Nov 3, 2017 at 9:02 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2017-11-03 at 06:00 -0700, Eric Dumazet wrote:
>> On Fri, 2017-11-03 at 08:41 -0400, Vitaly Davidovich wrote:
>> > Hi Eric,
>> >
>> > Ran a few more tests yesterday with packet captures, including a
>> > capture on the client.  It turns out that the client stops ack'ing
>> > entirely at some point in the conversation - the last advertised
>> > client window is not even close to zero (it's actually ~348K).  So
>> > there's complete radio silence from the client for some reason, even
>> > though it does send back ACKs early on in the conversation.  So yes,
>> > as far as the server is concerned, the client is completely gone and
>> > tcp_retries2 rightfully breaches eventually once the server retrans go
>> > unanswered long (and for sufficient times) enough.
>> >
>> > What's odd though is the packet capture on the client shows the server
>> > retrans packets arriving, so it's not like the segments don't reach
>> > the client.  I'll keep investigating, but if you (or anyone else
>> > reading this) knows of circumstances that might cause this, I'd
>> > appreciate any tips on where/what to look at.
>>
>>
>> Might be a middle box issue ?  Like a firewall connection tracking
>> having some kind of timeout if nothing is sent on one direction ?
>>
>> What output do you have from client side with :
>>
>> ss -temoi dst <server_ip>
>
> It also could be a wrapping issue on TCP timestamps.
>
> You could try disabling tcp timestamps, and restart the TCP flow.
>
> echo 0 >/proc/sys/net/ipv4/tcp_timestamps
Ok, I will try to do that.  Thanks for the tip.
>
>
>
>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: TCP connection closed without FIN or RST
  2017-11-03 13:39                   ` Vitaly Davidovich
@ 2017-11-03 14:02                     ` Vitaly Davidovich
  2017-11-03 15:13                       ` Vitaly Davidovich
  0 siblings, 1 reply; 20+ messages in thread
From: Vitaly Davidovich @ 2017-11-03 14:02 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On Fri, Nov 3, 2017 at 9:39 AM, Vitaly Davidovich <vitalyd@gmail.com> wrote:
> On Fri, Nov 3, 2017 at 9:02 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Fri, 2017-11-03 at 06:00 -0700, Eric Dumazet wrote:
>>> On Fri, 2017-11-03 at 08:41 -0400, Vitaly Davidovich wrote:
>>> > Hi Eric,
>>> >
>>> > Ran a few more tests yesterday with packet captures, including a
>>> > capture on the client.  It turns out that the client stops ack'ing
>>> > entirely at some point in the conversation - the last advertised
>>> > client window is not even close to zero (it's actually ~348K).  So
>>> > there's complete radio silence from the client for some reason, even
>>> > though it does send back ACKs early on in the conversation.  So yes,
>>> > as far as the server is concerned, the client is completely gone and
>>> > tcp_retries2 rightfully breaches eventually once the server retrans go
>>> > unanswered long (and for sufficient times) enough.
>>> >
>>> > What's odd though is the packet capture on the client shows the server
>>> > retrans packets arriving, so it's not like the segments don't reach
>>> > the client.  I'll keep investigating, but if you (or anyone else
>>> > reading this) knows of circumstances that might cause this, I'd
>>> > appreciate any tips on where/what to look at.
>>>
>>>
>>> Might be a middle box issue ?  Like a firewall connection tracking
>>> having some kind of timeout if nothing is sent on one direction ?
>>>
>>> What output do you have from client side with :
>>>
>>> ss -temoi dst <server_ip>
>>
>> It also could be a wrapping issue on TCP timestamps.
>>
>> You could try disabling tcp timestamps, and restart the TCP flow.
>>
>> echo 0 >/proc/sys/net/ipv4/tcp_timestamps
> Ok, I will try to do that.  Thanks for the tip.
Tried with tcp_timestamps disabled on the client (didn't touch the
server), but that didn't change the outcome - same issue at the end.
>>
>>
>>
>>
>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: TCP connection closed without FIN or RST
  2017-11-03 14:02                     ` Vitaly Davidovich
@ 2017-11-03 15:13                       ` Vitaly Davidovich
  2017-11-03 16:05                         ` Eric Dumazet
  0 siblings, 1 reply; 20+ messages in thread
From: Vitaly Davidovich @ 2017-11-03 15:13 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

Ok, an interesting finding.  The client was originally running with
SO_RCVBUF of 75K (apparently someone decided to set that for some
unknown reason).  I tried the test with a 1MB recv buffer and
everything works perfectly! The client responds with 0 window alerts,
the server just hits the persist condition and sends keep-alive
probes; the client continues answering with a 0 window up until it
wakes up and starts processing data in its receive buffer.  At that
point, the window opens up and the server sends more data.  Basically,
things look as one would expect in this situation :).

/proc/sys/net/ipv4/tcp_rmem is 131072  1048576   20971520.  The
conversation flows normally, as described above, when I change the
client's recv buf size to 1048576.  I also tried 131072, but that
doesn't work - same retrans/no ACKs situation.

I think this eliminates (right?) any middleware from the equation.
Instead, perhaps it's some bad interaction between a low recv buf size
and either some other TCP setting or TSO mechanics (LRO specifically).
Still investigating further.

On Fri, Nov 3, 2017 at 10:02 AM, Vitaly Davidovich <vitalyd@gmail.com> wrote:
> On Fri, Nov 3, 2017 at 9:39 AM, Vitaly Davidovich <vitalyd@gmail.com> wrote:
>> On Fri, Nov 3, 2017 at 9:02 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> On Fri, 2017-11-03 at 06:00 -0700, Eric Dumazet wrote:
>>>> On Fri, 2017-11-03 at 08:41 -0400, Vitaly Davidovich wrote:
>>>> > Hi Eric,
>>>> >
>>>> > Ran a few more tests yesterday with packet captures, including a
>>>> > capture on the client.  It turns out that the client stops ack'ing
>>>> > entirely at some point in the conversation - the last advertised
>>>> > client window is not even close to zero (it's actually ~348K).  So
>>>> > there's complete radio silence from the client for some reason, even
>>>> > though it does send back ACKs early on in the conversation.  So yes,
>>>> > as far as the server is concerned, the client is completely gone and
>>>> > tcp_retries2 rightfully breaches eventually once the server retrans go
>>>> > unanswered long (and for sufficient times) enough.
>>>> >
>>>> > What's odd though is the packet capture on the client shows the server
>>>> > retrans packets arriving, so it's not like the segments don't reach
>>>> > the client.  I'll keep investigating, but if you (or anyone else
>>>> > reading this) knows of circumstances that might cause this, I'd
>>>> > appreciate any tips on where/what to look at.
>>>>
>>>>
>>>> Might be a middle box issue ?  Like a firewall connection tracking
>>>> having some kind of timeout if nothing is sent on one direction ?
>>>>
>>>> What output do you have from client side with :
>>>>
>>>> ss -temoi dst <server_ip>
>>>
>>> It also could be a wrapping issue on TCP timestamps.
>>>
>>> You could try disabling tcp timestamps, and restart the TCP flow.
>>>
>>> echo 0 >/proc/sys/net/ipv4/tcp_timestamps
>> Ok, I will try to do that.  Thanks for the tip.
> Tried with tcp_timestamps disabled on the client (didn't touch the
> server), but that didn't change the outcome - same issue at the end.
>>>
>>>
>>>
>>>
>>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: TCP connection closed without FIN or RST
  2017-11-03 15:13                       ` Vitaly Davidovich
@ 2017-11-03 16:05                         ` Eric Dumazet
  2017-11-03 17:23                           ` Vitaly Davidovich
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2017-11-03 16:05 UTC (permalink / raw)
  To: Vitaly Davidovich; +Cc: netdev

On Fri, 2017-11-03 at 11:13 -0400, Vitaly Davidovich wrote:
> Ok, an interesting finding.  The client was originally running with
> SO_RCVBUF of 75K (apparently someone decided to set that for some
> unknown reason).  I tried the test with a 1MB recv buffer and
> everything works perfectly! The client responds with 0 window alerts,
> the server just hits the persist condition and sends keep-alive
> probes; the client continues answering with a 0 window up until it
> wakes up and starts processing data in its receive buffer.  At that
> point, the window opens up and the server sends more data.  Basically,
> things look as one would expect in this situation :).
> 
> /proc/sys/net/ipv4/tcp_rmem is 131072  1048576   20971520.  The
> conversation flows normally, as described above, when I change the
> client's recv buf size to 1048576.  I also tried 131072, but that
> doesn't work - same retrans/no ACKs situation.
> 
> I think this eliminates (right?) any middleware from the equation.
> Instead, perhaps it's some bad interaction between a low recv buf size
> and either some other TCP setting or TSO mechanics (LRO specifically).
> Still investigating further.

Just in case, have you tried a more recent linux kernel ?

I would rather not spend time on some problem that might already be
fixed.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: TCP connection closed without FIN or RST
  2017-11-03 16:05                         ` Eric Dumazet
@ 2017-11-03 17:23                           ` Vitaly Davidovich
  2017-11-03 17:58                             ` Eric Dumazet
  0 siblings, 1 reply; 20+ messages in thread
From: Vitaly Davidovich @ 2017-11-03 17:23 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On Fri, Nov 3, 2017 at 12:05 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2017-11-03 at 11:13 -0400, Vitaly Davidovich wrote:
>> Ok, an interesting finding.  The client was originally running with
>> SO_RCVBUF of 75K (apparently someone decided to set that for some
>> unknown reason).  I tried the test with a 1MB recv buffer and
>> everything works perfectly! The client responds with 0 window alerts,
>> the server just hits the persist condition and sends keep-alive
>> probes; the client continues answering with a 0 window up until it
>> wakes up and starts processing data in its receive buffer.  At that
>> point, the window opens up and the server sends more data.  Basically,
>> things look as one would expect in this situation :).
>>
>> /proc/sys/net/ipv4/tcp_rmem is 131072  1048576   20971520.  The
>> conversation flows normally, as described above, when I change the
>> client's recv buf size to 1048576.  I also tried 131072, but that
>> doesn't work - same retrans/no ACKs situation.
>>
>> I think this eliminates (right?) any middleware from the equation.
>> Instead, perhaps it's some bad interaction between a low recv buf size
>> and either some other TCP setting or TSO mechanics (LRO specifically).
>> Still investigating further.
>
> Just in case, have you tried a more recent linux kernel ?
I haven't but will look into that.  I was mostly hoping to see if
anyone perhaps has seen similar symptoms/behavior and figured out what
the root cause is - just a stab in the dark with the well-informed
folks on this list :).  As of right now, based on the fact that a 1MB
recv buffer works, I would surmise the issue is perhaps some poor
interaction between a lower recv buffer size and some other tcp
settings.  But I'm just speculating - will continue investigating, and
I'll update this thread if I get to the bottom of it.
>
> I would rather not spend time on some problem that might already be
> fixed.
Completely understandable - I really appreciate the tips and pointers
thus far Eric, they've been helpful in their own right.
>
>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: TCP connection closed without FIN or RST
  2017-11-03 17:23                           ` Vitaly Davidovich
@ 2017-11-03 17:58                             ` Eric Dumazet
  2017-11-03 18:28                               ` Vitaly Davidovich
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2017-11-03 17:58 UTC (permalink / raw)
  To: Vitaly Davidovich; +Cc: netdev

On Fri, 2017-11-03 at 13:23 -0400, Vitaly Davidovich wrote:
> On Fri, Nov 3, 2017 at 12:05 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Fri, 2017-11-03 at 11:13 -0400, Vitaly Davidovich wrote:
> >> Ok, an interesting finding.  The client was originally running with
> >> SO_RCVBUF of 75K (apparently someone decided to set that for some
> >> unknown reason).  I tried the test with a 1MB recv buffer and
> >> everything works perfectly! The client responds with 0 window alerts,
> >> the server just hits the persist condition and sends keep-alive
> >> probes; the client continues answering with a 0 window up until it
> >> wakes up and starts processing data in its receive buffer.  At that
> >> point, the window opens up and the server sends more data.  Basically,
> >> things look as one would expect in this situation :).
> >>
> >> /proc/sys/net/ipv4/tcp_rmem is 131072  1048576   20971520.  The
> >> conversation flows normally, as described above, when I change the
> >> client's recv buf size to 1048576.  I also tried 131072, but that
> >> doesn't work - same retrans/no ACKs situation.
> >>
> >> I think this eliminates (right?) any middleware from the equation.
> >> Instead, perhaps it's some bad interaction between a low recv buf size
> >> and either some other TCP setting or TSO mechanics (LRO specifically).
> >> Still investigating further.
> >
> > Just in case, have you tried a more recent linux kernel ?
> I haven't but will look into that.  I was mostly hoping to see if
> anyone perhaps has seen similar symptoms/behavior and figured out what
> the root cause is - just a stab in the dark with the well-informed
> folks on this list :).  As of right now, based on the fact that a 1MB
> recv buffer works, I would surmise the issue is perhaps some poor
> interaction between a lower recv buffer size and some other tcp
> settings.  But I'm just speculating - will continue investigating, and
> I'll update this thread if I get to the bottom of it.
> >
> > I would rather not spend time on some problem that might already be
> > fixed.
> Completely understandable - I really appreciate the tips and pointers
> thus far Eric, they've been helpful in their own right.

I am interested to see if the issue with small sk_rcvbuf is still there.

We have an upcoming change to rcvbuf autotuning to not blindly give
tcp_rmem[2] to all sockets, but use a function based on RTT.

Meaning that local flows could use small sk_rcvbuf instead of inflated
ones.

And meaning that we could increase tcp_rmem[2] to better match modern
capabilities (more memory on hosts, larger BDP)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: TCP connection closed without FIN or RST
  2017-11-03 17:58                             ` Eric Dumazet
@ 2017-11-03 18:28                               ` Vitaly Davidovich
  2017-11-03 21:33                                 ` Eric Dumazet
  0 siblings, 1 reply; 20+ messages in thread
From: Vitaly Davidovich @ 2017-11-03 18:28 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On Fri, Nov 3, 2017 at 1:58 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2017-11-03 at 13:23 -0400, Vitaly Davidovich wrote:
>> On Fri, Nov 3, 2017 at 12:05 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > On Fri, 2017-11-03 at 11:13 -0400, Vitaly Davidovich wrote:
>> >> Ok, an interesting finding.  The client was originally running with
>> >> SO_RCVBUF of 75K (apparently someone decided to set that for some
>> >> unknown reason).  I tried the test with a 1MB recv buffer and
>> >> everything works perfectly! The client responds with 0 window alerts,
>> >> the server just hits the persist condition and sends keep-alive
>> >> probes; the client continues answering with a 0 window up until it
>> >> wakes up and starts processing data in its receive buffer.  At that
>> >> point, the window opens up and the server sends more data.  Basically,
>> >> things look as one would expect in this situation :).
>> >>
>> >> /proc/sys/net/ipv4/tcp_rmem is 131072  1048576   20971520.  The
>> >> conversation flows normally, as described above, when I change the
>> >> client's recv buf size to 1048576.  I also tried 131072, but that
>> >> doesn't work - same retrans/no ACKs situation.
>> >>
>> >> I think this eliminates (right?) any middleware from the equation.
>> >> Instead, perhaps it's some bad interaction between a low recv buf size
>> >> and either some other TCP setting or TSO mechanics (LRO specifically).
>> >> Still investigating further.
>> >
>> > Just in case, have you tried a more recent linux kernel ?
>> I haven't but will look into that.  I was mostly hoping to see if
>> anyone perhaps has seen similar symptoms/behavior and figured out what
>> the root cause is - just a stab in the dark with the well-informed
>> folks on this list :).  As of right now, based on the fact that a 1MB
>> recv buffer works, I would surmise the issue is perhaps some poor
>> interaction between a lower recv buffer size and some other tcp
>> settings.  But I'm just speculating - will continue investigating, and
>> I'll update this thread if I get to the bottom of it.
>> >
>> > I would rather not spend time on some problem that might already be
>> > fixed.
>> Completely understandable - I really appreciate the tips and pointers
>> thus far Eric, they've been helpful in their own right.
>
> I am interested to see if the issue with small sk_rcvbuf is still there.
>
> We have an upcoming change to rcvbuf autotuning to not blindly give
> tcp_rmem[2] to all sockets, but use a function based on RTT.
>
> Meaning that local flows could use small sk_rcvbuf instead of inflated
> ones.
>
> And meaning that we could increase tcp_rmem[2] to better match modern
> capabilities (more memory on hosts, larger BDP)

So Eric, while I still have your interest here (although I know it's
waning :)), any code pointers to where I might look to see if a
specific small-ish rcv buf size may interact poorly with the rest of
the stack? Is it possible some buffer was starved in the client stack
which prevented it from sending any segments to the server? Maybe the
incoming retrans were actually dropped somewhere in the ingress pkt
processing and so the stack doesn't know it needs to react to
something? Pulling at straws here but clearly the recv buf size, and a
somewhat small one at that, has some play.

I checked dmesg (just in case something would pop up there) but didn't
observe any warnings or anything interesting.

>
>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: TCP connection closed without FIN or RST
  2017-11-03 18:28                               ` Vitaly Davidovich
@ 2017-11-03 21:33                                 ` Eric Dumazet
  2017-11-08 16:04                                   ` Vitaly Davidovich
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2017-11-03 21:33 UTC (permalink / raw)
  To: Vitaly Davidovich; +Cc: netdev

On Fri, 2017-11-03 at 14:28 -0400, Vitaly Davidovich wrote:

> So Eric, while I still have your interest here (although I know it's
> waning :)), any code pointers to where I might look to see if a
> specific small-ish rcv buf size may interact poorly with the rest of
> the stack? Is it possible some buffer was starved in the client stack
> which prevented it from sending any segments to the server? Maybe the
> incoming retrans were actually dropped somewhere in the ingress pkt
> processing and so the stack doesn't know it needs to react to
> something? Pulling at straws here but clearly the recv buf size, and a
> somewhat small one at that, has some play.
> 
> I checked dmesg (just in case something would pop up there) but didn't
> observe any warnings or anything interesting.

I believe you could reproduce the issue with packetdrill.

If you can provide a packetdrill file demonstrating the issue, that
would be awesome ;)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: TCP connection closed without FIN or RST
  2017-11-03 21:33                                 ` Eric Dumazet
@ 2017-11-08 16:04                                   ` Vitaly Davidovich
  2017-11-08 17:29                                     ` Eric Dumazet
  0 siblings, 1 reply; 20+ messages in thread
From: Vitaly Davidovich @ 2017-11-08 16:04 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

So this issue is somehow related to setting SO_RCVBUF *after*
connecting the socket (from the client).  The system is configured
such that the default rcvbuf size is 1MB, but the code was shrinking
this down to 75Kb right after connect().  I think that explains why
the window size advertised by the client was much larger than
expected.  I see that the kernel does not want to shrink the
previously advertised window without advancement in the sequence
space.  So my guess is that the client runs out of buffer and starts
dropping packets.  Not sure how to further debug this from userspace
(systemtap? bpf?) - any tips on that front would be appreciated.

Thanks again for the help.

On Fri, Nov 3, 2017 at 5:33 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2017-11-03 at 14:28 -0400, Vitaly Davidovich wrote:
>
>> So Eric, while I still have your interest here (although I know it's
>> waning :)), any code pointers to where I might look to see if a
>> specific small-ish rcv buf size may interact poorly with the rest of
>> the stack? Is it possible some buffer was starved in the client stack
>> which prevented it from sending any segments to the server? Maybe the
>> incoming retrans were actually dropped somewhere in the ingress pkt
>> processing and so the stack doesn't know it needs to react to
>> something? Pulling at straws here but clearly the recv buf size, and a
>> somewhat small one at that, has some play.
>>
>> I checked dmesg (just in case something would pop up there) but didn't
>> observe any warnings or anything interesting.
>
> I believe you could reproduce the issue with packetdrill.
>
> If you can provide a packetdrill file demonstrating the issue, that
> would be awesome ;)
>
>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: TCP connection closed without FIN or RST
  2017-11-08 16:04                                   ` Vitaly Davidovich
@ 2017-11-08 17:29                                     ` Eric Dumazet
  2017-11-08 17:58                                       ` Vitaly Davidovich
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2017-11-08 17:29 UTC (permalink / raw)
  To: Vitaly Davidovich; +Cc: netdev

Please do not top post on netdev.

On Wed, 2017-11-08 at 11:04 -0500, Vitaly Davidovich wrote:
> So this issue is somehow related to setting SO_RCVBUF *after*
> connecting the socket (from the client).  The system is configured
> such that the default rcvbuf size is 1MB, but the code was shrinking
> this down to 75Kb right after connect(). 

What are you calling default rcvbuf size exactly ?

Is the application doing 

 s = socket(...);
 ...
 setsockopt(s, SOL_SOCKET, SO_RCVBUF, [1000000], 4)
 ...
 connect(s, ...)
 setsockopt(s, SOL_SOCKET, SO_RCVBUF, [75000], 4)
 

>  I think that explains why
> the window size advertised by the client was much larger than
> expected.  I see that the kernel does not want to shrink the
> previously advertised window without advancement in the sequence
> space.  So my guess is that the client runs out of buffer and starts
> dropping packets.  Not sure how to further debug this from userspace
> (systemtap? bpf?) - any tips on that front would be appreciated.


You could provide a packet capture (tcpdump) for a start ;)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: TCP connection closed without FIN or RST
  2017-11-08 17:29                                     ` Eric Dumazet
@ 2017-11-08 17:58                                       ` Vitaly Davidovich
  0 siblings, 0 replies; 20+ messages in thread
From: Vitaly Davidovich @ 2017-11-08 17:58 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On Wed, Nov 8, 2017 at 12:29 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Please do not top post on netdev.
Right - apologies for that.
>
> On Wed, 2017-11-08 at 11:04 -0500, Vitaly Davidovich wrote:
>> So this issue is somehow related to setting SO_RCVBUF *after*
>> connecting the socket (from the client).  The system is configured
>> such that the default rcvbuf size is 1MB, but the code was shrinking
>> this down to 75Kb right after connect().
>
> What are you calling default rcvbuf size exactly ?
>
> Is the application doing
>
>  s = socket(...);
>  ...
>  setsockopt(s, SOL_SOCKET, SO_RCVBUF, [1000000], 4)
>  ...
>  connect(s, ...)
>  setsockopt(s, SOL_SOCKET, SO_RCVBUF, [75000], 4)
>
Yes, sort of.  The application (Java, but nothing fancy here) does
essentially the following:
s = socket(...);
// no explicit setting of SO_RCVBUF size, but the system default
should be picked up (1MB as tcp_rmem shows)
connect(s, ...);
// now it goes and sets it
setsockopt(s, SOL_SOCKET, SO_RCVBUF, 75000, ...);
// then it goes to sleep for 15 mins
sleep(...)

The client machine has /proc/sys/net/ipv4/tcp_rmem: 131072
1048576     20971520

>
>>  I think that explains why
>> the window size advertised by the client was much larger than
>> expected.  I see that the kernel does not want to shrink the
>> previously advertised window without advancement in the sequence
>> space.  So my guess is that the client runs out of buffer and starts
>> dropping packets.  Not sure how to further debug this from userspace
>> (systemtap? bpf?) - any tips on that front would be appreciated.
>
>
> You could provide a packet capture (tcpdump) for a start ;)
I might be able to share that (this is from a private network).  In
the meantime, if there's something specific I should look at there,
I'd be happy to do that and report back.  I understand that's not
ideal, but it would be faster/easier.  My own observation is that the
client's last ACK has a window size of >300KB, which I'm pretty sure
it doesn't have room for if the rcvbuf was shrunk after the
setsockopt() set it to 75000 (I understand the kernel actually
reserves more than that, but even if it's double, that's still far
less than room for 300KB.

Needless to say, if I move the setsockopt(s, SOL_SOCKET, SO_RCVBUF,
75000, ...) prior to connect(s, ...), then everything works fine - we
hit a "persist" state, and there's zero window alert and probing by
the server.  I've tried a few other buffer sizes, including smallish
ones like 4KB and 8KB, and they all work (no real surprise there, but
was more of sanity checking).

The fact that SO_RCVBUF is set after connect() is a bug in the code -
no doubt about it.  However, I'm surprised it wedges the stack like
this.  Another interesting bit is that if the client isn't put to
sleep but allowed to read the bytes as they come in, then everything
works fine as well.  So it's not like the stack is broken outright - I
need to put the client to sleep to hit this (but it reproduces 100% of
the time thus far).

Thanks Eric
>
>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2017-11-08 17:58 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-01 20:25 TCP connection closed without FIN or RST Vitaly Davidovich
2017-11-01 20:34 ` Eric Dumazet
2017-11-01 20:51   ` Eric Dumazet
     [not found]     ` <CAHjP37GkjJyY_6GsVpiZugp+DOKbA8bV2a77iKrAnxrn80Q9Rw@mail.gmail.com>
2017-11-01 22:02       ` Eric Dumazet
     [not found]         ` <CAHjP37FyGBmrEi7peAsHBfU=-BzrAafb42RMyjcOoieDsN0vrg@mail.gmail.com>
2017-11-01 23:06           ` Eric Dumazet
2017-11-03 12:41             ` Vitaly Davidovich
2017-11-03 13:00               ` Eric Dumazet
2017-11-03 13:02                 ` Eric Dumazet
2017-11-03 13:39                   ` Vitaly Davidovich
2017-11-03 14:02                     ` Vitaly Davidovich
2017-11-03 15:13                       ` Vitaly Davidovich
2017-11-03 16:05                         ` Eric Dumazet
2017-11-03 17:23                           ` Vitaly Davidovich
2017-11-03 17:58                             ` Eric Dumazet
2017-11-03 18:28                               ` Vitaly Davidovich
2017-11-03 21:33                                 ` Eric Dumazet
2017-11-08 16:04                                   ` Vitaly Davidovich
2017-11-08 17:29                                     ` Eric Dumazet
2017-11-08 17:58                                       ` Vitaly Davidovich
2017-11-03 13:38                 ` Vitaly Davidovich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.