All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vitaly Davidovich <vitalyd@gmail.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: netdev <netdev@vger.kernel.org>
Subject: Re: TCP connection closed without FIN or RST
Date: Wed, 8 Nov 2017 12:58:52 -0500	[thread overview]
Message-ID: <CAHjP37EMm0Ov8J-e2dEnK_hC6w-=BVouLNnjtMQR1XYm3xYntw@mail.gmail.com> (raw)
In-Reply-To: <1510162142.2849.106.camel@edumazet-glaptop3.roam.corp.google.com>

On Wed, Nov 8, 2017 at 12:29 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Please do not top post on netdev.
Right - apologies for that.
>
> On Wed, 2017-11-08 at 11:04 -0500, Vitaly Davidovich wrote:
>> So this issue is somehow related to setting SO_RCVBUF *after*
>> connecting the socket (from the client).  The system is configured
>> such that the default rcvbuf size is 1MB, but the code was shrinking
>> this down to 75Kb right after connect().
>
> What are you calling default rcvbuf size exactly ?
>
> Is the application doing
>
>  s = socket(...);
>  ...
>  setsockopt(s, SOL_SOCKET, SO_RCVBUF, [1000000], 4)
>  ...
>  connect(s, ...)
>  setsockopt(s, SOL_SOCKET, SO_RCVBUF, [75000], 4)
>
Yes, sort of.  The application (Java, but nothing fancy here) does
essentially the following:
s = socket(...);
// no explicit setting of SO_RCVBUF size, but the system default
should be picked up (1MB as tcp_rmem shows)
connect(s, ...);
// now it goes and sets it
setsockopt(s, SOL_SOCKET, SO_RCVBUF, 75000, ...);
// then it goes to sleep for 15 mins
sleep(...)

The client machine has /proc/sys/net/ipv4/tcp_rmem: 131072
1048576     20971520

>
>>  I think that explains why
>> the window size advertised by the client was much larger than
>> expected.  I see that the kernel does not want to shrink the
>> previously advertised window without advancement in the sequence
>> space.  So my guess is that the client runs out of buffer and starts
>> dropping packets.  Not sure how to further debug this from userspace
>> (systemtap? bpf?) - any tips on that front would be appreciated.
>
>
> You could provide a packet capture (tcpdump) for a start ;)
I might be able to share that (this is from a private network).  In
the meantime, if there's something specific I should look at there,
I'd be happy to do that and report back.  I understand that's not
ideal, but it would be faster/easier.  My own observation is that the
client's last ACK has a window size of >300KB, which I'm pretty sure
it doesn't have room for if the rcvbuf was shrunk after the
setsockopt() set it to 75000 (I understand the kernel actually
reserves more than that, but even if it's double, that's still far
less than room for 300KB.

Needless to say, if I move the setsockopt(s, SOL_SOCKET, SO_RCVBUF,
75000, ...) prior to connect(s, ...), then everything works fine - we
hit a "persist" state, and there's zero window alert and probing by
the server.  I've tried a few other buffer sizes, including smallish
ones like 4KB and 8KB, and they all work (no real surprise there, but
was more of sanity checking).

The fact that SO_RCVBUF is set after connect() is a bug in the code -
no doubt about it.  However, I'm surprised it wedges the stack like
this.  Another interesting bit is that if the client isn't put to
sleep but allowed to read the bytes as they come in, then everything
works fine as well.  So it's not like the stack is broken outright - I
need to put the client to sleep to hit this (but it reproduces 100% of
the time thus far).

Thanks Eric
>
>
>

  reply	other threads:[~2017-11-08 17:58 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-01 20:25 TCP connection closed without FIN or RST Vitaly Davidovich
2017-11-01 20:34 ` Eric Dumazet
2017-11-01 20:51   ` Eric Dumazet
     [not found]     ` <CAHjP37GkjJyY_6GsVpiZugp+DOKbA8bV2a77iKrAnxrn80Q9Rw@mail.gmail.com>
2017-11-01 22:02       ` Eric Dumazet
     [not found]         ` <CAHjP37FyGBmrEi7peAsHBfU=-BzrAafb42RMyjcOoieDsN0vrg@mail.gmail.com>
2017-11-01 23:06           ` Eric Dumazet
2017-11-03 12:41             ` Vitaly Davidovich
2017-11-03 13:00               ` Eric Dumazet
2017-11-03 13:02                 ` Eric Dumazet
2017-11-03 13:39                   ` Vitaly Davidovich
2017-11-03 14:02                     ` Vitaly Davidovich
2017-11-03 15:13                       ` Vitaly Davidovich
2017-11-03 16:05                         ` Eric Dumazet
2017-11-03 17:23                           ` Vitaly Davidovich
2017-11-03 17:58                             ` Eric Dumazet
2017-11-03 18:28                               ` Vitaly Davidovich
2017-11-03 21:33                                 ` Eric Dumazet
2017-11-08 16:04                                   ` Vitaly Davidovich
2017-11-08 17:29                                     ` Eric Dumazet
2017-11-08 17:58                                       ` Vitaly Davidovich [this message]
2017-11-03 13:38                 ` Vitaly Davidovich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHjP37EMm0Ov8J-e2dEnK_hC6w-=BVouLNnjtMQR1XYm3xYntw@mail.gmail.com' \
    --to=vitalyd@gmail.com \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.