linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Doubts about listen backlog and tcp_max_syn_backlog
@ 2013-01-22 16:10 Leandro Lucarella
  2013-01-22 16:45 ` Eric Dumazet
  0 siblings, 1 reply; 20+ messages in thread
From: Leandro Lucarella @ 2013-01-22 16:10 UTC (permalink / raw)
  To: netdev, linux-kernel

Hi, I'm having some problems with missing SYNs in a server with a high
rate of incoming connections and, even when far from understanding the
kernel,  I ended up looking at the kernel's source to try to understand
better what's going on, because some stuff doesn't make a lot of sense
to me.

The path I followed is this (line numbers for Linux 3.7):
net/socket.c[3]
    SYSCALL_DEFINE2(listen, int, fd, int, backlog)
        backlog is truncated to sysctl_somaxconn and
        sock->ops->listen(sock, backlog) is called, which I guess it
        calls to inet_listen().

net/ipv4/af_inet.c[4]
    int inet_listen(struct socket *sock, int backlog)
        the backlog is assigned to sk->sk_max_ack_backlog and
        inet_csk_listen_start(sk, backlog) is called (if the socket
        wans't already in TCP_LISTEN state)

net/ipv4/inet_connection_sock.c[5]
    int inet_csk_listen_start(struct sock *sk, const int nr_table_entries)
        reqsk_queue_alloc(&icsk->icsk_accept_queue, nr_table_entries) is
        called, which I guess it creates the actual queue

net/core/request_sock.c[6]
    int reqsk_queue_alloc(struct request_sock_queue *queue,
                          unsigned int nr_table_entries)
        nr_table_entries is first adjusted to satisfy:
        8 <= nr_table_entries <= sysctl_max_syn_backlog
        and then incremented by one and rounded up to the next power of
        2.

So here are a couple of questions:

1. What's the relation between the socket backlog and the queue created
   by reqsk_queue_alloc()? Because the backlog is only adjusted not to
   be grater than sysctl_somaxconn, but the queue size can be quite
   different.
2. The comment just above the definition of reqsk_queue_alloc() about
   sysctl_max_syn_backlog says "Maximum number of SYN_RECV sockets in
   queue per LISTEN socket.". But then nr_table_entries is not only
   rounded up to the next power of 2, is incremented by one before that,
   so a backlog of, for example, 128, would end up with 256 table
   entries even if sysctl_max_syn_backlog is 128.
3. Why is there a nr_table_entries + 1 at all in there? Looking at the
   commit that introduced this[1] I can't find any explanation and I've
   read some big projects are using backlogs of 511 because of this[2].
   (which BTW, ff the queue is really a hash table, looks like an awful
   idea).
4. I found some places sk->sk_ack_backlog is checked against
   sk->sk_max_ack_backlog to see if new requests should be dropped, but
   I also saw checks like inet_csk_reqsk_queue_young(sk) > 1 or
   inet_csk_reqsk_queue_is_full(sk), so I guess the queue is used too.


Thanks a lot.

[1] http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=commitdiff;h=72a3effaf633bcae9034b7e176bdbd78d64a71db
[2] http://blog.dubbelboer.com/2012/04/09/syn-cookies.html#a_reasonably_backlog_size
[3] http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=net/socket.c;h=2ca51c719ef984cdadef749008456cf7bd5e1ae4;hb=HEAD#l1544
[4] http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=net/ipv4/af_inet.c;h=24b384b7903ea7a59a11e7a4cbf06db996498924;hb=HEAD#l192
[5] http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=net/ipv4/inet_connection_sock.c;h=d0670f00d5243f95bec4536f60edf32fa2ded850;hb=HEAD#l729
[6] http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=net/core/request_sock.c;h=c31d9e8668c30346894adbf3be55eed4beeb1258;hb=HEAD#l23

-- 
Leandro Lucarella
sociomantic labs GmbH
http://www.sociomantic.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Doubts about listen backlog and tcp_max_syn_backlog
  2013-01-22 16:10 Doubts about listen backlog and tcp_max_syn_backlog Leandro Lucarella
@ 2013-01-22 16:45 ` Eric Dumazet
  2013-01-22 16:59   ` Leandro Lucarella
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2013-01-22 16:45 UTC (permalink / raw)
  To: Leandro Lucarella; +Cc: netdev, linux-kernel

On Tue, 2013-01-22 at 17:10 +0100, Leandro Lucarella wrote:
> Hi, I'm having some problems with missing SYNs in a server with a high
> rate of incoming connections and, even when far from understanding the
> kernel,  I ended up looking at the kernel's source to try to understand
> better what's going on, because some stuff doesn't make a lot of sense
> to me.
> 
> The path I followed is this (line numbers for Linux 3.7):
> net/socket.c[3]
>     SYSCALL_DEFINE2(listen, int, fd, int, backlog)
>         backlog is truncated to sysctl_somaxconn and
>         sock->ops->listen(sock, backlog) is called, which I guess it
>         calls to inet_listen().
> 
> net/ipv4/af_inet.c[4]
>     int inet_listen(struct socket *sock, int backlog)
>         the backlog is assigned to sk->sk_max_ack_backlog and
>         inet_csk_listen_start(sk, backlog) is called (if the socket
>         wans't already in TCP_LISTEN state)
> 
> net/ipv4/inet_connection_sock.c[5]
>     int inet_csk_listen_start(struct sock *sk, const int nr_table_entries)
>         reqsk_queue_alloc(&icsk->icsk_accept_queue, nr_table_entries) is
>         called, which I guess it creates the actual queue
> 
> net/core/request_sock.c[6]
>     int reqsk_queue_alloc(struct request_sock_queue *queue,
>                           unsigned int nr_table_entries)
>         nr_table_entries is first adjusted to satisfy:
>         8 <= nr_table_entries <= sysctl_max_syn_backlog
>         and then incremented by one and rounded up to the next power of
>         2.
> 
> So here are a couple of questions:
> 
> 1. What's the relation between the socket backlog and the queue created
>    by reqsk_queue_alloc()? Because the backlog is only adjusted not to
>    be grater than sysctl_somaxconn, but the queue size can be quite
>    different.
> 2. The comment just above the definition of reqsk_queue_alloc() about
>    sysctl_max_syn_backlog says "Maximum number of SYN_RECV sockets in
>    queue per LISTEN socket.". But then nr_table_entries is not only
>    rounded up to the next power of 2, is incremented by one before that,
>    so a backlog of, for example, 128, would end up with 256 table
>    entries even if sysctl_max_syn_backlog is 128.
> 3. Why is there a nr_table_entries + 1 at all in there? Looking at the
>    commit that introduced this[1] I can't find any explanation and I've
>    read some big projects are using backlogs of 511 because of this[2].
>    (which BTW, ff the queue is really a hash table, looks like an awful
>    idea).
> 4. I found some places sk->sk_ack_backlog is checked against
>    sk->sk_max_ack_backlog to see if new requests should be dropped, but
>    I also saw checks like inet_csk_reqsk_queue_young(sk) > 1 or
>    inet_csk_reqsk_queue_is_full(sk), so I guess the queue is used too.
> 
> 
> Thanks a lot.
> 
> [1] http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=commitdiff;h=72a3effaf633bcae9034b7e176bdbd78d64a71db
> [2] http://blog.dubbelboer.com/2012/04/09/syn-cookies.html#a_reasonably_backlog_size
> [3] http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=net/socket.c;h=2ca51c719ef984cdadef749008456cf7bd5e1ae4;hb=HEAD#l1544
> [4] http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=net/ipv4/af_inet.c;h=24b384b7903ea7a59a11e7a4cbf06db996498924;hb=HEAD#l192
> [5] http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=net/ipv4/inet_connection_sock.c;h=d0670f00d5243f95bec4536f60edf32fa2ded850;hb=HEAD#l729
> [6] http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=net/core/request_sock.c;h=c31d9e8668c30346894adbf3be55eed4beeb1258;hb=HEAD#l23
> 


What particular problem do you have ?

A serious rewrite of LISTEN code is needed, because the current
implementation doesn't scale :

The SYNACK retransmits are done by a single timer wheel, holding the
socket lock for too long. So increasing the backlog to 2^16 or 2^17 is
not really an option.

Hash table are nice, but if we have to scan them, holding a single lock,
they are not so nice.




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Doubts about listen backlog and tcp_max_syn_backlog
  2013-01-22 16:45 ` Eric Dumazet
@ 2013-01-22 16:59   ` Leandro Lucarella
  2013-01-22 17:13     ` Eric Dumazet
  0 siblings, 1 reply; 20+ messages in thread
From: Leandro Lucarella @ 2013-01-22 16:59 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-kernel

On Tue, Jan 22, 2013 at 08:45:42AM -0800, Eric Dumazet wrote:
> On Tue, 2013-01-22 at 17:10 +0100, Leandro Lucarella wrote:
> > Hi, I'm having some problems with missing SYNs in a server with a high
> > rate of incoming connections and, even when far from understanding the
> > kernel,  I ended up looking at the kernel's source to try to understand
> > better what's going on, because some stuff doesn't make a lot of sense
> > to me.
[snip]
> > 1. What's the relation between the socket backlog and the queue created
> >    by reqsk_queue_alloc()? Because the backlog is only adjusted not to
> >    be grater than sysctl_somaxconn, but the queue size can be quite
> >    different.
> > 2. The comment just above the definition of reqsk_queue_alloc() about
> >    sysctl_max_syn_backlog says "Maximum number of SYN_RECV sockets in
> >    queue per LISTEN socket.". But then nr_table_entries is not only
> >    rounded up to the next power of 2, is incremented by one before that,
> >    so a backlog of, for example, 128, would end up with 256 table
> >    entries even if sysctl_max_syn_backlog is 128.
> > 3. Why is there a nr_table_entries + 1 at all in there? Looking at the
> >    commit that introduced this[1] I can't find any explanation and I've
> >    read some big projects are using backlogs of 511 because of this[2].
> >    (which BTW, ff the queue is really a hash table, looks like an awful
> >    idea).
> > 4. I found some places sk->sk_ack_backlog is checked against
> >    sk->sk_max_ack_backlog to see if new requests should be dropped, but
> >    I also saw checks like inet_csk_reqsk_queue_young(sk) > 1 or
> >    inet_csk_reqsk_queue_is_full(sk), so I guess the queue is used too.
[snip]
> 
> What particular problem do you have ?

What I'm seeing are clients taking either useconds to connect, or 3
seconds, which suggest SYNs are getting lost, but the network doesn't
seem to be the problem. I'm still investigating this, so unfortunately
I'm not really sure.

> A serious rewrite of LISTEN code is needed, because the current
> implementation doesn't scale :
> 
> The SYNACK retransmits are done by a single timer wheel, holding the
> socket lock for too long. So increasing the backlog to 2^16 or 2^17 is
> not really an option.
> 
> Hash table are nice, but if we have to scan them, holding a single lock,
> they are not so nice.

So, the queue is really a hash table, then? So using any (2^n)-1 would
be a bad idea because when the backlog is next to full, the hash table
will be really slow? Is that why the + 1 is there? Is assuming everyone
will use a power of 2 an thus having a load factor of 0.5 at most?

-- 
Leandro Lucarella
Senior R&D Developer
-----------------------------------------------------------
sociomantic labs GmbH
Paul-Lincke-Ufer 39/40
10999 Berlin
DEUTSCHLAND
-----------------------------------------------------------
http://www.sociomantic.com
-----------------------------------------------------------
Fon:       +49 (0) 30 3087 4615
Fax:       +49 (0) 30 3087 4619
Mobile:    +49 (0)157 3636 7373
Skype:     llucarella
Twitter:   http://www.twitter.com/sociomantic
Facebook:  http://bit.ly/labsfacebook
-----------------------------------------------------------
sociomantic labs GmbH, Location: Berlin
Commercial Register - AG Charlottenburg: HRB 121302 B
VAT No. - USt-ID: DE 266262100
Managing Directors: Thomas Nicolai, Thomas Brandhoff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Doubts about listen backlog and tcp_max_syn_backlog
  2013-01-22 16:59   ` Leandro Lucarella
@ 2013-01-22 17:13     ` Eric Dumazet
  2013-01-22 18:17       ` Rick Jones
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2013-01-22 17:13 UTC (permalink / raw)
  To: Leandro Lucarella; +Cc: netdev, linux-kernel

On Tue, 2013-01-22 at 17:59 +0100, Leandro Lucarella wrote:

> What I'm seeing are clients taking either useconds to connect, or 3
> seconds, which suggest SYNs are getting lost, but the network doesn't
> seem to be the problem. I'm still investigating this, so unfortunately
> I'm not really sure.
> 

A SYN packet or a SYN-ACK packet can be lost in the network.

> > A serious rewrite of LISTEN code is needed, because the current
> > implementation doesn't scale :
> > 
> > The SYNACK retransmits are done by a single timer wheel, holding the
> > socket lock for too long. So increasing the backlog to 2^16 or 2^17 is
> > not really an option.
> > 
> > Hash table are nice, but if we have to scan them, holding a single lock,
> > they are not so nice.
> 
> So, the queue is really a hash table, then? So using any (2^n)-1 would
> be a bad idea because when the backlog is next to full, the hash table
> will be really slow? Is that why the + 1 is there? Is assuming everyone
> will use a power of 2 an thus having a load factor of 0.5 at most?
> 

The kind of hash tables we use are power of two.

The size of hash table has little effect, its automatic, to try to get
an average of one item per hash slot, or less. Even if we had 10 items
per slot, it would not be a big deal.

What is important is the backlog, and I guess you didn't increase it
properly. The somaxconn default is quite low (128)

# sysctl -w net/ipv4/tcp_max_syn_backlog=4096
net.ipv4.tcp_max_syn_backlog = 4096
# sysctl -w net.core.somaxconn=4096
net.core.somaxconn = 4096

Then make sure your server use a big enough listen(..., backlog)
parameter.




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Doubts about listen backlog and tcp_max_syn_backlog
  2013-01-22 17:13     ` Eric Dumazet
@ 2013-01-22 18:17       ` Rick Jones
  2013-01-22 18:42         ` Leandro Lucarella
  0 siblings, 1 reply; 20+ messages in thread
From: Rick Jones @ 2013-01-22 18:17 UTC (permalink / raw)
  To: Leandro Lucarella; +Cc: Eric Dumazet, netdev, linux-kernel


> What is important is the backlog, and I guess you didn't increase it
> properly. The somaxconn default is quite low (128)

Leandro -

If that is being overflowed, I believe you should be seeing something like:

     14 SYNs to LISTEN sockets dropped

in the output of netstat -s on the system on which the server 
application is running.

rick

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Doubts about listen backlog and tcp_max_syn_backlog
  2013-01-22 18:17       ` Rick Jones
@ 2013-01-22 18:42         ` Leandro Lucarella
  2013-01-22 22:01           ` Rick Jones
  0 siblings, 1 reply; 20+ messages in thread
From: Leandro Lucarella @ 2013-01-22 18:42 UTC (permalink / raw)
  To: Rick Jones; +Cc: Eric Dumazet, netdev, linux-kernel

On Tue, Jan 22, 2013 at 10:17:50AM -0800, Rick Jones wrote:
> >What is important is the backlog, and I guess you didn't increase it
> >properly. The somaxconn default is quite low (128)
> 
> Leandro -
> 
> If that is being overflowed, I believe you should be seeing something like:
> 
>     14 SYNs to LISTEN sockets dropped
> 
> in the output of netstat -s on the system on which the server
> application is running.

What is that value reporting exactly? Because we are using syncookies,
and AFAIK with that enabled, all SYNs are being replied, and what the
listen backlog is really limitting is the "completely  established
sockets waiting to be accepted", according to listen(2). What I don't
really know to be honest, is what a "completely established socket" is,
does it mean that the SYN,ACK was sent, or the ACK was received back?

Also, from the client side, when is the connect(2) call done? When the
SYN,ACK is received?

Thanks!

-- 
Leandro Lucarella
sociomantic labs GmbH
http://www.sociomantic.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Doubts about listen backlog and tcp_max_syn_backlog
  2013-01-22 18:42         ` Leandro Lucarella
@ 2013-01-22 22:01           ` Rick Jones
  2013-01-23 10:47             ` Leandro Lucarella
  0 siblings, 1 reply; 20+ messages in thread
From: Rick Jones @ 2013-01-22 22:01 UTC (permalink / raw)
  To: Leandro Lucarella; +Cc: Eric Dumazet, netdev, linux-kernel

On 01/22/2013 10:42 AM, Leandro Lucarella wrote:
> On Tue, Jan 22, 2013 at 10:17:50AM -0800, Rick Jones wrote:
>>> What is important is the backlog, and I guess you didn't increase it
>>> properly. The somaxconn default is quite low (128)
>>
>> Leandro -
>>
>> If that is being overflowed, I believe you should be seeing something like:
>>
>>      14 SYNs to LISTEN sockets dropped
>>
>> in the output of netstat -s on the system on which the server
>> application is running.
>
> What is that value reporting exactly?

Netstat is reporting the ListenDrops and/or ListenOverflows  which map 
to LINUX_MIB_LISTENDROPS and LINUX_MIB_LISTENOVERFLOWS.  Those get 
incremented in tcp_v4_syn_recv_sock() (and its v6 version etc)

        if (sk_acceptq_is_full(sk))
                 goto exit_overflow;

Will increment both overflows and drops, and drops will increment on its 
own in some additional cases.

> Because we are using syncookies, and AFAIK with that enabled, all
> SYNs are being replied, and what the listen backlog is really
> limitting is the "completely established sockets waiting to be
> accepted", according to listen(2). What I don't really know to be
> honest, is what a "completely established socket" is, does it mean
> that the SYN,ACK was sent, or the ACK was received back?

I have always thought it meant that the ACK of the SYN|ACK has been 
received.

SyncookiesSent SyncookiesRecv SyncookiesFailed also appear in 
/proc/net/netstat and presumably in netstat -s output.

> Also, from the client side, when is the connect(2) call done? When the
> SYN,ACK is received?

That would be my assumption.


In a previous message:

> What I'm seeing are clients taking either useconds to connect, or 3
> seconds, which suggest SYNs are getting lost, but the network doesn't
> seem to be the problem. I'm still investigating this, so unfortunately
> I'm not really sure.

I recently ran into something like that, which turned-out to be an issue 
with nf_conntrack and its table filling.

rick

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Doubts about listen backlog and tcp_max_syn_backlog
  2013-01-22 22:01           ` Rick Jones
@ 2013-01-23 10:47             ` Leandro Lucarella
  2013-01-23 19:28               ` Rick Jones
  2013-01-23 20:48               ` Vijay Subramanian
  0 siblings, 2 replies; 20+ messages in thread
From: Leandro Lucarella @ 2013-01-23 10:47 UTC (permalink / raw)
  To: Rick Jones; +Cc: Eric Dumazet, netdev, linux-kernel

On Tue, Jan 22, 2013 at 02:01:09PM -0800, Rick Jones wrote:
> >>If that is being overflowed, I believe you should be seeing something like:
> >>
> >>     14 SYNs to LISTEN sockets dropped
> >>
> >>in the output of netstat -s on the system on which the server
> >>application is running.
> >
> >What is that value reporting exactly?
> 
> Netstat is reporting the ListenDrops and/or ListenOverflows  which
> map to LINUX_MIB_LISTENDROPS and LINUX_MIB_LISTENOVERFLOWS.  Those
> get incremented in tcp_v4_syn_recv_sock() (and its v6 version etc)
> 
>        if (sk_acceptq_is_full(sk))
>                 goto exit_overflow;
> 
> Will increment both overflows and drops, and drops will increment on
> its own in some additional cases.
> 
> >Because we are using syncookies, and AFAIK with that enabled, all
> >SYNs are being replied, and what the listen backlog is really
> >limitting is the "completely established sockets waiting to be
> >accepted", according to listen(2). What I don't really know to be
> >honest, is what a "completely established socket" is, does it mean
> >that the SYN,ACK was sent, or the ACK was received back?
> 
> I have always thought it meant that the ACK of the SYN|ACK has been
> received.
> 
> SyncookiesSent SyncookiesRecv SyncookiesFailed also appear in
> /proc/net/netstat and presumably in netstat -s output.

Thanks for the info. I'm definitely dropping SYNs and sending cookies,
around 50/s. Is there any way to tell how many connections are queued in
a particular socket?

> >Also, from the client side, when is the connect(2) call done? When the
> >SYN,ACK is received?
> 
> That would be my assumption.

Then if syncookies are enabled, the time spent in connect() shouldn't be
bigger than 3 seconds even if SYNs are being "dropped" by listen, right?
(and I'm saying "dropped" because I assume if syncookies are enabled,
SYN,ACK replies are sent anyway, with a cookie, but they are not stored
in the queue/hash table).

> In a previous message:
> 
> >What I'm seeing are clients taking either useconds to connect, or 3
> >seconds, which suggest SYNs are getting lost, but the network doesn't
> >seem to be the problem. I'm still investigating this, so unfortunately
> >I'm not really sure.
> 
> I recently ran into something like that, which turned-out to be an
> issue with nf_conntrack and its table filling.

Doing a quick research about it, I found that when that happens I should
get a message about it in dmesg (like "kernel: nf_conntrack: table full,
dropping packet.") but I'm not getting any, so I guess that's not a
problem.

Thanks!

-- 
Leandro Lucarella
sociomantic labs GmbH
http://www.sociomantic.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Doubts about listen backlog and tcp_max_syn_backlog
  2013-01-23 10:47             ` Leandro Lucarella
@ 2013-01-23 19:28               ` Rick Jones
  2013-01-24 12:22                 ` Leandro Lucarella
  2013-01-23 20:48               ` Vijay Subramanian
  1 sibling, 1 reply; 20+ messages in thread
From: Rick Jones @ 2013-01-23 19:28 UTC (permalink / raw)
  To: Leandro Lucarella; +Cc: Eric Dumazet, netdev, linux-kernel

On 01/23/2013 02:47 AM, Leandro Lucarella wrote:
> Thanks for the info. I'm definitely dropping SYNs and sending cookies,
> around 50/s. Is there any way to tell how many connections are queued in
> a particular socket?

I am not familiar with one.  Doesn't mean there isn't one, only that I 
am not able to think of it.

> Then if syncookies are enabled, the time spent in connect() shouldn't be
> bigger than 3 seconds even if SYNs are being "dropped" by listen, right?

Do you mean if "ESTABLISHED" connections are dropped because the listen 
queue is full?  I don't think I would put that as "SYNs being dropped by 
listen" - too easy to confuse that with an actual dropping of a SYN segment.

But yes, I would not expect a connect() call to remain incomplete for 
any longer than it took to receive an SYN|ACK from the other end.  That 
would be 3 (,9, 21, etc...) seconds on a kernel with 3 seconds as the 
initial retransmission timeout.

rick

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Doubts about listen backlog and tcp_max_syn_backlog
  2013-01-23 10:47             ` Leandro Lucarella
  2013-01-23 19:28               ` Rick Jones
@ 2013-01-23 20:48               ` Vijay Subramanian
  1 sibling, 0 replies; 20+ messages in thread
From: Vijay Subramanian @ 2013-01-23 20:48 UTC (permalink / raw)
  To: Leandro Lucarella; +Cc: Rick Jones, Eric Dumazet, netdev, linux-kernel

On 23 January 2013 02:47, Leandro Lucarella
<leandro.lucarella@sociomantic.com> wrote:
> On Tue, Jan 22, 2013 at 02:01:09PM -0800, Rick Jones wrote:
>> >>If that is being overflowed, I believe you should be seeing something like:
>> >>
>> >>     14 SYNs to LISTEN sockets dropped
>> >>
>> >>in the output of netstat -s on the system on which the server
>> >>application is running.
>> >
>> >What is that value reporting exactly?
>>
>> Netstat is reporting the ListenDrops and/or ListenOverflows  which
>> map to LINUX_MIB_LISTENDROPS and LINUX_MIB_LISTENOVERFLOWS.  Those
>> get incremented in tcp_v4_syn_recv_sock() (and its v6 version etc)
>>
>>        if (sk_acceptq_is_full(sk))
>>                 goto exit_overflow;
>>
>> Will increment both overflows and drops, and drops will increment on
>> its own in some additional cases.

Note that tcp_v4_conn_request() can also drop SYNs directly (start of
TWHS) if acceptq is full and more than one young requests are queued
up
vim +1504 tcp_ipv4.c

       if (sk_acceptq_is_full(sk) && inet_csk_reqsk_queue_young(sk) > 1)
                goto drop;


These drops do not seem to be tracked by any MIB variable and so will
not show up in netstat
(Also, newer nstat is preferred to netstat ).
Maybe we need to track these drops too?

Vijay

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Doubts about listen backlog and tcp_max_syn_backlog
  2013-01-23 19:28               ` Rick Jones
@ 2013-01-24 12:22                 ` Leandro Lucarella
  2013-01-24 18:44                   ` Rick Jones
  0 siblings, 1 reply; 20+ messages in thread
From: Leandro Lucarella @ 2013-01-24 12:22 UTC (permalink / raw)
  To: Rick Jones; +Cc: Eric Dumazet, netdev, linux-kernel

On Wed, Jan 23, 2013 at 11:28:08AM -0800, Rick Jones wrote:
> >Then if syncookies are enabled, the time spent in connect() shouldn't be
> >bigger than 3 seconds even if SYNs are being "dropped" by listen, right?
> 
> Do you mean if "ESTABLISHED" connections are dropped because the
> listen queue is full?  I don't think I would put that as "SYNs being
> dropped by listen" - too easy to confuse that with an actual
> dropping of a SYN segment.

I was just kind of quoting the name given by netstat: "SYNs to LISTEN
sockets dropped" (for kernel 3.0, I noticed newer kernels don't have
this stat anymore, or the name was changed). I still don't know if we
are talking about the same thing.

> But yes, I would not expect a connect() call to remain incomplete
> for any longer than it took to receive an SYN|ACK from the other
> end.

So the only reason to experience these high times spent in connect()
should be because a SYN or SYN|ACK was actually loss in a lower layer,
like an error in the network device or a transmission error?

> That would be 3 (,9, 21, etc...) seconds on a kernel with 3
> seconds as the initial retransmission timeout.

Which can't be changed without recompiling, right?

Thanks!

-- 
Leandro Lucarella
sociomantic labs GmbH
http://www.sociomantic.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Doubts about listen backlog and tcp_max_syn_backlog
  2013-01-24 12:22                 ` Leandro Lucarella
@ 2013-01-24 18:44                   ` Rick Jones
  2013-01-24 19:21                     ` Leandro Lucarella
  0 siblings, 1 reply; 20+ messages in thread
From: Rick Jones @ 2013-01-24 18:44 UTC (permalink / raw)
  To: Leandro Lucarella; +Cc: Eric Dumazet, netdev, linux-kernel

On 01/24/2013 04:22 AM, Leandro Lucarella wrote:
> On Wed, Jan 23, 2013 at 11:28:08AM -0800, Rick Jones wrote:
>>> Then if syncookies are enabled, the time spent in connect() shouldn't be
>>> bigger than 3 seconds even if SYNs are being "dropped" by listen, right?
>>
>> Do you mean if "ESTABLISHED" connections are dropped because the
>> listen queue is full?  I don't think I would put that as "SYNs being
>> dropped by listen" - too easy to confuse that with an actual
>> dropping of a SYN segment.
>
> I was just kind of quoting the name given by netstat: "SYNs to LISTEN
> sockets dropped" (for kernel 3.0, I noticed newer kernels don't have
> this stat anymore, or the name was changed). I still don't know if we
> are talking about the same thing.

Are you sure those stats are not present in 3.X kernels?  I just looked 
at /proc/net/netstat on a 3.7 system and noticed both the ListenMumble 
stats and the three cookie stats.  And I see the code for them in the tree:

aj@tardy:~/net-next/net/ipv4$ grep MIB_LISTEN *.c
proc.c:	SNMP_MIB_ITEM("ListenOverflows", LINUX_MIB_LISTENOVERFLOWS),
proc.c:	SNMP_MIB_ITEM("ListenDrops", LINUX_MIB_LISTENDROPS),
tcp_ipv4.c:	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENOVERFLOWS);
tcp_ipv4.c:	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS);

raj@tardy:~/net-next/net/ipv4$ grep MIB_SYN *.c
proc.c:	SNMP_MIB_ITEM("SyncookiesSent", LINUX_MIB_SYNCOOKIESSENT),
proc.c:	SNMP_MIB_ITEM("SyncookiesRecv", LINUX_MIB_SYNCOOKIESRECV),
proc.c:	SNMP_MIB_ITEM("SyncookiesFailed", LINUX_MIB_SYNCOOKIESFAILED),
syncookies.c:	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESSENT);
syncookies.c:		NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESFAILED);
syncookies.c:	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESRECV);


I will sometimes be tripped-up by netstat's not showing a statistic with 
a zero value...

>> But yes, I would not expect a connect() call to remain incomplete
>> for any longer than it took to receive an SYN|ACK from the other
>> end.
>
> So the only reason to experience these high times spent in connect()
> should be because a SYN or SYN|ACK was actually loss in a lower layer,
> like an error in the network device or a transmission error?

Modulo the/some other drop-without-stat point such as Vijay mentioned 
yesterday.

You might consider taking some packet traces.  If you can I would start 
with a trace taken on the system(s) on which the long connect() calls 
are happening.   I think the tcpdump manpage has an example of a tcpdump 
command with a filter expression that catches just SYNchronize and 
FINished segments which I suppose you could extend to include ReSeT 
segments.  Such a filter expression would be missing the client's ACK of 
the SYN|ACK but unless you see incrementing stats relating to say 
checksum failures or other drops on the "client" side I suppose you 
could assume that the client ACKed the server's SYN|ACK.

>> That would be 3 (,9, 21, etc...) seconds on a kernel with 3
>> seconds as the initial retransmission timeout.
>
> Which can't be changed without recompiling, right?

To the best of my knowledge.

rick jones

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Doubts about listen backlog and tcp_max_syn_backlog
  2013-01-24 18:44                   ` Rick Jones
@ 2013-01-24 19:21                     ` Leandro Lucarella
  2013-01-25  6:12                       ` Nivedita SInghvi
  0 siblings, 1 reply; 20+ messages in thread
From: Leandro Lucarella @ 2013-01-24 19:21 UTC (permalink / raw)
  To: Rick Jones; +Cc: Eric Dumazet, netdev, linux-kernel

On Thu, Jan 24, 2013 at 10:44:32AM -0800, Rick Jones wrote:
> On 01/24/2013 04:22 AM, Leandro Lucarella wrote:
> >On Wed, Jan 23, 2013 at 11:28:08AM -0800, Rick Jones wrote:
> >>>Then if syncookies are enabled, the time spent in connect() shouldn't be
> >>>bigger than 3 seconds even if SYNs are being "dropped" by listen, right?
> >>
> >>Do you mean if "ESTABLISHED" connections are dropped because the
> >>listen queue is full?  I don't think I would put that as "SYNs being
> >>dropped by listen" - too easy to confuse that with an actual
> >>dropping of a SYN segment.
> >
> >I was just kind of quoting the name given by netstat: "SYNs to LISTEN
> >sockets dropped" (for kernel 3.0, I noticed newer kernels don't have
> >this stat anymore, or the name was changed). I still don't know if we
> >are talking about the same thing.
> 
> Are you sure those stats are not present in 3.X kernels?  I just
> looked at /proc/net/netstat on a 3.7 system and noticed both the
> ListenMumble stats and the three cookie stats.  And I see the code
> for them in the tree:
> 
> aj@tardy:~/net-next/net/ipv4$ grep MIB_LISTEN *.c
> proc.c:	SNMP_MIB_ITEM("ListenOverflows", LINUX_MIB_LISTENOVERFLOWS),
> proc.c:	SNMP_MIB_ITEM("ListenDrops", LINUX_MIB_LISTENDROPS),
> tcp_ipv4.c:	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENOVERFLOWS);
> tcp_ipv4.c:	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS);
> 
> raj@tardy:~/net-next/net/ipv4$ grep MIB_SYN *.c
> proc.c:	SNMP_MIB_ITEM("SyncookiesSent", LINUX_MIB_SYNCOOKIESSENT),
> proc.c:	SNMP_MIB_ITEM("SyncookiesRecv", LINUX_MIB_SYNCOOKIESRECV),
> proc.c:	SNMP_MIB_ITEM("SyncookiesFailed", LINUX_MIB_SYNCOOKIESFAILED),
> syncookies.c:	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESSENT);
> syncookies.c:		NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESFAILED);
> syncookies.c:	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESRECV);
> 
> I will sometimes be tripped-up by netstat's not showing a statistic
> with a zero value...

This is what I'm talking about:

pc1 $ uname -a
Linux labs09 3.5.0-18-generic #29~precise1-Ubuntu SMP Mon Oct 22 16:31:46 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
pc1 $ netstat --version | head -n2
net-tools 1.60
netstat 1.42 (2001-04-15)
pc1 $ netstat -s | grep -i syn
    4 invalid SYN cookies received

pc2 $ uname -a
Linux eu-21 3.0.0-19-server #33-Ubuntu SMP Thu Apr 19 20:32:48 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
pc2 $ netstat --version | head -n2
net-tools 1.60
netstat 1.42 (2001-04-15)
pc2 $ netstat -s | grep -i syn
    1996450 SYN cookies sent
    2899079 SYN cookies received
    410573 invalid SYN cookies received
    10012473 resets received for embryonic SYN_RECV sockets
    5659740 SYNs to LISTEN sockets dropped
    1 connections reset due to unexpected SYN

I didn't take a look at the kernel or netstat sources about this, so I
don't know exactly how are they connected.

> >>But yes, I would not expect a connect() call to remain incomplete
> >>for any longer than it took to receive an SYN|ACK from the other
> >>end.
> >
> >So the only reason to experience these high times spent in connect()
> >should be because a SYN or SYN|ACK was actually loss in a lower layer,
> >like an error in the network device or a transmission error?
> 
> Modulo the/some other drop-without-stat point such as Vijay
> mentioned yesterday.

So, in this cases a syncookie is not sent back? I had the impression
they were sent always...

> You might consider taking some packet traces.  If you can I would
> start with a trace taken on the system(s) on which the long
> connect() calls are happening.   I think the tcpdump manpage has an
> example of a tcpdump command with a filter expression that catches
> just SYNchronize and FINished segments which I suppose you could
> extend to include ReSeT segments.  Such a filter expression would be
> missing the client's ACK of the SYN|ACK but unless you see
> incrementing stats relating to say checksum failures or other drops
> on the "client" side I suppose you could assume that the client
> ACKed the server's SYN|ACK.

Yes, I already did captures and we are definitely loosing packets
(including SYNs), but it looks like the amount of SYNs I'm loosing is
lower than the amount of long connect() times I observe. This is not
confirmed yet, I'm still investigating.

Thanks!

-- 
Leandro Lucarella
sociomantic labs GmbH
http://www.sociomantic.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Doubts about listen backlog and tcp_max_syn_backlog
  2013-01-24 19:21                     ` Leandro Lucarella
@ 2013-01-25  6:12                       ` Nivedita SInghvi
  2013-01-25 10:05                         ` Leandro Lucarella
  0 siblings, 1 reply; 20+ messages in thread
From: Nivedita SInghvi @ 2013-01-25  6:12 UTC (permalink / raw)
  To: Leandro Lucarella; +Cc: Rick Jones, Eric Dumazet, netdev, linux-kernel

On 01/24/2013 11:21 AM, Leandro Lucarella wrote:
> On Thu, Jan 24, 2013 at 10:44:32AM -0800, Rick Jones wrote:
>> On 01/24/2013 04:22 AM, Leandro Lucarella wrote:
>>> On Wed, Jan 23, 2013 at 11:28:08AM -0800, Rick Jones wrote:
>>>>> Then if syncookies are enabled, the time spent in connect() shouldn't be
>>>>> bigger than 3 seconds even if SYNs are being "dropped" by listen, right?
>>>>
>>>> Do you mean if "ESTABLISHED" connections are dropped because the
>>>> listen queue is full?  I don't think I would put that as "SYNs being
>>>> dropped by listen" - too easy to confuse that with an actual
>>>> dropping of a SYN segment.
>>>
>>> I was just kind of quoting the name given by netstat: "SYNs to LISTEN
>>> sockets dropped" (for kernel 3.0, I noticed newer kernels don't have
>>> this stat anymore, or the name was changed). I still don't know if we
>>> are talking about the same thing.
>>
[snip]
>> I will sometimes be tripped-up by netstat's not showing a statistic
>> with a zero value...

Leandro, you should be able to do an nstat -z, it will print all counters even if zero. You should see something like so:

ipv4]> nstat -z
#kernel
IpInReceives                    2135               0.0
IpInHdrErrors                   0                  0.0
IpInAddrErrors                  202                0.0
...

You might want to take a look at those (your pkts may not even be making it to tcp) and these in particular:

TcpExtSyncookiesSent            0                  0.0
TcpExtSyncookiesRecv            0                  0.0
TcpExtSyncookiesFailed          0                  0.0
TcpExtListenOverflows           0                  0.0
TcpExtListenDrops               0                  0.0
TcpExtTCPBacklogDrop            0                  0.0
TcpExtTCPMinTTLDrop             0                  0.0
TcpExtTCPDeferAcceptDrop        0                  0.0

If you don't have nstat on that version for some reason, download the latest iproute pkg. Looking at the counter names is a lot more helpful and precise than the netstat converstion to human consumption. 


> Yes, I already did captures and we are definitely loosing packets
> (including SYNs), but it looks like the amount of SYNs I'm loosing is
> lower than the amount of long connect() times I observe. This is not
> confirmed yet, I'm still investigating.

Where did you narrow down the drop to? There are quite a few places in the networking stack we silently drop packets (such as the one pointed out earlier in this thread), although they should almost all be extremely low probability/NEVER type events. Do you want a patch to gap the most likely scenario? (I'll post that to netdev separately). 

thanks,
Nivedita


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Doubts about listen backlog and tcp_max_syn_backlog
  2013-01-25  6:12                       ` Nivedita SInghvi
@ 2013-01-25 10:05                         ` Leandro Lucarella
  2013-01-28  2:48                           ` Nivedita Singhvi
  2013-01-28  2:49                           ` Nivedita Singhvi
  0 siblings, 2 replies; 20+ messages in thread
From: Leandro Lucarella @ 2013-01-25 10:05 UTC (permalink / raw)
  To: Nivedita SInghvi; +Cc: Rick Jones, Eric Dumazet, netdev, linux-kernel

On Thu, Jan 24, 2013 at 10:12:46PM -0800, Nivedita SInghvi wrote:
> >>> I was just kind of quoting the name given by netstat: "SYNs to LISTEN
> >>> sockets dropped" (for kernel 3.0, I noticed newer kernels don't have
> >>> this stat anymore, or the name was changed). I still don't know if we
> >>> are talking about the same thing.
> >>
> [snip]
> >> I will sometimes be tripped-up by netstat's not showing a statistic
> >> with a zero value...
> 
> Leandro, you should be able to do an nstat -z, it will print all
> counters even if zero. You should see something like so:
> 
> ipv4]> nstat -z
> #kernel
> IpInReceives                    2135               0.0
> IpInHdrErrors                   0                  0.0
> IpInAddrErrors                  202                0.0
> ...
> 
> You might want to take a look at those (your pkts may not even be
> making it to tcp) and these in particular:
> 
> TcpExtSyncookiesSent            0                  0.0
> TcpExtSyncookiesRecv            0                  0.0
> TcpExtSyncookiesFailed          0                  0.0
> TcpExtListenOverflows           0                  0.0
> TcpExtListenDrops               0                  0.0
> TcpExtTCPBacklogDrop            0                  0.0
> TcpExtTCPMinTTLDrop             0                  0.0
> TcpExtTCPDeferAcceptDrop        0                  0.0
> 
> If you don't have nstat on that version for some reason, download the
> latest iproute pkg. Looking at the counter names is a lot more helpful
> and precise than the netstat converstion to human consumption. 

Thanks, but what about this?

pc2 $ nstat -z | grep -i drop
TcpExtLockDroppedIcmps          0                  0.0
TcpExtListenDrops               0                  0.0
TcpExtTCPPrequeueDropped        0                  0.0
TcpExtTCPBacklogDrop            0                  0.0
TcpExtTCPMinTTLDrop             0                  0.0
TcpExtTCPDeferAcceptDrop        0                  0.0
pc2 $ netstat -s | grep -i drop
    470 outgoing packets dropped
    5659740 SYNs to LISTEN sockets dropped

Is this normal?

> > Yes, I already did captures and we are definitely loosing packets
> > (including SYNs), but it looks like the amount of SYNs I'm loosing is
> > lower than the amount of long connect() times I observe. This is not
> > confirmed yet, I'm still investigating.
> 
> Where did you narrow down the drop to? There are quite a few places in
> the networking stack we silently drop packets (such as the one pointed
> out earlier in this thread), although they should almost all be
> extremely low probability/NEVER type events. Do you want a patch to
> gap the most likely scenario? (I'll post that to netdev separately). 

Even when that would be awesome, unfortunately there is no way I could
get permission to run a patched kernel (or even restart the servers for
that matter).

And I don't know how could I narrow down the drops in any way. What I
know is capturing traffic with tcpdump, I see some packets leaving one
server but never arriving to the new one.

Also, the hardware is not great either, I'm not sure is not responsible
for the loss. There are some errors reported by ethtool, but I don't
know exactly what they mean:

# ethtool -S eth0
NIC statistics:
     tx_packets: 336978308273
     rx_packets: 384108075585
     tx_errors: 0
     rx_errors: 194
     rx_missed: 1119
     align_errors: 31731
     tx_single_collisions: 0
     tx_multi_collisions: 0
     unicast: 384108023754
     broadcast: 51825
     multicast: 6
     tx_aborted: 0
     tx_underrun: 0

Thanks!

-- 
Leandro Lucarella
sociomantic labs GmbH
http://www.sociomantic.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Doubts about listen backlog and tcp_max_syn_backlog
  2013-01-25 10:05                         ` Leandro Lucarella
@ 2013-01-28  2:48                           ` Nivedita Singhvi
  2013-01-28  5:21                             ` Vijay Subramanian
  2013-01-28 13:08                             ` Leandro Lucarella
  2013-01-28  2:49                           ` Nivedita Singhvi
  1 sibling, 2 replies; 20+ messages in thread
From: Nivedita Singhvi @ 2013-01-28  2:48 UTC (permalink / raw)
  To: Leandro Lucarella; +Cc: Rick Jones, Eric Dumazet, netdev, linux-kernel

On 01/25/2013 02:05 AM, Leandro Lucarella wrote:
> On Thu, Jan 24, 2013 at 10:12:46PM -0800, Nivedita SInghvi wrote:
>>>>> I was just kind of quoting the name given by netstat: "SYNs to LISTEN
>>>>> sockets dropped" (for kernel 3.0, I noticed newer kernels don't have
>>>>> this stat anymore, or the name was changed). I still don't know if we
>>>>> are talking about the same thing.
>>>>
>> [snip]
>>>> I will sometimes be tripped-up by netstat's not showing a statistic
>>>> with a zero value...
>>
>> Leandro, you should be able to do an nstat -z, it will print all
>> counters even if zero. You should see something like so:
>>
>> ipv4]> nstat -z
>> #kernel
>> IpInReceives                    2135               0.0
>> IpInHdrErrors                   0                  0.0
>> IpInAddrErrors                  202                0.0
>> ...
>>
>> You might want to take a look at those (your pkts may not even be
>> making it to tcp) and these in particular:
>>
>> TcpExtSyncookiesSent            0                  0.0
>> TcpExtSyncookiesRecv            0                  0.0
>> TcpExtSyncookiesFailed          0                  0.0
>> TcpExtListenOverflows           0                  0.0
>> TcpExtListenDrops               0                  0.0
>> TcpExtTCPBacklogDrop            0                  0.0
>> TcpExtTCPMinTTLDrop             0                  0.0
>> TcpExtTCPDeferAcceptDrop        0                  0.0
>>
>> If you don't have nstat on that version for some reason, download the
>> latest iproute pkg. Looking at the counter names is a lot more helpful
>> and precise than the netstat converstion to human consumption. 
> 
> Thanks, but what about this?
> 
> pc2 $ nstat -z | grep -i drop
> TcpExtLockDroppedIcmps          0                  0.0
> TcpExtListenDrops               0                  0.0
> TcpExtTCPPrequeueDropped        0                  0.0
> TcpExtTCPBacklogDrop            0                  0.0
> TcpExtTCPMinTTLDrop             0                  0.0
> TcpExtTCPDeferAcceptDrop        0                  0.0

That seems bogus. 


> pc2 $ netstat -s | grep -i drop
>     470 outgoing packets dropped
>     5659740 SYNs to LISTEN sockets dropped
> 
> Is this normal?

That's a lot ofconnect requests dropped, but it depends on how 
long you've been up and how much traffic you've seen. 

Hmm...you were on an older Ubuntu, right? The netstat source 
was patched to translate it as follows:

+    { "ListenDrops", N_("%u SYNs to LISTEN sockets dropped"), opt_number },

(see the file debian/patches/CVS-20081003-statistics.c_sync.patch 
 in the net-tools src)

i.e., the netstat pkg is printing the value of the TCPEXT MIB counter
that's counting TCPExtListenDrops. 

Theoretically, that number should be the same as that printed by nstat,
as they are getting it from the same kernel stats counter. I have not
looked at nstat code (I actually almost always dump the counters from
/proc/net/{netstat + snmp} via a simple prettyprint script (will send
you that offline).  

If the nstat and netstat counters don't match, something is fishy.
That nstat output is broken.  

>>> Yes, I already did captures and we are definitely loosing packets
>>> (including SYNs), but it looks like the amount of SYNs I'm loosing is
>>> lower than the amount of long connect() times I observe. This is not
>>> confirmed yet, I'm still investigating.
>>
>> Where did you narrow down the drop to? There are quite a few places in
>> the networking stack we silently drop packets (such as the one pointed
>> out earlier in this thread), although they should almost all be
>> extremely low probability/NEVER type events. Do you want a patch to
>> gap the most likely scenario? (I'll post that to netdev separately). 
> 
> Even when that would be awesome, unfortunately there is no way I could
> get permission to run a patched kernel (or even restart the servers for
> that matter).
> 
> And I don't know how could I narrow down the drops in any way. What I
> know is capturing traffic with tcpdump, I see some packets leaving one
> server but never arriving to the new one.

Hmm..do you have a switch between your two end points dropping pkts? 
Could be.. Basically, by looking at the statistics kept by each layer, you 
should be able to narrow it down a little bit at least. 

It does still sound like some drops are occurring in TCP due to accept 
backlog being full and you're overrunning TCP incoming processing (or 
at least this contributing), going by that ListenDrops count. 

> Also, the hardware is not great either, I'm not sure is not responsible
> for the loss. There are some errors reported by ethtool, but I don't
> know exactly what they mean:
> 
> # ethtool -S eth0
> NIC statistics:
>      tx_packets: 336978308273
>      rx_packets: 384108075585
>      tx_errors: 0
>      rx_errors: 194
>      rx_missed: 1119
>      align_errors: 31731
>      tx_single_collisions: 0
>      tx_multi_collisions: 0
>      unicast: 384108023754
>      broadcast: 51825
>      multicast: 6
>      tx_aborted: 0
>      tx_underrun: 0
> 
> Thanks!
> 

You aren't suffering a lot of packet loss at the NIC.  

Sorry, I'm on the road, travelling, and likely not online much this week. 


thanks,
Nivedita


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Doubts about listen backlog and tcp_max_syn_backlog
  2013-01-25 10:05                         ` Leandro Lucarella
  2013-01-28  2:48                           ` Nivedita Singhvi
@ 2013-01-28  2:49                           ` Nivedita Singhvi
  1 sibling, 0 replies; 20+ messages in thread
From: Nivedita Singhvi @ 2013-01-28  2:49 UTC (permalink / raw)
  To: niv; +Cc: linux-kernel

On 01/25/2013 02:05 AM, Leandro Lucarella wrote:
> On Thu, Jan 24, 2013 at 10:12:46PM -0800, Nivedita SInghvi wrote:
>>>>> I was just kind of quoting the name given by netstat: "SYNs to LISTEN
>>>>> sockets dropped" (for kernel 3.0, I noticed newer kernels don't have
>>>>> this stat anymore, or the name was changed). I still don't know if we
>>>>> are talking about the same thing.
>>>>
>> [snip]
>>>> I will sometimes be tripped-up by netstat's not showing a statistic
>>>> with a zero value...
>>
>> Leandro, you should be able to do an nstat -z, it will print all
>> counters even if zero. You should see something like so:
>>
>> ipv4]> nstat -z
>> #kernel
>> IpInReceives                    2135               0.0
>> IpInHdrErrors                   0                  0.0
>> IpInAddrErrors                  202                0.0
>> ...
>>
>> You might want to take a look at those (your pkts may not even be
>> making it to tcp) and these in particular:
>>
>> TcpExtSyncookiesSent            0                  0.0
>> TcpExtSyncookiesRecv            0                  0.0
>> TcpExtSyncookiesFailed          0                  0.0
>> TcpExtListenOverflows           0                  0.0
>> TcpExtListenDrops               0                  0.0
>> TcpExtTCPBacklogDrop            0                  0.0
>> TcpExtTCPMinTTLDrop             0                  0.0
>> TcpExtTCPDeferAcceptDrop        0                  0.0
>>
>> If you don't have nstat on that version for some reason, download the
>> latest iproute pkg. Looking at the counter names is a lot more helpful
>> and precise than the netstat converstion to human consumption. 
> 
> Thanks, but what about this?
> 
> pc2 $ nstat -z | grep -i drop
> TcpExtLockDroppedIcmps          0                  0.0
> TcpExtListenDrops               0                  0.0
> TcpExtTCPPrequeueDropped        0                  0.0
> TcpExtTCPBacklogDrop            0                  0.0
> TcpExtTCPMinTTLDrop             0                  0.0
> TcpExtTCPDeferAcceptDrop        0                  0.0

That seems bogus. 


> pc2 $ netstat -s | grep -i drop
>     470 outgoing packets dropped
>     5659740 SYNs to LISTEN sockets dropped
> 
> Is this normal?

That's a lot ofconnect requests dropped, but it depends on how 
long you've been up and how much traffic you've seen. 

Hmm...you were on an older Ubuntu, right? The netstat source 
was patched to translate it as follows:

+    { "ListenDrops", N_("%u SYNs to LISTEN sockets dropped"), opt_number },

(see the file debian/patches/CVS-20081003-statistics.c_sync.patch 
 in the net-tools src)

i.e., the netstat pkg is printing the value of the TCPEXT MIB counter
that's counting TCPExtListenDrops. 

Theoretically, that number should be the same as that printed by nstat,
as they are getting it from the same kernel stats counter. I have not
looked at nstat code (I actually almost always dump the counters from
/proc/net/{netstat + snmp} via a simple prettyprint script (will send
you that offline).  

If the nstat and netstat counters don't match, something is fishy.
That nstat output is broken.  

>>> Yes, I already did captures and we are definitely loosing packets
>>> (including SYNs), but it looks like the amount of SYNs I'm loosing is
>>> lower than the amount of long connect() times I observe. This is not
>>> confirmed yet, I'm still investigating.
>>
>> Where did you narrow down the drop to? There are quite a few places in
>> the networking stack we silently drop packets (such as the one pointed
>> out earlier in this thread), although they should almost all be
>> extremely low probability/NEVER type events. Do you want a patch to
>> gap the most likely scenario? (I'll post that to netdev separately). 
> 
> Even when that would be awesome, unfortunately there is no way I could
> get permission to run a patched kernel (or even restart the servers for
> that matter).
> 
> And I don't know how could I narrow down the drops in any way. What I
> know is capturing traffic with tcpdump, I see some packets leaving one
> server but never arriving to the new one.

Hmm..do you have a switch between your two end points dropping pkts? 
Could be.. Basically, by looking at the statistics kept by each layer, you 
should be able to narrow it down a little bit at least. 

It does still sound like some drops are occurring in TCP due to accept 
backlog being full and you're overrunning TCP incoming processing (or 
at least this contributing), going by that ListenDrops count. 

> Also, the hardware is not great either, I'm not sure is not responsible
> for the loss. There are some errors reported by ethtool, but I don't
> know exactly what they mean:
> 
> # ethtool -S eth0
> NIC statistics:
>      tx_packets: 336978308273
>      rx_packets: 384108075585
>      tx_errors: 0
>      rx_errors: 194
>      rx_missed: 1119
>      align_errors: 31731
>      tx_single_collisions: 0
>      tx_multi_collisions: 0
>      unicast: 384108023754
>      broadcast: 51825
>      multicast: 6
>      tx_aborted: 0
>      tx_underrun: 0
> 
> Thanks!
> 

You aren't suffering a lot of packet loss at the NIC.  

Sorry, I'm on the road, travelling, and likely not online much this week. 


thanks,
Nivedita


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Doubts about listen backlog and tcp_max_syn_backlog
  2013-01-28  2:48                           ` Nivedita Singhvi
@ 2013-01-28  5:21                             ` Vijay Subramanian
  2013-01-28 14:40                               ` Leandro Lucarella
  2013-01-28 13:08                             ` Leandro Lucarella
  1 sibling, 1 reply; 20+ messages in thread
From: Vijay Subramanian @ 2013-01-28  5:21 UTC (permalink / raw)
  To: Nivedita Singhvi
  Cc: Leandro Lucarella, Rick Jones, Eric Dumazet, netdev, linux-kernel

> +    { "ListenDrops", N_("%u SYNs to LISTEN sockets dropped"), opt_number },
>
> (see the file debian/patches/CVS-20081003-statistics.c_sync.patch
>  in the net-tools src)
>
> i.e., the netstat pkg is printing the value of the TCPEXT MIB counter
> that's counting TCPExtListenDrops.
>
> Theoretically, that number should be the same as that printed by nstat,
> as they are getting it from the same kernel stats counter. I have not
> looked at nstat code (I actually almost always dump the counters from
> /proc/net/{netstat + snmp} via a simple prettyprint script (will send
> you that offline).
>

nstat pretty much does what you describe which is to parse the
/proc/net files(s) and
print the contents. This is one advantage of nstat over netstat. When
you add a new MIB, you
do not need to update nstat.

> If the nstat and netstat counters don't match, something is fishy.
> That nstat output is broken.
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Doubts about listen backlog and tcp_max_syn_backlog
  2013-01-28  2:48                           ` Nivedita Singhvi
  2013-01-28  5:21                             ` Vijay Subramanian
@ 2013-01-28 13:08                             ` Leandro Lucarella
  1 sibling, 0 replies; 20+ messages in thread
From: Leandro Lucarella @ 2013-01-28 13:08 UTC (permalink / raw)
  To: Nivedita Singhvi; +Cc: Rick Jones, Eric Dumazet, netdev, linux-kernel

On Sun, Jan 27, 2013 at 06:48:18PM -0800, Nivedita Singhvi wrote:
[snip]
> > Thanks, but what about this?
> > 
> > pc2 $ nstat -z | grep -i drop
> > TcpExtLockDroppedIcmps          0                  0.0
> > TcpExtListenDrops               0                  0.0
> > TcpExtTCPPrequeueDropped        0                  0.0
> > TcpExtTCPBacklogDrop            0                  0.0
> > TcpExtTCPMinTTLDrop             0                  0.0
> > TcpExtTCPDeferAcceptDrop        0                  0.0
> 
> That seems bogus. 
> 
> 
> > pc2 $ netstat -s | grep -i drop
> >     470 outgoing packets dropped
> >     5659740 SYNs to LISTEN sockets dropped
> > 
> > Is this normal?
> 
> That's a lot ofconnect requests dropped, but it depends on how 
> long you've been up and how much traffic you've seen. 
> 
> Hmm...you were on an older Ubuntu, right? The netstat source 
> was patched to translate it as follows:
> 
> +    { "ListenDrops", N_("%u SYNs to LISTEN sockets dropped"), opt_number },
> 
> (see the file debian/patches/CVS-20081003-statistics.c_sync.patch 
>  in the net-tools src)

I have ubuntu 11.10 in all the servers I'm checking except for one with
12.04. Is weird, for the same ubuntu version (same kernel, same netstat,
same nstat) I get different outputs. Some have some counters that
coincide, some have more counters than other, some have different
counters than other...

pc121 # nstat -z | grep -i drop
TcpExtLockDroppedIcmps          0                  0.0
TcpExtListenDrops               0                  0.0
TcpExtTCPPrequeueDropped        0                  0.0
TcpExtTCPBacklogDrop            0                  0.0
TcpExtTCPMinTTLDrop             0                  0.0
TcpExtTCPDeferAcceptDrop        0                  0.0
pc121 # netstat -s | grep -i drop
    470 outgoing packets dropped
    5659762 SYNs to LISTEN sockets dropped

Other are like this:
pc126 # nstat -z | grep -i drop
TcpExtLockDroppedIcmps          0                  0.0
TcpExtListenDrops               2968982            0.0
TcpExtTCPPrequeueDropped        0                  0.0
TcpExtTCPBacklogDrop            0                  0.0
TcpExtTCPMinTTLDrop             0                  0.0
TcpExtTCPDeferAcceptDrop        0                  0.0
pc126 # netstat -s | grep -i drop
    2968982 SYNs to LISTEN sockets dropped

Other like this:
pc127 # nstat -z | grep -i drop
TcpExtLockDroppedIcmps          0                  0.0
TcpExtListenDrops               0                  0.0
TcpExtTCPPrequeueDropped        0                  0.0
TcpExtTCPBacklogDrop            0                  0.0
TcpExtTCPMinTTLDrop             0                  0.0
TcpExtTCPDeferAcceptDrop        0                  0.0
pc127 # netstat -s | grep -i drop
    1321958 SYNs to LISTEN sockets dropped


pc128 # nstat -z | grep -i drop
TcpExtLockDroppedIcmps          0                  0.0
TcpExtListenDrops               6455507            0.0
TcpExtTCPPrequeueDropped        0                  0.0
TcpExtTCPBacklogDrop            0                  0.0
TcpExtTCPMinTTLDrop             0                  0.0
TcpExtTCPDeferAcceptDrop        0                  0.0
pc128 # netstat -s | grep -i drop
    6 ICMP packets dropped because they were out-of-window
    6455507 SYNs to LISTEN sockets dropped

pc130 # nstat -z | grep -i drop
TcpExtLockDroppedIcmps          0                  0.0
TcpExtListenDrops               0                  0.0
TcpExtTCPPrequeueDropped        0                  0.0
TcpExtTCPBacklogDrop            0                  0.0
TcpExtTCPMinTTLDrop             0                  0.0
TcpExtTCPDeferAcceptDrop        0                  0.0
pc130 # netstat -s | grep -i drop
    3 ICMP packets dropped because they were out-of-window
    6728909 SYNs to LISTEN sockets dropped

And this is for the one with Ubuntu 12.04:
pc106 # nstat -z | grep -i drop
TcpExtLockDroppedIcmps          0                  0.0
TcpExtListenDrops               2598140            0.0
TcpExtTCPPrequeueDropped        0                  0.0
TcpExtTCPBacklogDrop            1711               0.0
TcpExtTCPMinTTLDrop             0                  0.0
TcpExtTCPDeferAcceptDrop        0                  0.0
TcpExtTCPReqQFullDrop           0                  0.0
pc106 # netstat -s | grep -i drop
    2598140 SYNs to LISTEN sockets dropped
    TCPBacklogDrop: 1711

Are this counters hardware-dependant? Or there anything else why they
might be different in different servers?

> i.e., the netstat pkg iS printing the value of the TCPEXT MIB counter
> that's counting TCPExtListenDrops. 

Then why nstat show that counter in 0 and netstat with what I assume is
the right value?

> Theoretically, that number should be the same as that printed by nstat,
> as they are getting it from the same kernel stats counter. I have not
> looked at nstat code (I actually almost always dump the counters from
> /proc/net/{netstat + snmp} via a simple prettyprint script (will send
> you that offline).  

Mmm, ok, thanks!

> If the nstat and netstat counters don't match, something is fishy.
> That nstat output is broken.  

I using the one from iproute package 20110315-1build1 (except for the
one with Ubuntu 12.04, which have 20111117-1ubuntu2). Any ideas on what
could be wrong?

> > And I don't know how could I narrow down the drops in any way. What I
> > know is capturing traffic with tcpdump, I see some packets leaving one
> > server but never arriving to the new one.

About this, tcpdump should get all the packets received by the NIC,
before the kernel have any chance to drop anything, right?

> Hmm..do you have a switch between your two end points dropping pkts? 

I have no idea, I assume there is because the server have only one NIC
and they are interconnected to several other servers, so there should be
something in the middle, but we have the servers offsite, I can't do any
sniffing myself in the middle of the endpoints.

> Could be.. Basically, by looking at the statistics kept by each layer, you 
> should be able to narrow it down a little bit at least. 

You mean statistics provided by the switch?

> It does still sound like some drops are occurring in TCP due to accept 
> backlog being full and you're overrunning TCP incoming processing (or 
> at least this contributing), going by that ListenDrops count. 

If that's so, then I guess you're implying tcpdump don't get the packets
before the kernel can drop them.

> Sorry, I'm on the road, travelling, and likely not online much this week. 

No worries! Thanks for the help, is very much appreciated.

-- 
Leandro Lucarella
sociomantic labs GmbH
http://www.sociomantic.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Doubts about listen backlog and tcp_max_syn_backlog
  2013-01-28  5:21                             ` Vijay Subramanian
@ 2013-01-28 14:40                               ` Leandro Lucarella
  0 siblings, 0 replies; 20+ messages in thread
From: Leandro Lucarella @ 2013-01-28 14:40 UTC (permalink / raw)
  To: Vijay Subramanian
  Cc: Nivedita Singhvi, Rick Jones, Eric Dumazet, netdev, linux-kernel

On Sun, Jan 27, 2013 at 09:21:32PM -0800, Vijay Subramanian wrote:
> > +    { "ListenDrops", N_("%u SYNs to LISTEN sockets dropped"), opt_number },
> >
> > (see the file debian/patches/CVS-20081003-statistics.c_sync.patch
> >  in the net-tools src)
> >
> > i.e., the netstat pkg is printing the value of the TCPEXT MIB counter
> > that's counting TCPExtListenDrops.
> >
> > Theoretically, that number should be the same as that printed by nstat,
> > as they are getting it from the same kernel stats counter. I have not
> > looked at nstat code (I actually almost always dump the counters from
> > /proc/net/{netstat + snmp} via a simple prettyprint script (will send
> > you that offline).
> 
> nstat pretty much does what you describe which is to parse the
> /proc/net files(s) and print the contents. This is one advantage of
> nstat over netstat. When you add a new MIB, you do not need to update
> nstat.

Well, something seems to be broken in the nstat I have because using the
script to parse the start instead I get the the same values as with
netstat.

[2 minutes later, and after observing the values of nstat changed in the
same server]

OK, it looks like nstat is showing some transcient values, using nstat
-a I get the "absolute values of counters" (as stated in the man page).
By default it seems to print the values for the last 60 seconds, so
mistery solved.

-- 
Leandro Lucarella
sociomantic labs GmbH
http://www.sociomantic.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2013-01-28 14:40 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-22 16:10 Doubts about listen backlog and tcp_max_syn_backlog Leandro Lucarella
2013-01-22 16:45 ` Eric Dumazet
2013-01-22 16:59   ` Leandro Lucarella
2013-01-22 17:13     ` Eric Dumazet
2013-01-22 18:17       ` Rick Jones
2013-01-22 18:42         ` Leandro Lucarella
2013-01-22 22:01           ` Rick Jones
2013-01-23 10:47             ` Leandro Lucarella
2013-01-23 19:28               ` Rick Jones
2013-01-24 12:22                 ` Leandro Lucarella
2013-01-24 18:44                   ` Rick Jones
2013-01-24 19:21                     ` Leandro Lucarella
2013-01-25  6:12                       ` Nivedita SInghvi
2013-01-25 10:05                         ` Leandro Lucarella
2013-01-28  2:48                           ` Nivedita Singhvi
2013-01-28  5:21                             ` Vijay Subramanian
2013-01-28 14:40                               ` Leandro Lucarella
2013-01-28 13:08                             ` Leandro Lucarella
2013-01-28  2:49                           ` Nivedita Singhvi
2013-01-23 20:48               ` Vijay Subramanian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).