* Doubts about listen backlog and tcp_max_syn_backlog @ 2013-01-22 16:10 Leandro Lucarella 2013-01-22 16:45 ` Eric Dumazet 0 siblings, 1 reply; 20+ messages in thread From: Leandro Lucarella @ 2013-01-22 16:10 UTC (permalink / raw) To: netdev, linux-kernel Hi, I'm having some problems with missing SYNs in a server with a high rate of incoming connections and, even when far from understanding the kernel, I ended up looking at the kernel's source to try to understand better what's going on, because some stuff doesn't make a lot of sense to me. The path I followed is this (line numbers for Linux 3.7): net/socket.c[3] SYSCALL_DEFINE2(listen, int, fd, int, backlog) backlog is truncated to sysctl_somaxconn and sock->ops->listen(sock, backlog) is called, which I guess it calls to inet_listen(). net/ipv4/af_inet.c[4] int inet_listen(struct socket *sock, int backlog) the backlog is assigned to sk->sk_max_ack_backlog and inet_csk_listen_start(sk, backlog) is called (if the socket wans't already in TCP_LISTEN state) net/ipv4/inet_connection_sock.c[5] int inet_csk_listen_start(struct sock *sk, const int nr_table_entries) reqsk_queue_alloc(&icsk->icsk_accept_queue, nr_table_entries) is called, which I guess it creates the actual queue net/core/request_sock.c[6] int reqsk_queue_alloc(struct request_sock_queue *queue, unsigned int nr_table_entries) nr_table_entries is first adjusted to satisfy: 8 <= nr_table_entries <= sysctl_max_syn_backlog and then incremented by one and rounded up to the next power of 2. So here are a couple of questions: 1. What's the relation between the socket backlog and the queue created by reqsk_queue_alloc()? Because the backlog is only adjusted not to be grater than sysctl_somaxconn, but the queue size can be quite different. 2. The comment just above the definition of reqsk_queue_alloc() about sysctl_max_syn_backlog says "Maximum number of SYN_RECV sockets in queue per LISTEN socket.". But then nr_table_entries is not only rounded up to the next power of 2, is incremented by one before that, so a backlog of, for example, 128, would end up with 256 table entries even if sysctl_max_syn_backlog is 128. 3. Why is there a nr_table_entries + 1 at all in there? Looking at the commit that introduced this[1] I can't find any explanation and I've read some big projects are using backlogs of 511 because of this[2]. (which BTW, ff the queue is really a hash table, looks like an awful idea). 4. I found some places sk->sk_ack_backlog is checked against sk->sk_max_ack_backlog to see if new requests should be dropped, but I also saw checks like inet_csk_reqsk_queue_young(sk) > 1 or inet_csk_reqsk_queue_is_full(sk), so I guess the queue is used too. Thanks a lot. [1] http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=commitdiff;h=72a3effaf633bcae9034b7e176bdbd78d64a71db [2] http://blog.dubbelboer.com/2012/04/09/syn-cookies.html#a_reasonably_backlog_size [3] http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=net/socket.c;h=2ca51c719ef984cdadef749008456cf7bd5e1ae4;hb=HEAD#l1544 [4] http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=net/ipv4/af_inet.c;h=24b384b7903ea7a59a11e7a4cbf06db996498924;hb=HEAD#l192 [5] http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=net/ipv4/inet_connection_sock.c;h=d0670f00d5243f95bec4536f60edf32fa2ded850;hb=HEAD#l729 [6] http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=net/core/request_sock.c;h=c31d9e8668c30346894adbf3be55eed4beeb1258;hb=HEAD#l23 -- Leandro Lucarella sociomantic labs GmbH http://www.sociomantic.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Doubts about listen backlog and tcp_max_syn_backlog 2013-01-22 16:10 Doubts about listen backlog and tcp_max_syn_backlog Leandro Lucarella @ 2013-01-22 16:45 ` Eric Dumazet 2013-01-22 16:59 ` Leandro Lucarella 0 siblings, 1 reply; 20+ messages in thread From: Eric Dumazet @ 2013-01-22 16:45 UTC (permalink / raw) To: Leandro Lucarella; +Cc: netdev, linux-kernel On Tue, 2013-01-22 at 17:10 +0100, Leandro Lucarella wrote: > Hi, I'm having some problems with missing SYNs in a server with a high > rate of incoming connections and, even when far from understanding the > kernel, I ended up looking at the kernel's source to try to understand > better what's going on, because some stuff doesn't make a lot of sense > to me. > > The path I followed is this (line numbers for Linux 3.7): > net/socket.c[3] > SYSCALL_DEFINE2(listen, int, fd, int, backlog) > backlog is truncated to sysctl_somaxconn and > sock->ops->listen(sock, backlog) is called, which I guess it > calls to inet_listen(). > > net/ipv4/af_inet.c[4] > int inet_listen(struct socket *sock, int backlog) > the backlog is assigned to sk->sk_max_ack_backlog and > inet_csk_listen_start(sk, backlog) is called (if the socket > wans't already in TCP_LISTEN state) > > net/ipv4/inet_connection_sock.c[5] > int inet_csk_listen_start(struct sock *sk, const int nr_table_entries) > reqsk_queue_alloc(&icsk->icsk_accept_queue, nr_table_entries) is > called, which I guess it creates the actual queue > > net/core/request_sock.c[6] > int reqsk_queue_alloc(struct request_sock_queue *queue, > unsigned int nr_table_entries) > nr_table_entries is first adjusted to satisfy: > 8 <= nr_table_entries <= sysctl_max_syn_backlog > and then incremented by one and rounded up to the next power of > 2. > > So here are a couple of questions: > > 1. What's the relation between the socket backlog and the queue created > by reqsk_queue_alloc()? Because the backlog is only adjusted not to > be grater than sysctl_somaxconn, but the queue size can be quite > different. > 2. The comment just above the definition of reqsk_queue_alloc() about > sysctl_max_syn_backlog says "Maximum number of SYN_RECV sockets in > queue per LISTEN socket.". But then nr_table_entries is not only > rounded up to the next power of 2, is incremented by one before that, > so a backlog of, for example, 128, would end up with 256 table > entries even if sysctl_max_syn_backlog is 128. > 3. Why is there a nr_table_entries + 1 at all in there? Looking at the > commit that introduced this[1] I can't find any explanation and I've > read some big projects are using backlogs of 511 because of this[2]. > (which BTW, ff the queue is really a hash table, looks like an awful > idea). > 4. I found some places sk->sk_ack_backlog is checked against > sk->sk_max_ack_backlog to see if new requests should be dropped, but > I also saw checks like inet_csk_reqsk_queue_young(sk) > 1 or > inet_csk_reqsk_queue_is_full(sk), so I guess the queue is used too. > > > Thanks a lot. > > [1] http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=commitdiff;h=72a3effaf633bcae9034b7e176bdbd78d64a71db > [2] http://blog.dubbelboer.com/2012/04/09/syn-cookies.html#a_reasonably_backlog_size > [3] http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=net/socket.c;h=2ca51c719ef984cdadef749008456cf7bd5e1ae4;hb=HEAD#l1544 > [4] http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=net/ipv4/af_inet.c;h=24b384b7903ea7a59a11e7a4cbf06db996498924;hb=HEAD#l192 > [5] http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=net/ipv4/inet_connection_sock.c;h=d0670f00d5243f95bec4536f60edf32fa2ded850;hb=HEAD#l729 > [6] http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=net/core/request_sock.c;h=c31d9e8668c30346894adbf3be55eed4beeb1258;hb=HEAD#l23 > What particular problem do you have ? A serious rewrite of LISTEN code is needed, because the current implementation doesn't scale : The SYNACK retransmits are done by a single timer wheel, holding the socket lock for too long. So increasing the backlog to 2^16 or 2^17 is not really an option. Hash table are nice, but if we have to scan them, holding a single lock, they are not so nice. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Doubts about listen backlog and tcp_max_syn_backlog 2013-01-22 16:45 ` Eric Dumazet @ 2013-01-22 16:59 ` Leandro Lucarella 2013-01-22 17:13 ` Eric Dumazet 0 siblings, 1 reply; 20+ messages in thread From: Leandro Lucarella @ 2013-01-22 16:59 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, linux-kernel On Tue, Jan 22, 2013 at 08:45:42AM -0800, Eric Dumazet wrote: > On Tue, 2013-01-22 at 17:10 +0100, Leandro Lucarella wrote: > > Hi, I'm having some problems with missing SYNs in a server with a high > > rate of incoming connections and, even when far from understanding the > > kernel, I ended up looking at the kernel's source to try to understand > > better what's going on, because some stuff doesn't make a lot of sense > > to me. [snip] > > 1. What's the relation between the socket backlog and the queue created > > by reqsk_queue_alloc()? Because the backlog is only adjusted not to > > be grater than sysctl_somaxconn, but the queue size can be quite > > different. > > 2. The comment just above the definition of reqsk_queue_alloc() about > > sysctl_max_syn_backlog says "Maximum number of SYN_RECV sockets in > > queue per LISTEN socket.". But then nr_table_entries is not only > > rounded up to the next power of 2, is incremented by one before that, > > so a backlog of, for example, 128, would end up with 256 table > > entries even if sysctl_max_syn_backlog is 128. > > 3. Why is there a nr_table_entries + 1 at all in there? Looking at the > > commit that introduced this[1] I can't find any explanation and I've > > read some big projects are using backlogs of 511 because of this[2]. > > (which BTW, ff the queue is really a hash table, looks like an awful > > idea). > > 4. I found some places sk->sk_ack_backlog is checked against > > sk->sk_max_ack_backlog to see if new requests should be dropped, but > > I also saw checks like inet_csk_reqsk_queue_young(sk) > 1 or > > inet_csk_reqsk_queue_is_full(sk), so I guess the queue is used too. [snip] > > What particular problem do you have ? What I'm seeing are clients taking either useconds to connect, or 3 seconds, which suggest SYNs are getting lost, but the network doesn't seem to be the problem. I'm still investigating this, so unfortunately I'm not really sure. > A serious rewrite of LISTEN code is needed, because the current > implementation doesn't scale : > > The SYNACK retransmits are done by a single timer wheel, holding the > socket lock for too long. So increasing the backlog to 2^16 or 2^17 is > not really an option. > > Hash table are nice, but if we have to scan them, holding a single lock, > they are not so nice. So, the queue is really a hash table, then? So using any (2^n)-1 would be a bad idea because when the backlog is next to full, the hash table will be really slow? Is that why the + 1 is there? Is assuming everyone will use a power of 2 an thus having a load factor of 0.5 at most? -- Leandro Lucarella Senior R&D Developer ----------------------------------------------------------- sociomantic labs GmbH Paul-Lincke-Ufer 39/40 10999 Berlin DEUTSCHLAND ----------------------------------------------------------- http://www.sociomantic.com ----------------------------------------------------------- Fon: +49 (0) 30 3087 4615 Fax: +49 (0) 30 3087 4619 Mobile: +49 (0)157 3636 7373 Skype: llucarella Twitter: http://www.twitter.com/sociomantic Facebook: http://bit.ly/labsfacebook ----------------------------------------------------------- sociomantic labs GmbH, Location: Berlin Commercial Register - AG Charlottenburg: HRB 121302 B VAT No. - USt-ID: DE 266262100 Managing Directors: Thomas Nicolai, Thomas Brandhoff ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Doubts about listen backlog and tcp_max_syn_backlog 2013-01-22 16:59 ` Leandro Lucarella @ 2013-01-22 17:13 ` Eric Dumazet 2013-01-22 18:17 ` Rick Jones 0 siblings, 1 reply; 20+ messages in thread From: Eric Dumazet @ 2013-01-22 17:13 UTC (permalink / raw) To: Leandro Lucarella; +Cc: netdev, linux-kernel On Tue, 2013-01-22 at 17:59 +0100, Leandro Lucarella wrote: > What I'm seeing are clients taking either useconds to connect, or 3 > seconds, which suggest SYNs are getting lost, but the network doesn't > seem to be the problem. I'm still investigating this, so unfortunately > I'm not really sure. > A SYN packet or a SYN-ACK packet can be lost in the network. > > A serious rewrite of LISTEN code is needed, because the current > > implementation doesn't scale : > > > > The SYNACK retransmits are done by a single timer wheel, holding the > > socket lock for too long. So increasing the backlog to 2^16 or 2^17 is > > not really an option. > > > > Hash table are nice, but if we have to scan them, holding a single lock, > > they are not so nice. > > So, the queue is really a hash table, then? So using any (2^n)-1 would > be a bad idea because when the backlog is next to full, the hash table > will be really slow? Is that why the + 1 is there? Is assuming everyone > will use a power of 2 an thus having a load factor of 0.5 at most? > The kind of hash tables we use are power of two. The size of hash table has little effect, its automatic, to try to get an average of one item per hash slot, or less. Even if we had 10 items per slot, it would not be a big deal. What is important is the backlog, and I guess you didn't increase it properly. The somaxconn default is quite low (128) # sysctl -w net/ipv4/tcp_max_syn_backlog=4096 net.ipv4.tcp_max_syn_backlog = 4096 # sysctl -w net.core.somaxconn=4096 net.core.somaxconn = 4096 Then make sure your server use a big enough listen(..., backlog) parameter. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Doubts about listen backlog and tcp_max_syn_backlog 2013-01-22 17:13 ` Eric Dumazet @ 2013-01-22 18:17 ` Rick Jones 2013-01-22 18:42 ` Leandro Lucarella 0 siblings, 1 reply; 20+ messages in thread From: Rick Jones @ 2013-01-22 18:17 UTC (permalink / raw) To: Leandro Lucarella; +Cc: Eric Dumazet, netdev, linux-kernel > What is important is the backlog, and I guess you didn't increase it > properly. The somaxconn default is quite low (128) Leandro - If that is being overflowed, I believe you should be seeing something like: 14 SYNs to LISTEN sockets dropped in the output of netstat -s on the system on which the server application is running. rick ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Doubts about listen backlog and tcp_max_syn_backlog 2013-01-22 18:17 ` Rick Jones @ 2013-01-22 18:42 ` Leandro Lucarella 2013-01-22 22:01 ` Rick Jones 0 siblings, 1 reply; 20+ messages in thread From: Leandro Lucarella @ 2013-01-22 18:42 UTC (permalink / raw) To: Rick Jones; +Cc: Eric Dumazet, netdev, linux-kernel On Tue, Jan 22, 2013 at 10:17:50AM -0800, Rick Jones wrote: > >What is important is the backlog, and I guess you didn't increase it > >properly. The somaxconn default is quite low (128) > > Leandro - > > If that is being overflowed, I believe you should be seeing something like: > > 14 SYNs to LISTEN sockets dropped > > in the output of netstat -s on the system on which the server > application is running. What is that value reporting exactly? Because we are using syncookies, and AFAIK with that enabled, all SYNs are being replied, and what the listen backlog is really limitting is the "completely established sockets waiting to be accepted", according to listen(2). What I don't really know to be honest, is what a "completely established socket" is, does it mean that the SYN,ACK was sent, or the ACK was received back? Also, from the client side, when is the connect(2) call done? When the SYN,ACK is received? Thanks! -- Leandro Lucarella sociomantic labs GmbH http://www.sociomantic.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Doubts about listen backlog and tcp_max_syn_backlog 2013-01-22 18:42 ` Leandro Lucarella @ 2013-01-22 22:01 ` Rick Jones 2013-01-23 10:47 ` Leandro Lucarella 0 siblings, 1 reply; 20+ messages in thread From: Rick Jones @ 2013-01-22 22:01 UTC (permalink / raw) To: Leandro Lucarella; +Cc: Eric Dumazet, netdev, linux-kernel On 01/22/2013 10:42 AM, Leandro Lucarella wrote: > On Tue, Jan 22, 2013 at 10:17:50AM -0800, Rick Jones wrote: >>> What is important is the backlog, and I guess you didn't increase it >>> properly. The somaxconn default is quite low (128) >> >> Leandro - >> >> If that is being overflowed, I believe you should be seeing something like: >> >> 14 SYNs to LISTEN sockets dropped >> >> in the output of netstat -s on the system on which the server >> application is running. > > What is that value reporting exactly? Netstat is reporting the ListenDrops and/or ListenOverflows which map to LINUX_MIB_LISTENDROPS and LINUX_MIB_LISTENOVERFLOWS. Those get incremented in tcp_v4_syn_recv_sock() (and its v6 version etc) if (sk_acceptq_is_full(sk)) goto exit_overflow; Will increment both overflows and drops, and drops will increment on its own in some additional cases. > Because we are using syncookies, and AFAIK with that enabled, all > SYNs are being replied, and what the listen backlog is really > limitting is the "completely established sockets waiting to be > accepted", according to listen(2). What I don't really know to be > honest, is what a "completely established socket" is, does it mean > that the SYN,ACK was sent, or the ACK was received back? I have always thought it meant that the ACK of the SYN|ACK has been received. SyncookiesSent SyncookiesRecv SyncookiesFailed also appear in /proc/net/netstat and presumably in netstat -s output. > Also, from the client side, when is the connect(2) call done? When the > SYN,ACK is received? That would be my assumption. In a previous message: > What I'm seeing are clients taking either useconds to connect, or 3 > seconds, which suggest SYNs are getting lost, but the network doesn't > seem to be the problem. I'm still investigating this, so unfortunately > I'm not really sure. I recently ran into something like that, which turned-out to be an issue with nf_conntrack and its table filling. rick ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Doubts about listen backlog and tcp_max_syn_backlog 2013-01-22 22:01 ` Rick Jones @ 2013-01-23 10:47 ` Leandro Lucarella 2013-01-23 19:28 ` Rick Jones 2013-01-23 20:48 ` Vijay Subramanian 0 siblings, 2 replies; 20+ messages in thread From: Leandro Lucarella @ 2013-01-23 10:47 UTC (permalink / raw) To: Rick Jones; +Cc: Eric Dumazet, netdev, linux-kernel On Tue, Jan 22, 2013 at 02:01:09PM -0800, Rick Jones wrote: > >>If that is being overflowed, I believe you should be seeing something like: > >> > >> 14 SYNs to LISTEN sockets dropped > >> > >>in the output of netstat -s on the system on which the server > >>application is running. > > > >What is that value reporting exactly? > > Netstat is reporting the ListenDrops and/or ListenOverflows which > map to LINUX_MIB_LISTENDROPS and LINUX_MIB_LISTENOVERFLOWS. Those > get incremented in tcp_v4_syn_recv_sock() (and its v6 version etc) > > if (sk_acceptq_is_full(sk)) > goto exit_overflow; > > Will increment both overflows and drops, and drops will increment on > its own in some additional cases. > > >Because we are using syncookies, and AFAIK with that enabled, all > >SYNs are being replied, and what the listen backlog is really > >limitting is the "completely established sockets waiting to be > >accepted", according to listen(2). What I don't really know to be > >honest, is what a "completely established socket" is, does it mean > >that the SYN,ACK was sent, or the ACK was received back? > > I have always thought it meant that the ACK of the SYN|ACK has been > received. > > SyncookiesSent SyncookiesRecv SyncookiesFailed also appear in > /proc/net/netstat and presumably in netstat -s output. Thanks for the info. I'm definitely dropping SYNs and sending cookies, around 50/s. Is there any way to tell how many connections are queued in a particular socket? > >Also, from the client side, when is the connect(2) call done? When the > >SYN,ACK is received? > > That would be my assumption. Then if syncookies are enabled, the time spent in connect() shouldn't be bigger than 3 seconds even if SYNs are being "dropped" by listen, right? (and I'm saying "dropped" because I assume if syncookies are enabled, SYN,ACK replies are sent anyway, with a cookie, but they are not stored in the queue/hash table). > In a previous message: > > >What I'm seeing are clients taking either useconds to connect, or 3 > >seconds, which suggest SYNs are getting lost, but the network doesn't > >seem to be the problem. I'm still investigating this, so unfortunately > >I'm not really sure. > > I recently ran into something like that, which turned-out to be an > issue with nf_conntrack and its table filling. Doing a quick research about it, I found that when that happens I should get a message about it in dmesg (like "kernel: nf_conntrack: table full, dropping packet.") but I'm not getting any, so I guess that's not a problem. Thanks! -- Leandro Lucarella sociomantic labs GmbH http://www.sociomantic.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Doubts about listen backlog and tcp_max_syn_backlog 2013-01-23 10:47 ` Leandro Lucarella @ 2013-01-23 19:28 ` Rick Jones 2013-01-24 12:22 ` Leandro Lucarella 2013-01-23 20:48 ` Vijay Subramanian 1 sibling, 1 reply; 20+ messages in thread From: Rick Jones @ 2013-01-23 19:28 UTC (permalink / raw) To: Leandro Lucarella; +Cc: Eric Dumazet, netdev, linux-kernel On 01/23/2013 02:47 AM, Leandro Lucarella wrote: > Thanks for the info. I'm definitely dropping SYNs and sending cookies, > around 50/s. Is there any way to tell how many connections are queued in > a particular socket? I am not familiar with one. Doesn't mean there isn't one, only that I am not able to think of it. > Then if syncookies are enabled, the time spent in connect() shouldn't be > bigger than 3 seconds even if SYNs are being "dropped" by listen, right? Do you mean if "ESTABLISHED" connections are dropped because the listen queue is full? I don't think I would put that as "SYNs being dropped by listen" - too easy to confuse that with an actual dropping of a SYN segment. But yes, I would not expect a connect() call to remain incomplete for any longer than it took to receive an SYN|ACK from the other end. That would be 3 (,9, 21, etc...) seconds on a kernel with 3 seconds as the initial retransmission timeout. rick ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Doubts about listen backlog and tcp_max_syn_backlog 2013-01-23 19:28 ` Rick Jones @ 2013-01-24 12:22 ` Leandro Lucarella 2013-01-24 18:44 ` Rick Jones 0 siblings, 1 reply; 20+ messages in thread From: Leandro Lucarella @ 2013-01-24 12:22 UTC (permalink / raw) To: Rick Jones; +Cc: Eric Dumazet, netdev, linux-kernel On Wed, Jan 23, 2013 at 11:28:08AM -0800, Rick Jones wrote: > >Then if syncookies are enabled, the time spent in connect() shouldn't be > >bigger than 3 seconds even if SYNs are being "dropped" by listen, right? > > Do you mean if "ESTABLISHED" connections are dropped because the > listen queue is full? I don't think I would put that as "SYNs being > dropped by listen" - too easy to confuse that with an actual > dropping of a SYN segment. I was just kind of quoting the name given by netstat: "SYNs to LISTEN sockets dropped" (for kernel 3.0, I noticed newer kernels don't have this stat anymore, or the name was changed). I still don't know if we are talking about the same thing. > But yes, I would not expect a connect() call to remain incomplete > for any longer than it took to receive an SYN|ACK from the other > end. So the only reason to experience these high times spent in connect() should be because a SYN or SYN|ACK was actually loss in a lower layer, like an error in the network device or a transmission error? > That would be 3 (,9, 21, etc...) seconds on a kernel with 3 > seconds as the initial retransmission timeout. Which can't be changed without recompiling, right? Thanks! -- Leandro Lucarella sociomantic labs GmbH http://www.sociomantic.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Doubts about listen backlog and tcp_max_syn_backlog 2013-01-24 12:22 ` Leandro Lucarella @ 2013-01-24 18:44 ` Rick Jones 2013-01-24 19:21 ` Leandro Lucarella 0 siblings, 1 reply; 20+ messages in thread From: Rick Jones @ 2013-01-24 18:44 UTC (permalink / raw) To: Leandro Lucarella; +Cc: Eric Dumazet, netdev, linux-kernel On 01/24/2013 04:22 AM, Leandro Lucarella wrote: > On Wed, Jan 23, 2013 at 11:28:08AM -0800, Rick Jones wrote: >>> Then if syncookies are enabled, the time spent in connect() shouldn't be >>> bigger than 3 seconds even if SYNs are being "dropped" by listen, right? >> >> Do you mean if "ESTABLISHED" connections are dropped because the >> listen queue is full? I don't think I would put that as "SYNs being >> dropped by listen" - too easy to confuse that with an actual >> dropping of a SYN segment. > > I was just kind of quoting the name given by netstat: "SYNs to LISTEN > sockets dropped" (for kernel 3.0, I noticed newer kernels don't have > this stat anymore, or the name was changed). I still don't know if we > are talking about the same thing. Are you sure those stats are not present in 3.X kernels? I just looked at /proc/net/netstat on a 3.7 system and noticed both the ListenMumble stats and the three cookie stats. And I see the code for them in the tree: aj@tardy:~/net-next/net/ipv4$ grep MIB_LISTEN *.c proc.c: SNMP_MIB_ITEM("ListenOverflows", LINUX_MIB_LISTENOVERFLOWS), proc.c: SNMP_MIB_ITEM("ListenDrops", LINUX_MIB_LISTENDROPS), tcp_ipv4.c: NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENOVERFLOWS); tcp_ipv4.c: NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS); raj@tardy:~/net-next/net/ipv4$ grep MIB_SYN *.c proc.c: SNMP_MIB_ITEM("SyncookiesSent", LINUX_MIB_SYNCOOKIESSENT), proc.c: SNMP_MIB_ITEM("SyncookiesRecv", LINUX_MIB_SYNCOOKIESRECV), proc.c: SNMP_MIB_ITEM("SyncookiesFailed", LINUX_MIB_SYNCOOKIESFAILED), syncookies.c: NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESSENT); syncookies.c: NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESFAILED); syncookies.c: NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESRECV); I will sometimes be tripped-up by netstat's not showing a statistic with a zero value... >> But yes, I would not expect a connect() call to remain incomplete >> for any longer than it took to receive an SYN|ACK from the other >> end. > > So the only reason to experience these high times spent in connect() > should be because a SYN or SYN|ACK was actually loss in a lower layer, > like an error in the network device or a transmission error? Modulo the/some other drop-without-stat point such as Vijay mentioned yesterday. You might consider taking some packet traces. If you can I would start with a trace taken on the system(s) on which the long connect() calls are happening. I think the tcpdump manpage has an example of a tcpdump command with a filter expression that catches just SYNchronize and FINished segments which I suppose you could extend to include ReSeT segments. Such a filter expression would be missing the client's ACK of the SYN|ACK but unless you see incrementing stats relating to say checksum failures or other drops on the "client" side I suppose you could assume that the client ACKed the server's SYN|ACK. >> That would be 3 (,9, 21, etc...) seconds on a kernel with 3 >> seconds as the initial retransmission timeout. > > Which can't be changed without recompiling, right? To the best of my knowledge. rick jones ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Doubts about listen backlog and tcp_max_syn_backlog 2013-01-24 18:44 ` Rick Jones @ 2013-01-24 19:21 ` Leandro Lucarella 2013-01-25 6:12 ` Nivedita SInghvi 0 siblings, 1 reply; 20+ messages in thread From: Leandro Lucarella @ 2013-01-24 19:21 UTC (permalink / raw) To: Rick Jones; +Cc: Eric Dumazet, netdev, linux-kernel On Thu, Jan 24, 2013 at 10:44:32AM -0800, Rick Jones wrote: > On 01/24/2013 04:22 AM, Leandro Lucarella wrote: > >On Wed, Jan 23, 2013 at 11:28:08AM -0800, Rick Jones wrote: > >>>Then if syncookies are enabled, the time spent in connect() shouldn't be > >>>bigger than 3 seconds even if SYNs are being "dropped" by listen, right? > >> > >>Do you mean if "ESTABLISHED" connections are dropped because the > >>listen queue is full? I don't think I would put that as "SYNs being > >>dropped by listen" - too easy to confuse that with an actual > >>dropping of a SYN segment. > > > >I was just kind of quoting the name given by netstat: "SYNs to LISTEN > >sockets dropped" (for kernel 3.0, I noticed newer kernels don't have > >this stat anymore, or the name was changed). I still don't know if we > >are talking about the same thing. > > Are you sure those stats are not present in 3.X kernels? I just > looked at /proc/net/netstat on a 3.7 system and noticed both the > ListenMumble stats and the three cookie stats. And I see the code > for them in the tree: > > aj@tardy:~/net-next/net/ipv4$ grep MIB_LISTEN *.c > proc.c: SNMP_MIB_ITEM("ListenOverflows", LINUX_MIB_LISTENOVERFLOWS), > proc.c: SNMP_MIB_ITEM("ListenDrops", LINUX_MIB_LISTENDROPS), > tcp_ipv4.c: NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENOVERFLOWS); > tcp_ipv4.c: NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS); > > raj@tardy:~/net-next/net/ipv4$ grep MIB_SYN *.c > proc.c: SNMP_MIB_ITEM("SyncookiesSent", LINUX_MIB_SYNCOOKIESSENT), > proc.c: SNMP_MIB_ITEM("SyncookiesRecv", LINUX_MIB_SYNCOOKIESRECV), > proc.c: SNMP_MIB_ITEM("SyncookiesFailed", LINUX_MIB_SYNCOOKIESFAILED), > syncookies.c: NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESSENT); > syncookies.c: NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESFAILED); > syncookies.c: NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESRECV); > > I will sometimes be tripped-up by netstat's not showing a statistic > with a zero value... This is what I'm talking about: pc1 $ uname -a Linux labs09 3.5.0-18-generic #29~precise1-Ubuntu SMP Mon Oct 22 16:31:46 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux pc1 $ netstat --version | head -n2 net-tools 1.60 netstat 1.42 (2001-04-15) pc1 $ netstat -s | grep -i syn 4 invalid SYN cookies received pc2 $ uname -a Linux eu-21 3.0.0-19-server #33-Ubuntu SMP Thu Apr 19 20:32:48 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux pc2 $ netstat --version | head -n2 net-tools 1.60 netstat 1.42 (2001-04-15) pc2 $ netstat -s | grep -i syn 1996450 SYN cookies sent 2899079 SYN cookies received 410573 invalid SYN cookies received 10012473 resets received for embryonic SYN_RECV sockets 5659740 SYNs to LISTEN sockets dropped 1 connections reset due to unexpected SYN I didn't take a look at the kernel or netstat sources about this, so I don't know exactly how are they connected. > >>But yes, I would not expect a connect() call to remain incomplete > >>for any longer than it took to receive an SYN|ACK from the other > >>end. > > > >So the only reason to experience these high times spent in connect() > >should be because a SYN or SYN|ACK was actually loss in a lower layer, > >like an error in the network device or a transmission error? > > Modulo the/some other drop-without-stat point such as Vijay > mentioned yesterday. So, in this cases a syncookie is not sent back? I had the impression they were sent always... > You might consider taking some packet traces. If you can I would > start with a trace taken on the system(s) on which the long > connect() calls are happening. I think the tcpdump manpage has an > example of a tcpdump command with a filter expression that catches > just SYNchronize and FINished segments which I suppose you could > extend to include ReSeT segments. Such a filter expression would be > missing the client's ACK of the SYN|ACK but unless you see > incrementing stats relating to say checksum failures or other drops > on the "client" side I suppose you could assume that the client > ACKed the server's SYN|ACK. Yes, I already did captures and we are definitely loosing packets (including SYNs), but it looks like the amount of SYNs I'm loosing is lower than the amount of long connect() times I observe. This is not confirmed yet, I'm still investigating. Thanks! -- Leandro Lucarella sociomantic labs GmbH http://www.sociomantic.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Doubts about listen backlog and tcp_max_syn_backlog 2013-01-24 19:21 ` Leandro Lucarella @ 2013-01-25 6:12 ` Nivedita SInghvi 2013-01-25 10:05 ` Leandro Lucarella 0 siblings, 1 reply; 20+ messages in thread From: Nivedita SInghvi @ 2013-01-25 6:12 UTC (permalink / raw) To: Leandro Lucarella; +Cc: Rick Jones, Eric Dumazet, netdev, linux-kernel On 01/24/2013 11:21 AM, Leandro Lucarella wrote: > On Thu, Jan 24, 2013 at 10:44:32AM -0800, Rick Jones wrote: >> On 01/24/2013 04:22 AM, Leandro Lucarella wrote: >>> On Wed, Jan 23, 2013 at 11:28:08AM -0800, Rick Jones wrote: >>>>> Then if syncookies are enabled, the time spent in connect() shouldn't be >>>>> bigger than 3 seconds even if SYNs are being "dropped" by listen, right? >>>> >>>> Do you mean if "ESTABLISHED" connections are dropped because the >>>> listen queue is full? I don't think I would put that as "SYNs being >>>> dropped by listen" - too easy to confuse that with an actual >>>> dropping of a SYN segment. >>> >>> I was just kind of quoting the name given by netstat: "SYNs to LISTEN >>> sockets dropped" (for kernel 3.0, I noticed newer kernels don't have >>> this stat anymore, or the name was changed). I still don't know if we >>> are talking about the same thing. >> [snip] >> I will sometimes be tripped-up by netstat's not showing a statistic >> with a zero value... Leandro, you should be able to do an nstat -z, it will print all counters even if zero. You should see something like so: ipv4]> nstat -z #kernel IpInReceives 2135 0.0 IpInHdrErrors 0 0.0 IpInAddrErrors 202 0.0 ... You might want to take a look at those (your pkts may not even be making it to tcp) and these in particular: TcpExtSyncookiesSent 0 0.0 TcpExtSyncookiesRecv 0 0.0 TcpExtSyncookiesFailed 0 0.0 TcpExtListenOverflows 0 0.0 TcpExtListenDrops 0 0.0 TcpExtTCPBacklogDrop 0 0.0 TcpExtTCPMinTTLDrop 0 0.0 TcpExtTCPDeferAcceptDrop 0 0.0 If you don't have nstat on that version for some reason, download the latest iproute pkg. Looking at the counter names is a lot more helpful and precise than the netstat converstion to human consumption. > Yes, I already did captures and we are definitely loosing packets > (including SYNs), but it looks like the amount of SYNs I'm loosing is > lower than the amount of long connect() times I observe. This is not > confirmed yet, I'm still investigating. Where did you narrow down the drop to? There are quite a few places in the networking stack we silently drop packets (such as the one pointed out earlier in this thread), although they should almost all be extremely low probability/NEVER type events. Do you want a patch to gap the most likely scenario? (I'll post that to netdev separately). thanks, Nivedita ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Doubts about listen backlog and tcp_max_syn_backlog 2013-01-25 6:12 ` Nivedita SInghvi @ 2013-01-25 10:05 ` Leandro Lucarella 2013-01-28 2:48 ` Nivedita Singhvi 2013-01-28 2:49 ` Nivedita Singhvi 0 siblings, 2 replies; 20+ messages in thread From: Leandro Lucarella @ 2013-01-25 10:05 UTC (permalink / raw) To: Nivedita SInghvi; +Cc: Rick Jones, Eric Dumazet, netdev, linux-kernel On Thu, Jan 24, 2013 at 10:12:46PM -0800, Nivedita SInghvi wrote: > >>> I was just kind of quoting the name given by netstat: "SYNs to LISTEN > >>> sockets dropped" (for kernel 3.0, I noticed newer kernels don't have > >>> this stat anymore, or the name was changed). I still don't know if we > >>> are talking about the same thing. > >> > [snip] > >> I will sometimes be tripped-up by netstat's not showing a statistic > >> with a zero value... > > Leandro, you should be able to do an nstat -z, it will print all > counters even if zero. You should see something like so: > > ipv4]> nstat -z > #kernel > IpInReceives 2135 0.0 > IpInHdrErrors 0 0.0 > IpInAddrErrors 202 0.0 > ... > > You might want to take a look at those (your pkts may not even be > making it to tcp) and these in particular: > > TcpExtSyncookiesSent 0 0.0 > TcpExtSyncookiesRecv 0 0.0 > TcpExtSyncookiesFailed 0 0.0 > TcpExtListenOverflows 0 0.0 > TcpExtListenDrops 0 0.0 > TcpExtTCPBacklogDrop 0 0.0 > TcpExtTCPMinTTLDrop 0 0.0 > TcpExtTCPDeferAcceptDrop 0 0.0 > > If you don't have nstat on that version for some reason, download the > latest iproute pkg. Looking at the counter names is a lot more helpful > and precise than the netstat converstion to human consumption. Thanks, but what about this? pc2 $ nstat -z | grep -i drop TcpExtLockDroppedIcmps 0 0.0 TcpExtListenDrops 0 0.0 TcpExtTCPPrequeueDropped 0 0.0 TcpExtTCPBacklogDrop 0 0.0 TcpExtTCPMinTTLDrop 0 0.0 TcpExtTCPDeferAcceptDrop 0 0.0 pc2 $ netstat -s | grep -i drop 470 outgoing packets dropped 5659740 SYNs to LISTEN sockets dropped Is this normal? > > Yes, I already did captures and we are definitely loosing packets > > (including SYNs), but it looks like the amount of SYNs I'm loosing is > > lower than the amount of long connect() times I observe. This is not > > confirmed yet, I'm still investigating. > > Where did you narrow down the drop to? There are quite a few places in > the networking stack we silently drop packets (such as the one pointed > out earlier in this thread), although they should almost all be > extremely low probability/NEVER type events. Do you want a patch to > gap the most likely scenario? (I'll post that to netdev separately). Even when that would be awesome, unfortunately there is no way I could get permission to run a patched kernel (or even restart the servers for that matter). And I don't know how could I narrow down the drops in any way. What I know is capturing traffic with tcpdump, I see some packets leaving one server but never arriving to the new one. Also, the hardware is not great either, I'm not sure is not responsible for the loss. There are some errors reported by ethtool, but I don't know exactly what they mean: # ethtool -S eth0 NIC statistics: tx_packets: 336978308273 rx_packets: 384108075585 tx_errors: 0 rx_errors: 194 rx_missed: 1119 align_errors: 31731 tx_single_collisions: 0 tx_multi_collisions: 0 unicast: 384108023754 broadcast: 51825 multicast: 6 tx_aborted: 0 tx_underrun: 0 Thanks! -- Leandro Lucarella sociomantic labs GmbH http://www.sociomantic.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Doubts about listen backlog and tcp_max_syn_backlog 2013-01-25 10:05 ` Leandro Lucarella @ 2013-01-28 2:48 ` Nivedita Singhvi 2013-01-28 5:21 ` Vijay Subramanian 2013-01-28 13:08 ` Leandro Lucarella 2013-01-28 2:49 ` Nivedita Singhvi 1 sibling, 2 replies; 20+ messages in thread From: Nivedita Singhvi @ 2013-01-28 2:48 UTC (permalink / raw) To: Leandro Lucarella; +Cc: Rick Jones, Eric Dumazet, netdev, linux-kernel On 01/25/2013 02:05 AM, Leandro Lucarella wrote: > On Thu, Jan 24, 2013 at 10:12:46PM -0800, Nivedita SInghvi wrote: >>>>> I was just kind of quoting the name given by netstat: "SYNs to LISTEN >>>>> sockets dropped" (for kernel 3.0, I noticed newer kernels don't have >>>>> this stat anymore, or the name was changed). I still don't know if we >>>>> are talking about the same thing. >>>> >> [snip] >>>> I will sometimes be tripped-up by netstat's not showing a statistic >>>> with a zero value... >> >> Leandro, you should be able to do an nstat -z, it will print all >> counters even if zero. You should see something like so: >> >> ipv4]> nstat -z >> #kernel >> IpInReceives 2135 0.0 >> IpInHdrErrors 0 0.0 >> IpInAddrErrors 202 0.0 >> ... >> >> You might want to take a look at those (your pkts may not even be >> making it to tcp) and these in particular: >> >> TcpExtSyncookiesSent 0 0.0 >> TcpExtSyncookiesRecv 0 0.0 >> TcpExtSyncookiesFailed 0 0.0 >> TcpExtListenOverflows 0 0.0 >> TcpExtListenDrops 0 0.0 >> TcpExtTCPBacklogDrop 0 0.0 >> TcpExtTCPMinTTLDrop 0 0.0 >> TcpExtTCPDeferAcceptDrop 0 0.0 >> >> If you don't have nstat on that version for some reason, download the >> latest iproute pkg. Looking at the counter names is a lot more helpful >> and precise than the netstat converstion to human consumption. > > Thanks, but what about this? > > pc2 $ nstat -z | grep -i drop > TcpExtLockDroppedIcmps 0 0.0 > TcpExtListenDrops 0 0.0 > TcpExtTCPPrequeueDropped 0 0.0 > TcpExtTCPBacklogDrop 0 0.0 > TcpExtTCPMinTTLDrop 0 0.0 > TcpExtTCPDeferAcceptDrop 0 0.0 That seems bogus. > pc2 $ netstat -s | grep -i drop > 470 outgoing packets dropped > 5659740 SYNs to LISTEN sockets dropped > > Is this normal? That's a lot ofconnect requests dropped, but it depends on how long you've been up and how much traffic you've seen. Hmm...you were on an older Ubuntu, right? The netstat source was patched to translate it as follows: + { "ListenDrops", N_("%u SYNs to LISTEN sockets dropped"), opt_number }, (see the file debian/patches/CVS-20081003-statistics.c_sync.patch in the net-tools src) i.e., the netstat pkg is printing the value of the TCPEXT MIB counter that's counting TCPExtListenDrops. Theoretically, that number should be the same as that printed by nstat, as they are getting it from the same kernel stats counter. I have not looked at nstat code (I actually almost always dump the counters from /proc/net/{netstat + snmp} via a simple prettyprint script (will send you that offline). If the nstat and netstat counters don't match, something is fishy. That nstat output is broken. >>> Yes, I already did captures and we are definitely loosing packets >>> (including SYNs), but it looks like the amount of SYNs I'm loosing is >>> lower than the amount of long connect() times I observe. This is not >>> confirmed yet, I'm still investigating. >> >> Where did you narrow down the drop to? There are quite a few places in >> the networking stack we silently drop packets (such as the one pointed >> out earlier in this thread), although they should almost all be >> extremely low probability/NEVER type events. Do you want a patch to >> gap the most likely scenario? (I'll post that to netdev separately). > > Even when that would be awesome, unfortunately there is no way I could > get permission to run a patched kernel (or even restart the servers for > that matter). > > And I don't know how could I narrow down the drops in any way. What I > know is capturing traffic with tcpdump, I see some packets leaving one > server but never arriving to the new one. Hmm..do you have a switch between your two end points dropping pkts? Could be.. Basically, by looking at the statistics kept by each layer, you should be able to narrow it down a little bit at least. It does still sound like some drops are occurring in TCP due to accept backlog being full and you're overrunning TCP incoming processing (or at least this contributing), going by that ListenDrops count. > Also, the hardware is not great either, I'm not sure is not responsible > for the loss. There are some errors reported by ethtool, but I don't > know exactly what they mean: > > # ethtool -S eth0 > NIC statistics: > tx_packets: 336978308273 > rx_packets: 384108075585 > tx_errors: 0 > rx_errors: 194 > rx_missed: 1119 > align_errors: 31731 > tx_single_collisions: 0 > tx_multi_collisions: 0 > unicast: 384108023754 > broadcast: 51825 > multicast: 6 > tx_aborted: 0 > tx_underrun: 0 > > Thanks! > You aren't suffering a lot of packet loss at the NIC. Sorry, I'm on the road, travelling, and likely not online much this week. thanks, Nivedita ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Doubts about listen backlog and tcp_max_syn_backlog 2013-01-28 2:48 ` Nivedita Singhvi @ 2013-01-28 5:21 ` Vijay Subramanian 2013-01-28 14:40 ` Leandro Lucarella 2013-01-28 13:08 ` Leandro Lucarella 1 sibling, 1 reply; 20+ messages in thread From: Vijay Subramanian @ 2013-01-28 5:21 UTC (permalink / raw) To: Nivedita Singhvi Cc: Leandro Lucarella, Rick Jones, Eric Dumazet, netdev, linux-kernel > + { "ListenDrops", N_("%u SYNs to LISTEN sockets dropped"), opt_number }, > > (see the file debian/patches/CVS-20081003-statistics.c_sync.patch > in the net-tools src) > > i.e., the netstat pkg is printing the value of the TCPEXT MIB counter > that's counting TCPExtListenDrops. > > Theoretically, that number should be the same as that printed by nstat, > as they are getting it from the same kernel stats counter. I have not > looked at nstat code (I actually almost always dump the counters from > /proc/net/{netstat + snmp} via a simple prettyprint script (will send > you that offline). > nstat pretty much does what you describe which is to parse the /proc/net files(s) and print the contents. This is one advantage of nstat over netstat. When you add a new MIB, you do not need to update nstat. > If the nstat and netstat counters don't match, something is fishy. > That nstat output is broken. > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Doubts about listen backlog and tcp_max_syn_backlog 2013-01-28 5:21 ` Vijay Subramanian @ 2013-01-28 14:40 ` Leandro Lucarella 0 siblings, 0 replies; 20+ messages in thread From: Leandro Lucarella @ 2013-01-28 14:40 UTC (permalink / raw) To: Vijay Subramanian Cc: Nivedita Singhvi, Rick Jones, Eric Dumazet, netdev, linux-kernel On Sun, Jan 27, 2013 at 09:21:32PM -0800, Vijay Subramanian wrote: > > + { "ListenDrops", N_("%u SYNs to LISTEN sockets dropped"), opt_number }, > > > > (see the file debian/patches/CVS-20081003-statistics.c_sync.patch > > in the net-tools src) > > > > i.e., the netstat pkg is printing the value of the TCPEXT MIB counter > > that's counting TCPExtListenDrops. > > > > Theoretically, that number should be the same as that printed by nstat, > > as they are getting it from the same kernel stats counter. I have not > > looked at nstat code (I actually almost always dump the counters from > > /proc/net/{netstat + snmp} via a simple prettyprint script (will send > > you that offline). > > nstat pretty much does what you describe which is to parse the > /proc/net files(s) and print the contents. This is one advantage of > nstat over netstat. When you add a new MIB, you do not need to update > nstat. Well, something seems to be broken in the nstat I have because using the script to parse the start instead I get the the same values as with netstat. [2 minutes later, and after observing the values of nstat changed in the same server] OK, it looks like nstat is showing some transcient values, using nstat -a I get the "absolute values of counters" (as stated in the man page). By default it seems to print the values for the last 60 seconds, so mistery solved. -- Leandro Lucarella sociomantic labs GmbH http://www.sociomantic.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Doubts about listen backlog and tcp_max_syn_backlog 2013-01-28 2:48 ` Nivedita Singhvi 2013-01-28 5:21 ` Vijay Subramanian @ 2013-01-28 13:08 ` Leandro Lucarella 1 sibling, 0 replies; 20+ messages in thread From: Leandro Lucarella @ 2013-01-28 13:08 UTC (permalink / raw) To: Nivedita Singhvi; +Cc: Rick Jones, Eric Dumazet, netdev, linux-kernel On Sun, Jan 27, 2013 at 06:48:18PM -0800, Nivedita Singhvi wrote: [snip] > > Thanks, but what about this? > > > > pc2 $ nstat -z | grep -i drop > > TcpExtLockDroppedIcmps 0 0.0 > > TcpExtListenDrops 0 0.0 > > TcpExtTCPPrequeueDropped 0 0.0 > > TcpExtTCPBacklogDrop 0 0.0 > > TcpExtTCPMinTTLDrop 0 0.0 > > TcpExtTCPDeferAcceptDrop 0 0.0 > > That seems bogus. > > > > pc2 $ netstat -s | grep -i drop > > 470 outgoing packets dropped > > 5659740 SYNs to LISTEN sockets dropped > > > > Is this normal? > > That's a lot ofconnect requests dropped, but it depends on how > long you've been up and how much traffic you've seen. > > Hmm...you were on an older Ubuntu, right? The netstat source > was patched to translate it as follows: > > + { "ListenDrops", N_("%u SYNs to LISTEN sockets dropped"), opt_number }, > > (see the file debian/patches/CVS-20081003-statistics.c_sync.patch > in the net-tools src) I have ubuntu 11.10 in all the servers I'm checking except for one with 12.04. Is weird, for the same ubuntu version (same kernel, same netstat, same nstat) I get different outputs. Some have some counters that coincide, some have more counters than other, some have different counters than other... pc121 # nstat -z | grep -i drop TcpExtLockDroppedIcmps 0 0.0 TcpExtListenDrops 0 0.0 TcpExtTCPPrequeueDropped 0 0.0 TcpExtTCPBacklogDrop 0 0.0 TcpExtTCPMinTTLDrop 0 0.0 TcpExtTCPDeferAcceptDrop 0 0.0 pc121 # netstat -s | grep -i drop 470 outgoing packets dropped 5659762 SYNs to LISTEN sockets dropped Other are like this: pc126 # nstat -z | grep -i drop TcpExtLockDroppedIcmps 0 0.0 TcpExtListenDrops 2968982 0.0 TcpExtTCPPrequeueDropped 0 0.0 TcpExtTCPBacklogDrop 0 0.0 TcpExtTCPMinTTLDrop 0 0.0 TcpExtTCPDeferAcceptDrop 0 0.0 pc126 # netstat -s | grep -i drop 2968982 SYNs to LISTEN sockets dropped Other like this: pc127 # nstat -z | grep -i drop TcpExtLockDroppedIcmps 0 0.0 TcpExtListenDrops 0 0.0 TcpExtTCPPrequeueDropped 0 0.0 TcpExtTCPBacklogDrop 0 0.0 TcpExtTCPMinTTLDrop 0 0.0 TcpExtTCPDeferAcceptDrop 0 0.0 pc127 # netstat -s | grep -i drop 1321958 SYNs to LISTEN sockets dropped pc128 # nstat -z | grep -i drop TcpExtLockDroppedIcmps 0 0.0 TcpExtListenDrops 6455507 0.0 TcpExtTCPPrequeueDropped 0 0.0 TcpExtTCPBacklogDrop 0 0.0 TcpExtTCPMinTTLDrop 0 0.0 TcpExtTCPDeferAcceptDrop 0 0.0 pc128 # netstat -s | grep -i drop 6 ICMP packets dropped because they were out-of-window 6455507 SYNs to LISTEN sockets dropped pc130 # nstat -z | grep -i drop TcpExtLockDroppedIcmps 0 0.0 TcpExtListenDrops 0 0.0 TcpExtTCPPrequeueDropped 0 0.0 TcpExtTCPBacklogDrop 0 0.0 TcpExtTCPMinTTLDrop 0 0.0 TcpExtTCPDeferAcceptDrop 0 0.0 pc130 # netstat -s | grep -i drop 3 ICMP packets dropped because they were out-of-window 6728909 SYNs to LISTEN sockets dropped And this is for the one with Ubuntu 12.04: pc106 # nstat -z | grep -i drop TcpExtLockDroppedIcmps 0 0.0 TcpExtListenDrops 2598140 0.0 TcpExtTCPPrequeueDropped 0 0.0 TcpExtTCPBacklogDrop 1711 0.0 TcpExtTCPMinTTLDrop 0 0.0 TcpExtTCPDeferAcceptDrop 0 0.0 TcpExtTCPReqQFullDrop 0 0.0 pc106 # netstat -s | grep -i drop 2598140 SYNs to LISTEN sockets dropped TCPBacklogDrop: 1711 Are this counters hardware-dependant? Or there anything else why they might be different in different servers? > i.e., the netstat pkg iS printing the value of the TCPEXT MIB counter > that's counting TCPExtListenDrops. Then why nstat show that counter in 0 and netstat with what I assume is the right value? > Theoretically, that number should be the same as that printed by nstat, > as they are getting it from the same kernel stats counter. I have not > looked at nstat code (I actually almost always dump the counters from > /proc/net/{netstat + snmp} via a simple prettyprint script (will send > you that offline). Mmm, ok, thanks! > If the nstat and netstat counters don't match, something is fishy. > That nstat output is broken. I using the one from iproute package 20110315-1build1 (except for the one with Ubuntu 12.04, which have 20111117-1ubuntu2). Any ideas on what could be wrong? > > And I don't know how could I narrow down the drops in any way. What I > > know is capturing traffic with tcpdump, I see some packets leaving one > > server but never arriving to the new one. About this, tcpdump should get all the packets received by the NIC, before the kernel have any chance to drop anything, right? > Hmm..do you have a switch between your two end points dropping pkts? I have no idea, I assume there is because the server have only one NIC and they are interconnected to several other servers, so there should be something in the middle, but we have the servers offsite, I can't do any sniffing myself in the middle of the endpoints. > Could be.. Basically, by looking at the statistics kept by each layer, you > should be able to narrow it down a little bit at least. You mean statistics provided by the switch? > It does still sound like some drops are occurring in TCP due to accept > backlog being full and you're overrunning TCP incoming processing (or > at least this contributing), going by that ListenDrops count. If that's so, then I guess you're implying tcpdump don't get the packets before the kernel can drop them. > Sorry, I'm on the road, travelling, and likely not online much this week. No worries! Thanks for the help, is very much appreciated. -- Leandro Lucarella sociomantic labs GmbH http://www.sociomantic.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Doubts about listen backlog and tcp_max_syn_backlog 2013-01-25 10:05 ` Leandro Lucarella 2013-01-28 2:48 ` Nivedita Singhvi @ 2013-01-28 2:49 ` Nivedita Singhvi 1 sibling, 0 replies; 20+ messages in thread From: Nivedita Singhvi @ 2013-01-28 2:49 UTC (permalink / raw) To: niv; +Cc: linux-kernel On 01/25/2013 02:05 AM, Leandro Lucarella wrote: > On Thu, Jan 24, 2013 at 10:12:46PM -0800, Nivedita SInghvi wrote: >>>>> I was just kind of quoting the name given by netstat: "SYNs to LISTEN >>>>> sockets dropped" (for kernel 3.0, I noticed newer kernels don't have >>>>> this stat anymore, or the name was changed). I still don't know if we >>>>> are talking about the same thing. >>>> >> [snip] >>>> I will sometimes be tripped-up by netstat's not showing a statistic >>>> with a zero value... >> >> Leandro, you should be able to do an nstat -z, it will print all >> counters even if zero. You should see something like so: >> >> ipv4]> nstat -z >> #kernel >> IpInReceives 2135 0.0 >> IpInHdrErrors 0 0.0 >> IpInAddrErrors 202 0.0 >> ... >> >> You might want to take a look at those (your pkts may not even be >> making it to tcp) and these in particular: >> >> TcpExtSyncookiesSent 0 0.0 >> TcpExtSyncookiesRecv 0 0.0 >> TcpExtSyncookiesFailed 0 0.0 >> TcpExtListenOverflows 0 0.0 >> TcpExtListenDrops 0 0.0 >> TcpExtTCPBacklogDrop 0 0.0 >> TcpExtTCPMinTTLDrop 0 0.0 >> TcpExtTCPDeferAcceptDrop 0 0.0 >> >> If you don't have nstat on that version for some reason, download the >> latest iproute pkg. Looking at the counter names is a lot more helpful >> and precise than the netstat converstion to human consumption. > > Thanks, but what about this? > > pc2 $ nstat -z | grep -i drop > TcpExtLockDroppedIcmps 0 0.0 > TcpExtListenDrops 0 0.0 > TcpExtTCPPrequeueDropped 0 0.0 > TcpExtTCPBacklogDrop 0 0.0 > TcpExtTCPMinTTLDrop 0 0.0 > TcpExtTCPDeferAcceptDrop 0 0.0 That seems bogus. > pc2 $ netstat -s | grep -i drop > 470 outgoing packets dropped > 5659740 SYNs to LISTEN sockets dropped > > Is this normal? That's a lot ofconnect requests dropped, but it depends on how long you've been up and how much traffic you've seen. Hmm...you were on an older Ubuntu, right? The netstat source was patched to translate it as follows: + { "ListenDrops", N_("%u SYNs to LISTEN sockets dropped"), opt_number }, (see the file debian/patches/CVS-20081003-statistics.c_sync.patch in the net-tools src) i.e., the netstat pkg is printing the value of the TCPEXT MIB counter that's counting TCPExtListenDrops. Theoretically, that number should be the same as that printed by nstat, as they are getting it from the same kernel stats counter. I have not looked at nstat code (I actually almost always dump the counters from /proc/net/{netstat + snmp} via a simple prettyprint script (will send you that offline). If the nstat and netstat counters don't match, something is fishy. That nstat output is broken. >>> Yes, I already did captures and we are definitely loosing packets >>> (including SYNs), but it looks like the amount of SYNs I'm loosing is >>> lower than the amount of long connect() times I observe. This is not >>> confirmed yet, I'm still investigating. >> >> Where did you narrow down the drop to? There are quite a few places in >> the networking stack we silently drop packets (such as the one pointed >> out earlier in this thread), although they should almost all be >> extremely low probability/NEVER type events. Do you want a patch to >> gap the most likely scenario? (I'll post that to netdev separately). > > Even when that would be awesome, unfortunately there is no way I could > get permission to run a patched kernel (or even restart the servers for > that matter). > > And I don't know how could I narrow down the drops in any way. What I > know is capturing traffic with tcpdump, I see some packets leaving one > server but never arriving to the new one. Hmm..do you have a switch between your two end points dropping pkts? Could be.. Basically, by looking at the statistics kept by each layer, you should be able to narrow it down a little bit at least. It does still sound like some drops are occurring in TCP due to accept backlog being full and you're overrunning TCP incoming processing (or at least this contributing), going by that ListenDrops count. > Also, the hardware is not great either, I'm not sure is not responsible > for the loss. There are some errors reported by ethtool, but I don't > know exactly what they mean: > > # ethtool -S eth0 > NIC statistics: > tx_packets: 336978308273 > rx_packets: 384108075585 > tx_errors: 0 > rx_errors: 194 > rx_missed: 1119 > align_errors: 31731 > tx_single_collisions: 0 > tx_multi_collisions: 0 > unicast: 384108023754 > broadcast: 51825 > multicast: 6 > tx_aborted: 0 > tx_underrun: 0 > > Thanks! > You aren't suffering a lot of packet loss at the NIC. Sorry, I'm on the road, travelling, and likely not online much this week. thanks, Nivedita ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Doubts about listen backlog and tcp_max_syn_backlog 2013-01-23 10:47 ` Leandro Lucarella 2013-01-23 19:28 ` Rick Jones @ 2013-01-23 20:48 ` Vijay Subramanian 1 sibling, 0 replies; 20+ messages in thread From: Vijay Subramanian @ 2013-01-23 20:48 UTC (permalink / raw) To: Leandro Lucarella; +Cc: Rick Jones, Eric Dumazet, netdev, linux-kernel On 23 January 2013 02:47, Leandro Lucarella <leandro.lucarella@sociomantic.com> wrote: > On Tue, Jan 22, 2013 at 02:01:09PM -0800, Rick Jones wrote: >> >>If that is being overflowed, I believe you should be seeing something like: >> >> >> >> 14 SYNs to LISTEN sockets dropped >> >> >> >>in the output of netstat -s on the system on which the server >> >>application is running. >> > >> >What is that value reporting exactly? >> >> Netstat is reporting the ListenDrops and/or ListenOverflows which >> map to LINUX_MIB_LISTENDROPS and LINUX_MIB_LISTENOVERFLOWS. Those >> get incremented in tcp_v4_syn_recv_sock() (and its v6 version etc) >> >> if (sk_acceptq_is_full(sk)) >> goto exit_overflow; >> >> Will increment both overflows and drops, and drops will increment on >> its own in some additional cases. Note that tcp_v4_conn_request() can also drop SYNs directly (start of TWHS) if acceptq is full and more than one young requests are queued up vim +1504 tcp_ipv4.c if (sk_acceptq_is_full(sk) && inet_csk_reqsk_queue_young(sk) > 1) goto drop; These drops do not seem to be tracked by any MIB variable and so will not show up in netstat (Also, newer nstat is preferred to netstat ). Maybe we need to track these drops too? Vijay ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2013-01-28 14:40 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-01-22 16:10 Doubts about listen backlog and tcp_max_syn_backlog Leandro Lucarella 2013-01-22 16:45 ` Eric Dumazet 2013-01-22 16:59 ` Leandro Lucarella 2013-01-22 17:13 ` Eric Dumazet 2013-01-22 18:17 ` Rick Jones 2013-01-22 18:42 ` Leandro Lucarella 2013-01-22 22:01 ` Rick Jones 2013-01-23 10:47 ` Leandro Lucarella 2013-01-23 19:28 ` Rick Jones 2013-01-24 12:22 ` Leandro Lucarella 2013-01-24 18:44 ` Rick Jones 2013-01-24 19:21 ` Leandro Lucarella 2013-01-25 6:12 ` Nivedita SInghvi 2013-01-25 10:05 ` Leandro Lucarella 2013-01-28 2:48 ` Nivedita Singhvi 2013-01-28 5:21 ` Vijay Subramanian 2013-01-28 14:40 ` Leandro Lucarella 2013-01-28 13:08 ` Leandro Lucarella 2013-01-28 2:49 ` Nivedita Singhvi 2013-01-23 20:48 ` Vijay Subramanian
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.