From: Leandro Lucarella <leandro.lucarella@sociomantic.com>
To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Doubts about listen backlog and tcp_max_syn_backlog
Date: Tue, 22 Jan 2013 17:10:38 +0100 [thread overview]
Message-ID: <20130122161038.GG4608@sociomantic.com> (raw)
Hi, I'm having some problems with missing SYNs in a server with a high
rate of incoming connections and, even when far from understanding the
kernel, I ended up looking at the kernel's source to try to understand
better what's going on, because some stuff doesn't make a lot of sense
to me.
The path I followed is this (line numbers for Linux 3.7):
net/socket.c[3]
SYSCALL_DEFINE2(listen, int, fd, int, backlog)
backlog is truncated to sysctl_somaxconn and
sock->ops->listen(sock, backlog) is called, which I guess it
calls to inet_listen().
net/ipv4/af_inet.c[4]
int inet_listen(struct socket *sock, int backlog)
the backlog is assigned to sk->sk_max_ack_backlog and
inet_csk_listen_start(sk, backlog) is called (if the socket
wans't already in TCP_LISTEN state)
net/ipv4/inet_connection_sock.c[5]
int inet_csk_listen_start(struct sock *sk, const int nr_table_entries)
reqsk_queue_alloc(&icsk->icsk_accept_queue, nr_table_entries) is
called, which I guess it creates the actual queue
net/core/request_sock.c[6]
int reqsk_queue_alloc(struct request_sock_queue *queue,
unsigned int nr_table_entries)
nr_table_entries is first adjusted to satisfy:
8 <= nr_table_entries <= sysctl_max_syn_backlog
and then incremented by one and rounded up to the next power of
2.
So here are a couple of questions:
1. What's the relation between the socket backlog and the queue created
by reqsk_queue_alloc()? Because the backlog is only adjusted not to
be grater than sysctl_somaxconn, but the queue size can be quite
different.
2. The comment just above the definition of reqsk_queue_alloc() about
sysctl_max_syn_backlog says "Maximum number of SYN_RECV sockets in
queue per LISTEN socket.". But then nr_table_entries is not only
rounded up to the next power of 2, is incremented by one before that,
so a backlog of, for example, 128, would end up with 256 table
entries even if sysctl_max_syn_backlog is 128.
3. Why is there a nr_table_entries + 1 at all in there? Looking at the
commit that introduced this[1] I can't find any explanation and I've
read some big projects are using backlogs of 511 because of this[2].
(which BTW, ff the queue is really a hash table, looks like an awful
idea).
4. I found some places sk->sk_ack_backlog is checked against
sk->sk_max_ack_backlog to see if new requests should be dropped, but
I also saw checks like inet_csk_reqsk_queue_young(sk) > 1 or
inet_csk_reqsk_queue_is_full(sk), so I guess the queue is used too.
Thanks a lot.
[1] http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=commitdiff;h=72a3effaf633bcae9034b7e176bdbd78d64a71db
[2] http://blog.dubbelboer.com/2012/04/09/syn-cookies.html#a_reasonably_backlog_size
[3] http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=net/socket.c;h=2ca51c719ef984cdadef749008456cf7bd5e1ae4;hb=HEAD#l1544
[4] http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=net/ipv4/af_inet.c;h=24b384b7903ea7a59a11e7a4cbf06db996498924;hb=HEAD#l192
[5] http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=net/ipv4/inet_connection_sock.c;h=d0670f00d5243f95bec4536f60edf32fa2ded850;hb=HEAD#l729
[6] http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=net/core/request_sock.c;h=c31d9e8668c30346894adbf3be55eed4beeb1258;hb=HEAD#l23
--
Leandro Lucarella
sociomantic labs GmbH
http://www.sociomantic.com
next reply other threads:[~2013-01-22 16:10 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-22 16:10 Leandro Lucarella [this message]
2013-01-22 16:45 ` Doubts about listen backlog and tcp_max_syn_backlog Eric Dumazet
2013-01-22 16:59 ` Leandro Lucarella
2013-01-22 17:13 ` Eric Dumazet
2013-01-22 18:17 ` Rick Jones
2013-01-22 18:42 ` Leandro Lucarella
2013-01-22 22:01 ` Rick Jones
2013-01-23 10:47 ` Leandro Lucarella
2013-01-23 19:28 ` Rick Jones
2013-01-24 12:22 ` Leandro Lucarella
2013-01-24 18:44 ` Rick Jones
2013-01-24 19:21 ` Leandro Lucarella
2013-01-25 6:12 ` Nivedita SInghvi
2013-01-25 10:05 ` Leandro Lucarella
2013-01-28 2:48 ` Nivedita Singhvi
2013-01-28 5:21 ` Vijay Subramanian
2013-01-28 14:40 ` Leandro Lucarella
2013-01-28 13:08 ` Leandro Lucarella
2013-01-28 2:49 ` Nivedita Singhvi
2013-01-23 20:48 ` Vijay Subramanian
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130122161038.GG4608@sociomantic.com \
--to=leandro.lucarella@sociomantic.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.