All of lore.kernel.org
 help / color / mirror / Atom feed
From: Leandro Lucarella <leandro.lucarella@sociomantic.com>
To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Doubts about listen backlog and tcp_max_syn_backlog
Date: Tue, 22 Jan 2013 17:10:38 +0100	[thread overview]
Message-ID: <20130122161038.GG4608@sociomantic.com> (raw)

Hi, I'm having some problems with missing SYNs in a server with a high
rate of incoming connections and, even when far from understanding the
kernel,  I ended up looking at the kernel's source to try to understand
better what's going on, because some stuff doesn't make a lot of sense
to me.

The path I followed is this (line numbers for Linux 3.7):
net/socket.c[3]
    SYSCALL_DEFINE2(listen, int, fd, int, backlog)
        backlog is truncated to sysctl_somaxconn and
        sock->ops->listen(sock, backlog) is called, which I guess it
        calls to inet_listen().

net/ipv4/af_inet.c[4]
    int inet_listen(struct socket *sock, int backlog)
        the backlog is assigned to sk->sk_max_ack_backlog and
        inet_csk_listen_start(sk, backlog) is called (if the socket
        wans't already in TCP_LISTEN state)

net/ipv4/inet_connection_sock.c[5]
    int inet_csk_listen_start(struct sock *sk, const int nr_table_entries)
        reqsk_queue_alloc(&icsk->icsk_accept_queue, nr_table_entries) is
        called, which I guess it creates the actual queue

net/core/request_sock.c[6]
    int reqsk_queue_alloc(struct request_sock_queue *queue,
                          unsigned int nr_table_entries)
        nr_table_entries is first adjusted to satisfy:
        8 <= nr_table_entries <= sysctl_max_syn_backlog
        and then incremented by one and rounded up to the next power of
        2.

So here are a couple of questions:

1. What's the relation between the socket backlog and the queue created
   by reqsk_queue_alloc()? Because the backlog is only adjusted not to
   be grater than sysctl_somaxconn, but the queue size can be quite
   different.
2. The comment just above the definition of reqsk_queue_alloc() about
   sysctl_max_syn_backlog says "Maximum number of SYN_RECV sockets in
   queue per LISTEN socket.". But then nr_table_entries is not only
   rounded up to the next power of 2, is incremented by one before that,
   so a backlog of, for example, 128, would end up with 256 table
   entries even if sysctl_max_syn_backlog is 128.
3. Why is there a nr_table_entries + 1 at all in there? Looking at the
   commit that introduced this[1] I can't find any explanation and I've
   read some big projects are using backlogs of 511 because of this[2].
   (which BTW, ff the queue is really a hash table, looks like an awful
   idea).
4. I found some places sk->sk_ack_backlog is checked against
   sk->sk_max_ack_backlog to see if new requests should be dropped, but
   I also saw checks like inet_csk_reqsk_queue_young(sk) > 1 or
   inet_csk_reqsk_queue_is_full(sk), so I guess the queue is used too.


Thanks a lot.

[1] http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=commitdiff;h=72a3effaf633bcae9034b7e176bdbd78d64a71db
[2] http://blog.dubbelboer.com/2012/04/09/syn-cookies.html#a_reasonably_backlog_size
[3] http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=net/socket.c;h=2ca51c719ef984cdadef749008456cf7bd5e1ae4;hb=HEAD#l1544
[4] http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=net/ipv4/af_inet.c;h=24b384b7903ea7a59a11e7a4cbf06db996498924;hb=HEAD#l192
[5] http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=net/ipv4/inet_connection_sock.c;h=d0670f00d5243f95bec4536f60edf32fa2ded850;hb=HEAD#l729
[6] http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=net/core/request_sock.c;h=c31d9e8668c30346894adbf3be55eed4beeb1258;hb=HEAD#l23

-- 
Leandro Lucarella
sociomantic labs GmbH
http://www.sociomantic.com

             reply	other threads:[~2013-01-22 16:10 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-22 16:10 Leandro Lucarella [this message]
2013-01-22 16:45 ` Doubts about listen backlog and tcp_max_syn_backlog Eric Dumazet
2013-01-22 16:59   ` Leandro Lucarella
2013-01-22 17:13     ` Eric Dumazet
2013-01-22 18:17       ` Rick Jones
2013-01-22 18:42         ` Leandro Lucarella
2013-01-22 22:01           ` Rick Jones
2013-01-23 10:47             ` Leandro Lucarella
2013-01-23 19:28               ` Rick Jones
2013-01-24 12:22                 ` Leandro Lucarella
2013-01-24 18:44                   ` Rick Jones
2013-01-24 19:21                     ` Leandro Lucarella
2013-01-25  6:12                       ` Nivedita SInghvi
2013-01-25 10:05                         ` Leandro Lucarella
2013-01-28  2:48                           ` Nivedita Singhvi
2013-01-28  5:21                             ` Vijay Subramanian
2013-01-28 14:40                               ` Leandro Lucarella
2013-01-28 13:08                             ` Leandro Lucarella
2013-01-28  2:49                           ` Nivedita Singhvi
2013-01-23 20:48               ` Vijay Subramanian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130122161038.GG4608@sociomantic.com \
    --to=leandro.lucarella@sociomantic.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.