All of lore.kernel.org
 help / color / mirror / Atom feed
From: Leandro Lucarella <leandro.lucarella@sociomantic.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: Doubts about listen backlog and tcp_max_syn_backlog
Date: Tue, 22 Jan 2013 17:59:29 +0100	[thread overview]
Message-ID: <20130122165929.GH4608@sociomantic.com> (raw)
In-Reply-To: <1358873142.3464.3964.camel@edumazet-glaptop>

On Tue, Jan 22, 2013 at 08:45:42AM -0800, Eric Dumazet wrote:
> On Tue, 2013-01-22 at 17:10 +0100, Leandro Lucarella wrote:
> > Hi, I'm having some problems with missing SYNs in a server with a high
> > rate of incoming connections and, even when far from understanding the
> > kernel,  I ended up looking at the kernel's source to try to understand
> > better what's going on, because some stuff doesn't make a lot of sense
> > to me.
[snip]
> > 1. What's the relation between the socket backlog and the queue created
> >    by reqsk_queue_alloc()? Because the backlog is only adjusted not to
> >    be grater than sysctl_somaxconn, but the queue size can be quite
> >    different.
> > 2. The comment just above the definition of reqsk_queue_alloc() about
> >    sysctl_max_syn_backlog says "Maximum number of SYN_RECV sockets in
> >    queue per LISTEN socket.". But then nr_table_entries is not only
> >    rounded up to the next power of 2, is incremented by one before that,
> >    so a backlog of, for example, 128, would end up with 256 table
> >    entries even if sysctl_max_syn_backlog is 128.
> > 3. Why is there a nr_table_entries + 1 at all in there? Looking at the
> >    commit that introduced this[1] I can't find any explanation and I've
> >    read some big projects are using backlogs of 511 because of this[2].
> >    (which BTW, ff the queue is really a hash table, looks like an awful
> >    idea).
> > 4. I found some places sk->sk_ack_backlog is checked against
> >    sk->sk_max_ack_backlog to see if new requests should be dropped, but
> >    I also saw checks like inet_csk_reqsk_queue_young(sk) > 1 or
> >    inet_csk_reqsk_queue_is_full(sk), so I guess the queue is used too.
[snip]
> 
> What particular problem do you have ?

What I'm seeing are clients taking either useconds to connect, or 3
seconds, which suggest SYNs are getting lost, but the network doesn't
seem to be the problem. I'm still investigating this, so unfortunately
I'm not really sure.

> A serious rewrite of LISTEN code is needed, because the current
> implementation doesn't scale :
> 
> The SYNACK retransmits are done by a single timer wheel, holding the
> socket lock for too long. So increasing the backlog to 2^16 or 2^17 is
> not really an option.
> 
> Hash table are nice, but if we have to scan them, holding a single lock,
> they are not so nice.

So, the queue is really a hash table, then? So using any (2^n)-1 would
be a bad idea because when the backlog is next to full, the hash table
will be really slow? Is that why the + 1 is there? Is assuming everyone
will use a power of 2 an thus having a load factor of 0.5 at most?

-- 
Leandro Lucarella
Senior R&D Developer
-----------------------------------------------------------
sociomantic labs GmbH
Paul-Lincke-Ufer 39/40
10999 Berlin
DEUTSCHLAND
-----------------------------------------------------------
http://www.sociomantic.com
-----------------------------------------------------------
Fon:       +49 (0) 30 3087 4615
Fax:       +49 (0) 30 3087 4619
Mobile:    +49 (0)157 3636 7373
Skype:     llucarella
Twitter:   http://www.twitter.com/sociomantic
Facebook:  http://bit.ly/labsfacebook
-----------------------------------------------------------
sociomantic labs GmbH, Location: Berlin
Commercial Register - AG Charlottenburg: HRB 121302 B
VAT No. - USt-ID: DE 266262100
Managing Directors: Thomas Nicolai, Thomas Brandhoff

  reply	other threads:[~2013-01-22 16:59 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-22 16:10 Doubts about listen backlog and tcp_max_syn_backlog Leandro Lucarella
2013-01-22 16:45 ` Eric Dumazet
2013-01-22 16:59   ` Leandro Lucarella [this message]
2013-01-22 17:13     ` Eric Dumazet
2013-01-22 18:17       ` Rick Jones
2013-01-22 18:42         ` Leandro Lucarella
2013-01-22 22:01           ` Rick Jones
2013-01-23 10:47             ` Leandro Lucarella
2013-01-23 19:28               ` Rick Jones
2013-01-24 12:22                 ` Leandro Lucarella
2013-01-24 18:44                   ` Rick Jones
2013-01-24 19:21                     ` Leandro Lucarella
2013-01-25  6:12                       ` Nivedita SInghvi
2013-01-25 10:05                         ` Leandro Lucarella
2013-01-28  2:48                           ` Nivedita Singhvi
2013-01-28  5:21                             ` Vijay Subramanian
2013-01-28 14:40                               ` Leandro Lucarella
2013-01-28 13:08                             ` Leandro Lucarella
2013-01-28  2:49                           ` Nivedita Singhvi
2013-01-23 20:48               ` Vijay Subramanian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130122165929.GH4608@sociomantic.com \
    --to=leandro.lucarella@sociomantic.com \
    --cc=eric.dumazet@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.