netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] socket: increase default maximum listen queue length
@ 2011-03-20  1:50 Hagen Paul Pfeifer
  2011-03-20  4:41 ` David Miller
  2011-03-20  8:30 ` Eric Dumazet
  0 siblings, 2 replies; 16+ messages in thread
From: Hagen Paul Pfeifer @ 2011-03-20  1:50 UTC (permalink / raw)
  To: netdev; +Cc: Hagen Paul Pfeifer, Eric Dumazet

sysctl_somaxconn (SOMAXCONN: 128) specifies the maximum number of
sockets in state SYN_RECV per listen socket queue. At listen(2) time the
backlog is adjusted to this limit if bigger then that.

Afterwards in reqsk_queue_alloc() the backlog value is checked again
(nr_table_entries == backlog):

    nr_table_entries = min_t(u32, nr_table_entries, sysctl_max_syn_backlog);
    nr_table_entries = max_t(u32, nr_table_entries, 8);
    nr_table_entries = roundup_pow_of_two(nr_table_entries + 1);

sysctl_max_syn_backlog on the other hand is dynamically adjusted,
depending on the memory characteristic of the system. Default is 256,
128 for small systems and up to 1024 for bigger systems.

For real server work the defacto sysctl_somaxconn limit seems inadequate:

    Experiments with real servers show, that it is absolutely not enough
    even at 100conn/sec. 256 cures most of problems.

Increase default sysctl_somaxconn from 128 to 256 to meet todays condition by
simultaneously limit nr_table_entries by sysctl_max_syn_backlog which is
based on memory condition (max(128, (tcp_hashinfo.ehash_mask + 1 / 256)).

Signed_off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/linux/socket.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/socket.h b/include/linux/socket.h
index edbb1d0..bf35ce2 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -237,7 +237,7 @@ struct ucred {
 #define PF_MAX		AF_MAX
 
 /* Maximum queue length specifiable by listen.  */
-#define SOMAXCONN	128
+#define SOMAXCONN	256
 
 /* Flags we can use with send/ and recv. 
    Added those for 1003.1g not all are supported yet
-- 
1.7.4.1.57.g0466.dirty


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] socket: increase default maximum listen queue length
  2011-03-20  1:50 [PATCH] socket: increase default maximum listen queue length Hagen Paul Pfeifer
@ 2011-03-20  4:41 ` David Miller
  2011-03-20 11:59   ` Hagen Paul Pfeifer
  2011-03-20  8:30 ` Eric Dumazet
  1 sibling, 1 reply; 16+ messages in thread
From: David Miller @ 2011-03-20  4:41 UTC (permalink / raw)
  To: hagen; +Cc: netdev, eric.dumazet

From: Hagen Paul Pfeifer <hagen@jauu.net>
Date: Sun, 20 Mar 2011 02:50:11 +0100

> For real server work the defacto sysctl_somaxconn limit seems inadequate:
> 
>     Experiments with real servers show, that it is absolutely not enough
>     even at 100conn/sec. 256 cures most of problems.

What in the world is a server running on multi-GHZ cpus doing such
that it cannot accept() fast enough to keep up with a 100 connections
per second?

We were handling that just fine 10+ years ago.

The math simply doesn't add up.

Either your numbers are wrong or the server design is brain-dead.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] socket: increase default maximum listen queue length
  2011-03-20  1:50 [PATCH] socket: increase default maximum listen queue length Hagen Paul Pfeifer
  2011-03-20  4:41 ` David Miller
@ 2011-03-20  8:30 ` Eric Dumazet
  2011-03-20  9:04   ` Rémi Denis-Courmont
  2011-03-20 11:39   ` Hagen Paul Pfeifer
  1 sibling, 2 replies; 16+ messages in thread
From: Eric Dumazet @ 2011-03-20  8:30 UTC (permalink / raw)
  To: Hagen Paul Pfeifer; +Cc: netdev

Le dimanche 20 mars 2011 à 02:50 +0100, Hagen Paul Pfeifer a écrit :
> sysctl_somaxconn (SOMAXCONN: 128) specifies the maximum number of
> sockets in state SYN_RECV per listen socket queue. At listen(2) time the
> backlog is adjusted to this limit if bigger then that.
> 
> Afterwards in reqsk_queue_alloc() the backlog value is checked again
> (nr_table_entries == backlog):
> 
>     nr_table_entries = min_t(u32, nr_table_entries, sysctl_max_syn_backlog);
>     nr_table_entries = max_t(u32, nr_table_entries, 8);
>     nr_table_entries = roundup_pow_of_two(nr_table_entries + 1);
> 
> sysctl_max_syn_backlog on the other hand is dynamically adjusted,
> depending on the memory characteristic of the system. Default is 256,
> 128 for small systems and up to 1024 for bigger systems.
> 
> For real server work the defacto sysctl_somaxconn limit seems inadequate:
> 
>     Experiments with real servers show, that it is absolutely not enough
>     even at 100conn/sec. 256 cures most of problems.
> 
> Increase default sysctl_somaxconn from 128 to 256 to meet todays condition by
> simultaneously limit nr_table_entries by sysctl_max_syn_backlog which is
> based on memory condition (max(128, (tcp_hashinfo.ehash_mask + 1 / 256)).
> 
> Signed_off-by: Hagen Paul Pfeifer <hagen@jauu.net>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> ---
>  include/linux/socket.h |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/include/linux/socket.h b/include/linux/socket.h
> index edbb1d0..bf35ce2 100644
> --- a/include/linux/socket.h
> +++ b/include/linux/socket.h
> @@ -237,7 +237,7 @@ struct ucred {
>  #define PF_MAX		AF_MAX
>  
>  /* Maximum queue length specifiable by listen.  */
> -#define SOMAXCONN	128
> +#define SOMAXCONN	256
>  
>  /* Flags we can use with send/ and recv. 
>     Added those for 1003.1g not all are supported yet


Hmm, real problem is not the 'maximum queue value', but the minimum one.

If application says : listen(fd, 10), you are stuck.

128 or 256 is way too small on some servers, where admin can tune
in /etc/sysctl.conf :

net.core.somaxconn = 8192
net.ipv4.tcp_max_syn_backlog = 8192

But application also needs to use : listen(fd, 8192)




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] socket: increase default maximum listen queue length
  2011-03-20  8:30 ` Eric Dumazet
@ 2011-03-20  9:04   ` Rémi Denis-Courmont
  2011-03-20  9:36     ` Eric Dumazet
  2011-03-20 11:39   ` Hagen Paul Pfeifer
  1 sibling, 1 reply; 16+ messages in thread
From: Rémi Denis-Courmont @ 2011-03-20  9:04 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Hagen Paul Pfeifer, netdev

Le dimanche 20 mars 2011 10:30:17 Eric Dumazet, vous avez écrit :
> 
> But application also needs to use : listen(fd, 8192)

Application should pass INT_MAX anyway, unless it has a specific limit, e.g. 
it can only handle one connection at time.

-- 
Rémi Denis-Courmont
http://www.remlab.info/
http://fi.linkedin.com/in/remidenis

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] socket: increase default maximum listen queue length
  2011-03-20  9:04   ` Rémi Denis-Courmont
@ 2011-03-20  9:36     ` Eric Dumazet
  0 siblings, 0 replies; 16+ messages in thread
From: Eric Dumazet @ 2011-03-20  9:36 UTC (permalink / raw)
  To: Rémi Denis-Courmont; +Cc: Hagen Paul Pfeifer, netdev

Le dimanche 20 mars 2011 à 11:04 +0200, Rémi Denis-Courmont a écrit :
> Le dimanche 20 mars 2011 10:30:17 Eric Dumazet, vous avez écrit :
> > 
> > But application also needs to use : listen(fd, 8192)
> 
> Application should pass INT_MAX anyway, unless it has a specific limit, e.g. 
> it can only handle one connection at time.
> 

You'll be surprised how few of them do that actually

ss -a | head
State      Recv-Q Send-Q      Local Address:Port          Peer Address:Port   
LISTEN     0      8                       *:imaps                    *:*       
LISTEN     0      8                       *:pop3s                    *:*       
LISTEN     0      50                      *:mysql                    *:*       
LISTEN     0      8                       *:pop3                     *:*       
LISTEN     0      8                       *:imap2                    *:*       
LISTEN     0      511                     *:www                      *:*       


Yes, Apache itself uses 511 as its default listenbacklog

You'll need to change 

ListenBacklog 8192




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] socket: increase default maximum listen queue length
  2011-03-20  8:30 ` Eric Dumazet
  2011-03-20  9:04   ` Rémi Denis-Courmont
@ 2011-03-20 11:39   ` Hagen Paul Pfeifer
  2011-03-20 11:55     ` Eric Dumazet
  1 sibling, 1 reply; 16+ messages in thread
From: Hagen Paul Pfeifer @ 2011-03-20 11:39 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

* Eric Dumazet | 2011-03-20 09:30:17 [+0100]:

>Hmm, real problem is not the 'maximum queue value', but the minimum one.
>
>If application says : listen(fd, 10), you are stuck.
>
>128 or 256 is way too small on some servers, where admin can tune
>in /etc/sysctl.conf :
>
>net.core.somaxconn = 8192
>net.ipv4.tcp_max_syn_backlog = 8192
>
>But application also needs to use : listen(fd, 8192)

I know Eric, I wrote the patch. ;-) The used naming goes like this:

The system limits (somaxconn & tcp_max_syn_backlog) specify a _maximum_, the
user cannot exceed this limit with listen(2). The backlog argument for listen
on the other hand specify a _minimum_. But the patch increased the default
maximum, therefore I named it in this way.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] socket: increase default maximum listen queue length
  2011-03-20 11:39   ` Hagen Paul Pfeifer
@ 2011-03-20 11:55     ` Eric Dumazet
  2011-03-20 12:14       ` Hagen Paul Pfeifer
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2011-03-20 11:55 UTC (permalink / raw)
  To: Hagen Paul Pfeifer; +Cc: netdev

Le dimanche 20 mars 2011 à 12:39 +0100, Hagen Paul Pfeifer a écrit :
> * Eric Dumazet | 2011-03-20 09:30:17 [+0100]:
> 
> >Hmm, real problem is not the 'maximum queue value', but the minimum one.
> >
> >If application says : listen(fd, 10), you are stuck.
> >
> >128 or 256 is way too small on some servers, where admin can tune
> >in /etc/sysctl.conf :
> >
> >net.core.somaxconn = 8192
> >net.ipv4.tcp_max_syn_backlog = 8192
> >
> >But application also needs to use : listen(fd, 8192)
> 
> I know Eric, I wrote the patch. ;-) The used naming goes like this:
> 
> The system limits (somaxconn & tcp_max_syn_backlog) specify a _maximum_, the
> user cannot exceed this limit with listen(2). The backlog argument for listen
> on the other hand specify a _minimum_. But the patch increased the default
> maximum, therefore I named it in this way.


I am not sure you understood what I said.

Even if you change kernel limits, many applications still use low
limits : listen(fd, 8)

I remember some other OS (was it HPUX or Solaris...) had a minimum
limit : Even if application said 8, an admin could impose a 256 value
for example.

Frankly, I believe somaxconn should be a default enforcement, that a
particular protocol could override.

TCP sockets for example should only enforce tcp_max_syn_backlog limit,
not the somaxconn & tcp_max_syn_backlog.




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] socket: increase default maximum listen queue length
  2011-03-20  4:41 ` David Miller
@ 2011-03-20 11:59   ` Hagen Paul Pfeifer
  0 siblings, 0 replies; 16+ messages in thread
From: Hagen Paul Pfeifer @ 2011-03-20 11:59 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, eric.dumazet

* David Miller | 2011-03-19 21:41:00 [-0700]:

>What in the world is a server running on multi-GHZ cpus doing such
>that it cannot accept() fast enough to keep up with a 100 connections
>per second?
>
>We were handling that just fine 10+ years ago.
>
>The math simply doesn't add up.

Davem, what the hell - you don't get it! ;-) No Davem, the problem is not the
accept() performance rather it is the time since SYN/ACK is arrived and
request_sock are created: RTT time to the client (including client SYN/ACK
processing) and finally the accept() processing time. During _this_ duration
the backlog is of relevance. For connection with a high RTT the backlog can be
quite high - local accept() processing delay does not really matter.

The actual behavior is not really fine. max_syn_backlog and somaxconn are
somewhat[TM] unsynchronized. max_syn_backlog considers the system memory
characteristic where somaxconn is just a dump show stopper. My patch is a
compromise it increase the value to 256 because it showed up that 256 seems
reasonable for many setups. Many system administrators will not notice the
problem, server authors may know about this limitation (and adjust the listen
argument)- so why should we make the life of the administrators a little bit
more easy by adjusting the default? Low memory systems does not suffer, the
limit is still reasonable small. The memory footprint for a server with 4
listening sockets is just insignificant. But connection failure due to backlog
limits will magically go away.

The math is a function from incoming connections / time, the RTT and
processing delay of client and server side..

Have a nice Sunday, Hagen

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] socket: increase default maximum listen queue length
  2011-03-20 11:55     ` Eric Dumazet
@ 2011-03-20 12:14       ` Hagen Paul Pfeifer
  2011-03-20 23:04         ` [PATCH 1/2] " Hagen Paul Pfeifer
  0 siblings, 1 reply; 16+ messages in thread
From: Hagen Paul Pfeifer @ 2011-03-20 12:14 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

* Eric Dumazet | 2011-03-20 12:55:44 [+0100]:

>I am not sure you understood what I said.
>
>Even if you change kernel limits, many applications still use low
>limits : listen(fd, 8)

Right, but there is a discrepance between system administrators and server
authors: the later group will probably notice that listen(fd, 8) is not
adequate (e.g. someone send a bug report). System administrators on the other
hand have no obvious indicator that some goes wrong in the system. Most of
then would not even notice that the backlog is overflowing. 

>I remember some other OS (was it HPUX or Solaris...) had a minimum
>limit : Even if application said 8, an admin could impose a 256 value
>for example.

Not the baddest idea! It is nice that a server author can adjust that value.
But between you and me: the system administrator may have more information
about the network behavior (how many incoming connections/minute, RTT, memory
characteristic, ...). The system administrator should be in the ability to
increase the value, currently he is stucked up if the server author missed
that. E.g.

http://www.dovecot.org/list/dovecot-cvs/2009-September/014567.html

I will spin a patch for that.

Hagen

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/2] socket: increase default maximum listen queue length
  2011-03-20 12:14       ` Hagen Paul Pfeifer
@ 2011-03-20 23:04         ` Hagen Paul Pfeifer
  2011-03-20 23:04           ` [PATCH 2/2] socket: add minimum listen queue length sysctl Hagen Paul Pfeifer
  2011-03-20 23:09           ` [PATCH 1/2] socket: increase default maximum listen queue length David Miller
  0 siblings, 2 replies; 16+ messages in thread
From: Hagen Paul Pfeifer @ 2011-03-20 23:04 UTC (permalink / raw)
  To: netdev; +Cc: Hagen Paul Pfeifer, Eric Dumazet

sysctl_somaxconn (SOMAXCONN: 128) specifies the maximum number of
sockets in state SYN_RECV per listen socket queue. At listen(2) time the
backlog is adjusted to this limit if bigger then that.

Afterwards in reqsk_queue_alloc() the backlog value is checked again
(nr_table_entries == backlog):

    nr_table_entries = min_t(u32, nr_table_entries, sysctl_max_syn_backlog);
    nr_table_entries = max_t(u32, nr_table_entries, 8);
    nr_table_entries = roundup_pow_of_two(nr_table_entries + 1);

sysctl_max_syn_backlog on the other hand is dynamically adjusted,
depending on the memory characteristic of the system. Default is 256,
128 for small systems and up to 1024 for bigger systems.

For real server work the defacto sysctl_somaxconn limit seems inadequate:

    Experiments with real servers show, that it is absolutely not enough
    even at 100conn/sec. 256 cures most of problems.

Increase default sysctl_somaxconn from 128 to 256 to meet todays condition by
simultaneously limit nr_table_entries by sysctl_max_syn_backlog which is
based on memory condition (max(128, (tcp_hashinfo.ehash_mask + 1 / 256)).

Signed_off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/linux/socket.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/socket.h b/include/linux/socket.h
index edbb1d0..bf35ce2 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -237,7 +237,7 @@ struct ucred {
 #define PF_MAX		AF_MAX
 
 /* Maximum queue length specifiable by listen.  */
-#define SOMAXCONN	128
+#define SOMAXCONN	256
 
 /* Flags we can use with send/ and recv. 
    Added those for 1003.1g not all are supported yet
-- 
1.7.4.1.57.g0466.dirty


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/2] socket: add minimum listen queue length sysctl
  2011-03-20 23:04         ` [PATCH 1/2] " Hagen Paul Pfeifer
@ 2011-03-20 23:04           ` Hagen Paul Pfeifer
  2011-03-21  7:36             ` Eric Dumazet
  2011-03-20 23:09           ` [PATCH 1/2] socket: increase default maximum listen queue length David Miller
  1 sibling, 1 reply; 16+ messages in thread
From: Hagen Paul Pfeifer @ 2011-03-20 23:04 UTC (permalink / raw)
  To: netdev; +Cc: Hagen Paul Pfeifer

In the case that a server programmer misjudge network characteristic the
backlog parameter for listen(2) may not adequate to utilize hosts
capabilities and lead to unrequired SYN retransmission - a small backlog
value can form an artificial limitation. From Erics server setup, a
listen queue length of 8 is often a way to small):

ss -a | head
State      Recv-Q Send-Q      Local Address:Port          Peer
Address:Port
LISTEN     0      8                       *:imaps                    *:*
LISTEN     0      8                       *:pop3s                    *:*
LISTEN     0      50                      *:mysql                    *:*
LISTEN     0      8                       *:pop3                     *:*
LISTEN     0      8                       *:imap2                    *:*
LISTEN     0      511                     *:www                      *:*

Until now it is not possible for the system (network) administrator to
increase this value. A bug report must be filled, the backlog increased,
a new version released or even worse: if using closed source software
you cannot make anything.

sysctl_min_syn_backlog provides the ability to increase the minimum
queue length. The default is 8.

Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>

---
I will spin a second documentation patch if Davem accept this patch.
---
 include/net/request_sock.h |    1 +
 net/core/request_sock.c    |    5 ++++-
 net/ipv4/sysctl_net_ipv4.c |    7 +++++++
 3 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/include/net/request_sock.h b/include/net/request_sock.h
index 99e6e19..3e8865f 100644
--- a/include/net/request_sock.h
+++ b/include/net/request_sock.h
@@ -89,6 +89,7 @@ static inline void reqsk_free(struct request_sock *req)
 }
 
 extern int sysctl_max_syn_backlog;
+extern int sysctl_min_syn_backlog;
 
 /** struct listen_sock - listen state
  *
diff --git a/net/core/request_sock.c b/net/core/request_sock.c
index 182236b..e937e9c 100644
--- a/net/core/request_sock.c
+++ b/net/core/request_sock.c
@@ -35,6 +35,9 @@
 int sysctl_max_syn_backlog = 256;
 EXPORT_SYMBOL(sysctl_max_syn_backlog);
 
+int sysctl_min_syn_backlog = 8;
+EXPORT_SYMBOL(sysctl_min_syn_backlog);
+
 int reqsk_queue_alloc(struct request_sock_queue *queue,
 		      unsigned int nr_table_entries)
 {
@@ -42,7 +45,7 @@ int reqsk_queue_alloc(struct request_sock_queue *queue,
 	struct listen_sock *lopt;
 
 	nr_table_entries = min_t(u32, nr_table_entries, sysctl_max_syn_backlog);
-	nr_table_entries = max_t(u32, nr_table_entries, 8);
+	nr_table_entries = max_t(u32, nr_table_entries, sysctl_min_syn_backlog);
 	nr_table_entries = roundup_pow_of_two(nr_table_entries + 1);
 	lopt_size += nr_table_entries * sizeof(struct request_sock *);
 	if (lopt_size > PAGE_SIZE)
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 1a45665..cc03c62 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -298,6 +298,13 @@ static struct ctl_table ipv4_table[] = {
 		.proc_handler	= proc_dointvec
 	},
 	{
+		.procname	= "tcp_min_syn_backlog",
+		.data		= &sysctl_min_syn_backlog,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec
+	},
+	{
 		.procname	= "ip_local_port_range",
 		.data		= &sysctl_local_ports.range,
 		.maxlen		= sizeof(sysctl_local_ports.range),
-- 
1.7.4.1.57.g0466.dirty


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] socket: increase default maximum listen queue length
  2011-03-20 23:04         ` [PATCH 1/2] " Hagen Paul Pfeifer
  2011-03-20 23:04           ` [PATCH 2/2] socket: add minimum listen queue length sysctl Hagen Paul Pfeifer
@ 2011-03-20 23:09           ` David Miller
  2011-03-20 23:52             ` Hagen Paul Pfeifer
  2011-03-20 23:57             ` Hagen Paul Pfeifer
  1 sibling, 2 replies; 16+ messages in thread
From: David Miller @ 2011-03-20 23:09 UTC (permalink / raw)
  To: hagen; +Cc: netdev, eric.dumazet

From: Hagen Paul Pfeifer <hagen@jauu.net>
Date: Mon, 21 Mar 2011 00:04:41 +0100

> For real server work the defacto sysctl_somaxconn limit seems inadequate:
> 
>     Experiments with real servers show, that it is absolutely not enough
>     even at 100conn/sec. 256 cures most of problems.

Absolutely no context is provided for this number.

What's the RTT?  How fast are the cpus?  etc.

You must tell the whole story in order to justify these changes
properly.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] socket: increase default maximum listen queue length
  2011-03-20 23:09           ` [PATCH 1/2] socket: increase default maximum listen queue length David Miller
@ 2011-03-20 23:52             ` Hagen Paul Pfeifer
  2011-03-21  0:18               ` David Miller
  2011-03-20 23:57             ` Hagen Paul Pfeifer
  1 sibling, 1 reply; 16+ messages in thread
From: Hagen Paul Pfeifer @ 2011-03-20 23:52 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, eric.dumazet

* David Miller | 2011-03-20 16:09:06 [-0700]:

>Absolutely no context is provided for this number.
>
>What's the RTT?  How fast are the cpus?  etc.
>
>You must tell the whole story in order to justify these changes
>properly.

(you can skip the first paragraphs and read the last one ;)

The number is somewhat magically - like many other values. I greped
tglx/history.git but the comment (at that time tcp_ipv4.c) seems pre 2002 era.

Providing context is a little bit artificial: I can construct an scenario with
a RTT of 200ms and 1000 connection request per second and the table will
overflow. This can happen, sure. On the other hand there are scenarios with a
RTT of 20ms and 10 connection requests per second - no problem there.

Increasing the number _has_ one essential advantage: it is aligned on
sysctl_max_syn_backlog which in turn is determined by memory characteristics.


Without patch (sysctl not modified, BUT sysctl_max_syn_backlog depending on memory characteristic):

listen-queue-length = max(8, min(userspace_backlog, min(128, sysctl_max_syn_backlog))

Wit patch (sysctl not modified, BUT sysctl_max_syn_backlog depending on memory characteristic):

listen-queue-length = max(8, min(userspace_backlog, min(256, sysctl_max_syn_backlog))


The point is now: sysctl_max_syn_backlog is per default 256, 128 for small
systems and up to 1024 for larger systems. But sysctl_somaxconn (128) will
_always_ restrict the queue length to 128 and make therefore
sysctl_max_syn_backlog defacto unfeasible - it will always restrict the value
to 128. IMHO sysctl_somaxconn should be removed, the overhead of the
listen-queue size per listening socket is insignificant. Especially because
sysctl_max_syn_backlog already consider the memory characteristic. There are a
bunch more connected sockets as these <10 listening sockets, but performance
lack because of will always be noticeable:

netstat -s | grep overflowed
    2621 times the listen queue of a socket overflowed

Hagen

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] socket: increase default maximum listen queue length
  2011-03-20 23:09           ` [PATCH 1/2] socket: increase default maximum listen queue length David Miller
  2011-03-20 23:52             ` Hagen Paul Pfeifer
@ 2011-03-20 23:57             ` Hagen Paul Pfeifer
  1 sibling, 0 replies; 16+ messages in thread
From: Hagen Paul Pfeifer @ 2011-03-20 23:57 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, eric.dumazet

* David Miller | 2011-03-20 16:09:06 [-0700]:

>You must tell the whole story in order to justify these changes
>properly.

BTW: the second patch is independent and provides a new feature, mentioned by
Eric.

Hagen

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] socket: increase default maximum listen queue length
  2011-03-20 23:52             ` Hagen Paul Pfeifer
@ 2011-03-21  0:18               ` David Miller
  0 siblings, 0 replies; 16+ messages in thread
From: David Miller @ 2011-03-21  0:18 UTC (permalink / raw)
  To: hagen; +Cc: netdev, eric.dumazet

From: Hagen Paul Pfeifer <hagen@jauu.net>
Date: Mon, 21 Mar 2011 00:52:53 +0100

> The number is somewhat magically - like many other values. I greped
> tglx/history.git but the comment (at that time tcp_ipv4.c) seems pre 2002 era.

Then don't use that number as part of the justification for the
change.

Describe what matters, and only what matters.  Providing magic and
arbitrary numbers doesn't help people reading your commit message.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] socket: add minimum listen queue length sysctl
  2011-03-20 23:04           ` [PATCH 2/2] socket: add minimum listen queue length sysctl Hagen Paul Pfeifer
@ 2011-03-21  7:36             ` Eric Dumazet
  0 siblings, 0 replies; 16+ messages in thread
From: Eric Dumazet @ 2011-03-21  7:36 UTC (permalink / raw)
  To: Hagen Paul Pfeifer; +Cc: netdev

Le lundi 21 mars 2011 à 00:04 +0100, Hagen Paul Pfeifer a écrit :
> In the case that a server programmer misjudge network characteristic the
> backlog parameter for listen(2) may not adequate to utilize hosts
> capabilities and lead to unrequired SYN retransmission - a small backlog
> value can form an artificial limitation. From Erics server setup, a
> listen queue length of 8 is often a way to small):
> 
> ss -a | head
> State      Recv-Q Send-Q      Local Address:Port          Peer
> Address:Port
> LISTEN     0      8                       *:imaps                    *:*
> LISTEN     0      8                       *:pop3s                    *:*
> LISTEN     0      50                      *:mysql                    *:*
> LISTEN     0      8                       *:pop3                     *:*
> LISTEN     0      8                       *:imap2                    *:*
> LISTEN     0      511                     *:www                      *:*
> 
> Until now it is not possible for the system (network) administrator to
> increase this value. A bug report must be filled, the backlog increased,
> a new version released or even worse: if using closed source software
> you cannot make anything.
> 
> sysctl_min_syn_backlog provides the ability to increase the minimum
> queue length. The default is 8.
> 
> Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
> 
> ---
> I will spin a second documentation patch if Davem accept this patch.
> ---
>  include/net/request_sock.h |    1 +
>  net/core/request_sock.c    |    5 ++++-
>  net/ipv4/sysctl_net_ipv4.c |    7 +++++++
>  3 files changed, 12 insertions(+), 1 deletions(-)
> 
> diff --git a/include/net/request_sock.h b/include/net/request_sock.h
> index 99e6e19..3e8865f 100644
> --- a/include/net/request_sock.h
> +++ b/include/net/request_sock.h
> @@ -89,6 +89,7 @@ static inline void reqsk_free(struct request_sock *req)
>  }
>  
>  extern int sysctl_max_syn_backlog;
> +extern int sysctl_min_syn_backlog;
>  
>  /** struct listen_sock - listen state
>   *
> diff --git a/net/core/request_sock.c b/net/core/request_sock.c
> index 182236b..e937e9c 100644
> --- a/net/core/request_sock.c
> +++ b/net/core/request_sock.c
> @@ -35,6 +35,9 @@
>  int sysctl_max_syn_backlog = 256;
>  EXPORT_SYMBOL(sysctl_max_syn_backlog);
>  
> +int sysctl_min_syn_backlog = 8;
> +EXPORT_SYMBOL(sysctl_min_syn_backlog);
> +
>  int reqsk_queue_alloc(struct request_sock_queue *queue,
>  		      unsigned int nr_table_entries)
>  {
> @@ -42,7 +45,7 @@ int reqsk_queue_alloc(struct request_sock_queue *queue,
>  	struct listen_sock *lopt;
>  
>  	nr_table_entries = min_t(u32, nr_table_entries, sysctl_max_syn_backlog);
> -	nr_table_entries = max_t(u32, nr_table_entries, 8);
> +	nr_table_entries = max_t(u32, nr_table_entries, sysctl_min_syn_backlog);
>  	nr_table_entries = roundup_pow_of_two(nr_table_entries + 1);
>  	lopt_size += nr_table_entries * sizeof(struct request_sock *);
>  	if (lopt_size > PAGE_SIZE)

I believe you are mistaken.

The code you change is the code sizing the hash table, not
sk->sk_max_ack_backlog

This only matters if one application is able to change its listen
backlog during its lifetime.

Say, it begins with :

listen(fd, 1);

Then, a bit later :

listen(fd, 8192);

This certainly is very unlikely...

With current kernel, it does change the maximum SYN_RECV sockets in
flight, but hash table is not resized and stay with 8 slots, so
performance might be suboptimal, since chains are going to hold 1024
elements.




^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2011-03-21  7:36 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-20  1:50 [PATCH] socket: increase default maximum listen queue length Hagen Paul Pfeifer
2011-03-20  4:41 ` David Miller
2011-03-20 11:59   ` Hagen Paul Pfeifer
2011-03-20  8:30 ` Eric Dumazet
2011-03-20  9:04   ` Rémi Denis-Courmont
2011-03-20  9:36     ` Eric Dumazet
2011-03-20 11:39   ` Hagen Paul Pfeifer
2011-03-20 11:55     ` Eric Dumazet
2011-03-20 12:14       ` Hagen Paul Pfeifer
2011-03-20 23:04         ` [PATCH 1/2] " Hagen Paul Pfeifer
2011-03-20 23:04           ` [PATCH 2/2] socket: add minimum listen queue length sysctl Hagen Paul Pfeifer
2011-03-21  7:36             ` Eric Dumazet
2011-03-20 23:09           ` [PATCH 1/2] socket: increase default maximum listen queue length David Miller
2011-03-20 23:52             ` Hagen Paul Pfeifer
2011-03-21  0:18               ` David Miller
2011-03-20 23:57             ` Hagen Paul Pfeifer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).