netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] net-tcp: extend tcp_tw_reuse sysctl to enable loopback only optimization
@ 2018-06-03 17:41 Maciej Żenczykowski
  2018-06-04 17:18 ` Eric Dumazet
  2018-06-04 21:13 ` David Miller
  0 siblings, 2 replies; 3+ messages in thread
From: Maciej Żenczykowski @ 2018-06-03 17:41 UTC (permalink / raw)
  To: Maciej Żenczykowski, David S . Miller, Eric Dumazet
  Cc: netdev, Neal Cardwell, Yuchung Cheng, Wei Wang

From: Maciej Żenczykowski <maze@google.com>

This changes the /proc/sys/net/ipv4/tcp_tw_reuse from a boolean
to an integer.

It now takes the values 0, 1 and 2, where 0 and 1 behave as before,
while 2 enables timewait socket reuse only for sockets that we can
prove are loopback connections:
  ie. bound to 'lo' interface or where one of source or destination
  IPs is 127.0.0.0/8, ::ffff:127.0.0.0/104 or ::1.

This enables quicker reuse of ephemeral ports for loopback connections
- where tcp_tw_reuse is 100% safe from a protocol perspective
(this assumes no artificially induced packet loss on 'lo').

This also makes estblishing many loopback connections *much* faster
(allocating ports out of the first half of the ephemeral port range
is significantly faster, then allocating from the second half)

Without this change in a 32K ephemeral port space my sample program
(it just establishes and closes [::1]:ephemeral -> [::1]:server_port
connections in a tight loop) fails after 32765 connections in 24 seconds.
With it enabled 50000 connections only take 4.7 seconds.

This is particularly problematic for IPv6 where we only have one local
address and cannot play tricks with varying source IP from 127.0.0.0/8
pool.

Signed-off-by: Maciej Żenczykowski <maze@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Wei Wang <weiwan@google.com>

Change-Id: I0377961749979d0301b7b62871a32a4b34b654e1
---
 Documentation/networking/ip-sysctl.txt | 10 +++++---
 net/ipv4/sysctl_net_ipv4.c             |  5 +++-
 net/ipv4/tcp_ipv4.c                    | 35 +++++++++++++++++++++++---
 3 files changed, 43 insertions(+), 7 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 924bd51327b7..6841c74eac00 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -667,11 +667,15 @@ tcp_tso_win_divisor - INTEGER
 	building larger TSO frames.
 	Default: 3
 
-tcp_tw_reuse - BOOLEAN
-	Allow to reuse TIME-WAIT sockets for new connections when it is
-	safe from protocol viewpoint. Default value is 0.
+tcp_tw_reuse - INTEGER
+	Enable reuse of TIME-WAIT sockets for new connections when it is
+	safe from protocol viewpoint.
+	0 - disable
+	1 - global enable
+	2 - enable for loopback traffic only
 	It should not be changed without advice/request of technical
 	experts.
+	Default: 2
 
 tcp_window_scaling - BOOLEAN
 	Enable window scaling as defined in RFC1323.
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index d2eed3ddcb0a..d06247ba08b2 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -30,6 +30,7 @@
 
 static int zero;
 static int one = 1;
+static int two = 2;
 static int four = 4;
 static int thousand = 1000;
 static int gso_max_segs = GSO_MAX_SEGS;
@@ -845,7 +846,9 @@ static struct ctl_table ipv4_net_table[] = {
 		.data		= &init_net.ipv4.sysctl_tcp_tw_reuse,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &two,
 	},
 	{
 		.procname	= "tcp_max_tw_buckets",
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index adbdb503db0c..29f922d5e55d 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -110,8 +110,38 @@ static u32 tcp_v4_init_ts_off(const struct net *net, const struct sk_buff *skb)
 
 int tcp_twsk_unique(struct sock *sk, struct sock *sktw, void *twp)
 {
+	const struct inet_timewait_sock *tw = inet_twsk(sktw);
 	const struct tcp_timewait_sock *tcptw = tcp_twsk(sktw);
 	struct tcp_sock *tp = tcp_sk(sk);
+	int reuse = sock_net(sk)->ipv4.sysctl_tcp_tw_reuse;
+
+	if (reuse == 2) {
+		/* Still does not detect *everything* that goes through
+		 * lo, since we require a loopback src or dst address
+		 * or direct binding to 'lo' interface.
+		 */
+		bool loopback = false;
+		if (tw->tw_bound_dev_if == LOOPBACK_IFINDEX)
+			loopback = true;
+#if IS_ENABLED(CONFIG_IPV6)
+		if (tw->tw_family == AF_INET6) {
+			if (ipv6_addr_loopback(&tw->tw_v6_daddr) ||
+			    (ipv6_addr_v4mapped(&tw->tw_v6_daddr) &&
+			     (tw->tw_v6_daddr.s6_addr[12] == 127)) ||
+			    ipv6_addr_loopback(&tw->tw_v6_rcv_saddr) ||
+			    (ipv6_addr_v4mapped(&tw->tw_v6_rcv_saddr) &&
+			     (tw->tw_v6_rcv_saddr.s6_addr[12] == 127)))
+				loopback = true;
+		} else
+#endif
+		{
+			if (ipv4_is_loopback(tw->tw_daddr) ||
+			    ipv4_is_loopback(tw->tw_rcv_saddr))
+				loopback = true;
+		}
+		if (!loopback)
+			reuse = 0;
+	}
 
 	/* With PAWS, it is safe from the viewpoint
 	   of data integrity. Even without PAWS it is safe provided sequence
@@ -125,8 +155,7 @@ int tcp_twsk_unique(struct sock *sk, struct sock *sktw, void *twp)
 	   and use initial timestamp retrieved from peer table.
 	 */
 	if (tcptw->tw_ts_recent_stamp &&
-	    (!twp || (sock_net(sk)->ipv4.sysctl_tcp_tw_reuse &&
-			     get_seconds() - tcptw->tw_ts_recent_stamp > 1))) {
+	    (!twp || (reuse && get_seconds() - tcptw->tw_ts_recent_stamp > 1))) {
 		tp->write_seq = tcptw->tw_snd_nxt + 65535 + 2;
 		if (tp->write_seq == 0)
 			tp->write_seq = 1;
@@ -2529,7 +2558,7 @@ static int __net_init tcp_sk_init(struct net *net)
 	net->ipv4.sysctl_tcp_orphan_retries = 0;
 	net->ipv4.sysctl_tcp_fin_timeout = TCP_FIN_TIMEOUT;
 	net->ipv4.sysctl_tcp_notsent_lowat = UINT_MAX;
-	net->ipv4.sysctl_tcp_tw_reuse = 0;
+	net->ipv4.sysctl_tcp_tw_reuse = 2;
 
 	cnt = tcp_hashinfo.ehash_mask + 1;
 	net->ipv4.tcp_death_row.sysctl_max_tw_buckets = (cnt + 1) / 2;
-- 
2.17.1.1185.g55be947832-goog

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] net-tcp: extend tcp_tw_reuse sysctl to enable loopback only optimization
  2018-06-03 17:41 [PATCH] net-tcp: extend tcp_tw_reuse sysctl to enable loopback only optimization Maciej Żenczykowski
@ 2018-06-04 17:18 ` Eric Dumazet
  2018-06-04 21:13 ` David Miller
  1 sibling, 0 replies; 3+ messages in thread
From: Eric Dumazet @ 2018-06-04 17:18 UTC (permalink / raw)
  To: Maciej Żenczykowski, Maciej Żenczykowski,
	David S . Miller, Eric Dumazet
  Cc: netdev, Neal Cardwell, Yuchung Cheng, Wei Wang



On 06/03/2018 10:41 AM, Maciej Żenczykowski wrote:
> From: Maciej Żenczykowski <maze@google.com>
> 
> This changes the /proc/sys/net/ipv4/tcp_tw_reuse from a boolean
> to an integer.
> 
> It now takes the values 0, 1 and 2, where 0 and 1 behave as before,
> while 2 enables timewait socket reuse only for sockets that we can
> prove are loopback connections:
>   ie. bound to 'lo' interface or where one of source or destination
>   IPs is 127.0.0.0/8, ::ffff:127.0.0.0/104 or ::1.
> 
> This enables quicker reuse of ephemeral ports for loopback connections
> - where tcp_tw_reuse is 100% safe from a protocol perspective
> (this assumes no artificially induced packet loss on 'lo').
> 
> This also makes estblishing many loopback connections *much* faster
> (allocating ports out of the first half of the ephemeral port range
> is significantly faster, then allocating from the second half)
> 
> Without this change in a 32K ephemeral port space my sample program
> (it just establishes and closes [::1]:ephemeral -> [::1]:server_port
> connections in a tight loop) fails after 32765 connections in 24 seconds.
> With it enabled 50000 connections only take 4.7 seconds.
> 
> This is particularly problematic for IPv6 where we only have one local
> address and cannot play tricks with varying source IP from 127.0.0.0/8
> pool.
> 
> Signed-off-by: Maciej Żenczykowski <maze@google.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Neal Cardwell <ncardwell@google.com>
> Cc: Yuchung Cheng <ycheng@google.com>
> Cc: Wei Wang <weiwan@google.com>

This seems fine, thanks Maciej

Signed-off-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] net-tcp: extend tcp_tw_reuse sysctl to enable loopback only optimization
  2018-06-03 17:41 [PATCH] net-tcp: extend tcp_tw_reuse sysctl to enable loopback only optimization Maciej Żenczykowski
  2018-06-04 17:18 ` Eric Dumazet
@ 2018-06-04 21:13 ` David Miller
  1 sibling, 0 replies; 3+ messages in thread
From: David Miller @ 2018-06-04 21:13 UTC (permalink / raw)
  To: zenczykowski; +Cc: maze, edumazet, netdev, ncardwell, ycheng, weiwan

From: "Maciej Żenczykowski" <zenczykowski@gmail.com>
Date: Sun,  3 Jun 2018 10:41:17 -0700

> From: Maciej Żenczykowski <maze@google.com>
> 
> This changes the /proc/sys/net/ipv4/tcp_tw_reuse from a boolean
> to an integer.
> 
> It now takes the values 0, 1 and 2, where 0 and 1 behave as before,
> while 2 enables timewait socket reuse only for sockets that we can
> prove are loopback connections:
>   ie. bound to 'lo' interface or where one of source or destination
>   IPs is 127.0.0.0/8, ::ffff:127.0.0.0/104 or ::1.
> 
> This enables quicker reuse of ephemeral ports for loopback connections
> - where tcp_tw_reuse is 100% safe from a protocol perspective
> (this assumes no artificially induced packet loss on 'lo').
> 
> This also makes estblishing many loopback connections *much* faster
> (allocating ports out of the first half of the ephemeral port range
> is significantly faster, then allocating from the second half)
> 
> Without this change in a 32K ephemeral port space my sample program
> (it just establishes and closes [::1]:ephemeral -> [::1]:server_port
> connections in a tight loop) fails after 32765 connections in 24 seconds.
> With it enabled 50000 connections only take 4.7 seconds.
> 
> This is particularly problematic for IPv6 where we only have one local
> address and cannot play tricks with varying source IP from 127.0.0.0/8
> pool.
> 
> Signed-off-by: Maciej Żenczykowski <maze@google.com>

Applied, thank you.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-06-04 21:13 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-03 17:41 [PATCH] net-tcp: extend tcp_tw_reuse sysctl to enable loopback only optimization Maciej Żenczykowski
2018-06-04 17:18 ` Eric Dumazet
2018-06-04 21:13 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).