netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 net-next 0/5] Improve bind(addr, 0) behaviour.
@ 2020-03-08 18:16 Kuniyuki Iwashima
  2020-03-08 18:16 ` [PATCH v4 net-next 1/5] tcp: Remove unnecessary conditions in inet_csk_bind_conflict() Kuniyuki Iwashima
                   ` (5 more replies)
  0 siblings, 6 replies; 11+ messages in thread
From: Kuniyuki Iwashima @ 2020-03-08 18:16 UTC (permalink / raw)
  To: davem, kuznet, yoshfuji, edumazet
  Cc: kuniyu, kuni1840, netdev, osa-contribution-log

Currently we fail to bind sockets to ephemeral ports when all of the ports
are exhausted even if all sockets have SO_REUSEADDR enabled. In this case,
we still have a chance to connect to the different remote hosts.

These patches add net.ipv4.ip_autobind_reuse option and fix the behaviour
to fully utilize all space of the local (addr, port) tuples.

---
Changes in v4:
  - Add net.ipv4.ip_autobind_reuse option to not change the current behaviour.
  - Modify .gitignore for test.

Changes in v3:
  - Change the title and write more specific description of the 3rd patch.
  - Add a test in tools/testing/selftests/net/ as the 4th patch.
  https://lore.kernel.org/netdev/20200229113554.78338-1-kuniyu@amazon.co.jp/

Changes in v2:
  - Change the description of the 2nd patch ('localhost' -> 'address').
  - Correct the description and the if statement of the 3rd patch.
  https://lore.kernel.org/netdev/20200226074631.67688-1-kuniyu@amazon.co.jp/

v1 with tests:
  https://lore.kernel.org/netdev/20200220152020.13056-1-kuniyu@amazon.co.jp/
---

Kuniyuki Iwashima (5):
  tcp: Remove unnecessary conditions in inet_csk_bind_conflict().
  tcp: bind(0) remove the SO_REUSEADDR restriction when ephemeral ports
    are exhausted.
  tcp: Forbid to bind more than one sockets haveing SO_REUSEADDR and
    SO_REUSEPORT per EUID.
  net: Add net.ipv4.ip_autobind_reuse option.
  selftests: net: Add SO_REUSEADDR test to check if 4-tuples are fully
    utilized.

 Documentation/networking/ip-sysctl.txt        |   7 +
 include/net/netns/ipv4.h                      |   1 +
 net/ipv4/inet_connection_sock.c               |  36 ++--
 net/ipv4/sysctl_net_ipv4.c                    |   7 +
 tools/testing/selftests/net/.gitignore        |   1 +
 tools/testing/selftests/net/Makefile          |   2 +
 .../selftests/net/reuseaddr_ports_exhausted.c | 162 ++++++++++++++++++
 .../net/reuseaddr_ports_exhausted.sh          |  35 ++++
 8 files changed, 239 insertions(+), 12 deletions(-)
 create mode 100644 tools/testing/selftests/net/reuseaddr_ports_exhausted.c
 create mode 100755 tools/testing/selftests/net/reuseaddr_ports_exhausted.sh

-- 
2.17.2 (Apple Git-113)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v4 net-next 1/5] tcp: Remove unnecessary conditions in inet_csk_bind_conflict().
  2020-03-08 18:16 [PATCH v4 net-next 0/5] Improve bind(addr, 0) behaviour Kuniyuki Iwashima
@ 2020-03-08 18:16 ` Kuniyuki Iwashima
  2020-03-08 18:16 ` [PATCH v4 net-next 2/5] tcp: bind(0) remove the SO_REUSEADDR restriction when ephemeral ports are exhausted Kuniyuki Iwashima
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Kuniyuki Iwashima @ 2020-03-08 18:16 UTC (permalink / raw)
  To: davem, kuznet, yoshfuji, edumazet
  Cc: kuniyu, kuni1840, netdev, osa-contribution-log

When we get an ephemeral port, the relax is false, so the SO_REUSEADDR
conditions may be evaluated twice. We do not need to check the conditions
again.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
---
 net/ipv4/inet_connection_sock.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index a4db79b1b643..2e9549f49a82 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -146,17 +146,15 @@ static int inet_csk_bind_conflict(const struct sock *sk,
 		    (!sk->sk_bound_dev_if ||
 		     !sk2->sk_bound_dev_if ||
 		     sk->sk_bound_dev_if == sk2->sk_bound_dev_if)) {
-			if ((!reuse || !sk2->sk_reuse ||
-			    sk2->sk_state == TCP_LISTEN) &&
-			    (!reuseport || !sk2->sk_reuseport ||
-			     rcu_access_pointer(sk->sk_reuseport_cb) ||
-			     (sk2->sk_state != TCP_TIME_WAIT &&
-			     !uid_eq(uid, sock_i_uid(sk2))))) {
-				if (inet_rcv_saddr_equal(sk, sk2, true))
-					break;
-			}
-			if (!relax && reuse && sk2->sk_reuse &&
+			if (reuse && sk2->sk_reuse &&
 			    sk2->sk_state != TCP_LISTEN) {
+				if (!relax &&
+				    inet_rcv_saddr_equal(sk, sk2, true))
+					break;
+			} else if (!reuseport || !sk2->sk_reuseport ||
+				   rcu_access_pointer(sk->sk_reuseport_cb) ||
+				   (sk2->sk_state != TCP_TIME_WAIT &&
+				    !uid_eq(uid, sock_i_uid(sk2)))) {
 				if (inet_rcv_saddr_equal(sk, sk2, true))
 					break;
 			}
-- 
2.17.2 (Apple Git-113)


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 net-next 2/5] tcp: bind(0) remove the SO_REUSEADDR restriction when ephemeral ports are exhausted.
  2020-03-08 18:16 [PATCH v4 net-next 0/5] Improve bind(addr, 0) behaviour Kuniyuki Iwashima
  2020-03-08 18:16 ` [PATCH v4 net-next 1/5] tcp: Remove unnecessary conditions in inet_csk_bind_conflict() Kuniyuki Iwashima
@ 2020-03-08 18:16 ` Kuniyuki Iwashima
  2020-03-10  4:04   ` Eric Dumazet
  2020-03-08 18:16 ` [PATCH v4 net-next 3/5] tcp: Forbid to bind more than one sockets haveing SO_REUSEADDR and SO_REUSEPORT per EUID Kuniyuki Iwashima
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 11+ messages in thread
From: Kuniyuki Iwashima @ 2020-03-08 18:16 UTC (permalink / raw)
  To: davem, kuznet, yoshfuji, edumazet
  Cc: kuniyu, kuni1840, netdev, osa-contribution-log

Commit aacd9289af8b82f5fb01bcdd53d0e3406d1333c7 ("tcp: bind() use stronger
condition for bind_conflict") introduced a restriction to forbid to bind
SO_REUSEADDR enabled sockets to the same (addr, port) tuple in order to
assign ports dispersedly so that we can connect to the same remote host.

The change results in accelerating port depletion so that we fail to bind
sockets to the same local port even if we want to connect to the different
remote hosts.

You can reproduce this issue by following instructions below.
  1. # sysctl -w net.ipv4.ip_local_port_range="32768 32768"
  2. set SO_REUSEADDR to two sockets.
  3. bind two sockets to (localhost, 0) and the latter fails.

Therefore, when ephemeral ports are exhausted, bind(0) should fallback to
the legacy behaviour to enable the SO_REUSEADDR option and make it possible
to connect to different remote (addr, port) tuples.

This patch allows us to bind SO_REUSEADDR enabled sockets to the same
(addr, port) only when all ephemeral ports are exhausted.

The only notable thing is that if all sockets bound to the same port have
both SO_REUSEADDR and SO_REUSEPORT enabled, we can bind sockets to an
ephemeral port and also do listen().

Fixes: aacd9289af8b ("tcp: bind() use stronger condition for bind_conflict")

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
---
 net/ipv4/inet_connection_sock.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 2e9549f49a82..cddeab240ea6 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -174,12 +174,14 @@ inet_csk_find_open_port(struct sock *sk, struct inet_bind_bucket **tb_ret, int *
 	int port = 0;
 	struct inet_bind_hashbucket *head;
 	struct net *net = sock_net(sk);
+	bool relax = false;
 	int i, low, high, attempt_half;
 	struct inet_bind_bucket *tb;
 	u32 remaining, offset;
 	int l3mdev;
 
 	l3mdev = inet_sk_bound_l3mdev(sk);
+ports_exhausted:
 	attempt_half = (sk->sk_reuse == SK_CAN_REUSE) ? 1 : 0;
 other_half_scan:
 	inet_get_local_port_range(net, &low, &high);
@@ -217,7 +219,7 @@ inet_csk_find_open_port(struct sock *sk, struct inet_bind_bucket **tb_ret, int *
 		inet_bind_bucket_for_each(tb, &head->chain)
 			if (net_eq(ib_net(tb), net) && tb->l3mdev == l3mdev &&
 			    tb->port == port) {
-				if (!inet_csk_bind_conflict(sk, tb, false, false))
+				if (!inet_csk_bind_conflict(sk, tb, relax, false))
 					goto success;
 				goto next_port;
 			}
@@ -237,6 +239,12 @@ inet_csk_find_open_port(struct sock *sk, struct inet_bind_bucket **tb_ret, int *
 		attempt_half = 2;
 		goto other_half_scan;
 	}
+
+	if (!relax) {
+		/* We still have a chance to connect to different destinations */
+		relax = true;
+		goto ports_exhausted;
+	}
 	return NULL;
 success:
 	*port_ret = port;
-- 
2.17.2 (Apple Git-113)


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 net-next 3/5] tcp: Forbid to bind more than one sockets haveing SO_REUSEADDR and SO_REUSEPORT per EUID.
  2020-03-08 18:16 [PATCH v4 net-next 0/5] Improve bind(addr, 0) behaviour Kuniyuki Iwashima
  2020-03-08 18:16 ` [PATCH v4 net-next 1/5] tcp: Remove unnecessary conditions in inet_csk_bind_conflict() Kuniyuki Iwashima
  2020-03-08 18:16 ` [PATCH v4 net-next 2/5] tcp: bind(0) remove the SO_REUSEADDR restriction when ephemeral ports are exhausted Kuniyuki Iwashima
@ 2020-03-08 18:16 ` Kuniyuki Iwashima
  2020-03-08 18:16 ` [PATCH v4 net-next 4/5] net: Add net.ipv4.ip_autobind_reuse option Kuniyuki Iwashima
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Kuniyuki Iwashima @ 2020-03-08 18:16 UTC (permalink / raw)
  To: davem, kuznet, yoshfuji, edumazet
  Cc: kuniyu, kuni1840, netdev, osa-contribution-log

If there is no TCP_LISTEN socket on a ephemeral port, we can bind multiple
sockets having SO_REUSEADDR to the same port. Then if all sockets bound to
the port have also SO_REUSEPORT enabled and have the same EUID, all of them
can be listened. This is not safe.

Let's say, an application has root privilege and binds sockets to an
ephemeral port with both of SO_REUSEADDR and SO_REUSEPORT. When none of
sockets is not listened yet, a malicious user can use sudo, exhaust
ephemeral ports, and bind sockets to the same ephemeral port, so he or she
can call listen and steal the port.

To prevent this issue, we must not bind more than one sockets that have the
same EUID and both of SO_REUSEADDR and SO_REUSEPORT.

On the other hand, if the sockets have different EUIDs, the issue above does
not occur. After sockets with different EUIDs are bound to the same port and
one of them is listened, no more socket can be listened. This is because the
condition below is evaluated true and listen() for the second socket fails.

			} else if (!reuseport_ok ||
				   !reuseport || !sk2->sk_reuseport ||
				   rcu_access_pointer(sk->sk_reuseport_cb) ||
				   (sk2->sk_state != TCP_TIME_WAIT &&
				    !uid_eq(uid, sock_i_uid(sk2)))) {
				if (inet_rcv_saddr_equal(sk, sk2, true))
					break;
			}

Therefore, on the same port, we cannot do listen() for multiple sockets with
different EUIDs and any other listen syscalls fail, so the problem does not
happen. In this case, we can still call connect() for other sockets that
cannot be listened, so we have to succeed to call bind() in order to fully
utilize 4-tuples.

Summarizing the above, we should be able to bind only one socket having
SO_REUSEADDR and SO_REUSEPORT per EUID.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
---
 net/ipv4/inet_connection_sock.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index cddeab240ea6..d27ed5fe7147 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -131,7 +131,7 @@ static int inet_csk_bind_conflict(const struct sock *sk,
 {
 	struct sock *sk2;
 	bool reuse = sk->sk_reuse;
-	bool reuseport = !!sk->sk_reuseport && reuseport_ok;
+	bool reuseport = !!sk->sk_reuseport;
 	kuid_t uid = sock_i_uid((struct sock *)sk);
 
 	/*
@@ -148,10 +148,16 @@ static int inet_csk_bind_conflict(const struct sock *sk,
 		     sk->sk_bound_dev_if == sk2->sk_bound_dev_if)) {
 			if (reuse && sk2->sk_reuse &&
 			    sk2->sk_state != TCP_LISTEN) {
-				if (!relax &&
+				if ((!relax ||
+				     (!reuseport_ok &&
+				      reuseport && sk2->sk_reuseport &&
+				      !rcu_access_pointer(sk->sk_reuseport_cb) &&
+				      (sk2->sk_state == TCP_TIME_WAIT ||
+				       uid_eq(uid, sock_i_uid(sk2))))) &&
 				    inet_rcv_saddr_equal(sk, sk2, true))
 					break;
-			} else if (!reuseport || !sk2->sk_reuseport ||
+			} else if (!reuseport_ok ||
+				   !reuseport || !sk2->sk_reuseport ||
 				   rcu_access_pointer(sk->sk_reuseport_cb) ||
 				   (sk2->sk_state != TCP_TIME_WAIT &&
 				    !uid_eq(uid, sock_i_uid(sk2)))) {
-- 
2.17.2 (Apple Git-113)


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 net-next 4/5] net: Add net.ipv4.ip_autobind_reuse option.
  2020-03-08 18:16 [PATCH v4 net-next 0/5] Improve bind(addr, 0) behaviour Kuniyuki Iwashima
                   ` (2 preceding siblings ...)
  2020-03-08 18:16 ` [PATCH v4 net-next 3/5] tcp: Forbid to bind more than one sockets haveing SO_REUSEADDR and SO_REUSEPORT per EUID Kuniyuki Iwashima
@ 2020-03-08 18:16 ` Kuniyuki Iwashima
  2020-03-10  4:05   ` Eric Dumazet
  2020-03-08 18:21 ` [PATCH v4 net-next 5/5] selftests: net: Add SO_REUSEADDR test to check if 4-tuples are fully utilized Kuniyuki Iwashima
  2020-03-10  3:14 ` [PATCH v4 net-next 0/5] Improve bind(addr, 0) behaviour David Miller
  5 siblings, 1 reply; 11+ messages in thread
From: Kuniyuki Iwashima @ 2020-03-08 18:16 UTC (permalink / raw)
  To: davem, kuznet, yoshfuji, edumazet
  Cc: kuniyu, kuni1840, netdev, osa-contribution-log

The two commits("tcp: bind(addr, 0) remove the SO_REUSEADDR restriction
when ephemeral ports are exhausted" and "tcp: Forbid to automatically bind
more than one sockets haveing SO_REUSEADDR and SO_REUSEPORT per EUID")
introduced the new feature to reuse ports with SO_REUSEADDR when all
ephemeral pors are exhausted. They allow connect() and listen() to share
ports in the following way.

  1. setsockopt(sk1, SO_REUSEADDR)
  2. setsockopt(sk2, SO_REUSEADDR)
  3. bind(sk1, saddr, 0)
  4. bind(sk2, saddr, 0)
  5. connect(sk1, daddr)
  6. listen(sk2)

In this situation, new socket cannot be bound to the port, but sharing
port between connect() and listen() may break some applications. The
ip_autobind_reuse option is false (0) by default and disables the feature.
If it is set true, we can fully utilize the 4-tuples.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
---
 Documentation/networking/ip-sysctl.txt | 7 +++++++
 include/net/netns/ipv4.h               | 1 +
 net/ipv4/inet_connection_sock.c        | 2 +-
 net/ipv4/sysctl_net_ipv4.c             | 7 +++++++
 4 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 5f53faff4e25..9506a67a33c4 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -958,6 +958,13 @@ ip_nonlocal_bind - BOOLEAN
 	which can be quite useful - but may break some applications.
 	Default: 0
 
+ip_autobind_reuse - BOOLEAN
+	By default, bind() does not select the ports automatically even if
+	the new socket and all sockets bound to the port have SO_REUSEADDR.
+	ip_autobind_reuse allows bind() to reuse the port and this is useful
+	when you use bind()+connect(), but may break some applications.
+	Default: 0
+
 ip_dynaddr - BOOLEAN
 	If set non-zero, enables support for dynamic addresses.
 	If set to a non-zero value larger than 1, a kernel log
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 08b98414d94e..154b8f01499b 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -101,6 +101,7 @@ struct netns_ipv4 {
 	int sysctl_ip_fwd_use_pmtu;
 	int sysctl_ip_fwd_update_priority;
 	int sysctl_ip_nonlocal_bind;
+	int sysctl_ip_autobind_reuse;
 	/* Shall we try to damage output packets if routing dev changes? */
 	int sysctl_ip_dynaddr;
 	int sysctl_ip_early_demux;
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index d27ed5fe7147..3b4f81790e3e 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -246,7 +246,7 @@ inet_csk_find_open_port(struct sock *sk, struct inet_bind_bucket **tb_ret, int *
 		goto other_half_scan;
 	}
 
-	if (!relax) {
+	if (net->ipv4.sysctl_ip_autobind_reuse && !relax) {
 		/* We still have a chance to connect to different destinations */
 		relax = true;
 		goto ports_exhausted;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 9684af02e0a5..3b191764718b 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -775,6 +775,13 @@ static struct ctl_table ipv4_net_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec
 	},
+	{
+		.procname	= "ip_autobind_reuse",
+		.data		= &init_net.ipv4.sysctl_ip_autobind_reuse,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec
+	},
 	{
 		.procname	= "fwmark_reflect",
 		.data		= &init_net.ipv4.sysctl_fwmark_reflect,
-- 
2.17.2 (Apple Git-113)


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 net-next 5/5] selftests: net: Add SO_REUSEADDR test to check if 4-tuples are fully utilized.
  2020-03-08 18:16 [PATCH v4 net-next 0/5] Improve bind(addr, 0) behaviour Kuniyuki Iwashima
                   ` (3 preceding siblings ...)
  2020-03-08 18:16 ` [PATCH v4 net-next 4/5] net: Add net.ipv4.ip_autobind_reuse option Kuniyuki Iwashima
@ 2020-03-08 18:21 ` Kuniyuki Iwashima
  2020-03-10  3:14 ` [PATCH v4 net-next 0/5] Improve bind(addr, 0) behaviour David Miller
  5 siblings, 0 replies; 11+ messages in thread
From: Kuniyuki Iwashima @ 2020-03-08 18:21 UTC (permalink / raw)
  To: kuniyu
  Cc: davem, edumazet, kuni1840, kuznet, netdev, osa-contribution-log,
	yoshfuji

This commit adds a test to check if we can fully utilize 4-tuples for
connect() when all ephemeral ports are exhausted.

The test program changes the local port range to use only one port and binds
two sockets with or without SO_REUSEADDR and SO_REUSEPORT, and with the same
EUID or with different EUIDs, then do listen().

We should be able to bind only one socket having both SO_REUSEADDR and
SO_REUSEPORT per EUID, which restriction is to prevent unintentional
listen().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
---
 tools/testing/selftests/net/.gitignore        |   1 +
 tools/testing/selftests/net/Makefile          |   2 +
 .../selftests/net/reuseaddr_ports_exhausted.c | 162 ++++++++++++++++++
 .../net/reuseaddr_ports_exhausted.sh          |  35 ++++
 4 files changed, 200 insertions(+)
 create mode 100644 tools/testing/selftests/net/reuseaddr_ports_exhausted.c
 create mode 100755 tools/testing/selftests/net/reuseaddr_ports_exhausted.sh

diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore
index ecc52d4c034d..91f9aea853b1 100644
--- a/tools/testing/selftests/net/.gitignore
+++ b/tools/testing/selftests/net/.gitignore
@@ -23,3 +23,4 @@ so_txtime
 tcp_fastopen_backup_key
 nettest
 fin_ack_lat
+reuseaddr_ports_exhausted
\ No newline at end of file
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index b5694196430a..ded1aa394880 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -12,6 +12,7 @@ TEST_PROGS += udpgro_bench.sh udpgro.sh test_vxlan_under_vrf.sh reuseport_addr_a
 TEST_PROGS += test_vxlan_fdb_changelink.sh so_txtime.sh ipv6_flowlabel.sh
 TEST_PROGS += tcp_fastopen_backup_key.sh fcnal-test.sh l2tp.sh traceroute.sh
 TEST_PROGS += fin_ack_lat.sh
+TEST_PROGS += reuseaddr_ports_exhausted.sh
 TEST_PROGS_EXTENDED := in_netns.sh
 TEST_GEN_FILES =  socket nettest
 TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy reuseport_addr_any
@@ -22,6 +23,7 @@ TEST_GEN_FILES += tcp_fastopen_backup_key
 TEST_GEN_FILES += fin_ack_lat
 TEST_GEN_PROGS = reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa
 TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict tls
+TEST_GEN_FILES += reuseaddr_ports_exhausted
 
 KSFT_KHDR_INSTALL := 1
 include ../lib.mk
diff --git a/tools/testing/selftests/net/reuseaddr_ports_exhausted.c b/tools/testing/selftests/net/reuseaddr_ports_exhausted.c
new file mode 100644
index 000000000000..7b01b7c2ec10
--- /dev/null
+++ b/tools/testing/selftests/net/reuseaddr_ports_exhausted.c
@@ -0,0 +1,162 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Check if we can fully utilize 4-tuples for connect().
+ *
+ * Rules to bind sockets to the same port when all ephemeral ports are
+ * exhausted.
+ *
+ *   1. if there are TCP_LISTEN sockets on the port, fail to bind.
+ *   2. if there are sockets without SO_REUSEADDR, fail to bind.
+ *   3. if SO_REUSEADDR is disabled, fail to bind.
+ *   4. if SO_REUSEADDR is enabled and SO_REUSEPORT is disabled,
+ *        succeed to bind.
+ *   5. if SO_REUSEADDR and SO_REUSEPORT are enabled and
+ *        there is no socket having the both options and the same EUID,
+ *        succeed to bind.
+ *   6. fail to bind.
+ *
+ * Author: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
+ */
+#include <arpa/inet.h>
+#include <netinet/in.h>
+#include <sys/socket.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include "../kselftest_harness.h"
+
+struct reuse_opts {
+	int reuseaddr[2];
+	int reuseport[2];
+};
+
+struct reuse_opts unreusable_opts[12] = {
+	{0, 0, 0, 0},
+	{0, 0, 0, 1},
+	{0, 0, 1, 0},
+	{0, 0, 1, 1},
+	{0, 1, 0, 0},
+	{0, 1, 0, 1},
+	{0, 1, 1, 0},
+	{0, 1, 1, 1},
+	{1, 0, 0, 0},
+	{1, 0, 0, 1},
+	{1, 0, 1, 0},
+	{1, 0, 1, 1},
+};
+
+struct reuse_opts reusable_opts[4] = {
+	{1, 1, 0, 0},
+	{1, 1, 0, 1},
+	{1, 1, 1, 0},
+	{1, 1, 1, 1},
+};
+
+int bind_port(struct __test_metadata *_metadata, int reuseaddr, int reuseport)
+{
+	struct sockaddr_in local_addr;
+	int len = sizeof(local_addr);
+	int fd, ret;
+
+	fd = socket(AF_INET, SOCK_STREAM, 0);
+	ASSERT_NE(-1, fd) TH_LOG("failed to open socket.");
+
+	ret = setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &reuseaddr, sizeof(int));
+	ASSERT_EQ(0, ret) TH_LOG("failed to setsockopt: SO_REUSEADDR.");
+
+	ret = setsockopt(fd, SOL_SOCKET, SO_REUSEPORT, &reuseport, sizeof(int));
+	ASSERT_EQ(0, ret) TH_LOG("failed to setsockopt: SO_REUSEPORT.");
+
+	local_addr.sin_family = AF_INET;
+	local_addr.sin_addr.s_addr = inet_addr("127.0.0.1");
+	local_addr.sin_port = 0;
+
+	if (bind(fd, (struct sockaddr *)&local_addr, len) == -1) {
+		close(fd);
+		return -1;
+	}
+
+	return fd;
+}
+
+TEST(reuseaddr_ports_exhausted_unreusable)
+{
+	struct reuse_opts *opts;
+	int i, j, fd[2];
+
+	for (i = 0; i < 12; i++) {
+		opts = &unreusable_opts[i];
+
+		for (j = 0; j < 2; j++)
+			fd[j] = bind_port(_metadata, opts->reuseaddr[j], opts->reuseport[j]);
+
+		ASSERT_NE(-1, fd[0]) TH_LOG("failed to bind.");
+		EXPECT_EQ(-1, fd[1]) TH_LOG("should fail to bind.");
+
+		for (j = 0; j < 2; j++)
+			if (fd[j] != -1)
+				close(fd[j]);
+	}
+}
+
+TEST(reuseaddr_ports_exhausted_reusable_same_euid)
+{
+	struct reuse_opts *opts;
+	int i, j, fd[2];
+
+	for (i = 0; i < 4; i++) {
+		opts = &reusable_opts[i];
+
+		for (j = 0; j < 2; j++)
+			fd[j] = bind_port(_metadata, opts->reuseaddr[j], opts->reuseport[j]);
+
+		ASSERT_NE(-1, fd[0]) TH_LOG("failed to bind.");
+
+		if (opts->reuseport[0] && opts->reuseport[1]) {
+			EXPECT_EQ(-1, fd[1]) TH_LOG("should fail to bind because both sockets succeed to be listened.");
+		} else {
+			EXPECT_NE(-1, fd[1]) TH_LOG("should succeed to bind to connect to different destinations.");
+		}
+
+		for (j = 0; j < 2; j++)
+			if (fd[j] != -1)
+				close(fd[j]);
+	}
+}
+
+TEST(reuseaddr_ports_exhausted_reusable_different_euid)
+{
+	struct reuse_opts *opts;
+	int i, j, ret, fd[2];
+	uid_t euid[2] = {10, 20};
+
+	for (i = 0; i < 4; i++) {
+		opts = &reusable_opts[i];
+
+		for (j = 0; j < 2; j++) {
+			ret = seteuid(euid[j]);
+			ASSERT_EQ(0, ret) TH_LOG("failed to seteuid: %d.", euid[j]);
+
+			fd[j] = bind_port(_metadata, opts->reuseaddr[j], opts->reuseport[j]);
+
+			ret = seteuid(0);
+			ASSERT_EQ(0, ret) TH_LOG("failed to seteuid: 0.");
+		}
+
+		ASSERT_NE(-1, fd[0]) TH_LOG("failed to bind.");
+		EXPECT_NE(-1, fd[1]) TH_LOG("should succeed to bind because one socket can be bound in each euid.");
+
+		if (fd[1] != -1) {
+			ret = listen(fd[0], 5);
+			ASSERT_EQ(0, ret) TH_LOG("failed to listen.");
+
+			ret = listen(fd[1], 5);
+			EXPECT_EQ(-1, ret) TH_LOG("should fail to listen because only one uid reserves the port in TCP_LISTEN.");
+		}
+
+		for (j = 0; j < 2; j++)
+			if (fd[j] != -1)
+				close(fd[j]);
+	}
+}
+
+TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/net/reuseaddr_ports_exhausted.sh b/tools/testing/selftests/net/reuseaddr_ports_exhausted.sh
new file mode 100755
index 000000000000..20e3a2913d06
--- /dev/null
+++ b/tools/testing/selftests/net/reuseaddr_ports_exhausted.sh
@@ -0,0 +1,35 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Run tests when all ephemeral ports are exhausted.
+#
+# Author: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
+
+set +x
+set -e
+
+readonly NETNS="ns-$(mktemp -u XXXXXX)"
+
+setup() {
+	ip netns add "${NETNS}"
+	ip -netns "${NETNS}" link set lo up
+	ip netns exec "${NETNS}" \
+		sysctl -w net.ipv4.ip_local_port_range="32768 32768" \
+		> /dev/null 2>&1
+	ip netns exec "${NETNS}" \
+		sysctl -w net.ipv4.ip_autobind_reuse=1 > /dev/null 2>&1
+}
+
+cleanup() {
+	ip netns del "${NETNS}"
+}
+
+trap cleanup EXIT
+setup
+
+do_test() {
+	ip netns exec "${NETNS}" ./reuseaddr_ports_exhausted
+}
+
+do_test
+echo "tests done"
-- 
2.17.2 (Apple Git-113)


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 net-next 0/5] Improve bind(addr, 0) behaviour.
  2020-03-08 18:16 [PATCH v4 net-next 0/5] Improve bind(addr, 0) behaviour Kuniyuki Iwashima
                   ` (4 preceding siblings ...)
  2020-03-08 18:21 ` [PATCH v4 net-next 5/5] selftests: net: Add SO_REUSEADDR test to check if 4-tuples are fully utilized Kuniyuki Iwashima
@ 2020-03-10  3:14 ` David Miller
  5 siblings, 0 replies; 11+ messages in thread
From: David Miller @ 2020-03-10  3:14 UTC (permalink / raw)
  To: kuniyu; +Cc: kuznet, yoshfuji, edumazet, kuni1840, netdev, osa-contribution-log

From: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Date: Mon, 9 Mar 2020 03:16:10 +0900

> Currently we fail to bind sockets to ephemeral ports when all of the ports
> are exhausted even if all sockets have SO_REUSEADDR enabled. In this case,
> we still have a chance to connect to the different remote hosts.
> 
> These patches add net.ipv4.ip_autobind_reuse option and fix the behaviour
> to fully utilize all space of the local (addr, port) tuples.
 ...

Eric, please review, thank you.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 net-next 2/5] tcp: bind(0) remove the SO_REUSEADDR restriction when ephemeral ports are exhausted.
  2020-03-08 18:16 ` [PATCH v4 net-next 2/5] tcp: bind(0) remove the SO_REUSEADDR restriction when ephemeral ports are exhausted Kuniyuki Iwashima
@ 2020-03-10  4:04   ` Eric Dumazet
  2020-03-10  7:41     ` Kuniyuki Iwashima
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2020-03-10  4:04 UTC (permalink / raw)
  To: Kuniyuki Iwashima, davem, kuznet, yoshfuji, edumazet
  Cc: kuni1840, netdev, osa-contribution-log



On 3/8/20 11:16 AM, Kuniyuki Iwashima wrote:
> Commit aacd9289af8b82f5fb01bcdd53d0e3406d1333c7 ("tcp: bind() use stronger
> condition for bind_conflict") introduced a restriction to forbid to bind
> SO_REUSEADDR enabled sockets to the same (addr, port) tuple in order to
> assign ports dispersedly so that we can connect to the same remote host.
> 
> The change results in accelerating port depletion so that we fail to bind
> sockets to the same local port even if we want to connect to the different
> remote hosts.
> 
> You can reproduce this issue by following instructions below.
>   1. # sysctl -w net.ipv4.ip_local_port_range="32768 32768"
>   2. set SO_REUSEADDR to two sockets.
>   3. bind two sockets to (localhost, 0) and the latter fails.
> 
> Therefore, when ephemeral ports are exhausted, bind(0) should fallback to
> the legacy behaviour to enable the SO_REUSEADDR option and make it possible
> to connect to different remote (addr, port) tuples.

Sadly this commit tries hard to support obsolete SO_REUSEADDR for active connections,
which makes little sense now we have more powerful IP_BIND_ADDRESS_NO_PORT

SO_REUSEADDR only really makes sense for a listener, because you want a
server to be able to restart after core dump, while prior sockets are still
kept in TIME_WAIT state.

Same for SO_REUSEPORT : it only made sense for sharded listeners in linux kernel.

Trying to allocate a sport at bind() time, without knowing the destination address/port
is really not something that can be fixed.

Your patches might allow a 2x increase, while IP_BIND_ADDRESS_NO_PORT
basically allows for 1000x increase of the possible combinations.



> 
> This patch allows us to bind SO_REUSEADDR enabled sockets to the same
> (addr, port) only when all ephemeral ports are exhausted.
> 
> The only notable thing is that if all sockets bound to the same port have
> both SO_REUSEADDR and SO_REUSEPORT enabled, we can bind sockets to an
> ephemeral port and also do listen().
> 
> Fixes: aacd9289af8b ("tcp: bind() use stronger condition for bind_conflict")

I disagree with this Fixes: tag  : I do not want this patch in stable kernels,
particularly if you put the sysctl patch as a followup without a Fixes: tag.

Please reorder your patch to first introduce the sysctl, then this one.

Or squash the two patches.

> 
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 net-next 4/5] net: Add net.ipv4.ip_autobind_reuse option.
  2020-03-08 18:16 ` [PATCH v4 net-next 4/5] net: Add net.ipv4.ip_autobind_reuse option Kuniyuki Iwashima
@ 2020-03-10  4:05   ` Eric Dumazet
  2020-03-10  7:42     ` Kuniyuki Iwashima
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2020-03-10  4:05 UTC (permalink / raw)
  To: Kuniyuki Iwashima, davem, kuznet, yoshfuji, edumazet
  Cc: kuni1840, netdev, osa-contribution-log



On 3/8/20 11:16 AM, Kuniyuki Iwashima wrote:
> The two commits("tcp: bind(addr, 0) remove the SO_REUSEADDR restriction
> when ephemeral ports are exhausted" and "tcp: Forbid to automatically bind
> more than one sockets haveing SO_REUSEADDR and SO_REUSEPORT per EUID")
> introduced the new feature to reuse ports with SO_REUSEADDR when all
> ephemeral pors are exhausted. They allow connect() and listen() to share
> ports in the following way.
> 
>   1. setsockopt(sk1, SO_REUSEADDR)
>   2. setsockopt(sk2, SO_REUSEADDR)
>   3. bind(sk1, saddr, 0)
>   4. bind(sk2, saddr, 0)
>   5. connect(sk1, daddr)
>   6. listen(sk2)

Honestly, IP_BIND_ADDRESS_NO_PORT makes all these problems go away.


> 
> In this situation, new socket cannot be bound to the port, but sharing
> port between connect() and listen() may break some applications. The
> ip_autobind_reuse option is false (0) by default and disables the feature.
> If it is set true, we can fully utilize the 4-tuples.
> 
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
> ---
>  Documentation/networking/ip-sysctl.txt | 7 +++++++
>  include/net/netns/ipv4.h               | 1 +
>  net/ipv4/inet_connection_sock.c        | 2 +-
>  net/ipv4/sysctl_net_ipv4.c             | 7 +++++++
>  4 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
> index 5f53faff4e25..9506a67a33c4 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -958,6 +958,13 @@ ip_nonlocal_bind - BOOLEAN
>  	which can be quite useful - but may break some applications.
>  	Default: 0
>  
> +ip_autobind_reuse - BOOLEAN
> +	By default, bind() does not select the ports automatically even if
> +	the new socket and all sockets bound to the port have SO_REUSEADDR.
> +	ip_autobind_reuse allows bind() to reuse the port and this is useful
> +	when you use bind()+connect(), but may break some applications.

I would mention that the preferred solution is to use IP_BIND_ADDRESS_NO_PORT,
which is fully supported, and that this sysctl should only be set by experts.

> +	Default: 0
> +
>  ip_dynaddr - BOOLEAN
>  	If set non-zero, enables support for dynamic addresses.
>  	If set to a non-zero value larger than 1, a kernel log
> diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
> index 08b98414d94e..154b8f01499b 100644
> --- a/include/net/netns/ipv4.h
> +++ b/include/net/netns/ipv4.h
> @@ -101,6 +101,7 @@ struct netns_ipv4 {
>  	int sysctl_ip_fwd_use_pmtu;
>  	int sysctl_ip_fwd_update_priority;
>  	int sysctl_ip_nonlocal_bind;
> +	int sysctl_ip_autobind_reuse;
>  	/* Shall we try to damage output packets if routing dev changes? */
>  	int sysctl_ip_dynaddr;
>  	int sysctl_ip_early_demux;
> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index d27ed5fe7147..3b4f81790e3e 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -246,7 +246,7 @@ inet_csk_find_open_port(struct sock *sk, struct inet_bind_bucket **tb_ret, int *
>  		goto other_half_scan;
>  	}
>  
> -	if (!relax) {
> +	if (net->ipv4.sysctl_ip_autobind_reuse && !relax) {
>  		/* We still have a chance to connect to different destinations */
>  		relax = true;
>  		goto ports_exhausted;
> diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
> index 9684af02e0a5..3b191764718b 100644
> --- a/net/ipv4/sysctl_net_ipv4.c
> +++ b/net/ipv4/sysctl_net_ipv4.c
> @@ -775,6 +775,13 @@ static struct ctl_table ipv4_net_table[] = {
>  		.mode		= 0644,
>  		.proc_handler	= proc_dointvec
>  	},
> +	{
> +		.procname	= "ip_autobind_reuse",
> +		.data		= &init_net.ipv4.sysctl_ip_autobind_reuse,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec

.proc_handler = proc_dointvec_minmax,
.extra1         = SYSCTL_ZERO,
.extra2         = SYSCTL_ONE,



> +	},
>  	{
>  		.procname	= "fwmark_reflect",
>  		.data		= &init_net.ipv4.sysctl_fwmark_reflect,
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 net-next 2/5] tcp: bind(0) remove the SO_REUSEADDR restriction when ephemeral ports are exhausted.
  2020-03-10  4:04   ` Eric Dumazet
@ 2020-03-10  7:41     ` Kuniyuki Iwashima
  0 siblings, 0 replies; 11+ messages in thread
From: Kuniyuki Iwashima @ 2020-03-10  7:41 UTC (permalink / raw)
  To: eric.dumazet
  Cc: davem, edumazet, kuni1840, kuniyu, kuznet, netdev,
	osa-contribution-log, yoshfuji

From:   Eric Dumazet <eric.dumazet@gmail.com>
Date:   Mon, 9 Mar 2020 21:04:24 -0700
> On 3/8/20 11:16 AM, Kuniyuki Iwashima wrote:
> > Commit aacd9289af8b82f5fb01bcdd53d0e3406d1333c7 ("tcp: bind() use stronger
> > condition for bind_conflict") introduced a restriction to forbid to bind
> > SO_REUSEADDR enabled sockets to the same (addr, port) tuple in order to
> > assign ports dispersedly so that we can connect to the same remote host.
> > 
> > The change results in accelerating port depletion so that we fail to bind
> > sockets to the same local port even if we want to connect to the different
> > remote hosts.
> > 
> > You can reproduce this issue by following instructions below.
> >   1. # sysctl -w net.ipv4.ip_local_port_range="32768 32768"
> >   2. set SO_REUSEADDR to two sockets.
> >   3. bind two sockets to (localhost, 0) and the latter fails.
> > 
> > Therefore, when ephemeral ports are exhausted, bind(0) should fallback to
> > the legacy behaviour to enable the SO_REUSEADDR option and make it possible
> > to connect to different remote (addr, port) tuples.
> 
> Sadly this commit tries hard to support obsolete SO_REUSEADDR for active connections,
> which makes little sense now we have more powerful IP_BIND_ADDRESS_NO_PORT
> 
> SO_REUSEADDR only really makes sense for a listener, because you want a
> server to be able to restart after core dump, while prior sockets are still
> kept in TIME_WAIT state.
> 
> Same for SO_REUSEPORT : it only made sense for sharded listeners in linux kernel.
> 
> Trying to allocate a sport at bind() time, without knowing the destination address/port
> is really not something that can be fixed.
> 
> Your patches might allow a 2x increase, while IP_BIND_ADDRESS_NO_PORT
> basically allows for 1000x increase of the possible combinations.
> 
> 
> 
> > 
> > This patch allows us to bind SO_REUSEADDR enabled sockets to the same
> > (addr, port) only when all ephemeral ports are exhausted.
> > 
> > The only notable thing is that if all sockets bound to the same port have
> > both SO_REUSEADDR and SO_REUSEPORT enabled, we can bind sockets to an
> > ephemeral port and also do listen().
> > 
> > Fixes: aacd9289af8b ("tcp: bind() use stronger condition for bind_conflict")
> 
> I disagree with this Fixes: tag  : I do not want this patch in stable kernels,
> particularly if you put the sysctl patch as a followup without a Fixes: tag.
> 
> Please reorder your patch to first introduce the sysctl, then this one.
> 
> Or squash the two patches.

I'm sorry, I will remove the tag and squash the patches.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 net-next 4/5] net: Add net.ipv4.ip_autobind_reuse option.
  2020-03-10  4:05   ` Eric Dumazet
@ 2020-03-10  7:42     ` Kuniyuki Iwashima
  0 siblings, 0 replies; 11+ messages in thread
From: Kuniyuki Iwashima @ 2020-03-10  7:42 UTC (permalink / raw)
  To: eric.dumazet
  Cc: davem, edumazet, kuni1840, kuniyu, kuznet, netdev,
	osa-contribution-log, yoshfuji

From:   Eric Dumazet <eric.dumazet@gmail.com>
Date:   Mon, 9 Mar 2020 21:05:24 -0700
> On 3/8/20 11:16 AM, Kuniyuki Iwashima wrote:
> > The two commits("tcp: bind(addr, 0) remove the SO_REUSEADDR restriction
> > when ephemeral ports are exhausted" and "tcp: Forbid to automatically bind
> > more than one sockets haveing SO_REUSEADDR and SO_REUSEPORT per EUID")
> > introduced the new feature to reuse ports with SO_REUSEADDR when all
> > ephemeral pors are exhausted. They allow connect() and listen() to share
> > ports in the following way.
> > 
> >   1. setsockopt(sk1, SO_REUSEADDR)
> >   2. setsockopt(sk2, SO_REUSEADDR)
> >   3. bind(sk1, saddr, 0)
> >   4. bind(sk2, saddr, 0)
> >   5. connect(sk1, daddr)
> >   6. listen(sk2)
> 
> Honestly, IP_BIND_ADDRESS_NO_PORT makes all these problems go away.
> 
> 
> >
> > In this situation, new socket cannot be bound to the port, but sharing
> > port between connect() and listen() may break some applications. The
> > ip_autobind_reuse option is false (0) by default and disables the feature.
> > If it is set true, we can fully utilize the 4-tuples.
> > 
> > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
> > ---
> >  Documentation/networking/ip-sysctl.txt | 7 +++++++
> >  include/net/netns/ipv4.h               | 1 +
> >  net/ipv4/inet_connection_sock.c        | 2 +-
> >  net/ipv4/sysctl_net_ipv4.c             | 7 +++++++
> >  4 files changed, 16 insertions(+), 1 deletion(-)
> > 
> > diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
> > index 5f53faff4e25..9506a67a33c4 100644
> > --- a/Documentation/networking/ip-sysctl.txt
> > +++ b/Documentation/networking/ip-sysctl.txt
> > @@ -958,6 +958,13 @@ ip_nonlocal_bind - BOOLEAN
> >  	which can be quite useful - but may break some applications.
> >  	Default: 0
> >  
> > +ip_autobind_reuse - BOOLEAN
> > +	By default, bind() does not select the ports automatically even if
> > +	the new socket and all sockets bound to the port have SO_REUSEADDR.
> > +	ip_autobind_reuse allows bind() to reuse the port and this is useful
> > +	when you use bind()+connect(), but may break some applications.
> 
> I would mention that the preferred solution is to use IP_BIND_ADDRESS_NO_PORT,
> which is fully supported, and that this sysctl should only be set by experts.

I will add these to description.


> > +	Default: 0
> > +
> >  ip_dynaddr - BOOLEAN
> >  	If set non-zero, enables support for dynamic addresses.
> >  	If set to a non-zero value larger than 1, a kernel log
> > diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
> > index 08b98414d94e..154b8f01499b 100644
> > --- a/include/net/netns/ipv4.h
> > +++ b/include/net/netns/ipv4.h
> > @@ -101,6 +101,7 @@ struct netns_ipv4 {
> >  	int sysctl_ip_fwd_use_pmtu;
> >  	int sysctl_ip_fwd_update_priority;
> >  	int sysctl_ip_nonlocal_bind;
> > +	int sysctl_ip_autobind_reuse;
> >  	/* Shall we try to damage output packets if routing dev changes? */
> >  	int sysctl_ip_dynaddr;
> >  	int sysctl_ip_early_demux;
> > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> > index d27ed5fe7147..3b4f81790e3e 100644
> > --- a/net/ipv4/inet_connection_sock.c
> > +++ b/net/ipv4/inet_connection_sock.c
> > @@ -246,7 +246,7 @@ inet_csk_find_open_port(struct sock *sk, struct inet_bind_bucket **tb_ret, int *
> >  		goto other_half_scan;
> >  	}
> >  
> > -	if (!relax) {
> > +	if (net->ipv4.sysctl_ip_autobind_reuse && !relax) {
> >  		/* We still have a chance to connect to different destinations */
> >  		relax = true;
> >  		goto ports_exhausted;
> > diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
> > index 9684af02e0a5..3b191764718b 100644
> > --- a/net/ipv4/sysctl_net_ipv4.c
> > +++ b/net/ipv4/sysctl_net_ipv4.c
> > @@ -775,6 +775,13 @@ static struct ctl_table ipv4_net_table[] = {
> >  		.mode		= 0644,
> >  		.proc_handler	= proc_dointvec
> >  	},
> > +	{
> > +		.procname	= "ip_autobind_reuse",
> > +		.data		= &init_net.ipv4.sysctl_ip_autobind_reuse,
> > +		.maxlen		= sizeof(int),
> > +		.mode		= 0644,
> > +		.proc_handler	= proc_dointvec
> 
> .proc_handler = proc_dointvec_minmax,
> .extra1         = SYSCTL_ZERO,
> .extra2         = SYSCTL_ONE,

I will fix this and respin patches.

Thank you.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-03-10  7:42 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-08 18:16 [PATCH v4 net-next 0/5] Improve bind(addr, 0) behaviour Kuniyuki Iwashima
2020-03-08 18:16 ` [PATCH v4 net-next 1/5] tcp: Remove unnecessary conditions in inet_csk_bind_conflict() Kuniyuki Iwashima
2020-03-08 18:16 ` [PATCH v4 net-next 2/5] tcp: bind(0) remove the SO_REUSEADDR restriction when ephemeral ports are exhausted Kuniyuki Iwashima
2020-03-10  4:04   ` Eric Dumazet
2020-03-10  7:41     ` Kuniyuki Iwashima
2020-03-08 18:16 ` [PATCH v4 net-next 3/5] tcp: Forbid to bind more than one sockets haveing SO_REUSEADDR and SO_REUSEPORT per EUID Kuniyuki Iwashima
2020-03-08 18:16 ` [PATCH v4 net-next 4/5] net: Add net.ipv4.ip_autobind_reuse option Kuniyuki Iwashima
2020-03-10  4:05   ` Eric Dumazet
2020-03-10  7:42     ` Kuniyuki Iwashima
2020-03-08 18:21 ` [PATCH v4 net-next 5/5] selftests: net: Add SO_REUSEADDR test to check if 4-tuples are fully utilized Kuniyuki Iwashima
2020-03-10  3:14 ` [PATCH v4 net-next 0/5] Improve bind(addr, 0) behaviour David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).