netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 net-next 0/3] Improve bind(addr, 0) behaviour.
@ 2020-02-26  7:46 Kuniyuki Iwashima
  2020-02-26  7:46 ` [PATCH v2 net-next 1/3] tcp: Remove unnecessary conditions in inet_csk_bind_conflict() Kuniyuki Iwashima
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Kuniyuki Iwashima @ 2020-02-26  7:46 UTC (permalink / raw)
  To: davem, kuznet, yoshfuji, edumazet
  Cc: kuniyu, kuni1840, netdev, osa-contribution-log

Currently we fail to bind sockets to ephemeral ports when all of the ports
are exhausted even if all sockets have SO_REUSEADDR enabled. In this case,
we still have a chance to connect to the different remote hosts.

The second and third patches fix the behaviour to fully utilize all space
of the local (addr, port) tuples.

---
Changes in v2:
 - Change the description of the 2nd patch ('localhost' -> 'address').
 - Correct the description and the if statement of the 3rd patch.

v1 with tests:
 https://lore.kernel.org/netdev/20200220152020.13056-1-kuniyu@amazon.co.jp/
---

Kuniyuki Iwashima (3):
  tcp: Remove unnecessary conditions in inet_csk_bind_conflict().
  tcp: bind(addr, 0) remove the SO_REUSEADDR restriction when ephemeral
    ports are exhausted.
  tcp: Prevent port hijacking when ports are exhausted.

 net/ipv4/inet_connection_sock.c | 36 ++++++++++++++++++++++-----------
 1 file changed, 24 insertions(+), 12 deletions(-)

-- 
2.17.2 (Apple Git-113)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 net-next 1/3] tcp: Remove unnecessary conditions in inet_csk_bind_conflict().
  2020-02-26  7:46 [PATCH v2 net-next 0/3] Improve bind(addr, 0) behaviour Kuniyuki Iwashima
@ 2020-02-26  7:46 ` Kuniyuki Iwashima
  2020-02-26  7:46 ` [PATCH v2 net-next 2/3] tcp: bind(addr, 0) remove the SO_REUSEADDR restriction when ephemeral ports are exhausted Kuniyuki Iwashima
  2020-02-26  7:46 ` [PATCH v2 net-next 3/3] tcp: Prevent port hijacking when " Kuniyuki Iwashima
  2 siblings, 0 replies; 6+ messages in thread
From: Kuniyuki Iwashima @ 2020-02-26  7:46 UTC (permalink / raw)
  To: davem, kuznet, yoshfuji, edumazet
  Cc: kuniyu, kuni1840, netdev, osa-contribution-log

When we get an ephemeral port, the relax is false, so the SO_REUSEADDR
conditions may be evaluated twice. We do not need to check the conditions
again.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
---
 net/ipv4/inet_connection_sock.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index a4db79b1b643..2e9549f49a82 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -146,17 +146,15 @@ static int inet_csk_bind_conflict(const struct sock *sk,
 		    (!sk->sk_bound_dev_if ||
 		     !sk2->sk_bound_dev_if ||
 		     sk->sk_bound_dev_if == sk2->sk_bound_dev_if)) {
-			if ((!reuse || !sk2->sk_reuse ||
-			    sk2->sk_state == TCP_LISTEN) &&
-			    (!reuseport || !sk2->sk_reuseport ||
-			     rcu_access_pointer(sk->sk_reuseport_cb) ||
-			     (sk2->sk_state != TCP_TIME_WAIT &&
-			     !uid_eq(uid, sock_i_uid(sk2))))) {
-				if (inet_rcv_saddr_equal(sk, sk2, true))
-					break;
-			}
-			if (!relax && reuse && sk2->sk_reuse &&
+			if (reuse && sk2->sk_reuse &&
 			    sk2->sk_state != TCP_LISTEN) {
+				if (!relax &&
+				    inet_rcv_saddr_equal(sk, sk2, true))
+					break;
+			} else if (!reuseport || !sk2->sk_reuseport ||
+				   rcu_access_pointer(sk->sk_reuseport_cb) ||
+				   (sk2->sk_state != TCP_TIME_WAIT &&
+				    !uid_eq(uid, sock_i_uid(sk2)))) {
 				if (inet_rcv_saddr_equal(sk, sk2, true))
 					break;
 			}
-- 
2.17.2 (Apple Git-113)


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 net-next 2/3] tcp: bind(addr, 0) remove the SO_REUSEADDR restriction when ephemeral ports are exhausted.
  2020-02-26  7:46 [PATCH v2 net-next 0/3] Improve bind(addr, 0) behaviour Kuniyuki Iwashima
  2020-02-26  7:46 ` [PATCH v2 net-next 1/3] tcp: Remove unnecessary conditions in inet_csk_bind_conflict() Kuniyuki Iwashima
@ 2020-02-26  7:46 ` Kuniyuki Iwashima
  2020-02-26  7:46 ` [PATCH v2 net-next 3/3] tcp: Prevent port hijacking when " Kuniyuki Iwashima
  2 siblings, 0 replies; 6+ messages in thread
From: Kuniyuki Iwashima @ 2020-02-26  7:46 UTC (permalink / raw)
  To: davem, kuznet, yoshfuji, edumazet
  Cc: kuniyu, kuni1840, netdev, osa-contribution-log

Commit aacd9289af8b82f5fb01bcdd53d0e3406d1333c7 ("tcp: bind() use stronger
condition for bind_conflict") introduced a restriction to forbid to bind
SO_REUSEADDR enabled sockets to the same (addr, port) tuple in order to
assign ports dispersedly so that we can connect to the same remote host.

The change results in accelerating port depletion so that we fail to bind
sockets to the same local port even if we want to connect to the different
remote hosts.

You can reproduce this issue by following instructions below.
  1. # sysctl -w net.ipv4.ip_local_port_range="32768 32768"
  2. set SO_REUSEADDR to two sockets.
  3. bind two sockets to (address, 0) and the latter fails.

Therefore, when ephemeral ports are exhausted, bind(addr, 0) should
fallback to the legacy behaviour to enable the SO_REUSEADDR option and make
it possible to connect to different remote (addr, port) tuples.

This patch allows us to bind SO_REUSEADDR enabled sockets to the same
(addr, port) only when all ephemeral ports are exhausted.

The only notable thing is that if all sockets bound to the same port have
both SO_REUSEADDR and SO_REUSEPORT enabled, we can bind sockets to an
ephemeral port and also do listen().

Fixes: aacd9289af8b ("tcp: bind() use stronger condition for bind_conflict")

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
---
 net/ipv4/inet_connection_sock.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 2e9549f49a82..cddeab240ea6 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -174,12 +174,14 @@ inet_csk_find_open_port(struct sock *sk, struct inet_bind_bucket **tb_ret, int *
 	int port = 0;
 	struct inet_bind_hashbucket *head;
 	struct net *net = sock_net(sk);
+	bool relax = false;
 	int i, low, high, attempt_half;
 	struct inet_bind_bucket *tb;
 	u32 remaining, offset;
 	int l3mdev;
 
 	l3mdev = inet_sk_bound_l3mdev(sk);
+ports_exhausted:
 	attempt_half = (sk->sk_reuse == SK_CAN_REUSE) ? 1 : 0;
 other_half_scan:
 	inet_get_local_port_range(net, &low, &high);
@@ -217,7 +219,7 @@ inet_csk_find_open_port(struct sock *sk, struct inet_bind_bucket **tb_ret, int *
 		inet_bind_bucket_for_each(tb, &head->chain)
 			if (net_eq(ib_net(tb), net) && tb->l3mdev == l3mdev &&
 			    tb->port == port) {
-				if (!inet_csk_bind_conflict(sk, tb, false, false))
+				if (!inet_csk_bind_conflict(sk, tb, relax, false))
 					goto success;
 				goto next_port;
 			}
@@ -237,6 +239,12 @@ inet_csk_find_open_port(struct sock *sk, struct inet_bind_bucket **tb_ret, int *
 		attempt_half = 2;
 		goto other_half_scan;
 	}
+
+	if (!relax) {
+		/* We still have a chance to connect to different destinations */
+		relax = true;
+		goto ports_exhausted;
+	}
 	return NULL;
 success:
 	*port_ret = port;
-- 
2.17.2 (Apple Git-113)


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 net-next 3/3] tcp: Prevent port hijacking when ports are exhausted.
  2020-02-26  7:46 [PATCH v2 net-next 0/3] Improve bind(addr, 0) behaviour Kuniyuki Iwashima
  2020-02-26  7:46 ` [PATCH v2 net-next 1/3] tcp: Remove unnecessary conditions in inet_csk_bind_conflict() Kuniyuki Iwashima
  2020-02-26  7:46 ` [PATCH v2 net-next 2/3] tcp: bind(addr, 0) remove the SO_REUSEADDR restriction when ephemeral ports are exhausted Kuniyuki Iwashima
@ 2020-02-26  7:46 ` Kuniyuki Iwashima
  2020-02-26 17:47   ` Eric Dumazet
  2 siblings, 1 reply; 6+ messages in thread
From: Kuniyuki Iwashima @ 2020-02-26  7:46 UTC (permalink / raw)
  To: davem, kuznet, yoshfuji, edumazet
  Cc: kuniyu, kuni1840, netdev, osa-contribution-log

If all of the sockets bound to the same port have SO_REUSEADDR and
SO_REUSEPORT enabled, any other user can hijack the port by exhausting all
ephemeral ports, binding sockets to (addr, 0) and calling listen().

If both of SO_REUSEADDR and SO_REUSEPORT are enabled, the restriction of
SO_REUSEPORT should be taken into account so that can only one socket be in
TCP_LISTEN.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
---
 net/ipv4/inet_connection_sock.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index cddeab240ea6..d27ed5fe7147 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -131,7 +131,7 @@ static int inet_csk_bind_conflict(const struct sock *sk,
 {
 	struct sock *sk2;
 	bool reuse = sk->sk_reuse;
-	bool reuseport = !!sk->sk_reuseport && reuseport_ok;
+	bool reuseport = !!sk->sk_reuseport;
 	kuid_t uid = sock_i_uid((struct sock *)sk);
 
 	/*
@@ -148,10 +148,16 @@ static int inet_csk_bind_conflict(const struct sock *sk,
 		     sk->sk_bound_dev_if == sk2->sk_bound_dev_if)) {
 			if (reuse && sk2->sk_reuse &&
 			    sk2->sk_state != TCP_LISTEN) {
-				if (!relax &&
+				if ((!relax ||
+				     (!reuseport_ok &&
+				      reuseport && sk2->sk_reuseport &&
+				      !rcu_access_pointer(sk->sk_reuseport_cb) &&
+				      (sk2->sk_state == TCP_TIME_WAIT ||
+				       uid_eq(uid, sock_i_uid(sk2))))) &&
 				    inet_rcv_saddr_equal(sk, sk2, true))
 					break;
-			} else if (!reuseport || !sk2->sk_reuseport ||
+			} else if (!reuseport_ok ||
+				   !reuseport || !sk2->sk_reuseport ||
 				   rcu_access_pointer(sk->sk_reuseport_cb) ||
 				   (sk2->sk_state != TCP_TIME_WAIT &&
 				    !uid_eq(uid, sock_i_uid(sk2)))) {
-- 
2.17.2 (Apple Git-113)


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 net-next 3/3] tcp: Prevent port hijacking when ports are exhausted.
  2020-02-26  7:46 ` [PATCH v2 net-next 3/3] tcp: Prevent port hijacking when " Kuniyuki Iwashima
@ 2020-02-26 17:47   ` Eric Dumazet
  2020-02-29 18:37     ` Kuniyuki Iwashima
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2020-02-26 17:47 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: David Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, kuni1840,
	netdev, osa-contribution-log

On Tue, Feb 25, 2020 at 11:46 PM Kuniyuki Iwashima <kuniyu@amazon.co.jp> wrote:
>
> If all of the sockets bound to the same port have SO_REUSEADDR and
> SO_REUSEPORT enabled, any other user can hijack the port by exhausting all
> ephemeral ports, binding sockets to (addr, 0) and calling listen().
>

Yes, an user (application) can steal all ports by opening many
sockets, bind to (addr, 0) and calling listen().

This changelog is rather confusing, and your patch does not solve this
precise problem.
Patch titles are important, you are claiming something, but I fail to
see how the patch solves the problem stated in the title.

Please be more specific, and add tests officially, in tools/testing/selftests/


> If both of SO_REUSEADDR and SO_REUSEPORT are enabled, the restriction of
> SO_REUSEPORT should be taken into account so that can only one socket be in
> TCP_LISTEN.

Sorry, I do not understand this. If I do not understand the sentence,
I do not read the patch
changing one piece of code that has been very often broken in the past.

Please spend time on the changelog to give the exact outcome and goals.

Thanks.

>
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
> ---
>  net/ipv4/inet_connection_sock.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index cddeab240ea6..d27ed5fe7147 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -131,7 +131,7 @@ static int inet_csk_bind_conflict(const struct sock *sk,
>  {
>         struct sock *sk2;
>         bool reuse = sk->sk_reuse;
> -       bool reuseport = !!sk->sk_reuseport && reuseport_ok;
> +       bool reuseport = !!sk->sk_reuseport;
>         kuid_t uid = sock_i_uid((struct sock *)sk);
>
>         /*
> @@ -148,10 +148,16 @@ static int inet_csk_bind_conflict(const struct sock *sk,
>                      sk->sk_bound_dev_if == sk2->sk_bound_dev_if)) {
>                         if (reuse && sk2->sk_reuse &&
>                             sk2->sk_state != TCP_LISTEN) {
> -                               if (!relax &&
> +                               if ((!relax ||
> +                                    (!reuseport_ok &&
> +                                     reuseport && sk2->sk_reuseport &&
> +                                     !rcu_access_pointer(sk->sk_reuseport_cb) &&
> +                                     (sk2->sk_state == TCP_TIME_WAIT ||
> +                                      uid_eq(uid, sock_i_uid(sk2))))) &&
>                                     inet_rcv_saddr_equal(sk, sk2, true))
>                                         break;
> -                       } else if (!reuseport || !sk2->sk_reuseport ||
> +                       } else if (!reuseport_ok ||
> +                                  !reuseport || !sk2->sk_reuseport ||
>                                    rcu_access_pointer(sk->sk_reuseport_cb) ||
>                                    (sk2->sk_state != TCP_TIME_WAIT &&
>                                     !uid_eq(uid, sock_i_uid(sk2)))) {
> --
> 2.17.2 (Apple Git-113)
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 net-next 3/3] tcp: Prevent port hijacking when ports are exhausted.
  2020-02-26 17:47   ` Eric Dumazet
@ 2020-02-29 18:37     ` Kuniyuki Iwashima
  0 siblings, 0 replies; 6+ messages in thread
From: Kuniyuki Iwashima @ 2020-02-29 18:37 UTC (permalink / raw)
  To: edumazet
  Cc: davem, kuni1840, kuniyu, kuznet, netdev, osa-contribution-log, yoshfuji

From:   Eric Dumazet <edumazet@google.com>
Date:   Wed, 26 Feb 2020 09:47:26 -0800
> This changelog is rather confusing, and your patch does not solve this
> precise problem.
> Patch titles are important, you are claiming something, but I fail to
> see how the patch solves the problem stated in the title.
> 
> Please be more specific, and add tests officially, in tools/testing/selftests/
> 
> 
> > If both of SO_REUSEADDR and SO_REUSEPORT are enabled, the restriction of
> > SO_REUSEPORT should be taken into account so that can only one socket be in
> > TCP_LISTEN.
> 
> Sorry, I do not understand this. If I do not understand the sentence,
> I do not read the patch
> changing one piece of code that has been very often broken in the past.
> 
> Please spend time on the changelog to give the exact outcome and goals.

I am so sorry that I wrote the changelog roughly. I appreciate for your
kind advice. I could rewrite more precise description of this issue and
respin these patches with tests.

  [PATCH v3 net-next 0/4] Improve bind(addr, 0) behaviour.
  https://lore.kernel.org/netdev/20200229113554.78338-1-kuniyu@amazon.co.jp/

Thank you so much!

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-02-29 18:37 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-26  7:46 [PATCH v2 net-next 0/3] Improve bind(addr, 0) behaviour Kuniyuki Iwashima
2020-02-26  7:46 ` [PATCH v2 net-next 1/3] tcp: Remove unnecessary conditions in inet_csk_bind_conflict() Kuniyuki Iwashima
2020-02-26  7:46 ` [PATCH v2 net-next 2/3] tcp: bind(addr, 0) remove the SO_REUSEADDR restriction when ephemeral ports are exhausted Kuniyuki Iwashima
2020-02-26  7:46 ` [PATCH v2 net-next 3/3] tcp: Prevent port hijacking when " Kuniyuki Iwashima
2020-02-26 17:47   ` Eric Dumazet
2020-02-29 18:37     ` Kuniyuki Iwashima

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).