Stable Archive on lore.kernel.org
 help / color / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Willem de Bruijn <willemb@google.com>,
	Benjamin Herrenschmidt <benh@amazon.com>,
	Kuniyuki Iwashima <kuniyu@amazon.co.jp>,
	"David S. Miller" <davem@davemloft.net>
Subject: [PATCH 5.4 16/19] udp: Improve load balancing for SO_REUSEPORT.
Date: Thu, 30 Jul 2020 10:04:18 +0200
Message-ID: <20200730074421.306947214@linuxfoundation.org> (raw)
In-Reply-To: <20200730074420.502923740@linuxfoundation.org>

From: Kuniyuki Iwashima <kuniyu@amazon.co.jp>

[ Upstream commit efc6b6f6c3113e8b203b9debfb72d81e0f3dcace ]

Currently, SO_REUSEPORT does not work well if connected sockets are in a
UDP reuseport group.

Then reuseport_has_conns() returns true and the result of
reuseport_select_sock() is discarded. Also, unconnected sockets have the
same score, hence only does the first unconnected socket in udp_hslot
always receive all packets sent to unconnected sockets.

So, the result of reuseport_select_sock() should be used for load
balancing.

The noteworthy point is that the unconnected sockets placed after
connected sockets in sock_reuseport.socks will receive more packets than
others because of the algorithm in reuseport_select_sock().

    index | connected | reciprocal_scale | result
    ---------------------------------------------
    0     | no        | 20%              | 40%
    1     | no        | 20%              | 20%
    2     | yes       | 20%              | 0%
    3     | no        | 20%              | 40%
    4     | yes       | 20%              | 0%

If most of the sockets are connected, this can be a problem, but it still
works better than now.

Fixes: acdcecc61285 ("udp: correct reuseport selection with connected sockets")
CC: Willem de Bruijn <willemb@google.com>
Reviewed-by: Benjamin Herrenschmidt <benh@amazon.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/udp.c |   15 +++++++++------
 net/ipv6/udp.c |   15 +++++++++------
 2 files changed, 18 insertions(+), 12 deletions(-)

--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -413,7 +413,7 @@ static struct sock *udp4_lib_lookup2(str
 				     struct udp_hslot *hslot2,
 				     struct sk_buff *skb)
 {
-	struct sock *sk, *result;
+	struct sock *sk, *result, *reuseport_result;
 	int score, badness;
 	u32 hash = 0;
 
@@ -423,17 +423,20 @@ static struct sock *udp4_lib_lookup2(str
 		score = compute_score(sk, net, saddr, sport,
 				      daddr, hnum, dif, sdif);
 		if (score > badness) {
+			reuseport_result = NULL;
+
 			if (sk->sk_reuseport &&
 			    sk->sk_state != TCP_ESTABLISHED) {
 				hash = udp_ehashfn(net, daddr, hnum,
 						   saddr, sport);
-				result = reuseport_select_sock(sk, hash, skb,
-							sizeof(struct udphdr));
-				if (result && !reuseport_has_conns(sk, false))
-					return result;
+				reuseport_result = reuseport_select_sock(sk, hash, skb,
+									 sizeof(struct udphdr));
+				if (reuseport_result && !reuseport_has_conns(sk, false))
+					return reuseport_result;
 			}
+
+			result = reuseport_result ? : sk;
 			badness = score;
-			result = sk;
 		}
 	}
 	return result;
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -148,7 +148,7 @@ static struct sock *udp6_lib_lookup2(str
 		int dif, int sdif, struct udp_hslot *hslot2,
 		struct sk_buff *skb)
 {
-	struct sock *sk, *result;
+	struct sock *sk, *result, *reuseport_result;
 	int score, badness;
 	u32 hash = 0;
 
@@ -158,17 +158,20 @@ static struct sock *udp6_lib_lookup2(str
 		score = compute_score(sk, net, saddr, sport,
 				      daddr, hnum, dif, sdif);
 		if (score > badness) {
+			reuseport_result = NULL;
+
 			if (sk->sk_reuseport &&
 			    sk->sk_state != TCP_ESTABLISHED) {
 				hash = udp6_ehashfn(net, daddr, hnum,
 						    saddr, sport);
 
-				result = reuseport_select_sock(sk, hash, skb,
-							sizeof(struct udphdr));
-				if (result && !reuseport_has_conns(sk, false))
-					return result;
+				reuseport_result = reuseport_select_sock(sk, hash, skb,
+									 sizeof(struct udphdr));
+				if (reuseport_result && !reuseport_has_conns(sk, false))
+					return reuseport_result;
 			}
-			result = sk;
+
+			result = reuseport_result ? : sk;
 			badness = score;
 		}
 	}



  parent reply index

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-30  8:04 [PATCH 5.4 00/19] 5.4.55-rc1 review Greg Kroah-Hartman
2020-07-30  8:04 ` [PATCH 5.4 01/19] AX.25: Fix out-of-bounds read in ax25_connect() Greg Kroah-Hartman
2020-07-30  8:04 ` [PATCH 5.4 02/19] AX.25: Prevent out-of-bounds read in ax25_sendmsg() Greg Kroah-Hartman
2020-07-30  8:04 ` [PATCH 5.4 03/19] dev: Defer free of skbs in flush_backlog Greg Kroah-Hartman
2020-07-30  8:04 ` [PATCH 5.4 04/19] drivers/net/wan/x25_asy: Fix to make it work Greg Kroah-Hartman
2020-07-30  8:04 ` [PATCH 5.4 05/19] ip6_gre: fix null-ptr-deref in ip6gre_init_net() Greg Kroah-Hartman
2020-07-30  8:04 ` [PATCH 5.4 06/19] net-sysfs: add a newline when printing tx_timeout by sysfs Greg Kroah-Hartman
2020-07-30  8:04 ` [PATCH 5.4 07/19] net: udp: Fix wrong clean up for IS_UDPLITE macro Greg Kroah-Hartman
2020-07-30  8:04 ` [PATCH 5.4 08/19] qrtr: orphan socket in qrtr_release() Greg Kroah-Hartman
2020-07-30  8:04 ` [PATCH 5.4 09/19] rtnetlink: Fix memory(net_device) leak when ->newlink fails Greg Kroah-Hartman
2020-07-30  8:04 ` [PATCH 5.4 10/19] rxrpc: Fix sendmsg() returning EPIPE due to recvmsg() returning ENODATA Greg Kroah-Hartman
2020-07-30  8:04 ` [PATCH 5.4 11/19] tcp: allow at most one TLP probe per flight Greg Kroah-Hartman
2020-07-30  8:04 ` [PATCH 5.4 12/19] AX.25: Prevent integer overflows in connect and sendmsg Greg Kroah-Hartman
2020-07-30  8:04 ` [PATCH 5.4 13/19] sctp: shrink stream outq only when new outcnt < old outcnt Greg Kroah-Hartman
2020-07-30  8:04 ` [PATCH 5.4 14/19] sctp: shrink stream outq when fails to do addstream reconf Greg Kroah-Hartman
2020-07-30  8:04 ` [PATCH 5.4 15/19] udp: Copy has_conns in reuseport_grow() Greg Kroah-Hartman
2020-07-30  8:04 ` Greg Kroah-Hartman [this message]
2020-07-30  8:04 ` [PATCH 5.4 17/19] regmap: debugfs: check count when read regmap file Greg Kroah-Hartman
2020-07-30  8:04 ` [PATCH 5.4 18/19] PM: wakeup: Show statistics for deleted wakeup sources again Greg Kroah-Hartman
2020-07-30  8:04 ` [PATCH 5.4 19/19] Revert "dpaa_eth: fix usage as DSA master, try 3" Greg Kroah-Hartman
2020-07-30 16:47 ` [PATCH 5.4 00/19] 5.4.55-rc1 review Guenter Roeck
2020-07-31 10:32 ` Naresh Kamboju
2020-07-31 11:48   ` Arnd Bergmann
2020-07-31 17:15   ` Greg Kroah-Hartman
2020-07-31 12:53 ` Jon Hunter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200730074421.306947214@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=benh@amazon.com \
    --cc=davem@davemloft.net \
    --cc=kuniyu@amazon.co.jp \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Stable Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/stable/0 stable/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 stable stable/ https://lore.kernel.org/stable \
		stable@vger.kernel.org
	public-inbox-index stable

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.stable


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git