netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH AUTOSEL 5.15 10/19] net: emaclite: Don't advertise 1000BASE-T and do auto negotiation
       [not found] <20220510154429.153677-1-sashal@kernel.org>
@ 2022-05-10 15:44 ` Sasha Levin
  2022-05-10 15:44 ` [PATCH AUTOSEL 5.15 11/19] net: sfp: Add tx-fault workaround for Huawei MA5671A SFP ONT Sasha Levin
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 9+ messages in thread
From: Sasha Levin @ 2022-05-10 15:44 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Shravya Kumbham, Radhey Shyam Pandey, Andrew Lunn, Paolo Abeni,
	Sasha Levin, davem, edumazet, kuba, michal.simek, linmq006,
	yuehaibing, trix, prabhakar.mahadev-lad.rj, netdev,
	linux-arm-kernel

From: Shravya Kumbham <shravya.kumbham@xilinx.com>

[ Upstream commit b800528b97d0adc3a5ba42d78a8b0d3f07a31f44 ]

In xemaclite_open() function we are setting the max speed of
emaclite to 100Mb using phy_set_max_speed() function so,
there is no need to write the advertising registers to stop
giga-bit speed and the phy_start() function starts the
auto-negotiation so, there is no need to handle it separately
using advertising registers. Remove the phy_read and phy_write
of advertising registers in xemaclite_open() function.

Signed-off-by: Shravya Kumbham <shravya.kumbham@xilinx.com>
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/ethernet/xilinx/xilinx_emaclite.c | 15 ---------------
 1 file changed, 15 deletions(-)

diff --git a/drivers/net/ethernet/xilinx/xilinx_emaclite.c b/drivers/net/ethernet/xilinx/xilinx_emaclite.c
index 5524ac4fae80..cc15d4c53a12 100644
--- a/drivers/net/ethernet/xilinx/xilinx_emaclite.c
+++ b/drivers/net/ethernet/xilinx/xilinx_emaclite.c
@@ -925,8 +925,6 @@ static int xemaclite_open(struct net_device *dev)
 	xemaclite_disable_interrupts(lp);
 
 	if (lp->phy_node) {
-		u32 bmcr;
-
 		lp->phy_dev = of_phy_connect(lp->ndev, lp->phy_node,
 					     xemaclite_adjust_link, 0,
 					     PHY_INTERFACE_MODE_MII);
@@ -937,19 +935,6 @@ static int xemaclite_open(struct net_device *dev)
 
 		/* EmacLite doesn't support giga-bit speeds */
 		phy_set_max_speed(lp->phy_dev, SPEED_100);
-
-		/* Don't advertise 1000BASE-T Full/Half duplex speeds */
-		phy_write(lp->phy_dev, MII_CTRL1000, 0);
-
-		/* Advertise only 10 and 100mbps full/half duplex speeds */
-		phy_write(lp->phy_dev, MII_ADVERTISE, ADVERTISE_ALL |
-			  ADVERTISE_CSMA);
-
-		/* Restart auto negotiation */
-		bmcr = phy_read(lp->phy_dev, MII_BMCR);
-		bmcr |= (BMCR_ANENABLE | BMCR_ANRESTART);
-		phy_write(lp->phy_dev, MII_BMCR, bmcr);
-
 		phy_start(lp->phy_dev);
 	}
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH AUTOSEL 5.15 11/19] net: sfp: Add tx-fault workaround for Huawei MA5671A SFP ONT
       [not found] <20220510154429.153677-1-sashal@kernel.org>
  2022-05-10 15:44 ` [PATCH AUTOSEL 5.15 10/19] net: emaclite: Don't advertise 1000BASE-T and do auto negotiation Sasha Levin
@ 2022-05-10 15:44 ` Sasha Levin
  2022-05-10 15:44 ` [PATCH AUTOSEL 5.15 12/19] secure_seq: use the 64 bits of the siphash for port offset calculation Sasha Levin
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 9+ messages in thread
From: Sasha Levin @ 2022-05-10 15:44 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Matthew Hagan, Andrew Lunn, Jakub Kicinski, Sasha Levin, linux,
	hkallweit1, davem, edumazet, pabeni, netdev

From: Matthew Hagan <mnhagan88@gmail.com>

[ Upstream commit 2069624dac19d62c558bb6468fe03678553ab01d ]

As noted elsewhere, various GPON SFP modules exhibit non-standard
TX-fault behaviour. In the tested case, the Huawei MA5671A, when used
in combination with a Marvell mv88e6085 switch, was found to
persistently assert TX-fault, resulting in the module being disabled.

This patch adds a quirk to ignore the SFP_F_TX_FAULT state, allowing the
module to function.

Change from v1: removal of erroneous return statment (Andrew Lunn)

Signed-off-by: Matthew Hagan <mnhagan88@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220502223315.1973376-1-mnhagan88@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/phy/sfp.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c
index 4720b24ca51b..90dfefc1f5f8 100644
--- a/drivers/net/phy/sfp.c
+++ b/drivers/net/phy/sfp.c
@@ -250,6 +250,7 @@ struct sfp {
 	struct sfp_eeprom_id id;
 	unsigned int module_power_mW;
 	unsigned int module_t_start_up;
+	bool tx_fault_ignore;
 
 #if IS_ENABLED(CONFIG_HWMON)
 	struct sfp_diag diag;
@@ -1945,6 +1946,12 @@ static int sfp_sm_mod_probe(struct sfp *sfp, bool report)
 	else
 		sfp->module_t_start_up = T_START_UP;
 
+	if (!memcmp(id.base.vendor_name, "HUAWEI          ", 16) &&
+	    !memcmp(id.base.vendor_pn, "MA5671A         ", 16))
+		sfp->tx_fault_ignore = true;
+	else
+		sfp->tx_fault_ignore = false;
+
 	return 0;
 }
 
@@ -2397,7 +2404,10 @@ static void sfp_check_state(struct sfp *sfp)
 	mutex_lock(&sfp->st_mutex);
 	state = sfp_get_state(sfp);
 	changed = state ^ sfp->state;
-	changed &= SFP_F_PRESENT | SFP_F_LOS | SFP_F_TX_FAULT;
+	if (sfp->tx_fault_ignore)
+		changed &= SFP_F_PRESENT | SFP_F_LOS;
+	else
+		changed &= SFP_F_PRESENT | SFP_F_LOS | SFP_F_TX_FAULT;
 
 	for (i = 0; i < GPIO_MAX; i++)
 		if (changed & BIT(i))
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH AUTOSEL 5.15 12/19] secure_seq: use the 64 bits of the siphash for port offset calculation
       [not found] <20220510154429.153677-1-sashal@kernel.org>
  2022-05-10 15:44 ` [PATCH AUTOSEL 5.15 10/19] net: emaclite: Don't advertise 1000BASE-T and do auto negotiation Sasha Levin
  2022-05-10 15:44 ` [PATCH AUTOSEL 5.15 11/19] net: sfp: Add tx-fault workaround for Huawei MA5671A SFP ONT Sasha Levin
@ 2022-05-10 15:44 ` Sasha Levin
  2022-05-10 15:44 ` [PATCH AUTOSEL 5.15 13/19] tcp: use different parts of the port_offset for index and offset Sasha Levin
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 9+ messages in thread
From: Sasha Levin @ 2022-05-10 15:44 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Willy Tarreau, Jason A . Donenfeld, Moshe Kol, Yossi Gilad,
	Amit Klein, Eric Dumazet, Jakub Kicinski, Sasha Levin, davem,
	pabeni, yoshfuji, dsahern, netdev

From: Willy Tarreau <w@1wt.eu>

[ Upstream commit b2d057560b8107c633b39aabe517ff9d93f285e3 ]

SipHash replaced MD5 in secure_ipv{4,6}_port_ephemeral() via commit
7cd23e5300c1 ("secure_seq: use SipHash in place of MD5"), but the output
remained truncated to 32-bit only. In order to exploit more bits from the
hash, let's make the functions return the full 64-bit of siphash_3u32().
We also make sure the port offset calculation in __inet_hash_connect()
remains done on 32-bit to avoid the need for div_u64_rem() and an extra
cost on 32-bit systems.

Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Moshe Kol <moshe.kol@mail.huji.ac.il>
Cc: Yossi Gilad <yossi.gilad@mail.huji.ac.il>
Cc: Amit Klein <aksecurity@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 include/net/inet_hashtables.h |  2 +-
 include/net/secure_seq.h      |  4 ++--
 net/core/secure_seq.c         |  4 ++--
 net/ipv4/inet_hashtables.c    | 10 ++++++----
 net/ipv6/inet6_hashtables.c   |  4 ++--
 5 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index f72ec113ae56..98e1ec1a14f0 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -425,7 +425,7 @@ static inline void sk_rcv_saddr_set(struct sock *sk, __be32 addr)
 }
 
 int __inet_hash_connect(struct inet_timewait_death_row *death_row,
-			struct sock *sk, u32 port_offset,
+			struct sock *sk, u64 port_offset,
 			int (*check_established)(struct inet_timewait_death_row *,
 						 struct sock *, __u16,
 						 struct inet_timewait_sock **));
diff --git a/include/net/secure_seq.h b/include/net/secure_seq.h
index d7d2495f83c2..dac91aa38c5a 100644
--- a/include/net/secure_seq.h
+++ b/include/net/secure_seq.h
@@ -4,8 +4,8 @@
 
 #include <linux/types.h>
 
-u32 secure_ipv4_port_ephemeral(__be32 saddr, __be32 daddr, __be16 dport);
-u32 secure_ipv6_port_ephemeral(const __be32 *saddr, const __be32 *daddr,
+u64 secure_ipv4_port_ephemeral(__be32 saddr, __be32 daddr, __be16 dport);
+u64 secure_ipv6_port_ephemeral(const __be32 *saddr, const __be32 *daddr,
 			       __be16 dport);
 u32 secure_tcp_seq(__be32 saddr, __be32 daddr,
 		   __be16 sport, __be16 dport);
diff --git a/net/core/secure_seq.c b/net/core/secure_seq.c
index b5bc680d4755..444cce0184c3 100644
--- a/net/core/secure_seq.c
+++ b/net/core/secure_seq.c
@@ -94,7 +94,7 @@ u32 secure_tcpv6_seq(const __be32 *saddr, const __be32 *daddr,
 }
 EXPORT_SYMBOL(secure_tcpv6_seq);
 
-u32 secure_ipv6_port_ephemeral(const __be32 *saddr, const __be32 *daddr,
+u64 secure_ipv6_port_ephemeral(const __be32 *saddr, const __be32 *daddr,
 			       __be16 dport)
 {
 	const struct {
@@ -142,7 +142,7 @@ u32 secure_tcp_seq(__be32 saddr, __be32 daddr,
 }
 EXPORT_SYMBOL_GPL(secure_tcp_seq);
 
-u32 secure_ipv4_port_ephemeral(__be32 saddr, __be32 daddr, __be16 dport)
+u64 secure_ipv4_port_ephemeral(__be32 saddr, __be32 daddr, __be16 dport)
 {
 	net_secret_init();
 	return siphash_3u32((__force u32)saddr, (__force u32)daddr,
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 7bd1e10086f0..606a4220ebb9 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -504,7 +504,7 @@ static int __inet_check_established(struct inet_timewait_death_row *death_row,
 	return -EADDRNOTAVAIL;
 }
 
-static u32 inet_sk_port_offset(const struct sock *sk)
+static u64 inet_sk_port_offset(const struct sock *sk)
 {
 	const struct inet_sock *inet = inet_sk(sk);
 
@@ -734,7 +734,7 @@ EXPORT_SYMBOL_GPL(inet_unhash);
 static u32 table_perturb[1 << INET_TABLE_PERTURB_SHIFT];
 
 int __inet_hash_connect(struct inet_timewait_death_row *death_row,
-		struct sock *sk, u32 port_offset,
+		struct sock *sk, u64 port_offset,
 		int (*check_established)(struct inet_timewait_death_row *,
 			struct sock *, __u16, struct inet_timewait_sock **))
 {
@@ -777,7 +777,9 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
 	net_get_random_once(table_perturb, sizeof(table_perturb));
 	index = hash_32(port_offset, INET_TABLE_PERTURB_SHIFT);
 
-	offset = (READ_ONCE(table_perturb[index]) + port_offset) % remaining;
+	offset = READ_ONCE(table_perturb[index]) + port_offset;
+	offset %= remaining;
+
 	/* In first pass we try ports of @low parity.
 	 * inet_csk_get_port() does the opposite choice.
 	 */
@@ -859,7 +861,7 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
 int inet_hash_connect(struct inet_timewait_death_row *death_row,
 		      struct sock *sk)
 {
-	u32 port_offset = 0;
+	u64 port_offset = 0;
 
 	if (!inet_sk(sk)->inet_num)
 		port_offset = inet_sk_port_offset(sk);
diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
index 0a2e7f228391..40203255ed88 100644
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -308,7 +308,7 @@ static int __inet6_check_established(struct inet_timewait_death_row *death_row,
 	return -EADDRNOTAVAIL;
 }
 
-static u32 inet6_sk_port_offset(const struct sock *sk)
+static u64 inet6_sk_port_offset(const struct sock *sk)
 {
 	const struct inet_sock *inet = inet_sk(sk);
 
@@ -320,7 +320,7 @@ static u32 inet6_sk_port_offset(const struct sock *sk)
 int inet6_hash_connect(struct inet_timewait_death_row *death_row,
 		       struct sock *sk)
 {
-	u32 port_offset = 0;
+	u64 port_offset = 0;
 
 	if (!inet_sk(sk)->inet_num)
 		port_offset = inet6_sk_port_offset(sk);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH AUTOSEL 5.15 13/19] tcp: use different parts of the port_offset for index and offset
       [not found] <20220510154429.153677-1-sashal@kernel.org>
                   ` (2 preceding siblings ...)
  2022-05-10 15:44 ` [PATCH AUTOSEL 5.15 12/19] secure_seq: use the 64 bits of the siphash for port offset calculation Sasha Levin
@ 2022-05-10 15:44 ` Sasha Levin
  2022-05-10 15:44 ` [PATCH AUTOSEL 5.15 14/19] tcp: resalt the secret every 10 seconds Sasha Levin
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 9+ messages in thread
From: Sasha Levin @ 2022-05-10 15:44 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Willy Tarreau, Jason A . Donenfeld, Moshe Kol, Yossi Gilad,
	Amit Klein, Eric Dumazet, Jakub Kicinski, Sasha Levin, davem,
	yoshfuji, dsahern, pabeni, netdev

From: Willy Tarreau <w@1wt.eu>

[ Upstream commit 9e9b70ae923baf2b5e8a0ea4fd0c8451801ac526 ]

Amit Klein suggests that we use different parts of port_offset for the
table's index and the port offset so that there is no direct relation
between them.

Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Moshe Kol <moshe.kol@mail.huji.ac.il>
Cc: Yossi Gilad <yossi.gilad@mail.huji.ac.il>
Cc: Amit Klein <aksecurity@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/ipv4/inet_hashtables.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 606a4220ebb9..81a33af8393d 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -777,7 +777,7 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
 	net_get_random_once(table_perturb, sizeof(table_perturb));
 	index = hash_32(port_offset, INET_TABLE_PERTURB_SHIFT);
 
-	offset = READ_ONCE(table_perturb[index]) + port_offset;
+	offset = READ_ONCE(table_perturb[index]) + (port_offset >> 32);
 	offset %= remaining;
 
 	/* In first pass we try ports of @low parity.
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH AUTOSEL 5.15 14/19] tcp: resalt the secret every 10 seconds
       [not found] <20220510154429.153677-1-sashal@kernel.org>
                   ` (3 preceding siblings ...)
  2022-05-10 15:44 ` [PATCH AUTOSEL 5.15 13/19] tcp: use different parts of the port_offset for index and offset Sasha Levin
@ 2022-05-10 15:44 ` Sasha Levin
  2022-05-10 15:44 ` [PATCH AUTOSEL 5.15 15/19] tcp: add small random increments to the source port Sasha Levin
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 9+ messages in thread
From: Sasha Levin @ 2022-05-10 15:44 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Eric Dumazet, Moshe Kol, Yossi Gilad, Amit Klein,
	Jason A . Donenfeld, Willy Tarreau, Jakub Kicinski, Sasha Levin,
	davem, pabeni, netdev

From: Eric Dumazet <edumazet@google.com>

[ Upstream commit 4dfa9b438ee34caca4e6a4e5e961641807367f6f ]

In order to limit the ability for an observer to recognize the source
ports sequence used to contact a set of destinations, we should
periodically shuffle the secret. 10 seconds looks effective enough
without causing particular issues.

Cc: Moshe Kol <moshe.kol@mail.huji.ac.il>
Cc: Yossi Gilad <yossi.gilad@mail.huji.ac.il>
Cc: Amit Klein <aksecurity@gmail.com>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Tested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/core/secure_seq.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/net/core/secure_seq.c b/net/core/secure_seq.c
index 444cce0184c3..7131cd1fb2ad 100644
--- a/net/core/secure_seq.c
+++ b/net/core/secure_seq.c
@@ -22,6 +22,8 @@
 static siphash_key_t net_secret __read_mostly;
 static siphash_key_t ts_secret __read_mostly;
 
+#define EPHEMERAL_PORT_SHUFFLE_PERIOD (10 * HZ)
+
 static __always_inline void net_secret_init(void)
 {
 	net_get_random_once(&net_secret, sizeof(net_secret));
@@ -100,11 +102,13 @@ u64 secure_ipv6_port_ephemeral(const __be32 *saddr, const __be32 *daddr,
 	const struct {
 		struct in6_addr saddr;
 		struct in6_addr daddr;
+		unsigned int timeseed;
 		__be16 dport;
 	} __aligned(SIPHASH_ALIGNMENT) combined = {
 		.saddr = *(struct in6_addr *)saddr,
 		.daddr = *(struct in6_addr *)daddr,
-		.dport = dport
+		.timeseed = jiffies / EPHEMERAL_PORT_SHUFFLE_PERIOD,
+		.dport = dport,
 	};
 	net_secret_init();
 	return siphash(&combined, offsetofend(typeof(combined), dport),
@@ -145,8 +149,10 @@ EXPORT_SYMBOL_GPL(secure_tcp_seq);
 u64 secure_ipv4_port_ephemeral(__be32 saddr, __be32 daddr, __be16 dport)
 {
 	net_secret_init();
-	return siphash_3u32((__force u32)saddr, (__force u32)daddr,
-			    (__force u16)dport, &net_secret);
+	return siphash_4u32((__force u32)saddr, (__force u32)daddr,
+			    (__force u16)dport,
+			    jiffies / EPHEMERAL_PORT_SHUFFLE_PERIOD,
+			    &net_secret);
 }
 EXPORT_SYMBOL_GPL(secure_ipv4_port_ephemeral);
 #endif
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH AUTOSEL 5.15 15/19] tcp: add small random increments to the source port
       [not found] <20220510154429.153677-1-sashal@kernel.org>
                   ` (4 preceding siblings ...)
  2022-05-10 15:44 ` [PATCH AUTOSEL 5.15 14/19] tcp: resalt the secret every 10 seconds Sasha Levin
@ 2022-05-10 15:44 ` Sasha Levin
  2022-05-10 15:44 ` [PATCH AUTOSEL 5.15 16/19] tcp: dynamically allocate the perturb table used by source ports Sasha Levin
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 9+ messages in thread
From: Sasha Levin @ 2022-05-10 15:44 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Willy Tarreau, Moshe Kol, Yossi Gilad, Amit Klein, Eric Dumazet,
	Jakub Kicinski, Sasha Levin, davem, yoshfuji, dsahern, pabeni,
	netdev

From: Willy Tarreau <w@1wt.eu>

[ Upstream commit ca7af0402550f9a0b3316d5f1c30904e42ed257d ]

Here we're randomly adding between 0 and 7 random increments to the
selected source port in order to add some noise in the source port
selection that will make the next port less predictable.

With the default port range of 32768-60999 this means a worst case
reuse scenario of 14116/8=1764 connections between two consecutive
uses of the same port, with an average of 14116/4.5=3137. This code
was stressed at more than 800000 connections per second to a fixed
target with all connections closed by the client using RSTs (worst
condition) and only 2 connections failed among 13 billion, despite
the hash being reseeded every 10 seconds, indicating a perfectly
safe situation.

Cc: Moshe Kol <moshe.kol@mail.huji.ac.il>
Cc: Yossi Gilad <yossi.gilad@mail.huji.ac.il>
Cc: Amit Klein <aksecurity@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/ipv4/inet_hashtables.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 81a33af8393d..573a7e66ebc8 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -833,11 +833,12 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
 	return -EADDRNOTAVAIL;
 
 ok:
-	/* If our first attempt found a candidate, skip next candidate
-	 * in 1/16 of cases to add some noise.
+	/* Here we want to add a little bit of randomness to the next source
+	 * port that will be chosen. We use a max() with a random here so that
+	 * on low contention the randomness is maximal and on high contention
+	 * it may be inexistent.
 	 */
-	if (!i && !(prandom_u32() % 16))
-		i = 2;
+	i = max_t(int, i, (prandom_u32() & 7) * 2);
 	WRITE_ONCE(table_perturb[index], READ_ONCE(table_perturb[index]) + i + 2);
 
 	/* Head lock still held and bh's disabled */
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH AUTOSEL 5.15 16/19] tcp: dynamically allocate the perturb table used by source ports
       [not found] <20220510154429.153677-1-sashal@kernel.org>
                   ` (5 preceding siblings ...)
  2022-05-10 15:44 ` [PATCH AUTOSEL 5.15 15/19] tcp: add small random increments to the source port Sasha Levin
@ 2022-05-10 15:44 ` Sasha Levin
  2022-05-10 15:44 ` [PATCH AUTOSEL 5.15 17/19] tcp: increase source port perturb table to 2^16 Sasha Levin
  2022-05-10 15:44 ` [PATCH AUTOSEL 5.15 18/19] tcp: drop the hash_32() part from the index calculation Sasha Levin
  8 siblings, 0 replies; 9+ messages in thread
From: Sasha Levin @ 2022-05-10 15:44 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Willy Tarreau, Moshe Kol, Yossi Gilad, Amit Klein, Eric Dumazet,
	Jakub Kicinski, Sasha Levin, davem, yoshfuji, dsahern, pabeni,
	netdev

From: Willy Tarreau <w@1wt.eu>

[ Upstream commit e9261476184be1abd486c9434164b2acbe0ed6c2 ]

We'll need to further increase the size of this table and it's likely
that at some point its size will not be suitable anymore for a static
table. Let's allocate it on boot from inet_hashinfo2_init(), which is
called from tcp_init().

Cc: Moshe Kol <moshe.kol@mail.huji.ac.il>
Cc: Yossi Gilad <yossi.gilad@mail.huji.ac.il>
Cc: Amit Klein <aksecurity@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/ipv4/inet_hashtables.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 573a7e66ebc8..763395e30c77 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -731,7 +731,8 @@ EXPORT_SYMBOL_GPL(inet_unhash);
  * privacy, this only consumes 1 KB of kernel memory.
  */
 #define INET_TABLE_PERTURB_SHIFT 8
-static u32 table_perturb[1 << INET_TABLE_PERTURB_SHIFT];
+#define INET_TABLE_PERTURB_SIZE (1 << INET_TABLE_PERTURB_SHIFT)
+static u32 *table_perturb;
 
 int __inet_hash_connect(struct inet_timewait_death_row *death_row,
 		struct sock *sk, u64 port_offset,
@@ -774,7 +775,8 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
 	if (likely(remaining > 1))
 		remaining &= ~1U;
 
-	net_get_random_once(table_perturb, sizeof(table_perturb));
+	net_get_random_once(table_perturb,
+			    INET_TABLE_PERTURB_SIZE * sizeof(*table_perturb));
 	index = hash_32(port_offset, INET_TABLE_PERTURB_SHIFT);
 
 	offset = READ_ONCE(table_perturb[index]) + (port_offset >> 32);
@@ -912,6 +914,12 @@ void __init inet_hashinfo2_init(struct inet_hashinfo *h, const char *name,
 					    low_limit,
 					    high_limit);
 	init_hashinfo_lhash2(h);
+
+	/* this one is used for source ports of outgoing connections */
+	table_perturb = kmalloc_array(INET_TABLE_PERTURB_SIZE,
+				      sizeof(*table_perturb), GFP_KERNEL);
+	if (!table_perturb)
+		panic("TCP: failed to alloc table_perturb");
 }
 
 int inet_hashinfo2_init_mod(struct inet_hashinfo *h)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH AUTOSEL 5.15 17/19] tcp: increase source port perturb table to 2^16
       [not found] <20220510154429.153677-1-sashal@kernel.org>
                   ` (6 preceding siblings ...)
  2022-05-10 15:44 ` [PATCH AUTOSEL 5.15 16/19] tcp: dynamically allocate the perturb table used by source ports Sasha Levin
@ 2022-05-10 15:44 ` Sasha Levin
  2022-05-10 15:44 ` [PATCH AUTOSEL 5.15 18/19] tcp: drop the hash_32() part from the index calculation Sasha Levin
  8 siblings, 0 replies; 9+ messages in thread
From: Sasha Levin @ 2022-05-10 15:44 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Willy Tarreau, Moshe Kol, Yossi Gilad, Amit Klein, Eric Dumazet,
	Jakub Kicinski, Sasha Levin, davem, yoshfuji, dsahern, pabeni,
	netdev

From: Willy Tarreau <w@1wt.eu>

[ Upstream commit 4c2c8f03a5ab7cb04ec64724d7d176d00bcc91e5 ]

Moshe Kol, Amit Klein, and Yossi Gilad reported being able to accurately
identify a client by forcing it to emit only 40 times more connections
than there are entries in the table_perturb[] table. The previous two
improvements consisting in resalting the secret every 10s and adding
randomness to each port selection only slightly improved the situation,
and the current value of 2^8 was too small as it's not very difficult
to make a client emit 10k connections in less than 10 seconds.

Thus we're increasing the perturb table from 2^8 to 2^16 so that the
same precision now requires 2.6M connections, which is more difficult in
this time frame and harder to hide as a background activity. The impact
is that the table now uses 256 kB instead of 1 kB, which could mostly
affect devices making frequent outgoing connections. However such
components usually target a small set of destinations (load balancers,
database clients, perf assessment tools), and in practice only a few
entries will be visited, like before.

A live test at 1 million connections per second showed no performance
difference from the previous value.

Reported-by: Moshe Kol <moshe.kol@mail.huji.ac.il>
Reported-by: Yossi Gilad <yossi.gilad@mail.huji.ac.il>
Reported-by: Amit Klein <aksecurity@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/ipv4/inet_hashtables.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 763395e30c77..f76e4ac1ba3a 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -726,11 +726,12 @@ EXPORT_SYMBOL_GPL(inet_unhash);
  * Note that we use 32bit integers (vs RFC 'short integers')
  * because 2^16 is not a multiple of num_ephemeral and this
  * property might be used by clever attacker.
- * RFC claims using TABLE_LENGTH=10 buckets gives an improvement,
- * we use 256 instead to really give more isolation and
- * privacy, this only consumes 1 KB of kernel memory.
+ * RFC claims using TABLE_LENGTH=10 buckets gives an improvement, though
+ * attacks were since demonstrated, thus we use 65536 instead to really
+ * give more isolation and privacy, at the expense of 256kB of kernel
+ * memory.
  */
-#define INET_TABLE_PERTURB_SHIFT 8
+#define INET_TABLE_PERTURB_SHIFT 16
 #define INET_TABLE_PERTURB_SIZE (1 << INET_TABLE_PERTURB_SHIFT)
 static u32 *table_perturb;
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH AUTOSEL 5.15 18/19] tcp: drop the hash_32() part from the index calculation
       [not found] <20220510154429.153677-1-sashal@kernel.org>
                   ` (7 preceding siblings ...)
  2022-05-10 15:44 ` [PATCH AUTOSEL 5.15 17/19] tcp: increase source port perturb table to 2^16 Sasha Levin
@ 2022-05-10 15:44 ` Sasha Levin
  8 siblings, 0 replies; 9+ messages in thread
From: Sasha Levin @ 2022-05-10 15:44 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Willy Tarreau, Amit Klein, Eric Dumazet, Jakub Kicinski,
	Sasha Levin, davem, yoshfuji, dsahern, pabeni, netdev

From: Willy Tarreau <w@1wt.eu>

[ Upstream commit e8161345ddbb66e449abde10d2fdce93f867eba9 ]

In commit 190cc82489f4 ("tcp: change source port randomizarion at
connect() time"), the table_perturb[] array was introduced and an
index was taken from the port_offset via hash_32(). But it turns
out that hash_32() performs a multiplication while the input here
comes from the output of SipHash in secure_seq, that is well
distributed enough to avoid the need for yet another hash.

Suggested-by: Amit Klein <aksecurity@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/ipv4/inet_hashtables.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index f76e4ac1ba3a..ee9c587031b4 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -778,7 +778,7 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
 
 	net_get_random_once(table_perturb,
 			    INET_TABLE_PERTURB_SIZE * sizeof(*table_perturb));
-	index = hash_32(port_offset, INET_TABLE_PERTURB_SHIFT);
+	index = port_offset & (INET_TABLE_PERTURB_SIZE - 1);
 
 	offset = READ_ONCE(table_perturb[index]) + (port_offset >> 32);
 	offset %= remaining;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-05-10 15:49 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20220510154429.153677-1-sashal@kernel.org>
2022-05-10 15:44 ` [PATCH AUTOSEL 5.15 10/19] net: emaclite: Don't advertise 1000BASE-T and do auto negotiation Sasha Levin
2022-05-10 15:44 ` [PATCH AUTOSEL 5.15 11/19] net: sfp: Add tx-fault workaround for Huawei MA5671A SFP ONT Sasha Levin
2022-05-10 15:44 ` [PATCH AUTOSEL 5.15 12/19] secure_seq: use the 64 bits of the siphash for port offset calculation Sasha Levin
2022-05-10 15:44 ` [PATCH AUTOSEL 5.15 13/19] tcp: use different parts of the port_offset for index and offset Sasha Levin
2022-05-10 15:44 ` [PATCH AUTOSEL 5.15 14/19] tcp: resalt the secret every 10 seconds Sasha Levin
2022-05-10 15:44 ` [PATCH AUTOSEL 5.15 15/19] tcp: add small random increments to the source port Sasha Levin
2022-05-10 15:44 ` [PATCH AUTOSEL 5.15 16/19] tcp: dynamically allocate the perturb table used by source ports Sasha Levin
2022-05-10 15:44 ` [PATCH AUTOSEL 5.15 17/19] tcp: increase source port perturb table to 2^16 Sasha Levin
2022-05-10 15:44 ` [PATCH AUTOSEL 5.15 18/19] tcp: drop the hash_32() part from the index calculation Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).