All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 net 0/7] insufficient TCP source port randomness
@ 2022-04-28 12:39 Willy Tarreau
  2022-04-28 12:39 ` [PATCH v2 net 1/7] secure_seq: use the 64 bits of the siphash for port offset calculation Willy Tarreau
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Willy Tarreau @ 2022-04-28 12:39 UTC (permalink / raw)
  To: netdev
  Cc: David Miller, Jakub Kicinski, Eric Dumazet, Moshe Kol,
	Yossi Gilad, Amit Klein, Jason A . Donenfeld, linux-kernel,
	Willy Tarreau

Hi,

In a not-yet published paper, Moshe Kol, Amit Klein, and Yossi Gilad
report being able to accurately identify a client by forcing it to emit
only 40 times more connections than the number of entries in the
table_perturb[] table, which is indexed by hashing the connection tuple.
The current 2^8 setting allows them to perform that attack with only 10k
connections, which is not hard to achieve in a few seconds.

Eric, Amit and I have been working on this for a few weeks now imagining,
testing and eliminating a number of approaches that Amit and his team were
still able to break or that were found to be too risky or too expensive,
and ended up with the simple improvements in this series that resists to
the attack, doesn't degrade the performance, and preserves a reliable port
selection algorithm to avoid connection failures, including the odd/even
port selection preference that allows bind() to always find a port quickly
even under strong connect() stress.

The approach relies on several factors:
  - resalting the hash secret that's used to choose the table_perturb[]
    entry every 10 seconds to eliminate slow attacks and force the
    attacker to forget everything that was learned after this delay.
    This already eliminates most of the problem because if a client
    stays silent for more than 10 seconds there's no link between the
    previous and the next patterns, and 10s isn't yet frequent enough
    to cause too frequent repetition of a same port that may induce a
    connection failure ;

  - adding small random increments to the source port. Previously, a
    random 0 or 1 was added every 16 ports. Now a random 0 to 7 is
    added after each port. This means that with the default 32768-60999
    range, a worst case rollover happens after 1764 connections, and
    an average of 3137. This doesn't stop statistical attacks but
    requires significantly more iterations of the same attack to
    confirm a guess.

  - increasing the table_perturb[] size from 2^8 to 2^16, which Amit
    says will require 2.6 million connections to be attacked with the
    changes above, making it pointless to get a fingerprint that will
    only last 10 seconds. Due to the size, the table was made dynamic.

  - a few minor improvements on the bits used from the hash, to eliminate
    some unfortunate correlations that may possibly have been exploited
    to design future attack models.

These changes were tested under the most extreme conditions, up to
1.1 million connections per second to one and a few targets, showing no
performance regression, and only 2 connection failures within 13 billion,
which is less than 2^-32 and perfectly within usual values.

The series is split into small reviewable changes and was already reviewed
by Amit and Eric.

Regards,
Willy

---
v2:
  - fixed build issue reported by lkp@intel.com on 32-bit archs due
    to the 64-by-32 divide; it's now correctly 32-by-32 as suggested
    by Eric
  - addressed the IPv6 hash size as well that was overlooked, as
    reported by Jason

This version was built for i386, armv7, x86_64, and was stress-tested on
x86_64 under both IPv4 and IPv6. I'm reasonably confident that this time
nothing else is missing.

---

Eric Dumazet (1):
  tcp: resalt the secret every 10 seconds

Willy Tarreau (6):
  secure_seq: use the 64 bits of the siphash for port offset calculation
  tcp: use different parts of the port_offset for index and offset
  tcp: add small random increments to the source port
  tcp: dynamically allocate the perturb table used by source ports
  tcp: increase source port perturb table to 2^16
  tcp: drop the hash_32() part from the index calculation

 include/net/inet_hashtables.h |  2 +-
 include/net/secure_seq.h      |  4 ++--
 net/core/secure_seq.c         | 16 ++++++++-----
 net/ipv4/inet_hashtables.c    | 42 ++++++++++++++++++++++-------------
 net/ipv6/inet6_hashtables.c   |  4 ++--
 5 files changed, 43 insertions(+), 25 deletions(-)

-- 
2.17.5


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 net 1/7] secure_seq: use the 64 bits of the siphash for port offset calculation
  2022-04-28 12:39 [PATCH v2 net 0/7] insufficient TCP source port randomness Willy Tarreau
@ 2022-04-28 12:39 ` Willy Tarreau
  2022-04-29 14:38   ` Jason A. Donenfeld
  2022-04-28 12:39 ` [PATCH v2 net 2/7] tcp: use different parts of the port_offset for index and offset Willy Tarreau
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Willy Tarreau @ 2022-04-28 12:39 UTC (permalink / raw)
  To: netdev
  Cc: David Miller, Jakub Kicinski, Eric Dumazet, Moshe Kol,
	Yossi Gilad, Amit Klein, Jason A . Donenfeld, linux-kernel,
	Willy Tarreau

SipHash replaced MD5 in secure_ipv{4,6}_port_ephemeral() via commit
7cd23e5300c1 ("secure_seq: use SipHash in place of MD5"), but the output
remained truncated to 32-bit only. In order to exploit more bits from the
hash, let's make the functions return the full 64-bit of siphash_3u32().
We also make sure the port offset calculation in __inet_hash_connect()
remains done on 32-bit to avoid the need for div_u64_rem() and an extra
cost on 32-bit systems.

Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Moshe Kol <moshe.kol@mail.huji.ac.il>
Cc: Yossi Gilad <yossi.gilad@mail.huji.ac.il>
Cc: Amit Klein <aksecurity@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/net/inet_hashtables.h |  2 +-
 include/net/secure_seq.h      |  4 ++--
 net/core/secure_seq.c         |  4 ++--
 net/ipv4/inet_hashtables.c    | 10 ++++++----
 net/ipv6/inet6_hashtables.c   |  4 ++--
 5 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index f72ec113ae56..98e1ec1a14f0 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -425,7 +425,7 @@ static inline void sk_rcv_saddr_set(struct sock *sk, __be32 addr)
 }
 
 int __inet_hash_connect(struct inet_timewait_death_row *death_row,
-			struct sock *sk, u32 port_offset,
+			struct sock *sk, u64 port_offset,
 			int (*check_established)(struct inet_timewait_death_row *,
 						 struct sock *, __u16,
 						 struct inet_timewait_sock **));
diff --git a/include/net/secure_seq.h b/include/net/secure_seq.h
index d7d2495f83c2..dac91aa38c5a 100644
--- a/include/net/secure_seq.h
+++ b/include/net/secure_seq.h
@@ -4,8 +4,8 @@
 
 #include <linux/types.h>
 
-u32 secure_ipv4_port_ephemeral(__be32 saddr, __be32 daddr, __be16 dport);
-u32 secure_ipv6_port_ephemeral(const __be32 *saddr, const __be32 *daddr,
+u64 secure_ipv4_port_ephemeral(__be32 saddr, __be32 daddr, __be16 dport);
+u64 secure_ipv6_port_ephemeral(const __be32 *saddr, const __be32 *daddr,
 			       __be16 dport);
 u32 secure_tcp_seq(__be32 saddr, __be32 daddr,
 		   __be16 sport, __be16 dport);
diff --git a/net/core/secure_seq.c b/net/core/secure_seq.c
index 9b8443774449..55aa5cc258e3 100644
--- a/net/core/secure_seq.c
+++ b/net/core/secure_seq.c
@@ -94,7 +94,7 @@ u32 secure_tcpv6_seq(const __be32 *saddr, const __be32 *daddr,
 }
 EXPORT_SYMBOL(secure_tcpv6_seq);
 
-u32 secure_ipv6_port_ephemeral(const __be32 *saddr, const __be32 *daddr,
+u64 secure_ipv6_port_ephemeral(const __be32 *saddr, const __be32 *daddr,
 			       __be16 dport)
 {
 	const struct {
@@ -142,7 +142,7 @@ u32 secure_tcp_seq(__be32 saddr, __be32 daddr,
 }
 EXPORT_SYMBOL_GPL(secure_tcp_seq);
 
-u32 secure_ipv4_port_ephemeral(__be32 saddr, __be32 daddr, __be16 dport)
+u64 secure_ipv4_port_ephemeral(__be32 saddr, __be32 daddr, __be16 dport)
 {
 	net_secret_init();
 	return siphash_3u32((__force u32)saddr, (__force u32)daddr,
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 17440840a791..9d24d9319f3d 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -504,7 +504,7 @@ static int __inet_check_established(struct inet_timewait_death_row *death_row,
 	return -EADDRNOTAVAIL;
 }
 
-static u32 inet_sk_port_offset(const struct sock *sk)
+static u64 inet_sk_port_offset(const struct sock *sk)
 {
 	const struct inet_sock *inet = inet_sk(sk);
 
@@ -734,7 +734,7 @@ EXPORT_SYMBOL_GPL(inet_unhash);
 static u32 table_perturb[1 << INET_TABLE_PERTURB_SHIFT];
 
 int __inet_hash_connect(struct inet_timewait_death_row *death_row,
-		struct sock *sk, u32 port_offset,
+		struct sock *sk, u64 port_offset,
 		int (*check_established)(struct inet_timewait_death_row *,
 			struct sock *, __u16, struct inet_timewait_sock **))
 {
@@ -777,7 +777,9 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
 	net_get_random_once(table_perturb, sizeof(table_perturb));
 	index = hash_32(port_offset, INET_TABLE_PERTURB_SHIFT);
 
-	offset = (READ_ONCE(table_perturb[index]) + port_offset) % remaining;
+	offset = READ_ONCE(table_perturb[index]) + port_offset;
+	offset %= remaining;
+
 	/* In first pass we try ports of @low parity.
 	 * inet_csk_get_port() does the opposite choice.
 	 */
@@ -859,7 +861,7 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
 int inet_hash_connect(struct inet_timewait_death_row *death_row,
 		      struct sock *sk)
 {
-	u32 port_offset = 0;
+	u64 port_offset = 0;
 
 	if (!inet_sk(sk)->inet_num)
 		port_offset = inet_sk_port_offset(sk);
diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
index 4740afecf7c6..32ccac10bd62 100644
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -308,7 +308,7 @@ static int __inet6_check_established(struct inet_timewait_death_row *death_row,
 	return -EADDRNOTAVAIL;
 }
 
-static u32 inet6_sk_port_offset(const struct sock *sk)
+static u64 inet6_sk_port_offset(const struct sock *sk)
 {
 	const struct inet_sock *inet = inet_sk(sk);
 
@@ -320,7 +320,7 @@ static u32 inet6_sk_port_offset(const struct sock *sk)
 int inet6_hash_connect(struct inet_timewait_death_row *death_row,
 		       struct sock *sk)
 {
-	u32 port_offset = 0;
+	u64 port_offset = 0;
 
 	if (!inet_sk(sk)->inet_num)
 		port_offset = inet6_sk_port_offset(sk);
-- 
2.17.5


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 net 2/7] tcp: use different parts of the port_offset for index and offset
  2022-04-28 12:39 [PATCH v2 net 0/7] insufficient TCP source port randomness Willy Tarreau
  2022-04-28 12:39 ` [PATCH v2 net 1/7] secure_seq: use the 64 bits of the siphash for port offset calculation Willy Tarreau
@ 2022-04-28 12:39 ` Willy Tarreau
  2022-04-28 12:39 ` [PATCH v2 net 3/7] tcp: resalt the secret every 10 seconds Willy Tarreau
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Willy Tarreau @ 2022-04-28 12:39 UTC (permalink / raw)
  To: netdev
  Cc: David Miller, Jakub Kicinski, Eric Dumazet, Moshe Kol,
	Yossi Gilad, Amit Klein, Jason A . Donenfeld, linux-kernel,
	Willy Tarreau

Amit Klein suggests that we use different parts of port_offset for the
table's index and the port offset so that there is no direct relation
between them.

Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Moshe Kol <moshe.kol@mail.huji.ac.il>
Cc: Yossi Gilad <yossi.gilad@mail.huji.ac.il>
Cc: Amit Klein <aksecurity@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv4/inet_hashtables.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 9d24d9319f3d..29c701cd8312 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -777,7 +777,7 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
 	net_get_random_once(table_perturb, sizeof(table_perturb));
 	index = hash_32(port_offset, INET_TABLE_PERTURB_SHIFT);
 
-	offset = READ_ONCE(table_perturb[index]) + port_offset;
+	offset = READ_ONCE(table_perturb[index]) + (port_offset >> 32);
 	offset %= remaining;
 
 	/* In first pass we try ports of @low parity.
-- 
2.17.5


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 net 3/7] tcp: resalt the secret every 10 seconds
  2022-04-28 12:39 [PATCH v2 net 0/7] insufficient TCP source port randomness Willy Tarreau
  2022-04-28 12:39 ` [PATCH v2 net 1/7] secure_seq: use the 64 bits of the siphash for port offset calculation Willy Tarreau
  2022-04-28 12:39 ` [PATCH v2 net 2/7] tcp: use different parts of the port_offset for index and offset Willy Tarreau
@ 2022-04-28 12:39 ` Willy Tarreau
  2022-04-29 14:37   ` Jason A. Donenfeld
  2022-04-29 14:48   ` Jason A. Donenfeld
  2022-04-28 12:39 ` [PATCH v2 net 4/7] tcp: add small random increments to the source port Willy Tarreau
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 13+ messages in thread
From: Willy Tarreau @ 2022-04-28 12:39 UTC (permalink / raw)
  To: netdev
  Cc: David Miller, Jakub Kicinski, Eric Dumazet, Moshe Kol,
	Yossi Gilad, Amit Klein, Jason A . Donenfeld, linux-kernel

From: Eric Dumazet <edumazet@google.com>

In order to limit the ability for an observer to recognize the source
ports sequence used to contact a set of destinations, we should
periodically shuffle the secret. 10 seconds looks effective enough
without causing particular issues.

Cc: Moshe Kol <moshe.kol@mail.huji.ac.il>
Cc: Yossi Gilad <yossi.gilad@mail.huji.ac.il>
Cc: Amit Klein <aksecurity@gmail.com>
Tested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/secure_seq.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/net/core/secure_seq.c b/net/core/secure_seq.c
index 55aa5cc258e3..2c76aab20403 100644
--- a/net/core/secure_seq.c
+++ b/net/core/secure_seq.c
@@ -22,6 +22,8 @@
 static siphash_aligned_key_t net_secret;
 static siphash_aligned_key_t ts_secret;
 
+#define EPHEMERAL_PORT_SHUFFLE_PERIOD (10 * HZ)
+
 static __always_inline void net_secret_init(void)
 {
 	net_get_random_once(&net_secret, sizeof(net_secret));
@@ -101,10 +103,12 @@ u64 secure_ipv6_port_ephemeral(const __be32 *saddr, const __be32 *daddr,
 		struct in6_addr saddr;
 		struct in6_addr daddr;
 		__be16 dport;
+		unsigned int timeseed;
 	} __aligned(SIPHASH_ALIGNMENT) combined = {
 		.saddr = *(struct in6_addr *)saddr,
 		.daddr = *(struct in6_addr *)daddr,
-		.dport = dport
+		.dport = dport,
+		.timeseed = jiffies / EPHEMERAL_PORT_SHUFFLE_PERIOD,
 	};
 	net_secret_init();
 	return siphash(&combined, offsetofend(typeof(combined), dport),
@@ -145,8 +149,10 @@ EXPORT_SYMBOL_GPL(secure_tcp_seq);
 u64 secure_ipv4_port_ephemeral(__be32 saddr, __be32 daddr, __be16 dport)
 {
 	net_secret_init();
-	return siphash_3u32((__force u32)saddr, (__force u32)daddr,
-			    (__force u16)dport, &net_secret);
+	return siphash_4u32((__force u32)saddr, (__force u32)daddr,
+			    (__force u16)dport,
+			    jiffies / EPHEMERAL_PORT_SHUFFLE_PERIOD,
+			    &net_secret);
 }
 EXPORT_SYMBOL_GPL(secure_ipv4_port_ephemeral);
 #endif
-- 
2.17.5


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 net 4/7] tcp: add small random increments to the source port
  2022-04-28 12:39 [PATCH v2 net 0/7] insufficient TCP source port randomness Willy Tarreau
                   ` (2 preceding siblings ...)
  2022-04-28 12:39 ` [PATCH v2 net 3/7] tcp: resalt the secret every 10 seconds Willy Tarreau
@ 2022-04-28 12:39 ` Willy Tarreau
  2022-04-28 12:39 ` [PATCH v2 net 5/7] tcp: dynamically allocate the perturb table used by source ports Willy Tarreau
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Willy Tarreau @ 2022-04-28 12:39 UTC (permalink / raw)
  To: netdev
  Cc: David Miller, Jakub Kicinski, Eric Dumazet, Moshe Kol,
	Yossi Gilad, Amit Klein, Jason A . Donenfeld, linux-kernel,
	Willy Tarreau

Here we're randomly adding between 0 and 7 random increments to the
selected source port in order to add some noise in the source port
selection that will make the next port less predictable.

With the default port range of 32768-60999 this means a worst case
reuse scenario of 14116/8=1764 connections between two consecutive
uses of the same port, with an average of 14116/4.5=3137. This code
was stressed at more than 800000 connections per second to a fixed
target with all connections closed by the client using RSTs (worst
condition) and only 2 connections failed among 13 billion, despite
the hash being reseeded every 10 seconds, indicating a perfectly
safe situation.

Cc: Moshe Kol <moshe.kol@mail.huji.ac.il>
Cc: Yossi Gilad <yossi.gilad@mail.huji.ac.il>
Cc: Amit Klein <aksecurity@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv4/inet_hashtables.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 29c701cd8312..63bb4902f018 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -833,11 +833,12 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
 	return -EADDRNOTAVAIL;
 
 ok:
-	/* If our first attempt found a candidate, skip next candidate
-	 * in 1/16 of cases to add some noise.
+	/* Here we want to add a little bit of randomness to the next source
+	 * port that will be chosen. We use a max() with a random here so that
+	 * on low contention the randomness is maximal and on high contention
+	 * it may be inexistent.
 	 */
-	if (!i && !(prandom_u32() % 16))
-		i = 2;
+	i = max_t(int, i, (prandom_u32() & 7) * 2);
 	WRITE_ONCE(table_perturb[index], READ_ONCE(table_perturb[index]) + i + 2);
 
 	/* Head lock still held and bh's disabled */
-- 
2.17.5


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 net 5/7] tcp: dynamically allocate the perturb table used by source ports
  2022-04-28 12:39 [PATCH v2 net 0/7] insufficient TCP source port randomness Willy Tarreau
                   ` (3 preceding siblings ...)
  2022-04-28 12:39 ` [PATCH v2 net 4/7] tcp: add small random increments to the source port Willy Tarreau
@ 2022-04-28 12:39 ` Willy Tarreau
  2022-04-28 12:40 ` [PATCH v2 net 6/7] tcp: increase source port perturb table to 2^16 Willy Tarreau
  2022-04-28 12:40 ` [PATCH v2 net 7/7] tcp: drop the hash_32() part from the index calculation Willy Tarreau
  6 siblings, 0 replies; 13+ messages in thread
From: Willy Tarreau @ 2022-04-28 12:39 UTC (permalink / raw)
  To: netdev
  Cc: David Miller, Jakub Kicinski, Eric Dumazet, Moshe Kol,
	Yossi Gilad, Amit Klein, Jason A . Donenfeld, linux-kernel,
	Willy Tarreau

We'll need to further increase the size of this table and it's likely
that at some point its size will not be suitable anymore for a static
table. Let's allocate it on boot from inet_hashinfo2_init(), which is
called from tcp_init().

Cc: Moshe Kol <moshe.kol@mail.huji.ac.il>
Cc: Yossi Gilad <yossi.gilad@mail.huji.ac.il>
Cc: Amit Klein <aksecurity@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv4/inet_hashtables.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 63bb4902f018..48ca07853068 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -731,7 +731,8 @@ EXPORT_SYMBOL_GPL(inet_unhash);
  * privacy, this only consumes 1 KB of kernel memory.
  */
 #define INET_TABLE_PERTURB_SHIFT 8
-static u32 table_perturb[1 << INET_TABLE_PERTURB_SHIFT];
+#define INET_TABLE_PERTURB_SIZE (1 << INET_TABLE_PERTURB_SHIFT)
+static u32 *table_perturb;
 
 int __inet_hash_connect(struct inet_timewait_death_row *death_row,
 		struct sock *sk, u64 port_offset,
@@ -774,7 +775,8 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
 	if (likely(remaining > 1))
 		remaining &= ~1U;
 
-	net_get_random_once(table_perturb, sizeof(table_perturb));
+	net_get_random_once(table_perturb,
+			    INET_TABLE_PERTURB_SIZE * sizeof(*table_perturb));
 	index = hash_32(port_offset, INET_TABLE_PERTURB_SHIFT);
 
 	offset = READ_ONCE(table_perturb[index]) + (port_offset >> 32);
@@ -912,6 +914,12 @@ void __init inet_hashinfo2_init(struct inet_hashinfo *h, const char *name,
 					    low_limit,
 					    high_limit);
 	init_hashinfo_lhash2(h);
+
+	/* this one is used for source ports of outgoing connections */
+	table_perturb = kmalloc_array(INET_TABLE_PERTURB_SIZE,
+				      sizeof(*table_perturb), GFP_KERNEL);
+	if (!table_perturb)
+		panic("TCP: failed to alloc table_perturb");
 }
 
 int inet_hashinfo2_init_mod(struct inet_hashinfo *h)
-- 
2.17.5


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 net 6/7] tcp: increase source port perturb table to 2^16
  2022-04-28 12:39 [PATCH v2 net 0/7] insufficient TCP source port randomness Willy Tarreau
                   ` (4 preceding siblings ...)
  2022-04-28 12:39 ` [PATCH v2 net 5/7] tcp: dynamically allocate the perturb table used by source ports Willy Tarreau
@ 2022-04-28 12:40 ` Willy Tarreau
  2022-04-28 12:40 ` [PATCH v2 net 7/7] tcp: drop the hash_32() part from the index calculation Willy Tarreau
  6 siblings, 0 replies; 13+ messages in thread
From: Willy Tarreau @ 2022-04-28 12:40 UTC (permalink / raw)
  To: netdev
  Cc: David Miller, Jakub Kicinski, Eric Dumazet, Moshe Kol,
	Yossi Gilad, Amit Klein, Jason A . Donenfeld, linux-kernel,
	Willy Tarreau

Moshe Kol, Amit Klein, and Yossi Gilad reported being able to accurately
identify a client by forcing it to emit only 40 times more connections
than there are entries in the table_perturb[] table. The previous two
improvements consisting in resalting the secret every 10s and adding
randomness to each port selection only slightly improved the situation,
and the current value of 2^8 was too small as it's not very difficult
to make a client emit 10k connections in less than 10 seconds.

Thus we're increasing the perturb table from 2^8 to 2^16 so that the the
same precision now requires 2.6M connections, which is more difficult in
this time frame and harder to hide as a background activity. The impact
is that the table now uses 256 kB instead of 1 kB, which could mostly
affect devices making frequent outgoing connections. However such
components usually target a small set of destinations (load balancers,
database clients, perf assessment tools), and in practice only a few
entries will be visited, like before.

A live test at 1 million connections per second showed no performance
difference from the previous value.

Reported-by: Moshe Kol <moshe.kol@mail.huji.ac.il>
Reported-by: Yossi Gilad <yossi.gilad@mail.huji.ac.il>
Reported-by: Amit Klein <aksecurity@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv4/inet_hashtables.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 48ca07853068..cc5f66328b47 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -726,11 +726,12 @@ EXPORT_SYMBOL_GPL(inet_unhash);
  * Note that we use 32bit integers (vs RFC 'short integers')
  * because 2^16 is not a multiple of num_ephemeral and this
  * property might be used by clever attacker.
- * RFC claims using TABLE_LENGTH=10 buckets gives an improvement,
- * we use 256 instead to really give more isolation and
- * privacy, this only consumes 1 KB of kernel memory.
+ * RFC claims using TABLE_LENGTH=10 buckets gives an improvement, though
+ * attacks were since demonstrated, thus we use 65536 instead to really
+ * give more isolation and privacy, at the expense of 256kB of kernel
+ * memory.
  */
-#define INET_TABLE_PERTURB_SHIFT 8
+#define INET_TABLE_PERTURB_SHIFT 16
 #define INET_TABLE_PERTURB_SIZE (1 << INET_TABLE_PERTURB_SHIFT)
 static u32 *table_perturb;
 
-- 
2.17.5


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 net 7/7] tcp: drop the hash_32() part from the index calculation
  2022-04-28 12:39 [PATCH v2 net 0/7] insufficient TCP source port randomness Willy Tarreau
                   ` (5 preceding siblings ...)
  2022-04-28 12:40 ` [PATCH v2 net 6/7] tcp: increase source port perturb table to 2^16 Willy Tarreau
@ 2022-04-28 12:40 ` Willy Tarreau
  6 siblings, 0 replies; 13+ messages in thread
From: Willy Tarreau @ 2022-04-28 12:40 UTC (permalink / raw)
  To: netdev
  Cc: David Miller, Jakub Kicinski, Eric Dumazet, Moshe Kol,
	Yossi Gilad, Amit Klein, Jason A . Donenfeld, linux-kernel,
	Willy Tarreau

In commit 190cc82489f4 ("tcp: change source port randomizarion at
connect() time"), the table_perturb[] array was introduced and an
index was taken from the port_offset via hash_32(). But it turns
out that hash_32() performs a multiplication while the input here
comes from the output of SipHash in secure_seq, that is well
distributed enough to avoid the need for yet another hash.

Suggested-by: Amit Klein <aksecurity@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv4/inet_hashtables.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index cc5f66328b47..a5d57fa679ca 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -778,7 +778,7 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
 
 	net_get_random_once(table_perturb,
 			    INET_TABLE_PERTURB_SIZE * sizeof(*table_perturb));
-	index = hash_32(port_offset, INET_TABLE_PERTURB_SHIFT);
+	index = port_offset & (INET_TABLE_PERTURB_SIZE - 1);
 
 	offset = READ_ONCE(table_perturb[index]) + (port_offset >> 32);
 	offset %= remaining;
-- 
2.17.5


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 net 3/7] tcp: resalt the secret every 10 seconds
  2022-04-28 12:39 ` [PATCH v2 net 3/7] tcp: resalt the secret every 10 seconds Willy Tarreau
@ 2022-04-29 14:37   ` Jason A. Donenfeld
  2022-04-29 15:29     ` Willy Tarreau
  2022-04-29 14:48   ` Jason A. Donenfeld
  1 sibling, 1 reply; 13+ messages in thread
From: Jason A. Donenfeld @ 2022-04-29 14:37 UTC (permalink / raw)
  To: Willy Tarreau, edumazet
  Cc: netdev, David Miller, Jakub Kicinski, Eric Dumazet, Moshe Kol,
	Yossi Gilad, Amit Klein, linux-kernel

Hi Eric,

On Thu, Apr 28, 2022 at 02:39:57PM +0200, Willy Tarreau wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> In order to limit the ability for an observer to recognize the source
> ports sequence used to contact a set of destinations, we should
> periodically shuffle the secret. 10 seconds looks effective enough
Nit: "periodically re-salt the input".
> without causing particular issues.

Just FYI, moving from siphash_3u32 to siphash_4u32 is not free, as it
bumps us up from siphash_3u32 to siphash_2u64, which does two more
siphash rounds. Maybe this doesn't matter much, but just FYI.

I wonder, though, about your "10 seconds looks effective enough without
causing particular issues." I surmise from that sentence that a lower
value might cause particular issues, but that you found 10 seconds to be
okay in practice. Fine. But what happens if one caller hits this at
second 9 and the next caller hits it at second 0? In that case, the
interval might have been 1 second, not 10. In other words, if you need
a certain minimum quantization for this to not cause "particular
issues", it might not work the way you wanted it to.

Additionally, that problem aside, if you round EPHEMERAL_PORT_SHUFFLE_PERIOD
to the nearest power of two, you can turn the expensive division into a
bit shift right.

Regards,
Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 net 1/7] secure_seq: use the 64 bits of the siphash for port offset calculation
  2022-04-28 12:39 ` [PATCH v2 net 1/7] secure_seq: use the 64 bits of the siphash for port offset calculation Willy Tarreau
@ 2022-04-29 14:38   ` Jason A. Donenfeld
  0 siblings, 0 replies; 13+ messages in thread
From: Jason A. Donenfeld @ 2022-04-29 14:38 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: netdev, David Miller, Jakub Kicinski, Eric Dumazet, Moshe Kol,
	Yossi Gilad, Amit Klein, linux-kernel

Hi Willy,

On Thu, Apr 28, 2022 at 02:39:55PM +0200, Willy Tarreau wrote:
> SipHash replaced MD5 in secure_ipv{4,6}_port_ephemeral() via commit
> 7cd23e5300c1 ("secure_seq: use SipHash in place of MD5"), but the output
> remained truncated to 32-bit only. In order to exploit more bits from the
> hash, let's make the functions return the full 64-bit of siphash_3u32().
> We also make sure the port offset calculation in __inet_hash_connect()
> remains done on 32-bit to avoid the need for div_u64_rem() and an extra
> cost on 32-bit systems.
> 
> Cc: Jason A. Donenfeld <Jason@zx2c4.com>
> Cc: Moshe Kol <moshe.kol@mail.huji.ac.il>
> Cc: Yossi Gilad <yossi.gilad@mail.huji.ac.il>
> Cc: Amit Klein <aksecurity@gmail.com>
> Reviewed-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Willy Tarreau <w@1wt.eu>
> ---
>  include/net/inet_hashtables.h |  2 +-
>  include/net/secure_seq.h      |  4 ++--
>  net/core/secure_seq.c         |  4 ++--
>  net/ipv4/inet_hashtables.c    | 10 ++++++----
>  net/ipv6/inet6_hashtables.c   |  4 ++--
>  5 files changed, 13 insertions(+), 11 deletions(-)

For the secure_seq parts:

Acked-by: Jason A. Donenfeld <Jason@zx2c4.com> # For secure_seq.[ch]

Thanks,
Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 net 3/7] tcp: resalt the secret every 10 seconds
  2022-04-28 12:39 ` [PATCH v2 net 3/7] tcp: resalt the secret every 10 seconds Willy Tarreau
  2022-04-29 14:37   ` Jason A. Donenfeld
@ 2022-04-29 14:48   ` Jason A. Donenfeld
  2022-04-29 15:30     ` Willy Tarreau
  1 sibling, 1 reply; 13+ messages in thread
From: Jason A. Donenfeld @ 2022-04-29 14:48 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Netdev, David Miller, Jakub Kicinski, Eric Dumazet, Moshe Kol,
	Yossi Gilad, Amit Klein, LKML

On Thu, Apr 28, 2022 at 2:40 PM Willy Tarreau <w@1wt.eu> wrote:
> @@ -101,10 +103,12 @@ u64 secure_ipv6_port_ephemeral(const __be32 *saddr, const __be32 *daddr,
>                 struct in6_addr saddr;
>                 struct in6_addr daddr;
>                 __be16 dport;
> +               unsigned int timeseed;

Also, does the struct packing (or lack thereof) lead to problems here?
Uninitialized bytes might not make a stable hash.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 net 3/7] tcp: resalt the secret every 10 seconds
  2022-04-29 14:37   ` Jason A. Donenfeld
@ 2022-04-29 15:29     ` Willy Tarreau
  0 siblings, 0 replies; 13+ messages in thread
From: Willy Tarreau @ 2022-04-29 15:29 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: edumazet, netdev, David Miller, Jakub Kicinski, Moshe Kol,
	Yossi Gilad, Amit Klein, linux-kernel

Hi Jason,

On Fri, Apr 29, 2022 at 04:37:12PM +0200, Jason A. Donenfeld wrote:
> Hi Eric,
> 
> On Thu, Apr 28, 2022 at 02:39:57PM +0200, Willy Tarreau wrote:
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > In order to limit the ability for an observer to recognize the source
> > ports sequence used to contact a set of destinations, we should
> > periodically shuffle the secret. 10 seconds looks effective enough
> Nit: "periodically re-salt the input".
> > without causing particular issues.
> 
> Just FYI, moving from siphash_3u32 to siphash_4u32 is not free, as it
> bumps us up from siphash_3u32 to siphash_2u64, which does two more
> siphash rounds. Maybe this doesn't matter much, but just FYI.
> 
> I wonder, though, about your "10 seconds looks effective enough without
> causing particular issues." I surmise from that sentence that a lower
> value might cause particular issues, but that you found 10 seconds to be
> okay in practice. Fine. But what happens if one caller hits this at
> second 9 and the next caller hits it at second 0? In that case, the
> interval might have been 1 second, not 10. In other words, if you need
> a certain minimum quantization for this to not cause "particular
> issues", it might not work the way you wanted it to.

I can partially respond on this one because as you've probably seen at
first I had some reserves on this one, which vanished.

The problem caused by reusing a source port too early almost essentially
concerns clients closing first, either with a regular shutdown() where
a FIN is sent first, or by forcing an RST by disabling lingering / breaking
the association with connect(AF_UNSPEC). In both cases, the risk is that
the packet supposed to finish to close the connection (e.g. last ACK that
turns the socket to TIME_WAIT or the RST) is lost, releasing the local
entry too early and making the client send a SYN from a port that's still
seen as in use on the other side.

For local TIME_WAIT, there's no issue in practice because 1) the port is
not yet usable, and 2) clients closing first cannot sustain high connection
rates anyway, they're limited to #ports/60s.

For those dealing with high connection rates (e.g. proxies) which seldom
have to abort connections, they have no other choice but closing using an
RST to avoid blocking the port in TIME_WAIT. These ones already face the
risk that the RST is lost on the path and that a subsequent SYN from the
same port is delivered to a connection that's in ESTABLISHED or FIN_WAIT
state anyway. This *does* happen due to losses, albeit at quite a low
rate in general. In such cases, the server returns an out-of-window ACK,
the client sends an RST and a SYN again (which roughly looks like a
retransmit as the RST). Such cases may occasionally fail through firewalls
if the RST is lost between the firewall and the server. The firewall will
then block the response ACK from the server and the client will have to
time out on the SYN and try again. This is the current situation even
without the patch (hence why closing with RST must be avoided as much as
possible).

The change of the salting here can slightly increase the risk that this
happens during the transition from the 9th to the 10th second as you said,
and only for the case above. However this will only be a problem if the
SYN arrives before the RST or if the RST is lost. The second case already
exists and the difference is that before the patch, we could have hoped
that the other side would finally have closed on timeout before sending a
new SYN (which is already not granted, e.g. ESTABLISHED) or unlikely to
happen above #ports/60s connections per second (e.g.  FIN_WAIT with a
longer timeout).  The first case (SYN arrives before RST) may happen on
randomly load-balanced links (where TCP already doesn't work well), and
while this does increase the risk, the risk only concerns just a few SYN
packets every 10s which each have a chance over #port of reusing the same
port as the last one, or ~#port/N of reusing the same port as one of the
last N ones for small values of N. This remains extremely low and is not
higher than the risk of losing the RST before the patch anyway.

As such all these risks are very low but they increase as the timer
decreases. This would work well at 60s and could probably still work well
enough at 1s. But 10s is enough to make Amit's attack impractical, so
better not risk to degrade anything. And the tests I ran showed only 2
failed connections after 13 billion closed by the client, which is
perfectly within what I'm used to observe in such cases.

> Additionally, that problem aside, if you round EPHEMERAL_PORT_SHUFFLE_PERIOD
> to the nearest power of two, you can turn the expensive division into a
> bit shift right.

I've been wondering if it would help to use 8 or 16 here, but that doesn't
change anything, it's a constant so it's converted to a reciprocal divide
by the compiler, i.e. just a 32-bit multiply and a shift.

Thanks!
Willy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 net 3/7] tcp: resalt the secret every 10 seconds
  2022-04-29 14:48   ` Jason A. Donenfeld
@ 2022-04-29 15:30     ` Willy Tarreau
  0 siblings, 0 replies; 13+ messages in thread
From: Willy Tarreau @ 2022-04-29 15:30 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Netdev, David Miller, Jakub Kicinski, Eric Dumazet, Moshe Kol,
	Yossi Gilad, Amit Klein, LKML

On Fri, Apr 29, 2022 at 04:48:52PM +0200, Jason A. Donenfeld wrote:
> On Thu, Apr 28, 2022 at 2:40 PM Willy Tarreau <w@1wt.eu> wrote:
> > @@ -101,10 +103,12 @@ u64 secure_ipv6_port_ephemeral(const __be32 *saddr, const __be32 *daddr,
> >                 struct in6_addr saddr;
> >                 struct in6_addr daddr;
> >                 __be16 dport;
> > +               unsigned int timeseed;
> 
> Also, does the struct packing (or lack thereof) lead to problems here?
> Uninitialized bytes might not make a stable hash.

Hmmm, I didn't notice, and I think you're right indeed. I did test in IPv6
without noticing any problem but it doesn't mean that the hash is perfectly
stable.

I'll send an update for this one, thank you!
Willy

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-04-29 15:31 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-28 12:39 [PATCH v2 net 0/7] insufficient TCP source port randomness Willy Tarreau
2022-04-28 12:39 ` [PATCH v2 net 1/7] secure_seq: use the 64 bits of the siphash for port offset calculation Willy Tarreau
2022-04-29 14:38   ` Jason A. Donenfeld
2022-04-28 12:39 ` [PATCH v2 net 2/7] tcp: use different parts of the port_offset for index and offset Willy Tarreau
2022-04-28 12:39 ` [PATCH v2 net 3/7] tcp: resalt the secret every 10 seconds Willy Tarreau
2022-04-29 14:37   ` Jason A. Donenfeld
2022-04-29 15:29     ` Willy Tarreau
2022-04-29 14:48   ` Jason A. Donenfeld
2022-04-29 15:30     ` Willy Tarreau
2022-04-28 12:39 ` [PATCH v2 net 4/7] tcp: add small random increments to the source port Willy Tarreau
2022-04-28 12:39 ` [PATCH v2 net 5/7] tcp: dynamically allocate the perturb table used by source ports Willy Tarreau
2022-04-28 12:40 ` [PATCH v2 net 6/7] tcp: increase source port perturb table to 2^16 Willy Tarreau
2022-04-28 12:40 ` [PATCH v2 net 7/7] tcp: drop the hash_32() part from the index calculation Willy Tarreau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.