Re: [PATCH net] net/ipv6: Reduce chance of collisions in inet6_hashfn()

From: Eric Dumazet <edumazet@google.com>
To: Samuel Mendoza-Jonas <samjonas@amazon.com>
Cc: netdev@vger.kernel.org, Stewart Smith <trawets@amazon.com>,
	 "David S . Miller" <davem@davemloft.net>,
	David Ahern <dsahern@kernel.org>,
	 Jakub Kicinski <kuba@kernel.org>,
	Paolo Abeni <pabeni@redhat.com>,
	linux-kernel@vger.kernel.org,  benh@amazon.com,
	stable@vger.kernel.org
Subject: Re: [PATCH net] net/ipv6: Reduce chance of collisions in inet6_hashfn()
Date: Thu, 29 Jun 2023 12:05:29 +0200	[thread overview]
Message-ID: <CANn89i+6d9K1VwNK1Joc-Yb_4jAfV_YFzk=z_K2_Oy+xJHSn_g@mail.gmail.com> (raw)
In-Reply-To: <20230629015844.800280-1-samjonas@amazon.com>

On Thu, Jun 29, 2023 at 3:59 AM Samuel Mendoza-Jonas
<samjonas@amazon.com> wrote:
>
> From: Stewart Smith <trawets@amazon.com>
>
> For both IPv4 and IPv6 incoming TCP connections are tracked in a hash
> table with a hash over the source & destination addresses and ports.
> However, the IPv6 hash is insufficient and can lead to a high rate of
> collisions.
>
> The IPv6 hash used an XOR to fit everything into the 96 bits for the
> fast jenkins hash, meaning it is possible for an external entity to
> ensure the hash collides, thus falling back to a linear search in the
> bucket, which is slow.
>
> We take the approach of hash half the data; hash the other half; and
> then hash them together. We do this with 3x jenkins hashes rather than
> 2x to calculate the hashing value for the connection covering the full
> length of the addresses and ports.
>

...

> While this may look like it adds overhead, the reality of modern CPUs
> means that this is unmeasurable in real world scenarios.
>
> In simulating with llvm-mca, the increase in cycles for the hashing code
> was ~5 cycles on Skylake (from a base of ~50), and an extra ~9 on
> Nehalem (base of ~62).
>
> In commit dd6d2910c5e0 ("netfilter: conntrack: switch to siphash")
> netfilter switched from a jenkins hash to a siphash, but even the faster
> hsiphash is a more significant overhead (~20-30%) in some preliminary
> testing. So, in this patch, we keep to the more conservative approach to
> ensure we don't add much overhead per SYN.
>
> In testing, this results in a consistently even spread across the
> connection buckets. In both testing and real-world scenarios, we have
> not found any measurable performance impact.
>
> Cc: stable@vger.kernel.org
> Fixes: 08dcdbf6a7b9 ("ipv6: use a stronger hash for tcp")
> Fixes: b3da2cf37c5c ("[INET]: Use jhash + random secret for ehash.")
> Signed-off-by: Stewart Smith <trawets@amazon.com>
> Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com>
> ---
>  include/net/ipv6.h          | 4 +---
>  net/ipv6/inet6_hashtables.c | 5 ++++-
>  2 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/include/net/ipv6.h b/include/net/ipv6.h
> index 7332296eca44..f9bb54869d82 100644
> --- a/include/net/ipv6.h
> +++ b/include/net/ipv6.h
> @@ -752,9 +752,7 @@ static inline u32 ipv6_addr_hash(const struct in6_addr *a)
>  /* more secured version of ipv6_addr_hash() */
>  static inline u32 __ipv6_addr_jhash(const struct in6_addr *a, const u32 initval)
>  {
> -       u32 v = (__force u32)a->s6_addr32[0] ^ (__force u32)a->s6_addr32[1];
> -
> -       return jhash_3words(v,
> +       return jhash_3words((__force u32)a->s6_addr32[1],
>                             (__force u32)a->s6_addr32[2],
>                             (__force u32)a->s6_addr32[3],
>                             initval);

Hmmm... see my following comment.

> diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
> index b64b49012655..bb7198081974 100644
> --- a/net/ipv6/inet6_hashtables.c
> +++ b/net/ipv6/inet6_hashtables.c
> @@ -33,7 +33,10 @@ u32 inet6_ehashfn(const struct net *net,
>         net_get_random_once(&inet6_ehash_secret, sizeof(inet6_ehash_secret));
>         net_get_random_once(&ipv6_hash_secret, sizeof(ipv6_hash_secret));
>
> -       lhash = (__force u32)laddr->s6_addr32[3];
> +       lhash = jhash_3words((__force u32)laddr->s6_addr32[3],
> +                           (((u32)lport) << 16) | (__force u32)fport,
> +                           (__force u32)faddr->s6_addr32[0],
> +                           ipv6_hash_secret);

This seems wrong to me.

Reusing ipv6_hash_secret and other keys twice is not good, I am sure
some security researchers
would love this...

Please just change __ipv6_addr_jhash(), so that all users can benefit
from a more secure version ?
It also leaves lhash / fhash names relevant here.

We will probably have to switch to sip (or other stronger hash than
jhash)  at some point, it is a tradeoff.

We might also add a break in the loop when a bucket exceeds a given
safety length,
because attackers can eventually exploit hashes after some point.

The following patch looks much saner to me.

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 7332296eca44b84dca1bbecb545f6824a0e8ed3d..2acc4c808d45d1c1bb1c5076e79842e136203e4c
100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -752,12 +752,8 @@ static inline u32 ipv6_addr_hash(const struct in6_addr *a)
 /* more secured version of ipv6_addr_hash() */
 static inline u32 __ipv6_addr_jhash(const struct in6_addr *a, const
u32 initval)
 {
-       u32 v = (__force u32)a->s6_addr32[0] ^ (__force u32)a->s6_addr32[1];
-
-       return jhash_3words(v,
-                           (__force u32)a->s6_addr32[2],
-                           (__force u32)a->s6_addr32[3],
-                           initval);
+       return jhash2((__force const u32 *)a->s6_addr32,
+                     ARRAY_SIZE(a->s6_addr32), initval);
 }