All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v6 0/8] Add SO_REUSEPORT support for TC bpf_sk_assign
@ 2023-07-20 15:30 Lorenz Bauer
  2023-07-20 15:30 ` [PATCH bpf-next v6 1/8] udp: re-score reuseport groups when connected sockets are present Lorenz Bauer
                   ` (8 more replies)
  0 siblings, 9 replies; 19+ messages in thread
From: Lorenz Bauer @ 2023-07-20 15:30 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	David Ahern, Willem de Bruijn, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Joe Stringer, Mykola Lysenko, Shuah Khan,
	Kuniyuki Iwashima
  Cc: Hemanth Malla, netdev, linux-kernel, bpf, linux-kselftest,
	Lorenz Bauer, Joe Stringer

We want to replace iptables TPROXY with a BPF program at TC ingress.
To make this work in all cases we need to assign a SO_REUSEPORT socket
to an skb, which is currently prohibited. This series adds support for
such sockets to bpf_sk_assing.

I did some refactoring to cut down on the amount of duplicate code. The
key to this is to use INDIRECT_CALL in the reuseport helpers. To show
that this approach is not just beneficial to TC sk_assign I removed
duplicate code for bpf_sk_lookup as well.

Joint work with Daniel Borkmann.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
---
Changes in v6:
- Reject unhashed UDP sockets in bpf_sk_assign to avoid ref leak
- Link to v5: https://lore.kernel.org/r/20230613-so-reuseport-v5-0-f6686a0dbce0@isovalent.com

Changes in v5:
- Drop reuse_sk == sk check in inet[6]_steal_stock (Kuniyuki)
- Link to v4: https://lore.kernel.org/r/20230613-so-reuseport-v4-0-4ece76708bba@isovalent.com

Changes in v4:
- WARN_ON_ONCE if reuseport socket is refcounted (Kuniyuki)
- Use inet[6]_ehashfn_t to shorten function declarations (Kuniyuki)
- Shuffle documentation patch around (Kuniyuki)
- Update commit message to explain why IPv6 needs EXPORT_SYMBOL
- Link to v3: https://lore.kernel.org/r/20230613-so-reuseport-v3-0-907b4cbb7b99@isovalent.com

Changes in v3:
- Fix warning re udp_ehashfn and udp6_ehashfn (Simon)
- Return higher scoring connected UDP reuseport sockets (Kuniyuki)
- Fix ipv6 module builds
- Link to v2: https://lore.kernel.org/r/20230613-so-reuseport-v2-0-b7c69a342613@isovalent.com

Changes in v2:
- Correct commit abbrev length (Kuniyuki)
- Reduce duplication (Kuniyuki)
- Add checks on sk_state (Martin)
- Split exporting inet[6]_lookup_reuseport into separate patch (Eric)

---
Daniel Borkmann (1):
      selftests/bpf: Test that SO_REUSEPORT can be used with sk_assign helper

Lorenz Bauer (7):
      udp: re-score reuseport groups when connected sockets are present
      bpf: reject unhashed sockets in bpf_sk_assign
      net: export inet_lookup_reuseport and inet6_lookup_reuseport
      net: remove duplicate reuseport_lookup functions
      net: document inet[6]_lookup_reuseport sk_state requirements
      net: remove duplicate sk_lookup helpers
      bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign

 include/net/inet6_hashtables.h                     |  81 ++++++++-
 include/net/inet_hashtables.h                      |  74 +++++++-
 include/net/sock.h                                 |   7 +-
 include/uapi/linux/bpf.h                           |   3 -
 net/core/filter.c                                  |   4 +-
 net/ipv4/inet_hashtables.c                         |  68 ++++---
 net/ipv4/udp.c                                     |  88 ++++-----
 net/ipv6/inet6_hashtables.c                        |  71 +++++---
 net/ipv6/udp.c                                     |  98 ++++------
 tools/include/uapi/linux/bpf.h                     |   3 -
 tools/testing/selftests/bpf/network_helpers.c      |   3 +
 .../selftests/bpf/prog_tests/assign_reuse.c        | 197 +++++++++++++++++++++
 .../selftests/bpf/progs/test_assign_reuse.c        | 142 +++++++++++++++
 13 files changed, 660 insertions(+), 179 deletions(-)
---
base-commit: 6f5a630d7c57cd79b1f526a95e757311e32d41e5
change-id: 20230613-so-reuseport-e92c526173ee

Best regards,
-- 
Lorenz Bauer <lmb@isovalent.com>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH bpf-next v6 1/8] udp: re-score reuseport groups when connected sockets are present
  2023-07-20 15:30 [PATCH bpf-next v6 0/8] Add SO_REUSEPORT support for TC bpf_sk_assign Lorenz Bauer
@ 2023-07-20 15:30 ` Lorenz Bauer
  2023-07-20 15:30 ` [PATCH bpf-next v6 2/8] bpf: reject unhashed sockets in bpf_sk_assign Lorenz Bauer
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Lorenz Bauer @ 2023-07-20 15:30 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	David Ahern, Willem de Bruijn, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Joe Stringer, Mykola Lysenko, Shuah Khan,
	Kuniyuki Iwashima
  Cc: Hemanth Malla, netdev, linux-kernel, bpf, linux-kselftest, Lorenz Bauer

Contrary to TCP, UDP reuseport groups can contain TCP_ESTABLISHED
sockets. To support these properly we remember whether a group has
a connected socket and skip the fast reuseport early-return. In
effect we continue scoring all reuseport sockets and then choose the
one with the highest score.

The current code fails to re-calculate the score for the result of
lookup_reuseport. According to Kuniyuki Iwashima:

    1) SO_INCOMING_CPU is set
       -> selected sk might have +1 score

    2) BPF prog returns ESTABLISHED and/or SO_INCOMING_CPU sk
       -> selected sk will have more than 8

  Using the old score could trigger more lookups depending on the
  order that sockets are created.

    sk -> sk (SO_INCOMING_CPU) -> sk (ESTABLISHED)
    |     |
    `-> select the next SO_INCOMING_CPU sk
          |
          `-> select itself (We should save this lookup)

Fixes: efc6b6f6c311 ("udp: Improve load balancing for SO_REUSEPORT.")
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
---
 net/ipv4/udp.c | 20 +++++++++++++++-----
 net/ipv6/udp.c | 19 ++++++++++++++-----
 2 files changed, 29 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 42a96b3547c9..c62d5e1c6675 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -451,14 +451,24 @@ static struct sock *udp4_lib_lookup2(struct net *net,
 		score = compute_score(sk, net, saddr, sport,
 				      daddr, hnum, dif, sdif);
 		if (score > badness) {
-			result = lookup_reuseport(net, sk, skb,
-						  saddr, sport, daddr, hnum);
+			badness = score;
+			result = lookup_reuseport(net, sk, skb, saddr, sport, daddr, hnum);
+			if (!result) {
+				result = sk;
+				continue;
+			}
+
 			/* Fall back to scoring if group has connections */
-			if (result && !reuseport_has_conns(sk))
+			if (!reuseport_has_conns(sk))
 				return result;
 
-			result = result ? : sk;
-			badness = score;
+			/* Reuseport logic returned an error, keep original score. */
+			if (IS_ERR(result))
+				continue;
+
+			badness = compute_score(result, net, saddr, sport,
+						daddr, hnum, dif, sdif);
+
 		}
 	}
 	return result;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index b7c972aa09a7..dec69f0379e9 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -194,14 +194,23 @@ static struct sock *udp6_lib_lookup2(struct net *net,
 		score = compute_score(sk, net, saddr, sport,
 				      daddr, hnum, dif, sdif);
 		if (score > badness) {
-			result = lookup_reuseport(net, sk, skb,
-						  saddr, sport, daddr, hnum);
+			badness = score;
+			result = lookup_reuseport(net, sk, skb, saddr, sport, daddr, hnum);
+			if (!result) {
+				result = sk;
+				continue;
+			}
+
 			/* Fall back to scoring if group has connections */
-			if (result && !reuseport_has_conns(sk))
+			if (!reuseport_has_conns(sk))
 				return result;
 
-			result = result ? : sk;
-			badness = score;
+			/* Reuseport logic returned an error, keep original score. */
+			if (IS_ERR(result))
+				continue;
+
+			badness = compute_score(sk, net, saddr, sport,
+						daddr, hnum, dif, sdif);
 		}
 	}
 	return result;

-- 
2.41.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH bpf-next v6 2/8] bpf: reject unhashed sockets in bpf_sk_assign
  2023-07-20 15:30 [PATCH bpf-next v6 0/8] Add SO_REUSEPORT support for TC bpf_sk_assign Lorenz Bauer
  2023-07-20 15:30 ` [PATCH bpf-next v6 1/8] udp: re-score reuseport groups when connected sockets are present Lorenz Bauer
@ 2023-07-20 15:30 ` Lorenz Bauer
  2023-07-20 21:16   ` Kuniyuki Iwashima
  2023-07-20 15:30 ` [PATCH bpf-next v6 3/8] net: export inet_lookup_reuseport and inet6_lookup_reuseport Lorenz Bauer
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Lorenz Bauer @ 2023-07-20 15:30 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	David Ahern, Willem de Bruijn, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Joe Stringer, Mykola Lysenko, Shuah Khan,
	Kuniyuki Iwashima
  Cc: Hemanth Malla, netdev, linux-kernel, bpf, linux-kselftest,
	Lorenz Bauer, Joe Stringer

The semantics for bpf_sk_assign are as follows:

    sk = some_lookup_func()
    bpf_sk_assign(skb, sk)
    bpf_sk_release(sk)

That is, the sk is not consumed by bpf_sk_assign. The function
therefore needs to make sure that sk lives long enough to be
consumed from __inet_lookup_skb. The path through the stack for a
TCPv4 packet is roughly:

  netif_receive_skb_core: takes RCU read lock
    __netif_receive_skb_core:
      sch_handle_ingress:
        tcf_classify:
          bpf_sk_assign()
      deliver_ptype_list_skb:
        deliver_skb:
          ip_packet_type->func == ip_rcv:
            ip_rcv_core:
            ip_rcv_finish_core:
              dst_input:
                ip_local_deliver:
                  ip_local_deliver_finish:
                    ip_protocol_deliver_rcu:
                      tcp_v4_rcv:
                        __inet_lookup_skb:
                          skb_steal_sock

The existing helper takes advantage of the fact that everything
happens in the same RCU critical section: for sockets with
SOCK_RCU_FREE set bpf_sk_assign never takes a reference.
skb_steal_sock then checks SOCK_RCU_FREE again and does sock_put
if necessary.

This approach assumes that SOCK_RCU_FREE is never set on a sk
between bpf_sk_assign and skb_steal_sock, but this invariant is
violated by unhashed UDP sockets. A new UDP socket is created
in TCP_CLOSE state but without SOCK_RCU_FREE set. That flag is only
added in udp_lib_get_port() which happens when a socket is bound.

When bpf_sk_assign was added it wasn't possible to access unhashed
UDP sockets from BPF, so this wasn't a problem. This changed
in commit 0c48eefae712 ("sock_map: Lift socket state restriction
for datagram sockets"), but the helper wasn't adjusted accordingly.
The following sequence of events will therefore lead to a refcount
leak:

1. Add socket(AF_INET, SOCK_DGRAM) to a sockmap.
2. Pull socket out of sockmap and bpf_sk_assign it. Since
   SOCK_RCU_FREE is not set we increment the refcount.
3. bind() or connect() the socket, setting SOCK_RCU_FREE.
4. skb_steal_sock will now set refcounted = false due to
   SOCK_RCU_FREE.
5. tcp_v4_rcv() skips sock_put().

Fix the problem by rejecting unhashed sockets in bpf_sk_assign().
This matches the behaviour of __inet_lookup_skb which is ultimately
the goal of bpf_sk_assign().

Fixes: cf7fbe660f2d ("bpf: Add socket assign support")
Cc: Joe Stringer <joe@cilium.io>
Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
---
 net/core/filter.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index 797e8f039696..b5b51ef48c5f 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -7353,6 +7353,8 @@ BPF_CALL_3(bpf_sk_assign, struct sk_buff *, skb, struct sock *, sk, u64, flags)
 		return -ENETUNREACH;
 	if (unlikely(sk_fullsock(sk) && sk->sk_reuseport))
 		return -ESOCKTNOSUPPORT;
+	if (sk_unhashed(sk))
+		return -EOPNOTSUPP;
 	if (sk_is_refcounted(sk) &&
 	    unlikely(!refcount_inc_not_zero(&sk->sk_refcnt)))
 		return -ENOENT;

-- 
2.41.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH bpf-next v6 3/8] net: export inet_lookup_reuseport and inet6_lookup_reuseport
  2023-07-20 15:30 [PATCH bpf-next v6 0/8] Add SO_REUSEPORT support for TC bpf_sk_assign Lorenz Bauer
  2023-07-20 15:30 ` [PATCH bpf-next v6 1/8] udp: re-score reuseport groups when connected sockets are present Lorenz Bauer
  2023-07-20 15:30 ` [PATCH bpf-next v6 2/8] bpf: reject unhashed sockets in bpf_sk_assign Lorenz Bauer
@ 2023-07-20 15:30 ` Lorenz Bauer
  2023-07-20 15:30 ` [PATCH bpf-next v6 4/8] net: remove duplicate reuseport_lookup functions Lorenz Bauer
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Lorenz Bauer @ 2023-07-20 15:30 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	David Ahern, Willem de Bruijn, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Joe Stringer, Mykola Lysenko, Shuah Khan,
	Kuniyuki Iwashima
  Cc: Hemanth Malla, netdev, linux-kernel, bpf, linux-kselftest, Lorenz Bauer

Rename the existing reuseport helpers for IPv4 and IPv6 so that they
can be invoked in the follow up commit. Export them so that building
DCCP and IPv6 as a module works.

No change in functionality.

Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
---
 include/net/inet6_hashtables.h |  7 +++++++
 include/net/inet_hashtables.h  |  5 +++++
 net/ipv4/inet_hashtables.c     | 15 ++++++++-------
 net/ipv6/inet6_hashtables.c    | 19 ++++++++++---------
 4 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h
index 56f1286583d3..032ddab48f8f 100644
--- a/include/net/inet6_hashtables.h
+++ b/include/net/inet6_hashtables.h
@@ -48,6 +48,13 @@ struct sock *__inet6_lookup_established(struct net *net,
 					const u16 hnum, const int dif,
 					const int sdif);
 
+struct sock *inet6_lookup_reuseport(struct net *net, struct sock *sk,
+				    struct sk_buff *skb, int doff,
+				    const struct in6_addr *saddr,
+				    __be16 sport,
+				    const struct in6_addr *daddr,
+				    unsigned short hnum);
+
 struct sock *inet6_lookup_listener(struct net *net,
 				   struct inet_hashinfo *hashinfo,
 				   struct sk_buff *skb, int doff,
diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index 99bd823e97f6..8734f3488f5d 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -379,6 +379,11 @@ struct sock *__inet_lookup_established(struct net *net,
 				       const __be32 daddr, const u16 hnum,
 				       const int dif, const int sdif);
 
+struct sock *inet_lookup_reuseport(struct net *net, struct sock *sk,
+				   struct sk_buff *skb, int doff,
+				   __be32 saddr, __be16 sport,
+				   __be32 daddr, unsigned short hnum);
+
 static inline struct sock *
 	inet_lookup_established(struct net *net, struct inet_hashinfo *hashinfo,
 				const __be32 saddr, const __be16 sport,
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index e7391bf310a7..920131e4a65d 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -332,10 +332,10 @@ static inline int compute_score(struct sock *sk, struct net *net,
 	return score;
 }
 
-static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk,
-					    struct sk_buff *skb, int doff,
-					    __be32 saddr, __be16 sport,
-					    __be32 daddr, unsigned short hnum)
+struct sock *inet_lookup_reuseport(struct net *net, struct sock *sk,
+				   struct sk_buff *skb, int doff,
+				   __be32 saddr, __be16 sport,
+				   __be32 daddr, unsigned short hnum)
 {
 	struct sock *reuse_sk = NULL;
 	u32 phash;
@@ -346,6 +346,7 @@ static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk,
 	}
 	return reuse_sk;
 }
+EXPORT_SYMBOL_GPL(inet_lookup_reuseport);
 
 /*
  * Here are some nice properties to exploit here. The BSD API
@@ -369,8 +370,8 @@ static struct sock *inet_lhash2_lookup(struct net *net,
 	sk_nulls_for_each_rcu(sk, node, &ilb2->nulls_head) {
 		score = compute_score(sk, net, hnum, daddr, dif, sdif);
 		if (score > hiscore) {
-			result = lookup_reuseport(net, sk, skb, doff,
-						  saddr, sport, daddr, hnum);
+			result = inet_lookup_reuseport(net, sk, skb, doff,
+						       saddr, sport, daddr, hnum);
 			if (result)
 				return result;
 
@@ -399,7 +400,7 @@ static inline struct sock *inet_lookup_run_bpf(struct net *net,
 	if (no_reuseport || IS_ERR_OR_NULL(sk))
 		return sk;
 
-	reuse_sk = lookup_reuseport(net, sk, skb, doff, saddr, sport, daddr, hnum);
+	reuse_sk = inet_lookup_reuseport(net, sk, skb, doff, saddr, sport, daddr, hnum);
 	if (reuse_sk)
 		sk = reuse_sk;
 	return sk;
diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
index b64b49012655..b7c56867314e 100644
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -111,12 +111,12 @@ static inline int compute_score(struct sock *sk, struct net *net,
 	return score;
 }
 
-static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk,
-					    struct sk_buff *skb, int doff,
-					    const struct in6_addr *saddr,
-					    __be16 sport,
-					    const struct in6_addr *daddr,
-					    unsigned short hnum)
+struct sock *inet6_lookup_reuseport(struct net *net, struct sock *sk,
+				    struct sk_buff *skb, int doff,
+				    const struct in6_addr *saddr,
+				    __be16 sport,
+				    const struct in6_addr *daddr,
+				    unsigned short hnum)
 {
 	struct sock *reuse_sk = NULL;
 	u32 phash;
@@ -127,6 +127,7 @@ static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk,
 	}
 	return reuse_sk;
 }
+EXPORT_SYMBOL_GPL(inet6_lookup_reuseport);
 
 /* called with rcu_read_lock() */
 static struct sock *inet6_lhash2_lookup(struct net *net,
@@ -143,8 +144,8 @@ static struct sock *inet6_lhash2_lookup(struct net *net,
 	sk_nulls_for_each_rcu(sk, node, &ilb2->nulls_head) {
 		score = compute_score(sk, net, hnum, daddr, dif, sdif);
 		if (score > hiscore) {
-			result = lookup_reuseport(net, sk, skb, doff,
-						  saddr, sport, daddr, hnum);
+			result = inet6_lookup_reuseport(net, sk, skb, doff,
+							saddr, sport, daddr, hnum);
 			if (result)
 				return result;
 
@@ -175,7 +176,7 @@ static inline struct sock *inet6_lookup_run_bpf(struct net *net,
 	if (no_reuseport || IS_ERR_OR_NULL(sk))
 		return sk;
 
-	reuse_sk = lookup_reuseport(net, sk, skb, doff, saddr, sport, daddr, hnum);
+	reuse_sk = inet6_lookup_reuseport(net, sk, skb, doff, saddr, sport, daddr, hnum);
 	if (reuse_sk)
 		sk = reuse_sk;
 	return sk;

-- 
2.41.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH bpf-next v6 4/8] net: remove duplicate reuseport_lookup functions
  2023-07-20 15:30 [PATCH bpf-next v6 0/8] Add SO_REUSEPORT support for TC bpf_sk_assign Lorenz Bauer
                   ` (2 preceding siblings ...)
  2023-07-20 15:30 ` [PATCH bpf-next v6 3/8] net: export inet_lookup_reuseport and inet6_lookup_reuseport Lorenz Bauer
@ 2023-07-20 15:30 ` Lorenz Bauer
  2023-07-24 22:55   ` Martin KaFai Lau
  2023-07-25  0:53   ` Martin KaFai Lau
  2023-07-20 15:30 ` [PATCH bpf-next v6 5/8] net: document inet[6]_lookup_reuseport sk_state requirements Lorenz Bauer
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 19+ messages in thread
From: Lorenz Bauer @ 2023-07-20 15:30 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	David Ahern, Willem de Bruijn, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Joe Stringer, Mykola Lysenko, Shuah Khan,
	Kuniyuki Iwashima
  Cc: Hemanth Malla, netdev, linux-kernel, bpf, linux-kselftest, Lorenz Bauer

There are currently four copies of reuseport_lookup: one each for
(TCP, UDP)x(IPv4, IPv6). This forces us to duplicate all callers of
those functions as well. This is already the case for sk_lookup
helpers (inet,inet6,udp4,udp6)_lookup_run_bpf.

There are two differences between the reuseport_lookup helpers:

1. They call different hash functions depending on protocol
2. UDP reuseport_lookup checks that sk_state != TCP_ESTABLISHED

Move the check for sk_state into the caller and use the INDIRECT_CALL
infrastructure to cut down the helpers to one per IP version.

Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
---
 include/net/inet6_hashtables.h | 11 ++++++++++-
 include/net/inet_hashtables.h  | 15 ++++++++++-----
 net/ipv4/inet_hashtables.c     | 20 +++++++++++++-------
 net/ipv4/udp.c                 | 34 +++++++++++++---------------------
 net/ipv6/inet6_hashtables.c    | 14 ++++++++++----
 net/ipv6/udp.c                 | 41 ++++++++++++++++-------------------------
 6 files changed, 72 insertions(+), 63 deletions(-)

diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h
index 032ddab48f8f..f89320b6fee3 100644
--- a/include/net/inet6_hashtables.h
+++ b/include/net/inet6_hashtables.h
@@ -48,12 +48,21 @@ struct sock *__inet6_lookup_established(struct net *net,
 					const u16 hnum, const int dif,
 					const int sdif);
 
+typedef u32 (inet6_ehashfn_t)(const struct net *net,
+			       const struct in6_addr *laddr, const u16 lport,
+			       const struct in6_addr *faddr, const __be16 fport);
+
+inet6_ehashfn_t inet6_ehashfn;
+
+INDIRECT_CALLABLE_DECLARE(inet6_ehashfn_t udp6_ehashfn);
+
 struct sock *inet6_lookup_reuseport(struct net *net, struct sock *sk,
 				    struct sk_buff *skb, int doff,
 				    const struct in6_addr *saddr,
 				    __be16 sport,
 				    const struct in6_addr *daddr,
-				    unsigned short hnum);
+				    unsigned short hnum,
+				    inet6_ehashfn_t *ehashfn);
 
 struct sock *inet6_lookup_listener(struct net *net,
 				   struct inet_hashinfo *hashinfo,
diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index 8734f3488f5d..ddfa2e67fdb5 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -379,10 +379,19 @@ struct sock *__inet_lookup_established(struct net *net,
 				       const __be32 daddr, const u16 hnum,
 				       const int dif, const int sdif);
 
+typedef u32 (inet_ehashfn_t)(const struct net *net,
+			      const __be32 laddr, const __u16 lport,
+			      const __be32 faddr, const __be16 fport);
+
+inet_ehashfn_t inet_ehashfn;
+
+INDIRECT_CALLABLE_DECLARE(inet_ehashfn_t udp_ehashfn);
+
 struct sock *inet_lookup_reuseport(struct net *net, struct sock *sk,
 				   struct sk_buff *skb, int doff,
 				   __be32 saddr, __be16 sport,
-				   __be32 daddr, unsigned short hnum);
+				   __be32 daddr, unsigned short hnum,
+				   inet_ehashfn_t *ehashfn);
 
 static inline struct sock *
 	inet_lookup_established(struct net *net, struct inet_hashinfo *hashinfo,
@@ -453,10 +462,6 @@ static inline struct sock *__inet_lookup_skb(struct inet_hashinfo *hashinfo,
 			     refcounted);
 }
 
-u32 inet6_ehashfn(const struct net *net,
-		  const struct in6_addr *laddr, const u16 lport,
-		  const struct in6_addr *faddr, const __be16 fport);
-
 static inline void sk_daddr_set(struct sock *sk, __be32 addr)
 {
 	sk->sk_daddr = addr; /* alias of inet_daddr */
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 920131e4a65d..352eb371c93b 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -28,9 +28,9 @@
 #include <net/tcp.h>
 #include <net/sock_reuseport.h>
 
-static u32 inet_ehashfn(const struct net *net, const __be32 laddr,
-			const __u16 lport, const __be32 faddr,
-			const __be16 fport)
+u32 inet_ehashfn(const struct net *net, const __be32 laddr,
+		 const __u16 lport, const __be32 faddr,
+		 const __be16 fport)
 {
 	static u32 inet_ehash_secret __read_mostly;
 
@@ -39,6 +39,7 @@ static u32 inet_ehashfn(const struct net *net, const __be32 laddr,
 	return __inet_ehashfn(laddr, lport, faddr, fport,
 			      inet_ehash_secret + net_hash_mix(net));
 }
+EXPORT_SYMBOL_GPL(inet_ehashfn);
 
 /* This function handles inet_sock, but also timewait and request sockets
  * for IPv4/IPv6.
@@ -332,16 +333,20 @@ static inline int compute_score(struct sock *sk, struct net *net,
 	return score;
 }
 
+INDIRECT_CALLABLE_DECLARE(inet_ehashfn_t udp_ehashfn);
+
 struct sock *inet_lookup_reuseport(struct net *net, struct sock *sk,
 				   struct sk_buff *skb, int doff,
 				   __be32 saddr, __be16 sport,
-				   __be32 daddr, unsigned short hnum)
+				   __be32 daddr, unsigned short hnum,
+				   inet_ehashfn_t *ehashfn)
 {
 	struct sock *reuse_sk = NULL;
 	u32 phash;
 
 	if (sk->sk_reuseport) {
-		phash = inet_ehashfn(net, daddr, hnum, saddr, sport);
+		phash = INDIRECT_CALL_2(ehashfn, udp_ehashfn, inet_ehashfn,
+					net, daddr, hnum, saddr, sport);
 		reuse_sk = reuseport_select_sock(sk, phash, skb, doff);
 	}
 	return reuse_sk;
@@ -371,7 +376,7 @@ static struct sock *inet_lhash2_lookup(struct net *net,
 		score = compute_score(sk, net, hnum, daddr, dif, sdif);
 		if (score > hiscore) {
 			result = inet_lookup_reuseport(net, sk, skb, doff,
-						       saddr, sport, daddr, hnum);
+						       saddr, sport, daddr, hnum, inet_ehashfn);
 			if (result)
 				return result;
 
@@ -400,7 +405,8 @@ static inline struct sock *inet_lookup_run_bpf(struct net *net,
 	if (no_reuseport || IS_ERR_OR_NULL(sk))
 		return sk;
 
-	reuse_sk = inet_lookup_reuseport(net, sk, skb, doff, saddr, sport, daddr, hnum);
+	reuse_sk = inet_lookup_reuseport(net, sk, skb, doff, saddr, sport, daddr, hnum,
+					 inet_ehashfn);
 	if (reuse_sk)
 		sk = reuse_sk;
 	return sk;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index c62d5e1c6675..55f683b31c93 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -406,9 +406,9 @@ static int compute_score(struct sock *sk, struct net *net,
 	return score;
 }
 
-static u32 udp_ehashfn(const struct net *net, const __be32 laddr,
-		       const __u16 lport, const __be32 faddr,
-		       const __be16 fport)
+INDIRECT_CALLABLE_SCOPE
+u32 udp_ehashfn(const struct net *net, const __be32 laddr, const __u16 lport,
+		const __be32 faddr, const __be16 fport)
 {
 	static u32 udp_ehash_secret __read_mostly;
 
@@ -418,22 +418,6 @@ static u32 udp_ehashfn(const struct net *net, const __be32 laddr,
 			      udp_ehash_secret + net_hash_mix(net));
 }
 
-static struct sock *lookup_reuseport(struct net *net, struct sock *sk,
-				     struct sk_buff *skb,
-				     __be32 saddr, __be16 sport,
-				     __be32 daddr, unsigned short hnum)
-{
-	struct sock *reuse_sk = NULL;
-	u32 hash;
-
-	if (sk->sk_reuseport && sk->sk_state != TCP_ESTABLISHED) {
-		hash = udp_ehashfn(net, daddr, hnum, saddr, sport);
-		reuse_sk = reuseport_select_sock(sk, hash, skb,
-						 sizeof(struct udphdr));
-	}
-	return reuse_sk;
-}
-
 /* called with rcu_read_lock() */
 static struct sock *udp4_lib_lookup2(struct net *net,
 				     __be32 saddr, __be16 sport,
@@ -452,7 +436,14 @@ static struct sock *udp4_lib_lookup2(struct net *net,
 				      daddr, hnum, dif, sdif);
 		if (score > badness) {
 			badness = score;
-			result = lookup_reuseport(net, sk, skb, saddr, sport, daddr, hnum);
+
+			if (sk->sk_state == TCP_ESTABLISHED) {
+				result = sk;
+				continue;
+			}
+
+			result = inet_lookup_reuseport(net, sk, skb, sizeof(struct udphdr),
+						       saddr, sport, daddr, hnum, udp_ehashfn);
 			if (!result) {
 				result = sk;
 				continue;
@@ -491,7 +482,8 @@ static struct sock *udp4_lookup_run_bpf(struct net *net,
 	if (no_reuseport || IS_ERR_OR_NULL(sk))
 		return sk;
 
-	reuse_sk = lookup_reuseport(net, sk, skb, saddr, sport, daddr, hnum);
+	reuse_sk = inet_lookup_reuseport(net, sk, skb, sizeof(struct udphdr),
+					 saddr, sport, daddr, hnum, udp_ehashfn);
 	if (reuse_sk)
 		sk = reuse_sk;
 	return sk;
diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
index b7c56867314e..3616225c89ef 100644
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -39,6 +39,7 @@ u32 inet6_ehashfn(const struct net *net,
 	return __inet6_ehashfn(lhash, lport, fhash, fport,
 			       inet6_ehash_secret + net_hash_mix(net));
 }
+EXPORT_SYMBOL_GPL(inet6_ehashfn);
 
 /*
  * Sockets in TCP_CLOSE state are _always_ taken out of the hash, so
@@ -111,18 +112,22 @@ static inline int compute_score(struct sock *sk, struct net *net,
 	return score;
 }
 
+INDIRECT_CALLABLE_DECLARE(inet6_ehashfn_t udp6_ehashfn);
+
 struct sock *inet6_lookup_reuseport(struct net *net, struct sock *sk,
 				    struct sk_buff *skb, int doff,
 				    const struct in6_addr *saddr,
 				    __be16 sport,
 				    const struct in6_addr *daddr,
-				    unsigned short hnum)
+				    unsigned short hnum,
+				    inet6_ehashfn_t *ehashfn)
 {
 	struct sock *reuse_sk = NULL;
 	u32 phash;
 
 	if (sk->sk_reuseport) {
-		phash = inet6_ehashfn(net, daddr, hnum, saddr, sport);
+		phash = INDIRECT_CALL_INET(ehashfn, udp6_ehashfn, inet6_ehashfn,
+					   net, daddr, hnum, saddr, sport);
 		reuse_sk = reuseport_select_sock(sk, phash, skb, doff);
 	}
 	return reuse_sk;
@@ -145,7 +150,7 @@ static struct sock *inet6_lhash2_lookup(struct net *net,
 		score = compute_score(sk, net, hnum, daddr, dif, sdif);
 		if (score > hiscore) {
 			result = inet6_lookup_reuseport(net, sk, skb, doff,
-							saddr, sport, daddr, hnum);
+							saddr, sport, daddr, hnum, inet6_ehashfn);
 			if (result)
 				return result;
 
@@ -176,7 +181,8 @@ static inline struct sock *inet6_lookup_run_bpf(struct net *net,
 	if (no_reuseport || IS_ERR_OR_NULL(sk))
 		return sk;
 
-	reuse_sk = inet6_lookup_reuseport(net, sk, skb, doff, saddr, sport, daddr, hnum);
+	reuse_sk = inet6_lookup_reuseport(net, sk, skb, doff,
+					  saddr, sport, daddr, hnum, inet6_ehashfn);
 	if (reuse_sk)
 		sk = reuse_sk;
 	return sk;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index dec69f0379e9..a65f8d72543d 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -71,11 +71,12 @@ int udpv6_init_sock(struct sock *sk)
 	return 0;
 }
 
-static u32 udp6_ehashfn(const struct net *net,
-			const struct in6_addr *laddr,
-			const u16 lport,
-			const struct in6_addr *faddr,
-			const __be16 fport)
+INDIRECT_CALLABLE_SCOPE
+u32 udp6_ehashfn(const struct net *net,
+		 const struct in6_addr *laddr,
+		 const u16 lport,
+		 const struct in6_addr *faddr,
+		 const __be16 fport)
 {
 	static u32 udp6_ehash_secret __read_mostly;
 	static u32 udp_ipv6_hash_secret __read_mostly;
@@ -160,24 +161,6 @@ static int compute_score(struct sock *sk, struct net *net,
 	return score;
 }
 
-static struct sock *lookup_reuseport(struct net *net, struct sock *sk,
-				     struct sk_buff *skb,
-				     const struct in6_addr *saddr,
-				     __be16 sport,
-				     const struct in6_addr *daddr,
-				     unsigned int hnum)
-{
-	struct sock *reuse_sk = NULL;
-	u32 hash;
-
-	if (sk->sk_reuseport && sk->sk_state != TCP_ESTABLISHED) {
-		hash = udp6_ehashfn(net, daddr, hnum, saddr, sport);
-		reuse_sk = reuseport_select_sock(sk, hash, skb,
-						 sizeof(struct udphdr));
-	}
-	return reuse_sk;
-}
-
 /* called with rcu_read_lock() */
 static struct sock *udp6_lib_lookup2(struct net *net,
 		const struct in6_addr *saddr, __be16 sport,
@@ -195,7 +178,14 @@ static struct sock *udp6_lib_lookup2(struct net *net,
 				      daddr, hnum, dif, sdif);
 		if (score > badness) {
 			badness = score;
-			result = lookup_reuseport(net, sk, skb, saddr, sport, daddr, hnum);
+
+			if (sk->sk_state == TCP_ESTABLISHED) {
+				result = sk;
+				continue;
+			}
+
+			result = inet6_lookup_reuseport(net, sk, skb, sizeof(struct udphdr),
+							saddr, sport, daddr, hnum, udp6_ehashfn);
 			if (!result) {
 				result = sk;
 				continue;
@@ -235,7 +225,8 @@ static inline struct sock *udp6_lookup_run_bpf(struct net *net,
 	if (no_reuseport || IS_ERR_OR_NULL(sk))
 		return sk;
 
-	reuse_sk = lookup_reuseport(net, sk, skb, saddr, sport, daddr, hnum);
+	reuse_sk = inet6_lookup_reuseport(net, sk, skb, sizeof(struct udphdr),
+					  saddr, sport, daddr, hnum, udp6_ehashfn);
 	if (reuse_sk)
 		sk = reuse_sk;
 	return sk;

-- 
2.41.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH bpf-next v6 5/8] net: document inet[6]_lookup_reuseport sk_state requirements
  2023-07-20 15:30 [PATCH bpf-next v6 0/8] Add SO_REUSEPORT support for TC bpf_sk_assign Lorenz Bauer
                   ` (3 preceding siblings ...)
  2023-07-20 15:30 ` [PATCH bpf-next v6 4/8] net: remove duplicate reuseport_lookup functions Lorenz Bauer
@ 2023-07-20 15:30 ` Lorenz Bauer
  2023-07-20 15:30 ` [PATCH bpf-next v6 6/8] net: remove duplicate sk_lookup helpers Lorenz Bauer
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Lorenz Bauer @ 2023-07-20 15:30 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	David Ahern, Willem de Bruijn, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Joe Stringer, Mykola Lysenko, Shuah Khan,
	Kuniyuki Iwashima
  Cc: Hemanth Malla, netdev, linux-kernel, bpf, linux-kselftest, Lorenz Bauer

The current implementation was extracted from inet[6]_lhash2_lookup
in commit 80b373f74f9e ("inet: Extract helper for selecting socket
from reuseport group") and commit 5df6531292b5 ("inet6: Extract helper
for selecting socket from reuseport group"). In the original context,
sk is always in TCP_LISTEN state and so did not have a separate check.

Add documentation that specifies which sk_state are valid to pass to
the function.

Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
---
 net/ipv4/inet_hashtables.c  | 15 +++++++++++++++
 net/ipv6/inet6_hashtables.c | 15 +++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 352eb371c93b..64fc1bd3fb63 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -335,6 +335,21 @@ static inline int compute_score(struct sock *sk, struct net *net,
 
 INDIRECT_CALLABLE_DECLARE(inet_ehashfn_t udp_ehashfn);
 
+/**
+ * inet_lookup_reuseport() - execute reuseport logic on AF_INET socket if necessary.
+ * @net: network namespace.
+ * @sk: AF_INET socket, must be in TCP_LISTEN state for TCP or TCP_CLOSE for UDP.
+ * @skb: context for a potential SK_REUSEPORT program.
+ * @doff: header offset.
+ * @saddr: source address.
+ * @sport: source port.
+ * @daddr: destination address.
+ * @hnum: destination port in host byte order.
+ * @ehashfn: hash function used to generate the fallback hash.
+ *
+ * Return: NULL if sk doesn't have SO_REUSEPORT set, otherwise a pointer to
+ *         the selected sock or an error.
+ */
 struct sock *inet_lookup_reuseport(struct net *net, struct sock *sk,
 				   struct sk_buff *skb, int doff,
 				   __be32 saddr, __be16 sport,
diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
index 3616225c89ef..f76dbbb29332 100644
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -114,6 +114,21 @@ static inline int compute_score(struct sock *sk, struct net *net,
 
 INDIRECT_CALLABLE_DECLARE(inet6_ehashfn_t udp6_ehashfn);
 
+/**
+ * inet6_lookup_reuseport() - execute reuseport logic on AF_INET6 socket if necessary.
+ * @net: network namespace.
+ * @sk: AF_INET6 socket, must be in TCP_LISTEN state for TCP or TCP_CLOSE for UDP.
+ * @skb: context for a potential SK_REUSEPORT program.
+ * @doff: header offset.
+ * @saddr: source address.
+ * @sport: source port.
+ * @daddr: destination address.
+ * @hnum: destination port in host byte order.
+ * @ehashfn: hash function used to generate the fallback hash.
+ *
+ * Return: NULL if sk doesn't have SO_REUSEPORT set, otherwise a pointer to
+ *         the selected sock or an error.
+ */
 struct sock *inet6_lookup_reuseport(struct net *net, struct sock *sk,
 				    struct sk_buff *skb, int doff,
 				    const struct in6_addr *saddr,

-- 
2.41.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH bpf-next v6 6/8] net: remove duplicate sk_lookup helpers
  2023-07-20 15:30 [PATCH bpf-next v6 0/8] Add SO_REUSEPORT support for TC bpf_sk_assign Lorenz Bauer
                   ` (4 preceding siblings ...)
  2023-07-20 15:30 ` [PATCH bpf-next v6 5/8] net: document inet[6]_lookup_reuseport sk_state requirements Lorenz Bauer
@ 2023-07-20 15:30 ` Lorenz Bauer
  2023-07-20 15:30 ` [PATCH bpf-next v6 7/8] bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign Lorenz Bauer
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Lorenz Bauer @ 2023-07-20 15:30 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	David Ahern, Willem de Bruijn, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Joe Stringer, Mykola Lysenko, Shuah Khan,
	Kuniyuki Iwashima
  Cc: Hemanth Malla, netdev, linux-kernel, bpf, linux-kselftest, Lorenz Bauer

Now that inet[6]_lookup_reuseport are parameterised on the ehashfn
we can remove two sk_lookup helpers.

Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
---
 include/net/inet6_hashtables.h |  9 +++++++++
 include/net/inet_hashtables.h  |  7 +++++++
 net/ipv4/inet_hashtables.c     | 26 +++++++++++++-------------
 net/ipv4/udp.c                 | 32 +++++---------------------------
 net/ipv6/inet6_hashtables.c    | 31 ++++++++++++++++---------------
 net/ipv6/udp.c                 | 34 +++++-----------------------------
 6 files changed, 55 insertions(+), 84 deletions(-)

diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h
index f89320b6fee3..a6722d6ef80f 100644
--- a/include/net/inet6_hashtables.h
+++ b/include/net/inet6_hashtables.h
@@ -73,6 +73,15 @@ struct sock *inet6_lookup_listener(struct net *net,
 				   const unsigned short hnum,
 				   const int dif, const int sdif);
 
+struct sock *inet6_lookup_run_sk_lookup(struct net *net,
+					int protocol,
+					struct sk_buff *skb, int doff,
+					const struct in6_addr *saddr,
+					const __be16 sport,
+					const struct in6_addr *daddr,
+					const u16 hnum, const int dif,
+					inet6_ehashfn_t *ehashfn);
+
 static inline struct sock *__inet6_lookup(struct net *net,
 					  struct inet_hashinfo *hashinfo,
 					  struct sk_buff *skb, int doff,
diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index ddfa2e67fdb5..c0532cc7587f 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -393,6 +393,13 @@ struct sock *inet_lookup_reuseport(struct net *net, struct sock *sk,
 				   __be32 daddr, unsigned short hnum,
 				   inet_ehashfn_t *ehashfn);
 
+struct sock *inet_lookup_run_sk_lookup(struct net *net,
+				       int protocol,
+				       struct sk_buff *skb, int doff,
+				       __be32 saddr, __be16 sport,
+				       __be32 daddr, u16 hnum, const int dif,
+				       inet_ehashfn_t *ehashfn);
+
 static inline struct sock *
 	inet_lookup_established(struct net *net, struct inet_hashinfo *hashinfo,
 				const __be32 saddr, const __be16 sport,
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 64fc1bd3fb63..75b1ed1b89f2 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -403,25 +403,23 @@ static struct sock *inet_lhash2_lookup(struct net *net,
 	return result;
 }
 
-static inline struct sock *inet_lookup_run_bpf(struct net *net,
-					       struct inet_hashinfo *hashinfo,
-					       struct sk_buff *skb, int doff,
-					       __be32 saddr, __be16 sport,
-					       __be32 daddr, u16 hnum, const int dif)
+struct sock *inet_lookup_run_sk_lookup(struct net *net,
+				       int protocol,
+				       struct sk_buff *skb, int doff,
+				       __be32 saddr, __be16 sport,
+				       __be32 daddr, u16 hnum, const int dif,
+				       inet_ehashfn_t *ehashfn)
 {
 	struct sock *sk, *reuse_sk;
 	bool no_reuseport;
 
-	if (hashinfo != net->ipv4.tcp_death_row.hashinfo)
-		return NULL; /* only TCP is supported */
-
-	no_reuseport = bpf_sk_lookup_run_v4(net, IPPROTO_TCP, saddr, sport,
+	no_reuseport = bpf_sk_lookup_run_v4(net, protocol, saddr, sport,
 					    daddr, hnum, dif, &sk);
 	if (no_reuseport || IS_ERR_OR_NULL(sk))
 		return sk;
 
 	reuse_sk = inet_lookup_reuseport(net, sk, skb, doff, saddr, sport, daddr, hnum,
-					 inet_ehashfn);
+					 ehashfn);
 	if (reuse_sk)
 		sk = reuse_sk;
 	return sk;
@@ -439,9 +437,11 @@ struct sock *__inet_lookup_listener(struct net *net,
 	unsigned int hash2;
 
 	/* Lookup redirect from BPF */
-	if (static_branch_unlikely(&bpf_sk_lookup_enabled)) {
-		result = inet_lookup_run_bpf(net, hashinfo, skb, doff,
-					     saddr, sport, daddr, hnum, dif);
+	if (static_branch_unlikely(&bpf_sk_lookup_enabled) &&
+	    hashinfo == net->ipv4.tcp_death_row.hashinfo) {
+		result = inet_lookup_run_sk_lookup(net, IPPROTO_TCP, skb, doff,
+						   saddr, sport, daddr, hnum, dif,
+						   inet_ehashfn);
 		if (result)
 			goto done;
 	}
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 55f683b31c93..045eca6ed177 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -465,30 +465,6 @@ static struct sock *udp4_lib_lookup2(struct net *net,
 	return result;
 }
 
-static struct sock *udp4_lookup_run_bpf(struct net *net,
-					struct udp_table *udptable,
-					struct sk_buff *skb,
-					__be32 saddr, __be16 sport,
-					__be32 daddr, u16 hnum, const int dif)
-{
-	struct sock *sk, *reuse_sk;
-	bool no_reuseport;
-
-	if (udptable != net->ipv4.udp_table)
-		return NULL; /* only UDP is supported */
-
-	no_reuseport = bpf_sk_lookup_run_v4(net, IPPROTO_UDP, saddr, sport,
-					    daddr, hnum, dif, &sk);
-	if (no_reuseport || IS_ERR_OR_NULL(sk))
-		return sk;
-
-	reuse_sk = inet_lookup_reuseport(net, sk, skb, sizeof(struct udphdr),
-					 saddr, sport, daddr, hnum, udp_ehashfn);
-	if (reuse_sk)
-		sk = reuse_sk;
-	return sk;
-}
-
 /* UDP is nearly always wildcards out the wazoo, it makes no sense to try
  * harder than this. -DaveM
  */
@@ -513,9 +489,11 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,
 		goto done;
 
 	/* Lookup redirect from BPF */
-	if (static_branch_unlikely(&bpf_sk_lookup_enabled)) {
-		sk = udp4_lookup_run_bpf(net, udptable, skb,
-					 saddr, sport, daddr, hnum, dif);
+	if (static_branch_unlikely(&bpf_sk_lookup_enabled) &&
+	    udptable == net->ipv4.udp_table) {
+		sk = inet_lookup_run_sk_lookup(net, IPPROTO_UDP, skb, sizeof(struct udphdr),
+					       saddr, sport, daddr, hnum, dif,
+					       udp_ehashfn);
 		if (sk) {
 			result = sk;
 			goto done;
diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
index f76dbbb29332..7c9700c7c9c8 100644
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -177,31 +177,30 @@ static struct sock *inet6_lhash2_lookup(struct net *net,
 	return result;
 }
 
-static inline struct sock *inet6_lookup_run_bpf(struct net *net,
-						struct inet_hashinfo *hashinfo,
-						struct sk_buff *skb, int doff,
-						const struct in6_addr *saddr,
-						const __be16 sport,
-						const struct in6_addr *daddr,
-						const u16 hnum, const int dif)
+struct sock *inet6_lookup_run_sk_lookup(struct net *net,
+					int protocol,
+					struct sk_buff *skb, int doff,
+					const struct in6_addr *saddr,
+					const __be16 sport,
+					const struct in6_addr *daddr,
+					const u16 hnum, const int dif,
+					inet6_ehashfn_t *ehashfn)
 {
 	struct sock *sk, *reuse_sk;
 	bool no_reuseport;
 
-	if (hashinfo != net->ipv4.tcp_death_row.hashinfo)
-		return NULL; /* only TCP is supported */
-
-	no_reuseport = bpf_sk_lookup_run_v6(net, IPPROTO_TCP, saddr, sport,
+	no_reuseport = bpf_sk_lookup_run_v6(net, protocol, saddr, sport,
 					    daddr, hnum, dif, &sk);
 	if (no_reuseport || IS_ERR_OR_NULL(sk))
 		return sk;
 
 	reuse_sk = inet6_lookup_reuseport(net, sk, skb, doff,
-					  saddr, sport, daddr, hnum, inet6_ehashfn);
+					  saddr, sport, daddr, hnum, ehashfn);
 	if (reuse_sk)
 		sk = reuse_sk;
 	return sk;
 }
+EXPORT_SYMBOL_GPL(inet6_lookup_run_sk_lookup);
 
 struct sock *inet6_lookup_listener(struct net *net,
 		struct inet_hashinfo *hashinfo,
@@ -215,9 +214,11 @@ struct sock *inet6_lookup_listener(struct net *net,
 	unsigned int hash2;
 
 	/* Lookup redirect from BPF */
-	if (static_branch_unlikely(&bpf_sk_lookup_enabled)) {
-		result = inet6_lookup_run_bpf(net, hashinfo, skb, doff,
-					      saddr, sport, daddr, hnum, dif);
+	if (static_branch_unlikely(&bpf_sk_lookup_enabled) &&
+	    hashinfo == net->ipv4.tcp_death_row.hashinfo) {
+		result = inet6_lookup_run_sk_lookup(net, IPPROTO_TCP, skb, doff,
+						    saddr, sport, daddr, hnum, dif,
+						    inet6_ehashfn);
 		if (result)
 			goto done;
 	}
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index a65f8d72543d..109f14b17a09 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -206,32 +206,6 @@ static struct sock *udp6_lib_lookup2(struct net *net,
 	return result;
 }
 
-static inline struct sock *udp6_lookup_run_bpf(struct net *net,
-					       struct udp_table *udptable,
-					       struct sk_buff *skb,
-					       const struct in6_addr *saddr,
-					       __be16 sport,
-					       const struct in6_addr *daddr,
-					       u16 hnum, const int dif)
-{
-	struct sock *sk, *reuse_sk;
-	bool no_reuseport;
-
-	if (udptable != net->ipv4.udp_table)
-		return NULL; /* only UDP is supported */
-
-	no_reuseport = bpf_sk_lookup_run_v6(net, IPPROTO_UDP, saddr, sport,
-					    daddr, hnum, dif, &sk);
-	if (no_reuseport || IS_ERR_OR_NULL(sk))
-		return sk;
-
-	reuse_sk = inet6_lookup_reuseport(net, sk, skb, sizeof(struct udphdr),
-					  saddr, sport, daddr, hnum, udp6_ehashfn);
-	if (reuse_sk)
-		sk = reuse_sk;
-	return sk;
-}
-
 /* rcu_read_lock() must be held */
 struct sock *__udp6_lib_lookup(struct net *net,
 			       const struct in6_addr *saddr, __be16 sport,
@@ -256,9 +230,11 @@ struct sock *__udp6_lib_lookup(struct net *net,
 		goto done;
 
 	/* Lookup redirect from BPF */
-	if (static_branch_unlikely(&bpf_sk_lookup_enabled)) {
-		sk = udp6_lookup_run_bpf(net, udptable, skb,
-					 saddr, sport, daddr, hnum, dif);
+	if (static_branch_unlikely(&bpf_sk_lookup_enabled) &&
+	    udptable == net->ipv4.udp_table) {
+		sk = inet6_lookup_run_sk_lookup(net, IPPROTO_UDP, skb, sizeof(struct udphdr),
+						saddr, sport, daddr, hnum, dif,
+						udp6_ehashfn);
 		if (sk) {
 			result = sk;
 			goto done;

-- 
2.41.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH bpf-next v6 7/8] bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign
  2023-07-20 15:30 [PATCH bpf-next v6 0/8] Add SO_REUSEPORT support for TC bpf_sk_assign Lorenz Bauer
                   ` (5 preceding siblings ...)
  2023-07-20 15:30 ` [PATCH bpf-next v6 6/8] net: remove duplicate sk_lookup helpers Lorenz Bauer
@ 2023-07-20 15:30 ` Lorenz Bauer
  2023-07-20 21:34   ` Kuniyuki Iwashima
  2023-08-08  4:22   ` Kumar Kartikeya Dwivedi
  2023-07-20 15:30 ` [PATCH bpf-next v6 8/8] selftests/bpf: Test that SO_REUSEPORT can be used with sk_assign helper Lorenz Bauer
  2023-07-25 21:20 ` [PATCH bpf-next v6 0/8] Add SO_REUSEPORT support for TC bpf_sk_assign patchwork-bot+netdevbpf
  8 siblings, 2 replies; 19+ messages in thread
From: Lorenz Bauer @ 2023-07-20 15:30 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	David Ahern, Willem de Bruijn, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Joe Stringer, Mykola Lysenko, Shuah Khan,
	Kuniyuki Iwashima
  Cc: Hemanth Malla, netdev, linux-kernel, bpf, linux-kselftest,
	Lorenz Bauer, Joe Stringer

Currently the bpf_sk_assign helper in tc BPF context refuses SO_REUSEPORT
sockets. This means we can't use the helper to steer traffic to Envoy,
which configures SO_REUSEPORT on its sockets. In turn, we're blocked
from removing TPROXY from our setup.

The reason that bpf_sk_assign refuses such sockets is that the
bpf_sk_lookup helpers don't execute SK_REUSEPORT programs. Instead,
one of the reuseport sockets is selected by hash. This could cause
dispatch to the "wrong" socket:

    sk = bpf_sk_lookup_tcp(...) // select SO_REUSEPORT by hash
    bpf_sk_assign(skb, sk) // SK_REUSEPORT wasn't executed

Fixing this isn't as simple as invoking SK_REUSEPORT from the lookup
helpers unfortunately. In the tc context, L2 headers are at the start
of the skb, while SK_REUSEPORT expects L3 headers instead.

Instead, we execute the SK_REUSEPORT program when the assigned socket
is pulled out of the skb, further up the stack. This creates some
trickiness with regards to refcounting as bpf_sk_assign will put both
refcounted and RCU freed sockets in skb->sk. reuseport sockets are RCU
freed. We can infer that the sk_assigned socket is RCU freed if the
reuseport lookup succeeds, but convincing yourself of this fact isn't
straight forward. Therefore we defensively check refcounting on the
sk_assign sock even though it's probably not required in practice.

Fixes: 8e368dc72e86 ("bpf: Fix use of sk->sk_reuseport from sk_assign")
Fixes: cf7fbe660f2d ("bpf: Add socket assign support")
Co-developed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Joe Stringer <joe@cilium.io>
Link: https://lore.kernel.org/bpf/CACAyw98+qycmpQzKupquhkxbvWK4OFyDuuLMBNROnfWMZxUWeA@mail.gmail.com/
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
---
 include/net/inet6_hashtables.h | 56 ++++++++++++++++++++++++++++++++++++++----
 include/net/inet_hashtables.h  | 49 ++++++++++++++++++++++++++++++++++--
 include/net/sock.h             |  7 ++++--
 include/uapi/linux/bpf.h       |  3 ---
 net/core/filter.c              |  2 --
 net/ipv4/udp.c                 |  8 ++++--
 net/ipv6/udp.c                 | 10 +++++---
 tools/include/uapi/linux/bpf.h |  3 ---
 8 files changed, 116 insertions(+), 22 deletions(-)

diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h
index a6722d6ef80f..284b5ce7205d 100644
--- a/include/net/inet6_hashtables.h
+++ b/include/net/inet6_hashtables.h
@@ -103,6 +103,46 @@ static inline struct sock *__inet6_lookup(struct net *net,
 				     daddr, hnum, dif, sdif);
 }
 
+static inline
+struct sock *inet6_steal_sock(struct net *net, struct sk_buff *skb, int doff,
+			      const struct in6_addr *saddr, const __be16 sport,
+			      const struct in6_addr *daddr, const __be16 dport,
+			      bool *refcounted, inet6_ehashfn_t *ehashfn)
+{
+	struct sock *sk, *reuse_sk;
+	bool prefetched;
+
+	sk = skb_steal_sock(skb, refcounted, &prefetched);
+	if (!sk)
+		return NULL;
+
+	if (!prefetched)
+		return sk;
+
+	if (sk->sk_protocol == IPPROTO_TCP) {
+		if (sk->sk_state != TCP_LISTEN)
+			return sk;
+	} else if (sk->sk_protocol == IPPROTO_UDP) {
+		if (sk->sk_state != TCP_CLOSE)
+			return sk;
+	} else {
+		return sk;
+	}
+
+	reuse_sk = inet6_lookup_reuseport(net, sk, skb, doff,
+					  saddr, sport, daddr, ntohs(dport),
+					  ehashfn);
+	if (!reuse_sk)
+		return sk;
+
+	/* We've chosen a new reuseport sock which is never refcounted. This
+	 * implies that sk also isn't refcounted.
+	 */
+	WARN_ON_ONCE(*refcounted);
+
+	return reuse_sk;
+}
+
 static inline struct sock *__inet6_lookup_skb(struct inet_hashinfo *hashinfo,
 					      struct sk_buff *skb, int doff,
 					      const __be16 sport,
@@ -110,14 +150,20 @@ static inline struct sock *__inet6_lookup_skb(struct inet_hashinfo *hashinfo,
 					      int iif, int sdif,
 					      bool *refcounted)
 {
-	struct sock *sk = skb_steal_sock(skb, refcounted);
-
+	struct net *net = dev_net(skb_dst(skb)->dev);
+	const struct ipv6hdr *ip6h = ipv6_hdr(skb);
+	struct sock *sk;
+
+	sk = inet6_steal_sock(net, skb, doff, &ip6h->saddr, sport, &ip6h->daddr, dport,
+			      refcounted, inet6_ehashfn);
+	if (IS_ERR(sk))
+		return NULL;
 	if (sk)
 		return sk;
 
-	return __inet6_lookup(dev_net(skb_dst(skb)->dev), hashinfo, skb,
-			      doff, &ipv6_hdr(skb)->saddr, sport,
-			      &ipv6_hdr(skb)->daddr, ntohs(dport),
+	return __inet6_lookup(net, hashinfo, skb,
+			      doff, &ip6h->saddr, sport,
+			      &ip6h->daddr, ntohs(dport),
 			      iif, sdif, refcounted);
 }
 
diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index c0532cc7587f..1177effabed3 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -449,6 +449,46 @@ static inline struct sock *inet_lookup(struct net *net,
 	return sk;
 }
 
+static inline
+struct sock *inet_steal_sock(struct net *net, struct sk_buff *skb, int doff,
+			     const __be32 saddr, const __be16 sport,
+			     const __be32 daddr, const __be16 dport,
+			     bool *refcounted, inet_ehashfn_t *ehashfn)
+{
+	struct sock *sk, *reuse_sk;
+	bool prefetched;
+
+	sk = skb_steal_sock(skb, refcounted, &prefetched);
+	if (!sk)
+		return NULL;
+
+	if (!prefetched)
+		return sk;
+
+	if (sk->sk_protocol == IPPROTO_TCP) {
+		if (sk->sk_state != TCP_LISTEN)
+			return sk;
+	} else if (sk->sk_protocol == IPPROTO_UDP) {
+		if (sk->sk_state != TCP_CLOSE)
+			return sk;
+	} else {
+		return sk;
+	}
+
+	reuse_sk = inet_lookup_reuseport(net, sk, skb, doff,
+					 saddr, sport, daddr, ntohs(dport),
+					 ehashfn);
+	if (!reuse_sk)
+		return sk;
+
+	/* We've chosen a new reuseport sock which is never refcounted. This
+	 * implies that sk also isn't refcounted.
+	 */
+	WARN_ON_ONCE(*refcounted);
+
+	return reuse_sk;
+}
+
 static inline struct sock *__inet_lookup_skb(struct inet_hashinfo *hashinfo,
 					     struct sk_buff *skb,
 					     int doff,
@@ -457,13 +497,18 @@ static inline struct sock *__inet_lookup_skb(struct inet_hashinfo *hashinfo,
 					     const int sdif,
 					     bool *refcounted)
 {
-	struct sock *sk = skb_steal_sock(skb, refcounted);
+	struct net *net = dev_net(skb_dst(skb)->dev);
 	const struct iphdr *iph = ip_hdr(skb);
+	struct sock *sk;
 
+	sk = inet_steal_sock(net, skb, doff, iph->saddr, sport, iph->daddr, dport,
+			     refcounted, inet_ehashfn);
+	if (IS_ERR(sk))
+		return NULL;
 	if (sk)
 		return sk;
 
-	return __inet_lookup(dev_net(skb_dst(skb)->dev), hashinfo, skb,
+	return __inet_lookup(net, hashinfo, skb,
 			     doff, iph->saddr, sport,
 			     iph->daddr, dport, inet_iif(skb), sdif,
 			     refcounted);
diff --git a/include/net/sock.h b/include/net/sock.h
index 2eb916d1ff64..320f00e69ae9 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2814,20 +2814,23 @@ sk_is_refcounted(struct sock *sk)
  * skb_steal_sock - steal a socket from an sk_buff
  * @skb: sk_buff to steal the socket from
  * @refcounted: is set to true if the socket is reference-counted
+ * @prefetched: is set to true if the socket was assigned from bpf
  */
 static inline struct sock *
-skb_steal_sock(struct sk_buff *skb, bool *refcounted)
+skb_steal_sock(struct sk_buff *skb, bool *refcounted, bool *prefetched)
 {
 	if (skb->sk) {
 		struct sock *sk = skb->sk;
 
 		*refcounted = true;
-		if (skb_sk_is_prefetched(skb))
+		*prefetched = skb_sk_is_prefetched(skb);
+		if (*prefetched)
 			*refcounted = sk_is_refcounted(sk);
 		skb->destructor = NULL;
 		skb->sk = NULL;
 		return sk;
 	}
+	*prefetched = false;
 	*refcounted = false;
 	return NULL;
 }
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 739c15906a65..7fc98f4b63e9 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -4198,9 +4198,6 @@ union bpf_attr {
  *		**-EOPNOTSUPP** if the operation is not supported, for example
  *		a call from outside of TC ingress.
  *
- *		**-ESOCKTNOSUPPORT** if the socket type is not supported
- *		(reuseport).
- *
  * long bpf_sk_assign(struct bpf_sk_lookup *ctx, struct bpf_sock *sk, u64 flags)
  *	Description
  *		Helper is overloaded depending on BPF program type. This
diff --git a/net/core/filter.c b/net/core/filter.c
index b5b51ef48c5f..7c37f4646c20 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -7351,8 +7351,6 @@ BPF_CALL_3(bpf_sk_assign, struct sk_buff *, skb, struct sock *, sk, u64, flags)
 		return -EOPNOTSUPP;
 	if (unlikely(dev_net(skb->dev) != sock_net(sk)))
 		return -ENETUNREACH;
-	if (unlikely(sk_fullsock(sk) && sk->sk_reuseport))
-		return -ESOCKTNOSUPPORT;
 	if (sk_unhashed(sk))
 		return -EOPNOTSUPP;
 	if (sk_is_refcounted(sk) &&
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 045eca6ed177..ec1a5f8a2eca 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2388,7 +2388,11 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	if (udp4_csum_init(skb, uh, proto))
 		goto csum_error;
 
-	sk = skb_steal_sock(skb, &refcounted);
+	sk = inet_steal_sock(net, skb, sizeof(struct udphdr), saddr, uh->source, daddr, uh->dest,
+			     &refcounted, udp_ehashfn);
+	if (IS_ERR(sk))
+		goto no_sk;
+
 	if (sk) {
 		struct dst_entry *dst = skb_dst(skb);
 		int ret;
@@ -2409,7 +2413,7 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	sk = __udp4_lib_lookup_skb(skb, uh->source, uh->dest, udptable);
 	if (sk)
 		return udp_unicast_rcv_skb(sk, skb, uh);
-
+no_sk:
 	if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb))
 		goto drop;
 	nf_reset_ct(skb);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 109f14b17a09..f6fc75edfa23 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -925,9 +925,9 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	enum skb_drop_reason reason = SKB_DROP_REASON_NOT_SPECIFIED;
 	const struct in6_addr *saddr, *daddr;
 	struct net *net = dev_net(skb->dev);
+	bool refcounted;
 	struct udphdr *uh;
 	struct sock *sk;
-	bool refcounted;
 	u32 ulen = 0;
 
 	if (!pskb_may_pull(skb, sizeof(struct udphdr)))
@@ -964,7 +964,11 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 		goto csum_error;
 
 	/* Check if the socket is already available, e.g. due to early demux */
-	sk = skb_steal_sock(skb, &refcounted);
+	sk = inet6_steal_sock(net, skb, sizeof(struct udphdr), saddr, uh->source, daddr, uh->dest,
+			      &refcounted, udp6_ehashfn);
+	if (IS_ERR(sk))
+		goto no_sk;
+
 	if (sk) {
 		struct dst_entry *dst = skb_dst(skb);
 		int ret;
@@ -998,7 +1002,7 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 			goto report_csum_error;
 		return udp6_unicast_rcv_skb(sk, skb, uh);
 	}
-
+no_sk:
 	reason = SKB_DROP_REASON_NO_SOCKET;
 
 	if (!uh->check)
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 739c15906a65..7fc98f4b63e9 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -4198,9 +4198,6 @@ union bpf_attr {
  *		**-EOPNOTSUPP** if the operation is not supported, for example
  *		a call from outside of TC ingress.
  *
- *		**-ESOCKTNOSUPPORT** if the socket type is not supported
- *		(reuseport).
- *
  * long bpf_sk_assign(struct bpf_sk_lookup *ctx, struct bpf_sock *sk, u64 flags)
  *	Description
  *		Helper is overloaded depending on BPF program type. This

-- 
2.41.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH bpf-next v6 8/8] selftests/bpf: Test that SO_REUSEPORT can be used with sk_assign helper
  2023-07-20 15:30 [PATCH bpf-next v6 0/8] Add SO_REUSEPORT support for TC bpf_sk_assign Lorenz Bauer
                   ` (6 preceding siblings ...)
  2023-07-20 15:30 ` [PATCH bpf-next v6 7/8] bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign Lorenz Bauer
@ 2023-07-20 15:30 ` Lorenz Bauer
  2023-07-25  0:42   ` Martin KaFai Lau
  2023-07-25 21:20 ` [PATCH bpf-next v6 0/8] Add SO_REUSEPORT support for TC bpf_sk_assign patchwork-bot+netdevbpf
  8 siblings, 1 reply; 19+ messages in thread
From: Lorenz Bauer @ 2023-07-20 15:30 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	David Ahern, Willem de Bruijn, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Joe Stringer, Mykola Lysenko, Shuah Khan,
	Kuniyuki Iwashima
  Cc: Hemanth Malla, netdev, linux-kernel, bpf, linux-kselftest,
	Lorenz Bauer, Joe Stringer

From: Daniel Borkmann <daniel@iogearbox.net>

We use two programs to check that the new reuseport logic is executed
appropriately.

The first is a TC clsact program which bpf_sk_assigns
the skb to a UDP or TCP socket created by user space. Since the test
communicates via lo we see both directions of packets in the eBPF.
Traffic ingressing to the reuseport socket is identified by looking
at the destination port. For TCP, we additionally need to make sure
that we only assign the initial SYN packets towards our listening
socket. The network stack then creates a request socket which
transitions to ESTABLISHED after the 3WHS.

The second is a reuseport program which shares the fact that
it has been executed with user space. This tells us that the delayed
lookup mechanism is working.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Co-developed-by: Lorenz Bauer <lmb@isovalent.com>
Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Cc: Joe Stringer <joe@cilium.io>
---
 tools/testing/selftests/bpf/network_helpers.c      |   3 +
 .../selftests/bpf/prog_tests/assign_reuse.c        | 197 +++++++++++++++++++++
 .../selftests/bpf/progs/test_assign_reuse.c        | 142 +++++++++++++++
 3 files changed, 342 insertions(+)

diff --git a/tools/testing/selftests/bpf/network_helpers.c b/tools/testing/selftests/bpf/network_helpers.c
index a105c0cd008a..8a33bcea97de 100644
--- a/tools/testing/selftests/bpf/network_helpers.c
+++ b/tools/testing/selftests/bpf/network_helpers.c
@@ -423,6 +423,9 @@ struct nstoken *open_netns(const char *name)
 
 void close_netns(struct nstoken *token)
 {
+	if (!token)
+		return;
+
 	ASSERT_OK(setns(token->orig_netns_fd, CLONE_NEWNET), "setns");
 	close(token->orig_netns_fd);
 	free(token);
diff --git a/tools/testing/selftests/bpf/prog_tests/assign_reuse.c b/tools/testing/selftests/bpf/prog_tests/assign_reuse.c
new file mode 100644
index 000000000000..622f123410f4
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/assign_reuse.c
@@ -0,0 +1,197 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Isovalent */
+#include <uapi/linux/if_link.h>
+#include <test_progs.h>
+
+#include <netinet/tcp.h>
+#include <netinet/udp.h>
+
+#include "network_helpers.h"
+#include "test_assign_reuse.skel.h"
+
+#define NS_TEST "assign_reuse"
+#define LOOPBACK 1
+#define PORT 4443
+
+static int attach_reuseport(int sock_fd, int prog_fd)
+{
+	return setsockopt(sock_fd, SOL_SOCKET, SO_ATTACH_REUSEPORT_EBPF,
+			  &prog_fd, sizeof(prog_fd));
+}
+
+static __u64 cookie(int fd)
+{
+	__u64 cookie = 0;
+	socklen_t cookie_len = sizeof(cookie);
+	int ret;
+
+	ret = getsockopt(fd, SOL_SOCKET, SO_COOKIE, &cookie, &cookie_len);
+	ASSERT_OK(ret, "cookie");
+	ASSERT_GT(cookie, 0, "cookie_invalid");
+
+	return cookie;
+}
+
+static int echo_test_udp(int fd_sv)
+{
+	struct sockaddr_storage addr = {};
+	socklen_t len = sizeof(addr);
+	char buff[1] = {};
+	int fd_cl = -1, ret;
+
+	fd_cl = connect_to_fd(fd_sv, 100);
+	ASSERT_GT(fd_cl, 0, "create_client");
+	ASSERT_EQ(getsockname(fd_cl, (void *)&addr, &len), 0, "getsockname");
+
+	ASSERT_EQ(send(fd_cl, buff, sizeof(buff), 0), 1, "send_client");
+
+	ret = recv(fd_sv, buff, sizeof(buff), 0);
+	if (ret < 0)
+		return errno;
+
+	ASSERT_EQ(ret, 1, "recv_server");
+	ASSERT_EQ(sendto(fd_sv, buff, sizeof(buff), 0, (void *)&addr, len), 1, "send_server");
+	ASSERT_EQ(recv(fd_cl, buff, sizeof(buff), 0), 1, "recv_client");
+	close(fd_cl);
+	return 0;
+}
+
+static int echo_test_tcp(int fd_sv)
+{
+	char buff[1] = {};
+	int fd_cl = -1, fd_sv_cl = -1;
+
+	fd_cl = connect_to_fd(fd_sv, 100);
+	if (fd_cl < 0)
+		return errno;
+
+	fd_sv_cl = accept(fd_sv, NULL, NULL);
+	ASSERT_GE(fd_sv_cl, 0, "accept_fd");
+
+	ASSERT_EQ(send(fd_cl, buff, sizeof(buff), 0), 1, "send_client");
+	ASSERT_EQ(recv(fd_sv_cl, buff, sizeof(buff), 0), 1, "recv_server");
+	ASSERT_EQ(send(fd_sv_cl, buff, sizeof(buff), 0), 1, "send_server");
+	ASSERT_EQ(recv(fd_cl, buff, sizeof(buff), 0), 1, "recv_client");
+	close(fd_sv_cl);
+	close(fd_cl);
+	return 0;
+}
+
+void run_assign_reuse(int family, int sotype, const char *ip, __u16 port)
+{
+	DECLARE_LIBBPF_OPTS(bpf_tc_hook, tc_hook,
+		.ifindex = LOOPBACK,
+		.attach_point = BPF_TC_INGRESS,
+	);
+	DECLARE_LIBBPF_OPTS(bpf_tc_opts, tc_opts,
+		.handle = 1,
+		.priority = 1,
+	);
+	bool hook_created = false, tc_attached = false;
+	int ret, fd_tc, fd_accept, fd_drop, fd_map;
+	int *fd_sv = NULL;
+	__u64 fd_val;
+	struct test_assign_reuse *skel;
+	const int zero = 0;
+
+	skel = test_assign_reuse__open();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		goto cleanup;
+
+	skel->rodata->dest_port = port;
+
+	ret = test_assign_reuse__load(skel);
+	if (!ASSERT_OK(ret, "skel_load"))
+		goto cleanup;
+
+	ASSERT_EQ(skel->bss->sk_cookie_seen, 0, "cookie_init");
+
+	fd_tc = bpf_program__fd(skel->progs.tc_main);
+	fd_accept = bpf_program__fd(skel->progs.reuse_accept);
+	fd_drop = bpf_program__fd(skel->progs.reuse_drop);
+	fd_map = bpf_map__fd(skel->maps.sk_map);
+
+	fd_sv = start_reuseport_server(family, sotype, ip, port, 100, 1);
+	if (!ASSERT_NEQ(fd_sv, NULL, "start_reuseport_server"))
+		goto cleanup;
+
+	ret = attach_reuseport(*fd_sv, fd_drop);
+	if (!ASSERT_OK(ret, "attach_reuseport"))
+		goto cleanup;
+
+	fd_val = *fd_sv;
+	ret = bpf_map_update_elem(fd_map, &zero, &fd_val, BPF_NOEXIST);
+	if (!ASSERT_OK(ret, "bpf_sk_map"))
+		goto cleanup;
+
+	ret = bpf_tc_hook_create(&tc_hook);
+	if (ret == 0)
+		hook_created = true;
+	ret = ret == -EEXIST ? 0 : ret;
+	if (!ASSERT_OK(ret, "bpf_tc_hook_create"))
+		goto cleanup;
+
+	tc_opts.prog_fd = fd_tc;
+	ret = bpf_tc_attach(&tc_hook, &tc_opts);
+	if (!ASSERT_OK(ret, "bpf_tc_attach"))
+		goto cleanup;
+	tc_attached = true;
+
+	if (sotype == SOCK_STREAM)
+		ASSERT_EQ(echo_test_tcp(*fd_sv), ECONNREFUSED, "drop_tcp");
+	else
+		ASSERT_EQ(echo_test_udp(*fd_sv), EAGAIN, "drop_udp");
+	ASSERT_EQ(skel->bss->reuseport_executed, 1, "program executed once");
+
+	skel->bss->sk_cookie_seen = 0;
+	skel->bss->reuseport_executed = 0;
+	ASSERT_OK(attach_reuseport(*fd_sv, fd_accept), "attach_reuseport(accept)");
+
+	if (sotype == SOCK_STREAM)
+		ASSERT_EQ(echo_test_tcp(*fd_sv), 0, "echo_tcp");
+	else
+		ASSERT_EQ(echo_test_udp(*fd_sv), 0, "echo_udp");
+
+	ASSERT_EQ(skel->bss->sk_cookie_seen, cookie(*fd_sv),
+		  "cookie_mismatch");
+	ASSERT_EQ(skel->bss->reuseport_executed, 1, "program executed once");
+cleanup:
+	if (tc_attached) {
+		tc_opts.flags = tc_opts.prog_fd = tc_opts.prog_id = 0;
+		ret = bpf_tc_detach(&tc_hook, &tc_opts);
+		ASSERT_OK(ret, "bpf_tc_detach");
+	}
+	if (hook_created) {
+		tc_hook.attach_point = BPF_TC_INGRESS | BPF_TC_EGRESS;
+		bpf_tc_hook_destroy(&tc_hook);
+	}
+	test_assign_reuse__destroy(skel);
+	free_fds(fd_sv, 1);
+}
+
+void test_assign_reuse(void)
+{
+	struct nstoken *tok = NULL;
+
+	SYS(out, "ip netns add %s", NS_TEST);
+	SYS(cleanup, "ip -net %s link set dev lo up", NS_TEST);
+
+	tok = open_netns(NS_TEST);
+	if (!ASSERT_OK_PTR(tok, "netns token"))
+		return;
+
+	if (test__start_subtest("tcpv4"))
+		run_assign_reuse(AF_INET, SOCK_STREAM, "127.0.0.1", PORT);
+	if (test__start_subtest("tcpv6"))
+		run_assign_reuse(AF_INET6, SOCK_STREAM, "::1", PORT);
+	if (test__start_subtest("udpv4"))
+		run_assign_reuse(AF_INET, SOCK_DGRAM, "127.0.0.1", PORT);
+	if (test__start_subtest("udpv6"))
+		run_assign_reuse(AF_INET6, SOCK_DGRAM, "::1", PORT);
+
+cleanup:
+	close_netns(tok);
+	SYS_NOFAIL("ip netns delete %s", NS_TEST);
+out:
+	return;
+}
diff --git a/tools/testing/selftests/bpf/progs/test_assign_reuse.c b/tools/testing/selftests/bpf/progs/test_assign_reuse.c
new file mode 100644
index 000000000000..4f2e2321ea06
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_assign_reuse.c
@@ -0,0 +1,142 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Isovalent */
+#include <stdbool.h>
+#include <linux/bpf.h>
+#include <linux/if_ether.h>
+#include <linux/in.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/tcp.h>
+#include <linux/udp.h>
+#include <bpf/bpf_endian.h>
+#include <bpf/bpf_helpers.h>
+#include <linux/pkt_cls.h>
+
+char LICENSE[] SEC("license") = "GPL";
+
+__u64 sk_cookie_seen;
+__u64 reuseport_executed;
+union {
+	struct tcphdr tcp;
+	struct udphdr udp;
+} headers;
+
+const volatile __u16 dest_port;
+
+struct {
+	__uint(type, BPF_MAP_TYPE_SOCKMAP);
+	__uint(max_entries, 1);
+	__type(key, __u32);
+	__type(value, __u64);
+} sk_map SEC(".maps");
+
+SEC("sk_reuseport")
+int reuse_accept(struct sk_reuseport_md *ctx)
+{
+	reuseport_executed++;
+
+	if (ctx->ip_protocol == IPPROTO_TCP) {
+		if (ctx->data + sizeof(headers.tcp) > ctx->data_end)
+			return SK_DROP;
+
+		if (__builtin_memcmp(&headers.tcp, ctx->data, sizeof(headers.tcp)) != 0)
+			return SK_DROP;
+	} else if (ctx->ip_protocol == IPPROTO_UDP) {
+		if (ctx->data + sizeof(headers.udp) > ctx->data_end)
+			return SK_DROP;
+
+		if (__builtin_memcmp(&headers.udp, ctx->data, sizeof(headers.udp)) != 0)
+			return SK_DROP;
+	} else {
+		return SK_DROP;
+	}
+
+	sk_cookie_seen = bpf_get_socket_cookie(ctx->sk);
+	return SK_PASS;
+}
+
+SEC("sk_reuseport")
+int reuse_drop(struct sk_reuseport_md *ctx)
+{
+	reuseport_executed++;
+	sk_cookie_seen = 0;
+	return SK_DROP;
+}
+
+static int
+assign_sk(struct __sk_buff *skb)
+{
+	int zero = 0, ret = 0;
+	struct bpf_sock *sk;
+
+	sk = bpf_map_lookup_elem(&sk_map, &zero);
+	if (!sk)
+		return TC_ACT_SHOT;
+	ret = bpf_sk_assign(skb, sk, 0);
+	bpf_sk_release(sk);
+	return ret ? TC_ACT_SHOT : TC_ACT_OK;
+}
+
+static bool
+maybe_assign_tcp(struct __sk_buff *skb, struct tcphdr *th)
+{
+	if (th + 1 > (void *)(long)(skb->data_end))
+		return TC_ACT_SHOT;
+
+	if (!th->syn || th->ack || th->dest != bpf_htons(dest_port))
+		return TC_ACT_OK;
+
+	__builtin_memcpy(&headers.tcp, th, sizeof(headers.tcp));
+	return assign_sk(skb);
+}
+
+static bool
+maybe_assign_udp(struct __sk_buff *skb, struct udphdr *uh)
+{
+	if (uh + 1 > (void *)(long)(skb->data_end))
+		return TC_ACT_SHOT;
+
+	if (uh->dest != bpf_htons(dest_port))
+		return TC_ACT_OK;
+
+	__builtin_memcpy(&headers.udp, uh, sizeof(headers.udp));
+	return assign_sk(skb);
+}
+
+SEC("tc")
+int tc_main(struct __sk_buff *skb)
+{
+	void *data_end = (void *)(long)skb->data_end;
+	void *data = (void *)(long)skb->data;
+	struct ethhdr *eth;
+
+	eth = (struct ethhdr *)(data);
+	if (eth + 1 > data_end)
+		return TC_ACT_SHOT;
+
+	if (eth->h_proto == bpf_htons(ETH_P_IP)) {
+		struct iphdr *iph = (struct iphdr *)(data + sizeof(*eth));
+
+		if (iph + 1 > data_end)
+			return TC_ACT_SHOT;
+
+		if (iph->protocol == IPPROTO_TCP)
+			return maybe_assign_tcp(skb, (struct tcphdr *)(iph + 1));
+		else if (iph->protocol == IPPROTO_UDP)
+			return maybe_assign_udp(skb, (struct udphdr *)(iph + 1));
+		else
+			return TC_ACT_SHOT;
+	} else {
+		struct ipv6hdr *ip6h = (struct ipv6hdr *)(data + sizeof(*eth));
+
+		if (ip6h + 1 > data_end)
+			return TC_ACT_SHOT;
+
+		if (ip6h->nexthdr == IPPROTO_TCP)
+			return maybe_assign_tcp(skb, (struct tcphdr *)(ip6h + 1));
+		else if (ip6h->nexthdr == IPPROTO_UDP)
+			return maybe_assign_udp(skb, (struct udphdr *)(ip6h + 1));
+		else
+			return TC_ACT_SHOT;
+	}
+}

-- 
2.41.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v6 2/8] bpf: reject unhashed sockets in bpf_sk_assign
  2023-07-20 15:30 ` [PATCH bpf-next v6 2/8] bpf: reject unhashed sockets in bpf_sk_assign Lorenz Bauer
@ 2023-07-20 21:16   ` Kuniyuki Iwashima
  2023-07-24  8:01     ` Lorenz Bauer
  0 siblings, 1 reply; 19+ messages in thread
From: Kuniyuki Iwashima @ 2023-07-20 21:16 UTC (permalink / raw)
  To: lmb
  Cc: andrii, ast, bpf, daniel, davem, dsahern, edumazet, haoluo,
	hemanthmalla, joe, joe, john.fastabend, jolsa, kpsingh, kuba,
	kuniyu, linux-kernel, linux-kselftest, martin.lau, mykolal,
	netdev, pabeni, sdf, shuah, song, willemdebruijn.kernel, yhs

From: Lorenz Bauer <lmb@isovalent.com>
Date: Thu, 20 Jul 2023 17:30:06 +0200
> The semantics for bpf_sk_assign are as follows:
> 
>     sk = some_lookup_func()
>     bpf_sk_assign(skb, sk)
>     bpf_sk_release(sk)
> 
> That is, the sk is not consumed by bpf_sk_assign. The function
> therefore needs to make sure that sk lives long enough to be
> consumed from __inet_lookup_skb. The path through the stack for a
> TCPv4 packet is roughly:
> 
>   netif_receive_skb_core: takes RCU read lock
>     __netif_receive_skb_core:
>       sch_handle_ingress:
>         tcf_classify:
>           bpf_sk_assign()
>       deliver_ptype_list_skb:
>         deliver_skb:
>           ip_packet_type->func == ip_rcv:
>             ip_rcv_core:
>             ip_rcv_finish_core:
>               dst_input:
>                 ip_local_deliver:
>                   ip_local_deliver_finish:
>                     ip_protocol_deliver_rcu:
>                       tcp_v4_rcv:
>                         __inet_lookup_skb:
>                           skb_steal_sock
> 
> The existing helper takes advantage of the fact that everything
> happens in the same RCU critical section: for sockets with
> SOCK_RCU_FREE set bpf_sk_assign never takes a reference.
> skb_steal_sock then checks SOCK_RCU_FREE again and does sock_put
> if necessary.
> 
> This approach assumes that SOCK_RCU_FREE is never set on a sk
> between bpf_sk_assign and skb_steal_sock, but this invariant is
> violated by unhashed UDP sockets. A new UDP socket is created
> in TCP_CLOSE state but without SOCK_RCU_FREE set. That flag is only
> added in udp_lib_get_port() which happens when a socket is bound.
> 
> When bpf_sk_assign was added it wasn't possible to access unhashed
> UDP sockets from BPF, so this wasn't a problem. This changed
> in commit 0c48eefae712 ("sock_map: Lift socket state restriction
> for datagram sockets"), but the helper wasn't adjusted accordingly.
> The following sequence of events will therefore lead to a refcount
> leak:
> 
> 1. Add socket(AF_INET, SOCK_DGRAM) to a sockmap.
> 2. Pull socket out of sockmap and bpf_sk_assign it. Since
>    SOCK_RCU_FREE is not set we increment the refcount.
> 3. bind() or connect() the socket, setting SOCK_RCU_FREE.
> 4. skb_steal_sock will now set refcounted = false due to
>    SOCK_RCU_FREE.
> 5. tcp_v4_rcv() skips sock_put().
> 
> Fix the problem by rejecting unhashed sockets in bpf_sk_assign().
> This matches the behaviour of __inet_lookup_skb which is ultimately
> the goal of bpf_sk_assign().
> 
> Fixes: cf7fbe660f2d ("bpf: Add socket assign support")

Should this be 0c48eefae712 then ?


> Cc: Joe Stringer <joe@cilium.io>
> Signed-off-by: Lorenz Bauer <lmb@isovalent.com>

The change itself looks good.

Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>

Thanks!


> ---
>  net/core/filter.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 797e8f039696..b5b51ef48c5f 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -7353,6 +7353,8 @@ BPF_CALL_3(bpf_sk_assign, struct sk_buff *, skb, struct sock *, sk, u64, flags)
>  		return -ENETUNREACH;
>  	if (unlikely(sk_fullsock(sk) && sk->sk_reuseport))
>  		return -ESOCKTNOSUPPORT;
> +	if (sk_unhashed(sk))
> +		return -EOPNOTSUPP;
>  	if (sk_is_refcounted(sk) &&
>  	    unlikely(!refcount_inc_not_zero(&sk->sk_refcnt)))
>  		return -ENOENT;
> 
> -- 
> 2.41.0

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v6 7/8] bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign
  2023-07-20 15:30 ` [PATCH bpf-next v6 7/8] bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign Lorenz Bauer
@ 2023-07-20 21:34   ` Kuniyuki Iwashima
  2023-08-08  4:22   ` Kumar Kartikeya Dwivedi
  1 sibling, 0 replies; 19+ messages in thread
From: Kuniyuki Iwashima @ 2023-07-20 21:34 UTC (permalink / raw)
  To: lmb
  Cc: andrii, ast, bpf, daniel, davem, dsahern, edumazet, haoluo,
	hemanthmalla, joe, joe, john.fastabend, jolsa, kpsingh, kuba,
	kuniyu, linux-kernel, linux-kselftest, martin.lau, mykolal,
	netdev, pabeni, sdf, shuah, song, willemdebruijn.kernel, yhs

From: Lorenz Bauer <lmb@isovalent.com>
Date: Thu, 20 Jul 2023 17:30:11 +0200
> Currently the bpf_sk_assign helper in tc BPF context refuses SO_REUSEPORT
> sockets. This means we can't use the helper to steer traffic to Envoy,
> which configures SO_REUSEPORT on its sockets. In turn, we're blocked
> from removing TPROXY from our setup.
> 
> The reason that bpf_sk_assign refuses such sockets is that the
> bpf_sk_lookup helpers don't execute SK_REUSEPORT programs. Instead,
> one of the reuseport sockets is selected by hash. This could cause
> dispatch to the "wrong" socket:
> 
>     sk = bpf_sk_lookup_tcp(...) // select SO_REUSEPORT by hash
>     bpf_sk_assign(skb, sk) // SK_REUSEPORT wasn't executed
> 
> Fixing this isn't as simple as invoking SK_REUSEPORT from the lookup
> helpers unfortunately. In the tc context, L2 headers are at the start
> of the skb, while SK_REUSEPORT expects L3 headers instead.
> 
> Instead, we execute the SK_REUSEPORT program when the assigned socket
> is pulled out of the skb, further up the stack. This creates some
> trickiness with regards to refcounting as bpf_sk_assign will put both
> refcounted and RCU freed sockets in skb->sk. reuseport sockets are RCU
> freed. We can infer that the sk_assigned socket is RCU freed if the
> reuseport lookup succeeds, but convincing yourself of this fact isn't
> straight forward. Therefore we defensively check refcounting on the
> sk_assign sock even though it's probably not required in practice.
> 
> Fixes: 8e368dc72e86 ("bpf: Fix use of sk->sk_reuseport from sk_assign")
> Fixes: cf7fbe660f2d ("bpf: Add socket assign support")
> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Joe Stringer <joe@cilium.io>
> Link: https://lore.kernel.org/bpf/CACAyw98+qycmpQzKupquhkxbvWK4OFyDuuLMBNROnfWMZxUWeA@mail.gmail.com/
> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
> ---
>  include/net/inet6_hashtables.h | 56 ++++++++++++++++++++++++++++++++++++++----
>  include/net/inet_hashtables.h  | 49 ++++++++++++++++++++++++++++++++++--
>  include/net/sock.h             |  7 ++++--
>  include/uapi/linux/bpf.h       |  3 ---
>  net/core/filter.c              |  2 --
>  net/ipv4/udp.c                 |  8 ++++--
>  net/ipv6/udp.c                 | 10 +++++---
>  tools/include/uapi/linux/bpf.h |  3 ---
>  8 files changed, 116 insertions(+), 22 deletions(-)
> 
> diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h
> index a6722d6ef80f..284b5ce7205d 100644
> --- a/include/net/inet6_hashtables.h
> +++ b/include/net/inet6_hashtables.h
> @@ -103,6 +103,46 @@ static inline struct sock *__inet6_lookup(struct net *net,
>  				     daddr, hnum, dif, sdif);
>  }
>  
> +static inline
> +struct sock *inet6_steal_sock(struct net *net, struct sk_buff *skb, int doff,
> +			      const struct in6_addr *saddr, const __be16 sport,
> +			      const struct in6_addr *daddr, const __be16 dport,
> +			      bool *refcounted, inet6_ehashfn_t *ehashfn)
> +{
> +	struct sock *sk, *reuse_sk;
> +	bool prefetched;
> +
> +	sk = skb_steal_sock(skb, refcounted, &prefetched);
> +	if (!sk)
> +		return NULL;
> +
> +	if (!prefetched)
> +		return sk;
> +
> +	if (sk->sk_protocol == IPPROTO_TCP) {
> +		if (sk->sk_state != TCP_LISTEN)
> +			return sk;
> +	} else if (sk->sk_protocol == IPPROTO_UDP) {
> +		if (sk->sk_state != TCP_CLOSE)
> +			return sk;
> +	} else {
> +		return sk;
> +	}
> +
> +	reuse_sk = inet6_lookup_reuseport(net, sk, skb, doff,
> +					  saddr, sport, daddr, ntohs(dport),
> +					  ehashfn);
> +	if (!reuse_sk)
> +		return sk;
> +
> +	/* We've chosen a new reuseport sock which is never refcounted. This
> +	 * implies that sk also isn't refcounted.
> +	 */
> +	WARN_ON_ONCE(*refcounted);
> +
> +	return reuse_sk;
> +}
> +
>  static inline struct sock *__inet6_lookup_skb(struct inet_hashinfo *hashinfo,
>  					      struct sk_buff *skb, int doff,
>  					      const __be16 sport,
> @@ -110,14 +150,20 @@ static inline struct sock *__inet6_lookup_skb(struct inet_hashinfo *hashinfo,
>  					      int iif, int sdif,
>  					      bool *refcounted)
>  {
> -	struct sock *sk = skb_steal_sock(skb, refcounted);
> -
> +	struct net *net = dev_net(skb_dst(skb)->dev);
> +	const struct ipv6hdr *ip6h = ipv6_hdr(skb);
> +	struct sock *sk;
> +
> +	sk = inet6_steal_sock(net, skb, doff, &ip6h->saddr, sport, &ip6h->daddr, dport,
> +			      refcounted, inet6_ehashfn);
> +	if (IS_ERR(sk))
> +		return NULL;
>  	if (sk)
>  		return sk;
>  
> -	return __inet6_lookup(dev_net(skb_dst(skb)->dev), hashinfo, skb,
> -			      doff, &ipv6_hdr(skb)->saddr, sport,
> -			      &ipv6_hdr(skb)->daddr, ntohs(dport),
> +	return __inet6_lookup(net, hashinfo, skb,
> +			      doff, &ip6h->saddr, sport,
> +			      &ip6h->daddr, ntohs(dport),
>  			      iif, sdif, refcounted);
>  }
>  
> diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
> index c0532cc7587f..1177effabed3 100644
> --- a/include/net/inet_hashtables.h
> +++ b/include/net/inet_hashtables.h
> @@ -449,6 +449,46 @@ static inline struct sock *inet_lookup(struct net *net,
>  	return sk;
>  }
>  
> +static inline
> +struct sock *inet_steal_sock(struct net *net, struct sk_buff *skb, int doff,
> +			     const __be32 saddr, const __be16 sport,
> +			     const __be32 daddr, const __be16 dport,
> +			     bool *refcounted, inet_ehashfn_t *ehashfn)
> +{
> +	struct sock *sk, *reuse_sk;
> +	bool prefetched;
> +
> +	sk = skb_steal_sock(skb, refcounted, &prefetched);
> +	if (!sk)
> +		return NULL;
> +
> +	if (!prefetched)
> +		return sk;
> +
> +	if (sk->sk_protocol == IPPROTO_TCP) {
> +		if (sk->sk_state != TCP_LISTEN)
> +			return sk;
> +	} else if (sk->sk_protocol == IPPROTO_UDP) {
> +		if (sk->sk_state != TCP_CLOSE)
> +			return sk;
> +	} else {
> +		return sk;
> +	}
> +
> +	reuse_sk = inet_lookup_reuseport(net, sk, skb, doff,
> +					 saddr, sport, daddr, ntohs(dport),
> +					 ehashfn);
> +	if (!reuse_sk)
> +		return sk;
> +
> +	/* We've chosen a new reuseport sock which is never refcounted. This
> +	 * implies that sk also isn't refcounted.
> +	 */
> +	WARN_ON_ONCE(*refcounted);
> +
> +	return reuse_sk;
> +}
> +
>  static inline struct sock *__inet_lookup_skb(struct inet_hashinfo *hashinfo,
>  					     struct sk_buff *skb,
>  					     int doff,
> @@ -457,13 +497,18 @@ static inline struct sock *__inet_lookup_skb(struct inet_hashinfo *hashinfo,
>  					     const int sdif,
>  					     bool *refcounted)
>  {
> -	struct sock *sk = skb_steal_sock(skb, refcounted);
> +	struct net *net = dev_net(skb_dst(skb)->dev);
>  	const struct iphdr *iph = ip_hdr(skb);
> +	struct sock *sk;
>  
> +	sk = inet_steal_sock(net, skb, doff, iph->saddr, sport, iph->daddr, dport,
> +			     refcounted, inet_ehashfn);
> +	if (IS_ERR(sk))
> +		return NULL;
>  	if (sk)
>  		return sk;
>  
> -	return __inet_lookup(dev_net(skb_dst(skb)->dev), hashinfo, skb,
> +	return __inet_lookup(net, hashinfo, skb,
>  			     doff, iph->saddr, sport,
>  			     iph->daddr, dport, inet_iif(skb), sdif,
>  			     refcounted);
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 2eb916d1ff64..320f00e69ae9 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -2814,20 +2814,23 @@ sk_is_refcounted(struct sock *sk)
>   * skb_steal_sock - steal a socket from an sk_buff
>   * @skb: sk_buff to steal the socket from
>   * @refcounted: is set to true if the socket is reference-counted
> + * @prefetched: is set to true if the socket was assigned from bpf
>   */
>  static inline struct sock *
> -skb_steal_sock(struct sk_buff *skb, bool *refcounted)
> +skb_steal_sock(struct sk_buff *skb, bool *refcounted, bool *prefetched)
>  {
>  	if (skb->sk) {
>  		struct sock *sk = skb->sk;
>  
>  		*refcounted = true;
> -		if (skb_sk_is_prefetched(skb))
> +		*prefetched = skb_sk_is_prefetched(skb);
> +		if (*prefetched)
>  			*refcounted = sk_is_refcounted(sk);
>  		skb->destructor = NULL;
>  		skb->sk = NULL;
>  		return sk;
>  	}
> +	*prefetched = false;
>  	*refcounted = false;
>  	return NULL;
>  }
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 739c15906a65..7fc98f4b63e9 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -4198,9 +4198,6 @@ union bpf_attr {
>   *		**-EOPNOTSUPP** if the operation is not supported, for example
>   *		a call from outside of TC ingress.
>   *
> - *		**-ESOCKTNOSUPPORT** if the socket type is not supported
> - *		(reuseport).
> - *
>   * long bpf_sk_assign(struct bpf_sk_lookup *ctx, struct bpf_sock *sk, u64 flags)
>   *	Description
>   *		Helper is overloaded depending on BPF program type. This
> diff --git a/net/core/filter.c b/net/core/filter.c
> index b5b51ef48c5f..7c37f4646c20 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -7351,8 +7351,6 @@ BPF_CALL_3(bpf_sk_assign, struct sk_buff *, skb, struct sock *, sk, u64, flags)
>  		return -EOPNOTSUPP;
>  	if (unlikely(dev_net(skb->dev) != sock_net(sk)))
>  		return -ENETUNREACH;
> -	if (unlikely(sk_fullsock(sk) && sk->sk_reuseport))
> -		return -ESOCKTNOSUPPORT;
>  	if (sk_unhashed(sk))
>  		return -EOPNOTSUPP;
>  	if (sk_is_refcounted(sk) &&
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 045eca6ed177..ec1a5f8a2eca 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -2388,7 +2388,11 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
>  	if (udp4_csum_init(skb, uh, proto))
>  		goto csum_error;
>  
> -	sk = skb_steal_sock(skb, &refcounted);
> +	sk = inet_steal_sock(net, skb, sizeof(struct udphdr), saddr, uh->source, daddr, uh->dest,
> +			     &refcounted, udp_ehashfn);
> +	if (IS_ERR(sk))
> +		goto no_sk;
> +
>  	if (sk) {
>  		struct dst_entry *dst = skb_dst(skb);
>  		int ret;
> @@ -2409,7 +2413,7 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
>  	sk = __udp4_lib_lookup_skb(skb, uh->source, uh->dest, udptable);
>  	if (sk)
>  		return udp_unicast_rcv_skb(sk, skb, uh);
> -
> +no_sk:
>  	if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb))
>  		goto drop;
>  	nf_reset_ct(skb);
> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
> index 109f14b17a09..f6fc75edfa23 100644
> --- a/net/ipv6/udp.c
> +++ b/net/ipv6/udp.c
> @@ -925,9 +925,9 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
>  	enum skb_drop_reason reason = SKB_DROP_REASON_NOT_SPECIFIED;
>  	const struct in6_addr *saddr, *daddr;
>  	struct net *net = dev_net(skb->dev);
> +	bool refcounted;
>  	struct udphdr *uh;
>  	struct sock *sk;
> -	bool refcounted;
>  	u32 ulen = 0;
>  
>  	if (!pskb_may_pull(skb, sizeof(struct udphdr)))

This chunk is unnecessary.  If there is no other comments from anyone,
it would be good to drop this while merging.

Thanks!


> @@ -964,7 +964,11 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
>  		goto csum_error;
>  
>  	/* Check if the socket is already available, e.g. due to early demux */
> -	sk = skb_steal_sock(skb, &refcounted);
> +	sk = inet6_steal_sock(net, skb, sizeof(struct udphdr), saddr, uh->source, daddr, uh->dest,
> +			      &refcounted, udp6_ehashfn);
> +	if (IS_ERR(sk))
> +		goto no_sk;
> +
>  	if (sk) {
>  		struct dst_entry *dst = skb_dst(skb);
>  		int ret;
> @@ -998,7 +1002,7 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
>  			goto report_csum_error;
>  		return udp6_unicast_rcv_skb(sk, skb, uh);
>  	}
> -
> +no_sk:
>  	reason = SKB_DROP_REASON_NO_SOCKET;
>  
>  	if (!uh->check)
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 739c15906a65..7fc98f4b63e9 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -4198,9 +4198,6 @@ union bpf_attr {
>   *		**-EOPNOTSUPP** if the operation is not supported, for example
>   *		a call from outside of TC ingress.
>   *
> - *		**-ESOCKTNOSUPPORT** if the socket type is not supported
> - *		(reuseport).
> - *
>   * long bpf_sk_assign(struct bpf_sk_lookup *ctx, struct bpf_sock *sk, u64 flags)
>   *	Description
>   *		Helper is overloaded depending on BPF program type. This
> 
> -- 
> 2.41.0

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v6 2/8] bpf: reject unhashed sockets in bpf_sk_assign
  2023-07-20 21:16   ` Kuniyuki Iwashima
@ 2023-07-24  8:01     ` Lorenz Bauer
  0 siblings, 0 replies; 19+ messages in thread
From: Lorenz Bauer @ 2023-07-24  8:01 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: andrii, ast, bpf, daniel, davem, dsahern, edumazet, haoluo,
	hemanthmalla, joe, joe, john.fastabend, jolsa, kpsingh, kuba,
	linux-kernel, linux-kselftest, martin.lau, mykolal, netdev,
	pabeni, sdf, shuah, song, willemdebruijn.kernel, yhs

On Thu, Jul 20, 2023 at 11:17 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:

> > Fix the problem by rejecting unhashed sockets in bpf_sk_assign().
> > This matches the behaviour of __inet_lookup_skb which is ultimately
> > the goal of bpf_sk_assign().
> >
> > Fixes: cf7fbe660f2d ("bpf: Add socket assign support")
>
> Should this be 0c48eefae712 then ?

I think it makes sense to target it at the original helper add, since
we really should've done the unhashed check back then. Relying on
unhashed not being available is too subtle.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v6 4/8] net: remove duplicate reuseport_lookup functions
  2023-07-20 15:30 ` [PATCH bpf-next v6 4/8] net: remove duplicate reuseport_lookup functions Lorenz Bauer
@ 2023-07-24 22:55   ` Martin KaFai Lau
  2023-07-25  0:53   ` Martin KaFai Lau
  1 sibling, 0 replies; 19+ messages in thread
From: Martin KaFai Lau @ 2023-07-24 22:55 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Hemanth Malla, netdev, linux-kernel, bpf, linux-kselftest,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	David Ahern, Willem de Bruijn, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Joe Stringer, Mykola Lysenko, Shuah Khan, Kuniyuki Iwashima

On 7/20/23 8:30 AM, Lorenz Bauer wrote:
> @@ -452,7 +436,14 @@ static struct sock *udp4_lib_lookup2(struct net *net,
>   				      daddr, hnum, dif, sdif);
>   		if (score > badness) {
>   			badness = score;
> -			result = lookup_reuseport(net, sk, skb, saddr, sport, daddr, hnum);
> +
> +			if (sk->sk_state == TCP_ESTABLISHED) {
> +				result = sk;
> +				continue;
> +			}

Thanks for the cleanup. I also found moving the TCP_ESTABLISHED check here made 
the score logic easier to reason.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v6 8/8] selftests/bpf: Test that SO_REUSEPORT can be used with sk_assign helper
  2023-07-20 15:30 ` [PATCH bpf-next v6 8/8] selftests/bpf: Test that SO_REUSEPORT can be used with sk_assign helper Lorenz Bauer
@ 2023-07-25  0:42   ` Martin KaFai Lau
  0 siblings, 0 replies; 19+ messages in thread
From: Martin KaFai Lau @ 2023-07-25  0:42 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Hemanth Malla, netdev, linux-kernel, bpf, linux-kselftest,
	Joe Stringer, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, David Ahern, Willem de Bruijn, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Joe Stringer, Mykola Lysenko, Shuah Khan, Kuniyuki Iwashima

On 7/20/23 8:30 AM, Lorenz Bauer wrote:
> +static int echo_test_udp(int fd_sv)
> +{
> +	struct sockaddr_storage addr = {};
> +	socklen_t len = sizeof(addr);
> +	char buff[1] = {};
> +	int fd_cl = -1, ret;
> +
> +	fd_cl = connect_to_fd(fd_sv, 100);
> +	ASSERT_GT(fd_cl, 0, "create_client");
> +	ASSERT_EQ(getsockname(fd_cl, (void *)&addr, &len), 0, "getsockname");
> +
> +	ASSERT_EQ(send(fd_cl, buff, sizeof(buff), 0), 1, "send_client");
> +
> +	ret = recv(fd_sv, buff, sizeof(buff), 0);
> +	if (ret < 0)

I think this needs a close(fd_cl).

> +		return errno;


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v6 4/8] net: remove duplicate reuseport_lookup functions
  2023-07-20 15:30 ` [PATCH bpf-next v6 4/8] net: remove duplicate reuseport_lookup functions Lorenz Bauer
  2023-07-24 22:55   ` Martin KaFai Lau
@ 2023-07-25  0:53   ` Martin KaFai Lau
  2023-07-25 21:19     ` Martin KaFai Lau
  1 sibling, 1 reply; 19+ messages in thread
From: Martin KaFai Lau @ 2023-07-25  0:53 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Hemanth Malla, netdev, linux-kernel, bpf, linux-kselftest,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	David Ahern, Willem de Bruijn, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Joe Stringer, Mykola Lysenko, Shuah Khan, Kuniyuki Iwashima

On 7/20/23 8:30 AM, Lorenz Bauer wrote:
> diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h
> index 032ddab48f8f..f89320b6fee3 100644
> --- a/include/net/inet6_hashtables.h
> +++ b/include/net/inet6_hashtables.h
> @@ -48,12 +48,21 @@ struct sock *__inet6_lookup_established(struct net *net,
>   					const u16 hnum, const int dif,
>   					const int sdif);
>   
> +typedef u32 (inet6_ehashfn_t)(const struct net *net,
> +			       const struct in6_addr *laddr, const u16 lport,
> +			       const struct in6_addr *faddr, const __be16 fport);
> +
> +inet6_ehashfn_t inet6_ehashfn;
> +
> +INDIRECT_CALLABLE_DECLARE(inet6_ehashfn_t udp6_ehashfn);
> +

[ ... ]

> diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
> index b7c56867314e..3616225c89ef 100644
> --- a/net/ipv6/inet6_hashtables.c
> +++ b/net/ipv6/inet6_hashtables.c
> @@ -39,6 +39,7 @@ u32 inet6_ehashfn(const struct net *net,
>   	return __inet6_ehashfn(lhash, lport, fhash, fport,
>   			       inet6_ehash_secret + net_hash_mix(net));
>   }
> +EXPORT_SYMBOL_GPL(inet6_ehashfn);
>   
>   /*
>    * Sockets in TCP_CLOSE state are _always_ taken out of the hash, so
> @@ -111,18 +112,22 @@ static inline int compute_score(struct sock *sk, struct net *net,
>   	return score;
>   }
>   
> +INDIRECT_CALLABLE_DECLARE(inet6_ehashfn_t udp6_ehashfn);

The same INDIRECT_CALLABLE_DECLARE is also added to inet6_hashtables.h. Is this 
one still needed here?

The same goes for the inet_hashtables.c.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v6 4/8] net: remove duplicate reuseport_lookup functions
  2023-07-25  0:53   ` Martin KaFai Lau
@ 2023-07-25 21:19     ` Martin KaFai Lau
  0 siblings, 0 replies; 19+ messages in thread
From: Martin KaFai Lau @ 2023-07-25 21:19 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Hemanth Malla, netdev, linux-kernel, bpf, linux-kselftest,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	David Ahern, Willem de Bruijn, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Joe Stringer, Mykola Lysenko, Shuah Khan, Kuniyuki Iwashima

On 7/24/23 5:53 PM, Martin KaFai Lau wrote:
> On 7/20/23 8:30 AM, Lorenz Bauer wrote:
>> diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h
>> index 032ddab48f8f..f89320b6fee3 100644
>> --- a/include/net/inet6_hashtables.h
>> +++ b/include/net/inet6_hashtables.h
>> @@ -48,12 +48,21 @@ struct sock *__inet6_lookup_established(struct net *net,
>>                       const u16 hnum, const int dif,
>>                       const int sdif);
>> +typedef u32 (inet6_ehashfn_t)(const struct net *net,
>> +                   const struct in6_addr *laddr, const u16 lport,
>> +                   const struct in6_addr *faddr, const __be16 fport);
>> +
>> +inet6_ehashfn_t inet6_ehashfn;
>> +
>> +INDIRECT_CALLABLE_DECLARE(inet6_ehashfn_t udp6_ehashfn);
>> +
> 
> [ ... ]
> 
>> diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
>> index b7c56867314e..3616225c89ef 100644
>> --- a/net/ipv6/inet6_hashtables.c
>> +++ b/net/ipv6/inet6_hashtables.c
>> @@ -39,6 +39,7 @@ u32 inet6_ehashfn(const struct net *net,
>>       return __inet6_ehashfn(lhash, lport, fhash, fport,
>>                      inet6_ehash_secret + net_hash_mix(net));
>>   }
>> +EXPORT_SYMBOL_GPL(inet6_ehashfn);
>>   /*
>>    * Sockets in TCP_CLOSE state are _always_ taken out of the hash, so
>> @@ -111,18 +112,22 @@ static inline int compute_score(struct sock *sk, struct 
>> net *net,
>>       return score;
>>   }
>> +INDIRECT_CALLABLE_DECLARE(inet6_ehashfn_t udp6_ehashfn);
> 
> The same INDIRECT_CALLABLE_DECLARE is also added to inet6_hashtables.h. Is this 
> one still needed here?
> 
> The same goes for the inet_hashtables.c.

Please follow up if one of the INDIRECT_CALLABLE_DECLARE makes sense to remove.

I have applied the set after fixing up patch 7 and 8.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v6 0/8] Add SO_REUSEPORT support for TC bpf_sk_assign
  2023-07-20 15:30 [PATCH bpf-next v6 0/8] Add SO_REUSEPORT support for TC bpf_sk_assign Lorenz Bauer
                   ` (7 preceding siblings ...)
  2023-07-20 15:30 ` [PATCH bpf-next v6 8/8] selftests/bpf: Test that SO_REUSEPORT can be used with sk_assign helper Lorenz Bauer
@ 2023-07-25 21:20 ` patchwork-bot+netdevbpf
  8 siblings, 0 replies; 19+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-07-25 21:20 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, joe, mykolal, shuah, kuniyu,
	hemanthmalla, netdev, linux-kernel, bpf, linux-kselftest, joe

Hello:

This series was applied to bpf/bpf-next.git (master)
by Martin KaFai Lau <martin.lau@kernel.org>:

On Thu, 20 Jul 2023 17:30:04 +0200 you wrote:
> We want to replace iptables TPROXY with a BPF program at TC ingress.
> To make this work in all cases we need to assign a SO_REUSEPORT socket
> to an skb, which is currently prohibited. This series adds support for
> such sockets to bpf_sk_assing.
> 
> I did some refactoring to cut down on the amount of duplicate code. The
> key to this is to use INDIRECT_CALL in the reuseport helpers. To show
> that this approach is not just beneficial to TC sk_assign I removed
> duplicate code for bpf_sk_lookup as well.
> 
> [...]

Here is the summary with links:
  - [bpf-next,v6,1/8] udp: re-score reuseport groups when connected sockets are present
    https://git.kernel.org/bpf/bpf-next/c/f0ea27e7bfe1
  - [bpf-next,v6,2/8] bpf: reject unhashed sockets in bpf_sk_assign
    https://git.kernel.org/bpf/bpf-next/c/67312adc96b5
  - [bpf-next,v6,3/8] net: export inet_lookup_reuseport and inet6_lookup_reuseport
    https://git.kernel.org/bpf/bpf-next/c/ce796e60b3b1
  - [bpf-next,v6,4/8] net: remove duplicate reuseport_lookup functions
    https://git.kernel.org/bpf/bpf-next/c/0f495f761722
  - [bpf-next,v6,5/8] net: document inet[6]_lookup_reuseport sk_state requirements
    https://git.kernel.org/bpf/bpf-next/c/2a61776366bd
  - [bpf-next,v6,6/8] net: remove duplicate sk_lookup helpers
    https://git.kernel.org/bpf/bpf-next/c/6c886db2e78c
  - [bpf-next,v6,7/8] bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign
    (no matching commit)
  - [bpf-next,v6,8/8] selftests/bpf: Test that SO_REUSEPORT can be used with sk_assign helper
    https://git.kernel.org/bpf/bpf-next/c/22408d58a42c

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v6 7/8] bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign
  2023-07-20 15:30 ` [PATCH bpf-next v6 7/8] bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign Lorenz Bauer
  2023-07-20 21:34   ` Kuniyuki Iwashima
@ 2023-08-08  4:22   ` Kumar Kartikeya Dwivedi
  2023-08-08 16:35     ` Lorenz Bauer
  1 sibling, 1 reply; 19+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-08-08  4:22 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	David Ahern, Willem de Bruijn, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Joe Stringer, Mykola Lysenko, Shuah Khan,
	Kuniyuki Iwashima, Hemanth Malla, netdev, linux-kernel, bpf,
	linux-kselftest, Joe Stringer

On Thu, 20 Jul 2023 at 21:04, Lorenz Bauer <lmb@isovalent.com> wrote:
>
> Currently the bpf_sk_assign helper in tc BPF context refuses SO_REUSEPORT
> sockets. This means we can't use the helper to steer traffic to Envoy,
> which configures SO_REUSEPORT on its sockets. In turn, we're blocked
> from removing TPROXY from our setup.
>
> The reason that bpf_sk_assign refuses such sockets is that the
> bpf_sk_lookup helpers don't execute SK_REUSEPORT programs. Instead,
> one of the reuseport sockets is selected by hash. This could cause
> dispatch to the "wrong" socket:
>
>     sk = bpf_sk_lookup_tcp(...) // select SO_REUSEPORT by hash
>     bpf_sk_assign(skb, sk) // SK_REUSEPORT wasn't executed
>
> Fixing this isn't as simple as invoking SK_REUSEPORT from the lookup
> helpers unfortunately. In the tc context, L2 headers are at the start
> of the skb, while SK_REUSEPORT expects L3 headers instead.
>
> Instead, we execute the SK_REUSEPORT program when the assigned socket
> is pulled out of the skb, further up the stack. This creates some
> trickiness with regards to refcounting as bpf_sk_assign will put both
> refcounted and RCU freed sockets in skb->sk. reuseport sockets are RCU
> freed. We can infer that the sk_assigned socket is RCU freed if the
> reuseport lookup succeeds, but convincing yourself of this fact isn't
> straight forward. Therefore we defensively check refcounting on the
> sk_assign sock even though it's probably not required in practice.
>
> Fixes: 8e368dc72e86 ("bpf: Fix use of sk->sk_reuseport from sk_assign")
> Fixes: cf7fbe660f2d ("bpf: Add socket assign support")
> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Joe Stringer <joe@cilium.io>
> Link: https://lore.kernel.org/bpf/CACAyw98+qycmpQzKupquhkxbvWK4OFyDuuLMBNROnfWMZxUWeA@mail.gmail.com/
> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
> ---

Hi Lorenz, Kuniyuki, Martin,

I am getting a KASAN (inline mode) splat on bpf-next when I run
./test_progs -t btf_skc_cls_ingress, and I bisected it to this commit:
9c02bec95954 ("bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign")

bash-5.2# ./test_progs -t btf_skc_cls_ingress
[   51.611015] bpf_testmod: loading out-of-tree module taints kernel.
[   51.611434] bpf_testmod: module verification failed: signature
and/or required key missing - tainting kernel
[   51.809645] ==================================================================
[   51.810085] BUG: KASAN: slab-out-of-bounds in tcp_v6_rcv+0x2d7d/0x3440
[   51.810458] Read of size 2 at addr ffff8881053f038c by task test_progs/226
[   51.810848]
[   51.810955] CPU: 4 PID: 226 Comm: test_progs Tainted: G
OE      6.5.0-rc2-g9c02bec95954 #51
[   51.811467] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.16.2-2.fc39 04/01/2014
[   51.811968] Call Trace:
[   51.812118]  <IRQ>
[   51.812239]  dump_stack_lvl+0x4a/0x80
[   51.812452]  print_report+0xcf/0x670
[   51.812660]  ? __pfx___lock_acquire+0x10/0x10
[   51.812930]  kasan_report+0xda/0x110
[   51.813140]  ? tcp_v6_rcv+0x2d7d/0x3440
[   51.813363]  ? tcp_v6_rcv+0x2d7d/0x3440
[   51.813585]  tcp_v6_rcv+0x2d7d/0x3440
[   51.813808]  ? __pfx_tcp_v6_rcv+0x10/0x10
[   51.814051]  ? __pfx_raw6_local_deliver+0x10/0x10
[   51.814322]  ? lock_acquire+0x18e/0x4b0
[   51.814543]  ? ip6_input_finish+0xda/0x240
[   51.814783]  ip6_protocol_deliver_rcu+0x12a/0x1310
[   51.815067]  ? find_held_lock+0x2d/0x110
[   51.815293]  ip6_input_finish+0x118/0x240
[   51.815526]  ? __pfx_lock_release+0x10/0x10
[   51.815770]  ip6_input+0xc4/0x300
[   51.815972]  ? __pfx_ip6_input+0x10/0x10
[   51.816204]  ? ip6_rcv_core+0x996/0x1990
[   51.816431]  ipv6_rcv+0x3bf/0x780
[   51.816625]  ? __pfx_ipv6_rcv+0x10/0x10
[   51.816853]  ? lock_acquire+0x18e/0x4b0
[   51.817088]  ? process_backlog+0x1d8/0x620
[   51.817325]  ? __pfx_ipv6_rcv+0x10/0x10
[   51.817546]  __netif_receive_skb_one_core+0x11a/0x1b0
[   51.817837]  ? __pfx___netif_receive_skb_one_core+0x10/0x10
[   51.818166]  ? mark_held_locks+0x94/0xe0
[   51.818392]  process_backlog+0xd3/0x620
[   51.818614]  __napi_poll.constprop.0+0xa3/0x540
[   51.818890]  net_rx_action+0x3b4/0xa80
[   51.819113]  ? __pfx_net_rx_action+0x10/0x10
[   51.819365]  ? reacquire_held_locks+0x490/0x4b0
[   51.819631]  __do_softirq+0x202/0x83f
[   51.819852]  ? __pfx___do_softirq+0x10/0x10
[   51.820103]  ? tick_nohz_stop_idle+0x19e/0x280
[   51.820360]  ? __dev_queue_xmit+0x818/0x2d50
[   51.820608]  do_softirq+0x47/0xa0
[   51.820817]  </IRQ>
[   51.820954]  <TASK>
[   51.821080]  __local_bh_enable_ip+0xf2/0x120
[   51.821324]  __dev_queue_xmit+0x83c/0x2d50
[   51.821561]  ? __pfx_mark_lock+0x10/0x10
[   51.821795]  ? __pfx___dev_queue_xmit+0x10/0x10
[   51.822064]  ? lock_acquire+0x18e/0x4b0
[   51.822285]  ? find_held_lock+0x2d/0x110
[   51.822511]  ? lock_release+0x1de/0x620
[   51.822737]  ? ip6_finish_output+0x666/0x1250
[   51.822998]  ? __pfx_lock_release+0x10/0x10
[   51.823238]  ? mark_held_locks+0x94/0xe0
[   51.823467]  ip6_finish_output2+0xd8d/0x1bb0
[   51.823717]  ? ip6_mtu+0x13f/0x350
[   51.823928]  ? __pfx_ip6_finish_output2+0x10/0x10
[   51.824195]  ? find_held_lock+0x2d/0x110
[   51.824427]  ? lock_release+0x1de/0x620
[   51.824650]  ip6_finish_output+0x666/0x1250
[   51.824906]  ip6_output+0x1ed/0x6c0
[   51.825113]  ? __pfx_ip6_output+0x10/0x10
[   51.825343]  ? ip6_xmit+0xed5/0x1f40
[   51.825551]  ? __pfx_ip6_finish_output+0x10/0x10
[   51.825821]  ? nf_hook_slow+0xa9/0x180
[   51.826048]  ip6_xmit+0xbd2/0x1f40
[   51.826248]  ? lock_acquire+0x181/0x4b0
[   51.826470]  ? __pfx_ip6_xmit+0x10/0x10
[   51.826696]  ? __pfx_dst_output+0x10/0x10
[   51.826937]  ? __pfx_lock_acquire+0x10/0x10
[   51.827176]  ? release_sock+0x53/0x180
[   51.827393]  ? __pfx_inet6_csk_route_socket+0x10/0x10
[   51.827684]  ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   51.827993]  inet6_csk_xmit+0x31c/0x570
[   51.828215]  ? __pfx_inet6_csk_xmit+0x10/0x10
[   51.828467]  __tcp_transmit_skb+0x1708/0x3630
[   51.828732]  ? __pfx___tcp_transmit_skb+0x10/0x10
[   51.829017]  ? __alloc_skb+0x110/0x280
[   51.829238]  ? __pfx___alloc_skb+0x10/0x10
[   51.829474]  ? __tcp_send_ack.part.0+0x64/0x590
[   51.829743]  tcp_rcv_state_process+0xeba/0x4ff0
[   51.830015]  ? __pfx_tcp_rcv_state_process+0x10/0x10
[   51.830298]  ? __pfx_mark_lock+0x10/0x10
[   51.830524]  ? find_held_lock+0x2d/0x110
[   51.830756]  ? lock_release+0x1de/0x620
[   51.830988]  ? __release_sock+0xd8/0x2b0
[   51.831215]  ? tcp_v6_do_rcv+0x34e/0x13a0
[   51.831443]  tcp_v6_do_rcv+0x34e/0x13a0
[   51.831665]  ? __local_bh_enable_ip+0xa6/0x120
[   51.831935]  __release_sock+0x12c/0x2b0
[   51.832158]  release_sock+0x53/0x180
[   51.832365]  __inet_stream_connect+0x6a2/0x1100
[   51.832634]  ? __pfx___inet_stream_connect+0x10/0x10
[   51.832934]  ? __pfx_woken_wake_function+0x10/0x10
[   51.833209]  ? lockdep_hardirqs_on_prepare+0x27b/0x3f0
[   51.833501]  inet_stream_connect+0x57/0xa0
[   51.833743]  __sys_connect+0x106/0x130
[   51.833973]  ? __sys_setsockopt+0x19d/0x410
[   51.834211]  ? __pfx___sys_connect+0x10/0x10
[   51.834460]  ? __pfx_blkcg_maybe_throttle_current+0x10/0x10
[   51.834780]  __x64_sys_connect+0x72/0xb0
[   51.835019]  ? syscall_enter_from_user_mode+0x20/0x50
[   51.835312]  do_syscall_64+0x3c/0x90
[   51.835523]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   51.835818] RIP: 0033:0x7f99941965b4
[   51.836034] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84
00 00 00 00 00 90 f3 0f 1e fa 80 3d b5 8d 12 00 00 74 13 b8 2a 00 00
00 0f 05 <48> 3d 00 f0 ff ff 77 4c c3 0f 1f 00 55 48 89 e5 48 83 ec 10
89 55
[   51.837071] RSP: 002b:00007ffc9a4a3ea8 EFLAGS: 00000202 ORIG_RAX:
000000000000002a
[   51.837507] RAX: ffffffffffffffda RBX: 00007ffc9a4a42a8 RCX: 00007f99941965b4
[   51.837920] RDX: 000000000000001c RSI: 00007ffc9a4a3f10 RDI: 0000000000000008
[   51.838318] RBP: 00007ffc9a4a3ee0 R08: 0000000000000010 R09: 0000000000000000
[   51.838721] R10: 00007f999408a730 R11: 0000000000000202 R12: 0000000000000003
[   51.839126] R13: 0000000000000000 R14: 00007f999433c000 R15: 0000000000ce2d90
[   51.839527]  </TASK>
[   51.839658]
[   51.839759] The buggy address belongs to the object at ffff8881053f0330
[   51.839759]  which belongs to the cache request_sock_TCPv6 of size 344
[   51.840491] The buggy address is located 92 bytes inside of
[   51.840491]  allocated 344-byte region [ffff8881053f0330, ffff8881053f0488)
[   51.841199]
[   51.841298] The buggy address belongs to the physical page:
[   51.841663] page:00000000e92b0bb6 refcount:1 mapcount:0
mapping:0000000000000000 index:0x0 pfn:0x1053f0
[   51.842213] head:00000000e92b0bb6 order:1 entire_mapcount:0
nr_pages_mapped:0 pincount:0
[   51.842741] flags:
0x2fffe0000010200(slab|head|node=0|zone=2|lastcpupid=0x7fff)
[   51.843170] page_type: 0xffffffff()
[   51.843374] raw: 02fffe0000010200 ffff8881016463c0 dead000000000122
0000000000000000
[   51.843819] raw: 0000000000000000 0000000080140014 00000001ffffffff
0000000000000000
[   51.844263] page dumped because: kasan: bad access detected
[   51.844578]
[   51.844691] Memory state around the buggy address:
[   51.845125]  ffff8881053f0280: fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc
[   51.845599]  ffff8881053f0300: fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc
[   51.846069] >ffff8881053f0380: fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc
[   51.846476]
                         ^
[   51.846684]  ffff8881053f0400: fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc
[   51.847109]  ffff8881053f0480: fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc
[   51.847526] ==================================================================
[   51.847961] Disabling lock debugging due to kernel taint
#30/1    btf_skc_cls_ingress/conn:OK
#30/2    btf_skc_cls_ingress/syncookie:OK
#30      btf_skc_cls_ingress:OK
Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED

KASAN is pointing to this part of the commit:
0xffffffff8305164d is in tcp_v6_rcv (../include/net/inet6_hashtables.h:122).
117 return NULL;
118
119 if (!prefetched)
120         return sk;
121
122 if (sk->sk_protocol == IPPROTO_TCP) { // bad access
123         if (sk->sk_state != TCP_LISTEN)
124                 return sk;
125 } else if (sk->sk_protocol == IPPROTO_UDP) {

Let me know if you have trouble reproducing or need any other information.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v6 7/8] bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign
  2023-08-08  4:22   ` Kumar Kartikeya Dwivedi
@ 2023-08-08 16:35     ` Lorenz Bauer
  0 siblings, 0 replies; 19+ messages in thread
From: Lorenz Bauer @ 2023-08-08 16:35 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	David Ahern, Willem de Bruijn, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Joe Stringer, Mykola Lysenko, Shuah Khan,
	Kuniyuki Iwashima, Hemanth Malla, netdev, linux-kernel, bpf,
	linux-kselftest, Joe Stringer

On Tue, Aug 8, 2023 at 5:23 AM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>
> Hi Lorenz, Kuniyuki, Martin,
>
> I am getting a KASAN (inline mode) splat on bpf-next when I run
> ./test_progs -t btf_skc_cls_ingress, and I bisected it to this commit:
> 9c02bec95954 ("bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign")

Thanks for the report. I forgot about struct request_sock again...
I'll have a fix shortly, I hope.

Lorenz

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2023-08-08 19:57 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-20 15:30 [PATCH bpf-next v6 0/8] Add SO_REUSEPORT support for TC bpf_sk_assign Lorenz Bauer
2023-07-20 15:30 ` [PATCH bpf-next v6 1/8] udp: re-score reuseport groups when connected sockets are present Lorenz Bauer
2023-07-20 15:30 ` [PATCH bpf-next v6 2/8] bpf: reject unhashed sockets in bpf_sk_assign Lorenz Bauer
2023-07-20 21:16   ` Kuniyuki Iwashima
2023-07-24  8:01     ` Lorenz Bauer
2023-07-20 15:30 ` [PATCH bpf-next v6 3/8] net: export inet_lookup_reuseport and inet6_lookup_reuseport Lorenz Bauer
2023-07-20 15:30 ` [PATCH bpf-next v6 4/8] net: remove duplicate reuseport_lookup functions Lorenz Bauer
2023-07-24 22:55   ` Martin KaFai Lau
2023-07-25  0:53   ` Martin KaFai Lau
2023-07-25 21:19     ` Martin KaFai Lau
2023-07-20 15:30 ` [PATCH bpf-next v6 5/8] net: document inet[6]_lookup_reuseport sk_state requirements Lorenz Bauer
2023-07-20 15:30 ` [PATCH bpf-next v6 6/8] net: remove duplicate sk_lookup helpers Lorenz Bauer
2023-07-20 15:30 ` [PATCH bpf-next v6 7/8] bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign Lorenz Bauer
2023-07-20 21:34   ` Kuniyuki Iwashima
2023-08-08  4:22   ` Kumar Kartikeya Dwivedi
2023-08-08 16:35     ` Lorenz Bauer
2023-07-20 15:30 ` [PATCH bpf-next v6 8/8] selftests/bpf: Test that SO_REUSEPORT can be used with sk_assign helper Lorenz Bauer
2023-07-25  0:42   ` Martin KaFai Lau
2023-07-25 21:20 ` [PATCH bpf-next v6 0/8] Add SO_REUSEPORT support for TC bpf_sk_assign patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.