All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lorenz Bauer <lmb@isovalent.com>
To: "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	David Ahern <dsahern@kernel.org>,
	Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Song Liu <song@kernel.org>, Yonghong Song <yhs@fb.com>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>,
	Jiri Olsa <jolsa@kernel.org>, Joe Stringer <joe@wand.net.nz>,
	Mykola Lysenko <mykolal@fb.com>, Shuah Khan <shuah@kernel.org>,
	Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Hemanth Malla <hemanthmalla@gmail.com>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
	Lorenz Bauer <lmb@isovalent.com>, Joe Stringer <joe@cilium.io>
Subject: [PATCH bpf-next v6 2/8] bpf: reject unhashed sockets in bpf_sk_assign
Date: Thu, 20 Jul 2023 17:30:06 +0200	[thread overview]
Message-ID: <20230720-so-reuseport-v6-2-7021b683cdae@isovalent.com> (raw)
In-Reply-To: <20230720-so-reuseport-v6-0-7021b683cdae@isovalent.com>

The semantics for bpf_sk_assign are as follows:

    sk = some_lookup_func()
    bpf_sk_assign(skb, sk)
    bpf_sk_release(sk)

That is, the sk is not consumed by bpf_sk_assign. The function
therefore needs to make sure that sk lives long enough to be
consumed from __inet_lookup_skb. The path through the stack for a
TCPv4 packet is roughly:

  netif_receive_skb_core: takes RCU read lock
    __netif_receive_skb_core:
      sch_handle_ingress:
        tcf_classify:
          bpf_sk_assign()
      deliver_ptype_list_skb:
        deliver_skb:
          ip_packet_type->func == ip_rcv:
            ip_rcv_core:
            ip_rcv_finish_core:
              dst_input:
                ip_local_deliver:
                  ip_local_deliver_finish:
                    ip_protocol_deliver_rcu:
                      tcp_v4_rcv:
                        __inet_lookup_skb:
                          skb_steal_sock

The existing helper takes advantage of the fact that everything
happens in the same RCU critical section: for sockets with
SOCK_RCU_FREE set bpf_sk_assign never takes a reference.
skb_steal_sock then checks SOCK_RCU_FREE again and does sock_put
if necessary.

This approach assumes that SOCK_RCU_FREE is never set on a sk
between bpf_sk_assign and skb_steal_sock, but this invariant is
violated by unhashed UDP sockets. A new UDP socket is created
in TCP_CLOSE state but without SOCK_RCU_FREE set. That flag is only
added in udp_lib_get_port() which happens when a socket is bound.

When bpf_sk_assign was added it wasn't possible to access unhashed
UDP sockets from BPF, so this wasn't a problem. This changed
in commit 0c48eefae712 ("sock_map: Lift socket state restriction
for datagram sockets"), but the helper wasn't adjusted accordingly.
The following sequence of events will therefore lead to a refcount
leak:

1. Add socket(AF_INET, SOCK_DGRAM) to a sockmap.
2. Pull socket out of sockmap and bpf_sk_assign it. Since
   SOCK_RCU_FREE is not set we increment the refcount.
3. bind() or connect() the socket, setting SOCK_RCU_FREE.
4. skb_steal_sock will now set refcounted = false due to
   SOCK_RCU_FREE.
5. tcp_v4_rcv() skips sock_put().

Fix the problem by rejecting unhashed sockets in bpf_sk_assign().
This matches the behaviour of __inet_lookup_skb which is ultimately
the goal of bpf_sk_assign().

Fixes: cf7fbe660f2d ("bpf: Add socket assign support")
Cc: Joe Stringer <joe@cilium.io>
Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
---
 net/core/filter.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index 797e8f039696..b5b51ef48c5f 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -7353,6 +7353,8 @@ BPF_CALL_3(bpf_sk_assign, struct sk_buff *, skb, struct sock *, sk, u64, flags)
 		return -ENETUNREACH;
 	if (unlikely(sk_fullsock(sk) && sk->sk_reuseport))
 		return -ESOCKTNOSUPPORT;
+	if (sk_unhashed(sk))
+		return -EOPNOTSUPP;
 	if (sk_is_refcounted(sk) &&
 	    unlikely(!refcount_inc_not_zero(&sk->sk_refcnt)))
 		return -ENOENT;

-- 
2.41.0


  parent reply	other threads:[~2023-07-20 15:31 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-20 15:30 [PATCH bpf-next v6 0/8] Add SO_REUSEPORT support for TC bpf_sk_assign Lorenz Bauer
2023-07-20 15:30 ` [PATCH bpf-next v6 1/8] udp: re-score reuseport groups when connected sockets are present Lorenz Bauer
2023-07-20 15:30 ` Lorenz Bauer [this message]
2023-07-20 21:16   ` [PATCH bpf-next v6 2/8] bpf: reject unhashed sockets in bpf_sk_assign Kuniyuki Iwashima
2023-07-24  8:01     ` Lorenz Bauer
2023-07-20 15:30 ` [PATCH bpf-next v6 3/8] net: export inet_lookup_reuseport and inet6_lookup_reuseport Lorenz Bauer
2023-07-20 15:30 ` [PATCH bpf-next v6 4/8] net: remove duplicate reuseport_lookup functions Lorenz Bauer
2023-07-24 22:55   ` Martin KaFai Lau
2023-07-25  0:53   ` Martin KaFai Lau
2023-07-25 21:19     ` Martin KaFai Lau
2023-07-20 15:30 ` [PATCH bpf-next v6 5/8] net: document inet[6]_lookup_reuseport sk_state requirements Lorenz Bauer
2023-07-20 15:30 ` [PATCH bpf-next v6 6/8] net: remove duplicate sk_lookup helpers Lorenz Bauer
2023-07-20 15:30 ` [PATCH bpf-next v6 7/8] bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign Lorenz Bauer
2023-07-20 21:34   ` Kuniyuki Iwashima
2023-08-08  4:22   ` Kumar Kartikeya Dwivedi
2023-08-08 16:35     ` Lorenz Bauer
2023-07-20 15:30 ` [PATCH bpf-next v6 8/8] selftests/bpf: Test that SO_REUSEPORT can be used with sk_assign helper Lorenz Bauer
2023-07-25  0:42   ` Martin KaFai Lau
2023-07-25 21:20 ` [PATCH bpf-next v6 0/8] Add SO_REUSEPORT support for TC bpf_sk_assign patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230720-so-reuseport-v6-2-7021b683cdae@isovalent.com \
    --to=lmb@isovalent.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=haoluo@google.com \
    --cc=hemanthmalla@gmail.com \
    --cc=joe@cilium.io \
    --cc=joe@wand.net.nz \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=kuniyu@amazon.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=mykolal@fb.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sdf@google.com \
    --cc=shuah@kernel.org \
    --cc=song@kernel.org \
    --cc=willemdebruijn.kernel@gmail.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.