[PATCH bpf v2 2/5] bpf, sockmap: Remove unhash handler for BPF sockmap usage

From: John Fastabend <john.fastabend@gmail.com>
To: bpf@vger.kernel.org, netdev@vger.kernel.org
Cc: daniel@iogearbox.net, joamaki@gmail.com,
	xiyou.wangcong@gmail.com, jakub@cloudflare.com,
	john.fastabend@gmail.com
Subject: [PATCH bpf v2 2/5] bpf, sockmap: Remove unhash handler for BPF sockmap usage
Date: Wed,  3 Nov 2021 13:47:33 -0700	[thread overview]
Message-ID: <20211103204736.248403-3-john.fastabend@gmail.com> (raw)
In-Reply-To: <20211103204736.248403-1-john.fastabend@gmail.com>

We do not need to handle unhash from BPF side we can simply wait for the
close to happen. The original concern was a socket could transition from
ESTABLISHED state to a new state while the BPF hook was still attached.
But, we convinced ourself this is no longer possible and we also
improved BPF sockmap to handle listen sockets so this is no longer a
problem.

More importantly though there are cases where unhash is called when data is
in the receive queue. The BPF unhash logic will flush this data which is
wrong. To be correct it should keep the data in the receive queue and allow
a receiving application to continue reading the data. This may happen when
tcp_abort is received for example. Instead of complicating the logic in
unhash simply moving all this to tcp_close hook solves this.

Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path")
Tested-by: Jussi Maki <joamaki@gmail.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 net/ipv4/tcp_bpf.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index 5f4d6f45d87f..246f725b78c9 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -475,7 +475,6 @@ static void tcp_bpf_rebuild_protos(struct proto prot[TCP_BPF_NUM_CFGS],
 				   struct proto *base)
 {
 	prot[TCP_BPF_BASE]			= *base;
-	prot[TCP_BPF_BASE].unhash		= sock_map_unhash;
 	prot[TCP_BPF_BASE].close		= sock_map_close;
 	prot[TCP_BPF_BASE].recvmsg		= tcp_bpf_recvmsg;
 	prot[TCP_BPF_BASE].sock_is_readable	= sk_msg_is_readable;
-- 
2.33.0