bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf v3 0/3] bpf, sockmap complete fixes for avail bytes
@ 2023-09-26  3:52 John Fastabend
  2023-09-26  3:52 ` [PATCH bpf v3 1/3] bpf: tcp_read_skb needs to pop skb regardless of seq John Fastabend
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: John Fastabend @ 2023-09-26  3:52 UTC (permalink / raw)
  To: daniel, ast, andrii, jakub; +Cc: john.fastabend, bpf, netdev, edumazet

With e5c6de5fa0258 ("bpf, sockmap: Incorrectly handling copied_seq") we
started fixing the available bytes accounting by moving copied_seq to
where the user actually reads the bytes.

However we missed handling MSG_PEEK correctly and we need to ensure
that we don't kfree_skb() a skb off the receive_queue when the
copied_seq number is not incremented by user reads for some time.

v2: drop seq var in tcp_read_skb its no longer necessary per Jakub's
    suggestion
v3: drop tcp_sock as well its also not used anymore. sorry for the extra
    noise there.

John Fastabend (3):
  bpf: tcp_read_skb needs to pop skb regardless of seq
  bpf: sockmap, do not inc copied_seq when PEEK flag set
  bpf: sockmap, add tests for MSG_F_PEEK

 net/ipv4/tcp.c                                | 10 +---
 net/ipv4/tcp_bpf.c                            |  4 +-
 .../selftests/bpf/prog_tests/sockmap_basic.c  | 52 +++++++++++++++++++
 3 files changed, 57 insertions(+), 9 deletions(-)

-- 
2.33.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH bpf v3 1/3] bpf: tcp_read_skb needs to pop skb regardless of seq
  2023-09-26  3:52 [PATCH bpf v3 0/3] bpf, sockmap complete fixes for avail bytes John Fastabend
@ 2023-09-26  3:52 ` John Fastabend
  2023-09-26  3:52 ` [PATCH bpf v3 2/3] bpf: sockmap, do not inc copied_seq when PEEK flag set John Fastabend
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: John Fastabend @ 2023-09-26  3:52 UTC (permalink / raw)
  To: daniel, ast, andrii, jakub; +Cc: john.fastabend, bpf, netdev, edumazet

Before fix e5c6de5fa0258 tcp_read_skb() would increment the tp->copied-seq
value. This (as described in the commit) would cause an error for apps
because once that is incremented the application might believe there is no
data to be read. Then some apps would stall or abort believing no data is
available.

However, the fix is incomplete because it introduces another issue in
the skb dequeue. The loop does tcp_recv_skb() in a while loop to consume
as many skbs as possible. The problem is the call is,

  tcp_recv_skb(sk, seq, &offset)

Where 'seq' is

  u32 seq = tp->copied_seq;

Now we can hit a case where we've yet incremented copied_seq from BPF side,
but then tcp_recv_skb() fails this test,

 if (offset < skb->len || (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN))

so that instead of returning the skb we call tcp_eat_recv_skb() which frees
the skb. This is because the routine believes the SKB has been collapsed
per comment,

 /* This looks weird, but this can happen if TCP collapsing
  * splitted a fat GRO packet, while we released socket lock
  * in skb_splice_bits()
  */

This can't happen here we've unlinked the full SKB and orphaned it. Anyways
it would confuse any BPF programs if the data were suddenly moved underneath
it.

To fix this situation do simpler operation and just skb_peek() the data
of the queue followed by the unlink. It shouldn't need to check this
condition and tcp_read_skb() reads entire skbs so there is no need to
handle the 'offset!=0' case as we would see in tcp_read_sock().

Fixes: e5c6de5fa0258 ("bpf, sockmap: Incorrectly handling copied_seq")
Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 net/ipv4/tcp.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 0c3040a63ebd..3f66cdeef7de 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1621,16 +1621,13 @@ EXPORT_SYMBOL(tcp_read_sock);
 
 int tcp_read_skb(struct sock *sk, skb_read_actor_t recv_actor)
 {
-	struct tcp_sock *tp = tcp_sk(sk);
-	u32 seq = tp->copied_seq;
 	struct sk_buff *skb;
 	int copied = 0;
-	u32 offset;
 
 	if (sk->sk_state == TCP_LISTEN)
 		return -ENOTCONN;
 
-	while ((skb = tcp_recv_skb(sk, seq, &offset)) != NULL) {
+	while ((skb = skb_peek(&sk->sk_receive_queue)) != NULL) {
 		u8 tcp_flags;
 		int used;
 
@@ -1643,13 +1640,10 @@ int tcp_read_skb(struct sock *sk, skb_read_actor_t recv_actor)
 				copied = used;
 			break;
 		}
-		seq += used;
 		copied += used;
 
-		if (tcp_flags & TCPHDR_FIN) {
-			++seq;
+		if (tcp_flags & TCPHDR_FIN)
 			break;
-		}
 	}
 	return copied;
 }
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH bpf v3 2/3] bpf: sockmap, do not inc copied_seq when PEEK flag set
  2023-09-26  3:52 [PATCH bpf v3 0/3] bpf, sockmap complete fixes for avail bytes John Fastabend
  2023-09-26  3:52 ` [PATCH bpf v3 1/3] bpf: tcp_read_skb needs to pop skb regardless of seq John Fastabend
@ 2023-09-26  3:52 ` John Fastabend
  2023-09-26  3:53 ` [PATCH bpf v3 3/3] bpf: sockmap, add tests for MSG_F_PEEK John Fastabend
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: John Fastabend @ 2023-09-26  3:52 UTC (permalink / raw)
  To: daniel, ast, andrii, jakub; +Cc: john.fastabend, bpf, netdev, edumazet

When data is peek'd off the receive queue we shouldn't considered it
copied from tcp_sock side. When we increment copied_seq this will confuse
tcp_data_ready() because copied_seq can be arbitrarily increased. From]
application side it results in poll() operations not waking up when
expected.

Notice tcp stack without BPF recvmsg programs also does not increment
copied_seq.

We broke this when we moved copied_seq into recvmsg to only update when
actual copy was happening. But, it wasn't working correctly either before
because the tcp_data_ready() tried to use the copied_seq value to see
if data was read by user yet. See fixes tags.

Fixes: e5c6de5fa0258 ("bpf, sockmap: Incorrectly handling copied_seq")
Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 net/ipv4/tcp_bpf.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index 81f0dff69e0b..327268203001 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -222,6 +222,7 @@ static int tcp_bpf_recvmsg_parser(struct sock *sk,
 				  int *addr_len)
 {
 	struct tcp_sock *tcp = tcp_sk(sk);
+	int peek = flags & MSG_PEEK;
 	u32 seq = tcp->copied_seq;
 	struct sk_psock *psock;
 	int copied = 0;
@@ -311,7 +312,8 @@ static int tcp_bpf_recvmsg_parser(struct sock *sk,
 		copied = -EAGAIN;
 	}
 out:
-	WRITE_ONCE(tcp->copied_seq, seq);
+	if (!peek)
+		WRITE_ONCE(tcp->copied_seq, seq);
 	tcp_rcv_space_adjust(sk);
 	if (copied > 0)
 		__tcp_cleanup_rbuf(sk, copied);
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH bpf v3 3/3] bpf: sockmap, add tests for MSG_F_PEEK
  2023-09-26  3:52 [PATCH bpf v3 0/3] bpf, sockmap complete fixes for avail bytes John Fastabend
  2023-09-26  3:52 ` [PATCH bpf v3 1/3] bpf: tcp_read_skb needs to pop skb regardless of seq John Fastabend
  2023-09-26  3:52 ` [PATCH bpf v3 2/3] bpf: sockmap, do not inc copied_seq when PEEK flag set John Fastabend
@ 2023-09-26  3:53 ` John Fastabend
  2023-09-29  8:02 ` [PATCH bpf v3 0/3] bpf, sockmap complete fixes for avail bytes Jakub Sitnicki
  2023-09-29 15:10 ` patchwork-bot+netdevbpf
  4 siblings, 0 replies; 6+ messages in thread
From: John Fastabend @ 2023-09-26  3:53 UTC (permalink / raw)
  To: daniel, ast, andrii, jakub; +Cc: john.fastabend, bpf, netdev, edumazet

Test that we can read with MSG_F_PEEK and then still get correct number
of available bytes through FIONREAD. The recv() (without PEEK) then
returns the bytes as expected. The recv() always worked though because
it was just the available byte reporting that was broke before latest
fixes.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 .../selftests/bpf/prog_tests/sockmap_basic.c  | 52 +++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
index 064cc5e8d9ad..e8eee2b8901e 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
@@ -475,6 +475,56 @@ static void test_sockmap_skb_verdict_fionread(bool pass_prog)
 		test_sockmap_drop_prog__destroy(drop);
 }
 
+
+static void test_sockmap_skb_verdict_peek(void)
+{
+	int err, map, verdict, s, c1, p1, zero = 0, sent, recvd, avail;
+	struct test_sockmap_pass_prog *pass;
+	char snd[256] = "0123456789";
+	char rcv[256] = "0";
+
+	pass = test_sockmap_pass_prog__open_and_load();
+	if (!ASSERT_OK_PTR(pass, "open_and_load"))
+		return;
+	verdict = bpf_program__fd(pass->progs.prog_skb_verdict);
+	map = bpf_map__fd(pass->maps.sock_map_rx);
+
+	err = bpf_prog_attach(verdict, map, BPF_SK_SKB_STREAM_VERDICT, 0);
+	if (!ASSERT_OK(err, "bpf_prog_attach"))
+		goto out;
+
+	s = socket_loopback(AF_INET, SOCK_STREAM);
+	if (!ASSERT_GT(s, -1, "socket_loopback(s)"))
+		goto out;
+
+	err = create_pair(s, AF_INET, SOCK_STREAM, &c1, &p1);
+	if (!ASSERT_OK(err, "create_pairs(s)"))
+		goto out;
+
+	err = bpf_map_update_elem(map, &zero, &c1, BPF_NOEXIST);
+	if (!ASSERT_OK(err, "bpf_map_update_elem(c1)"))
+		goto out_close;
+
+	sent = xsend(p1, snd, sizeof(snd), 0);
+	ASSERT_EQ(sent, sizeof(snd), "xsend(p1)");
+	recvd = recv(c1, rcv, sizeof(rcv), MSG_PEEK);
+	ASSERT_EQ(recvd, sizeof(rcv), "recv(c1)");
+	err = ioctl(c1, FIONREAD, &avail);
+	ASSERT_OK(err, "ioctl(FIONREAD) error");
+	ASSERT_EQ(avail, sizeof(snd), "after peek ioctl(FIONREAD)");
+	recvd = recv(c1, rcv, sizeof(rcv), 0);
+	ASSERT_EQ(recvd, sizeof(rcv), "recv(p0)");
+	err = ioctl(c1, FIONREAD, &avail);
+	ASSERT_OK(err, "ioctl(FIONREAD) error");
+	ASSERT_EQ(avail, 0, "after read ioctl(FIONREAD)");
+
+out_close:
+	close(c1);
+	close(p1);
+out:
+	test_sockmap_pass_prog__destroy(pass);
+}
+
 void test_sockmap_basic(void)
 {
 	if (test__start_subtest("sockmap create_update_free"))
@@ -515,4 +565,6 @@ void test_sockmap_basic(void)
 		test_sockmap_skb_verdict_fionread(true);
 	if (test__start_subtest("sockmap skb_verdict fionread on drop"))
 		test_sockmap_skb_verdict_fionread(false);
+	if (test__start_subtest("sockmap skb_verdict msg_f_peek"))
+		test_sockmap_skb_verdict_peek();
 }
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf v3 0/3] bpf, sockmap complete fixes for avail bytes
  2023-09-26  3:52 [PATCH bpf v3 0/3] bpf, sockmap complete fixes for avail bytes John Fastabend
                   ` (2 preceding siblings ...)
  2023-09-26  3:53 ` [PATCH bpf v3 3/3] bpf: sockmap, add tests for MSG_F_PEEK John Fastabend
@ 2023-09-29  8:02 ` Jakub Sitnicki
  2023-09-29 15:10 ` patchwork-bot+netdevbpf
  4 siblings, 0 replies; 6+ messages in thread
From: Jakub Sitnicki @ 2023-09-29  8:02 UTC (permalink / raw)
  To: John Fastabend; +Cc: daniel, ast, andrii, bpf, netdev, edumazet

On Mon, Sep 25, 2023 at 08:52 PM -07, John Fastabend wrote:
> With e5c6de5fa0258 ("bpf, sockmap: Incorrectly handling copied_seq") we
> started fixing the available bytes accounting by moving copied_seq to
> where the user actually reads the bytes.
>
> However we missed handling MSG_PEEK correctly and we need to ensure
> that we don't kfree_skb() a skb off the receive_queue when the
> copied_seq number is not incremented by user reads for some time.
>
> v2: drop seq var in tcp_read_skb its no longer necessary per Jakub's
>     suggestion

Credit goes to Simon Horman.

> v3: drop tcp_sock as well its also not used anymore. sorry for the extra
>     noise there.
>
> John Fastabend (3):
>   bpf: tcp_read_skb needs to pop skb regardless of seq
>   bpf: sockmap, do not inc copied_seq when PEEK flag set
>   bpf: sockmap, add tests for MSG_F_PEEK
>
>  net/ipv4/tcp.c                                | 10 +---
>  net/ipv4/tcp_bpf.c                            |  4 +-
>  .../selftests/bpf/prog_tests/sockmap_basic.c  | 52 +++++++++++++++++++
>  3 files changed, 57 insertions(+), 9 deletions(-)

For the series:

Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf v3 0/3] bpf, sockmap complete fixes for avail bytes
  2023-09-26  3:52 [PATCH bpf v3 0/3] bpf, sockmap complete fixes for avail bytes John Fastabend
                   ` (3 preceding siblings ...)
  2023-09-29  8:02 ` [PATCH bpf v3 0/3] bpf, sockmap complete fixes for avail bytes Jakub Sitnicki
@ 2023-09-29 15:10 ` patchwork-bot+netdevbpf
  4 siblings, 0 replies; 6+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-09-29 15:10 UTC (permalink / raw)
  To: John Fastabend; +Cc: daniel, ast, andrii, jakub, bpf, netdev, edumazet

Hello:

This series was applied to bpf/bpf.git (master)
by Daniel Borkmann <daniel@iogearbox.net>:

On Mon, 25 Sep 2023 20:52:57 -0700 you wrote:
> With e5c6de5fa0258 ("bpf, sockmap: Incorrectly handling copied_seq") we
> started fixing the available bytes accounting by moving copied_seq to
> where the user actually reads the bytes.
> 
> However we missed handling MSG_PEEK correctly and we need to ensure
> that we don't kfree_skb() a skb off the receive_queue when the
> copied_seq number is not incremented by user reads for some time.
> 
> [...]

Here is the summary with links:
  - [bpf,v3,1/3] bpf: tcp_read_skb needs to pop skb regardless of seq
    https://git.kernel.org/bpf/bpf/c/9b7177b1df64
  - [bpf,v3,2/3] bpf: sockmap, do not inc copied_seq when PEEK flag set
    https://git.kernel.org/bpf/bpf/c/da9e915eaf5d
  - [bpf,v3,3/3] bpf: sockmap, add tests for MSG_F_PEEK
    https://git.kernel.org/bpf/bpf/c/5f405c0c0c46

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-09-29 15:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-26  3:52 [PATCH bpf v3 0/3] bpf, sockmap complete fixes for avail bytes John Fastabend
2023-09-26  3:52 ` [PATCH bpf v3 1/3] bpf: tcp_read_skb needs to pop skb regardless of seq John Fastabend
2023-09-26  3:52 ` [PATCH bpf v3 2/3] bpf: sockmap, do not inc copied_seq when PEEK flag set John Fastabend
2023-09-26  3:53 ` [PATCH bpf v3 3/3] bpf: sockmap, add tests for MSG_F_PEEK John Fastabend
2023-09-29  8:02 ` [PATCH bpf v3 0/3] bpf, sockmap complete fixes for avail bytes Jakub Sitnicki
2023-09-29 15:10 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).