bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Patch net v2 0/4] tcp: some bug fixes for tcp_read_skb()
@ 2022-08-08  3:31 Cong Wang
  2022-08-08  3:31 ` [Patch net v2 1/4] tcp: fix sock skb accounting in tcp_read_skb() Cong Wang
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Cong Wang @ 2022-08-08  3:31 UTC (permalink / raw)
  To: netdev; +Cc: bpf, Cong Wang

From: Cong Wang <cong.wang@bytedance.com>

This patchset contains 3 bug fixes and 1 minor refactor patch for
tcp_read_skb(). V1 only had the first patch, as Eric prefers to fix all
of them together, I have to group them together. Please see each patch
description for more details.

---

Cong Wang (4):
  tcp: fix sock skb accounting in tcp_read_skb()
  tcp: fix tcp_cleanup_rbuf() for tcp_read_skb()
  tcp: refactor tcp_read_skb() a bit
  tcp: handle pure FIN case correctly

 net/core/skmsg.c |  5 +++--
 net/ipv4/tcp.c   | 46 +++++++++++++++++++++-------------------------
 2 files changed, 24 insertions(+), 27 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Patch net v2 1/4] tcp: fix sock skb accounting in tcp_read_skb()
  2022-08-08  3:31 [Patch net v2 0/4] tcp: some bug fixes for tcp_read_skb() Cong Wang
@ 2022-08-08  3:31 ` Cong Wang
  2022-08-08  3:31 ` [Patch net v2 2/4] tcp: fix tcp_cleanup_rbuf() for tcp_read_skb() Cong Wang
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Cong Wang @ 2022-08-08  3:31 UTC (permalink / raw)
  To: netdev
  Cc: bpf, Cong Wang, syzbot+a0e6f8738b58f7654417, Stanislav Fomichev,
	Eric Dumazet, John Fastabend, Jakub Sitnicki

From: Cong Wang <cong.wang@bytedance.com>

Before commit 965b57b469a5 ("net: Introduce a new proto_ops
->read_skb()"), skb was not dequeued from receive queue hence
when we close TCP socket skb can be just flushed synchronously.

After this commit, we have to uncharge skb immediately after being
dequeued, otherwise it is still charged in the original sock. And we
still need to retain skb->sk, as eBPF programs may extract sock
information from skb->sk. Therefore, we have to call
skb_set_owner_sk_safe() here.

Fixes: 965b57b469a5 ("net: Introduce a new proto_ops ->read_skb()")
Reported-and-tested-by: syzbot+a0e6f8738b58f7654417@syzkaller.appspotmail.com
Tested-by: Stanislav Fomichev <sdf@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 net/ipv4/tcp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 970e9a2cca4a..05da5cac080b 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1760,6 +1760,7 @@ int tcp_read_skb(struct sock *sk, skb_read_actor_t recv_actor)
 		int used;
 
 		__skb_unlink(skb, &sk->sk_receive_queue);
+		WARN_ON(!skb_set_owner_sk_safe(skb, sk));
 		used = recv_actor(sk, skb);
 		if (used <= 0) {
 			if (!copied)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Patch net v2 2/4] tcp: fix tcp_cleanup_rbuf() for tcp_read_skb()
  2022-08-08  3:31 [Patch net v2 0/4] tcp: some bug fixes for tcp_read_skb() Cong Wang
  2022-08-08  3:31 ` [Patch net v2 1/4] tcp: fix sock skb accounting in tcp_read_skb() Cong Wang
@ 2022-08-08  3:31 ` Cong Wang
  2022-08-08  3:31 ` [Patch net v2 3/4] tcp: refactor tcp_read_skb() a bit Cong Wang
  2022-08-08  3:31 ` [Patch net v2 4/4] tcp: handle pure FIN case correctly Cong Wang
  3 siblings, 0 replies; 6+ messages in thread
From: Cong Wang @ 2022-08-08  3:31 UTC (permalink / raw)
  To: netdev; +Cc: bpf, Cong Wang, Eric Dumazet, John Fastabend, Jakub Sitnicki

From: Cong Wang <cong.wang@bytedance.com>

tcp_cleanup_rbuf() retrieves the skb from sk_receive_queue, it
assumes the skb is not yet dequeued. This is no longer true for
tcp_read_skb() case where we dequeue the skb first.

Fix this by introducing a helper __tcp_cleanup_rbuf() which does
not require any skb and calling it in tcp_read_skb().

Fixes: 04919bed948d ("tcp: Introduce tcp_read_skb()")
Cc: Eric Dumazet <edumazet@google.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 net/ipv4/tcp.c | 24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 05da5cac080b..181a0d350123 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1567,17 +1567,11 @@ static int tcp_peek_sndq(struct sock *sk, struct msghdr *msg, int len)
  * calculation of whether or not we must ACK for the sake of
  * a window update.
  */
-void tcp_cleanup_rbuf(struct sock *sk, int copied)
+static void __tcp_cleanup_rbuf(struct sock *sk, int copied)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	bool time_to_ack = false;
 
-	struct sk_buff *skb = skb_peek(&sk->sk_receive_queue);
-
-	WARN(skb && !before(tp->copied_seq, TCP_SKB_CB(skb)->end_seq),
-	     "cleanup rbuf bug: copied %X seq %X rcvnxt %X\n",
-	     tp->copied_seq, TCP_SKB_CB(skb)->end_seq, tp->rcv_nxt);
-
 	if (inet_csk_ack_scheduled(sk)) {
 		const struct inet_connection_sock *icsk = inet_csk(sk);
 
@@ -1623,6 +1617,17 @@ void tcp_cleanup_rbuf(struct sock *sk, int copied)
 		tcp_send_ack(sk);
 }
 
+void tcp_cleanup_rbuf(struct sock *sk, int copied)
+{
+	struct sk_buff *skb = skb_peek(&sk->sk_receive_queue);
+	struct tcp_sock *tp = tcp_sk(sk);
+
+	WARN(skb && !before(tp->copied_seq, TCP_SKB_CB(skb)->end_seq),
+	     "cleanup rbuf bug: copied %X seq %X rcvnxt %X\n",
+	     tp->copied_seq, TCP_SKB_CB(skb)->end_seq, tp->rcv_nxt);
+	__tcp_cleanup_rbuf(sk, copied);
+}
+
 static void tcp_eat_recv_skb(struct sock *sk, struct sk_buff *skb)
 {
 	__skb_unlink(skb, &sk->sk_receive_queue);
@@ -1771,20 +1776,19 @@ int tcp_read_skb(struct sock *sk, skb_read_actor_t recv_actor)
 		copied += used;
 
 		if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) {
-			consume_skb(skb);
 			++seq;
 			break;
 		}
-		consume_skb(skb);
 		break;
 	}
+	consume_skb(skb);
 	WRITE_ONCE(tp->copied_seq, seq);
 
 	tcp_rcv_space_adjust(sk);
 
 	/* Clean up data we have read: This will do ACK frames. */
 	if (copied > 0)
-		tcp_cleanup_rbuf(sk, copied);
+		__tcp_cleanup_rbuf(sk, copied);
 
 	return copied;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Patch net v2 3/4] tcp: refactor tcp_read_skb() a bit
  2022-08-08  3:31 [Patch net v2 0/4] tcp: some bug fixes for tcp_read_skb() Cong Wang
  2022-08-08  3:31 ` [Patch net v2 1/4] tcp: fix sock skb accounting in tcp_read_skb() Cong Wang
  2022-08-08  3:31 ` [Patch net v2 2/4] tcp: fix tcp_cleanup_rbuf() for tcp_read_skb() Cong Wang
@ 2022-08-08  3:31 ` Cong Wang
  2022-08-12 23:55   ` Jakub Kicinski
  2022-08-08  3:31 ` [Patch net v2 4/4] tcp: handle pure FIN case correctly Cong Wang
  3 siblings, 1 reply; 6+ messages in thread
From: Cong Wang @ 2022-08-08  3:31 UTC (permalink / raw)
  To: netdev; +Cc: bpf, Cong Wang, Eric Dumazet, John Fastabend, Jakub Sitnicki

From: Cong Wang <cong.wang@bytedance.com>

As tcp_read_skb() only reads one skb at a time, the while loop is
unnecessary, we can turn it into an if. This also simplifies the
code logic.

Cc: Eric Dumazet <edumazet@google.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 net/ipv4/tcp.c | 25 ++++++++-----------------
 1 file changed, 8 insertions(+), 17 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 181a0d350123..5212a7512269 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1761,27 +1761,18 @@ int tcp_read_skb(struct sock *sk, skb_read_actor_t recv_actor)
 	if (sk->sk_state == TCP_LISTEN)
 		return -ENOTCONN;
 
-	while ((skb = tcp_recv_skb(sk, seq, &offset)) != NULL) {
-		int used;
-
+	skb = tcp_recv_skb(sk, seq, &offset);
+	if (skb) {
 		__skb_unlink(skb, &sk->sk_receive_queue);
 		WARN_ON(!skb_set_owner_sk_safe(skb, sk));
-		used = recv_actor(sk, skb);
-		if (used <= 0) {
-			if (!copied)
-				copied = used;
-			break;
-		}
-		seq += used;
-		copied += used;
-
-		if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) {
-			++seq;
-			break;
+		copied = recv_actor(sk, skb);
+		if (copied > 0) {
+			seq += copied;
+			if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
+				++seq;
 		}
-		break;
+		consume_skb(skb);
 	}
-	consume_skb(skb);
 	WRITE_ONCE(tp->copied_seq, seq);
 
 	tcp_rcv_space_adjust(sk);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Patch net v2 4/4] tcp: handle pure FIN case correctly
  2022-08-08  3:31 [Patch net v2 0/4] tcp: some bug fixes for tcp_read_skb() Cong Wang
                   ` (2 preceding siblings ...)
  2022-08-08  3:31 ` [Patch net v2 3/4] tcp: refactor tcp_read_skb() a bit Cong Wang
@ 2022-08-08  3:31 ` Cong Wang
  3 siblings, 0 replies; 6+ messages in thread
From: Cong Wang @ 2022-08-08  3:31 UTC (permalink / raw)
  To: netdev; +Cc: bpf, Cong Wang, Eric Dumazet, John Fastabend, Jakub Sitnicki

From: Cong Wang <cong.wang@bytedance.com>

When skb->len==0, the recv_actor() returns 0 too, but we also use 0
for error conditions. This patch amends this by propagating the errors
to tcp_read_skb() so that we can distinguish skb->len==0 case from
error cases.

Fixes: 04919bed948d ("tcp: Introduce tcp_read_skb()")
Reported-by: Eric Dumazet <edumazet@google.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 net/core/skmsg.c | 5 +++--
 net/ipv4/tcp.c   | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 81627892bdd4..f0fa915cfe16 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -1193,8 +1193,9 @@ static int sk_psock_verdict_recv(struct sock *sk, struct sk_buff *skb)
 		ret = bpf_prog_run_pin_on_cpu(prog, skb);
 		ret = sk_psock_map_verd(ret, skb_bpf_redirect_fetch(skb));
 	}
-	if (sk_psock_verdict_apply(psock, skb, ret) < 0)
-		len = 0;
+	ret = sk_psock_verdict_apply(psock, skb, ret);
+	if (ret < 0)
+		len = ret;
 out:
 	rcu_read_unlock();
 	return len;
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 5212a7512269..5e99ecf5515f 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1766,7 +1766,7 @@ int tcp_read_skb(struct sock *sk, skb_read_actor_t recv_actor)
 		__skb_unlink(skb, &sk->sk_receive_queue);
 		WARN_ON(!skb_set_owner_sk_safe(skb, sk));
 		copied = recv_actor(sk, skb);
-		if (copied > 0) {
+		if (copied >= 0) {
 			seq += copied;
 			if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
 				++seq;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Patch net v2 3/4] tcp: refactor tcp_read_skb() a bit
  2022-08-08  3:31 ` [Patch net v2 3/4] tcp: refactor tcp_read_skb() a bit Cong Wang
@ 2022-08-12 23:55   ` Jakub Kicinski
  0 siblings, 0 replies; 6+ messages in thread
From: Jakub Kicinski @ 2022-08-12 23:55 UTC (permalink / raw)
  To: Cong Wang
  Cc: netdev, bpf, Cong Wang, Eric Dumazet, John Fastabend, Jakub Sitnicki

On Sun,  7 Aug 2022 20:31:05 -0700 Cong Wang wrote:
> From: Cong Wang <cong.wang@bytedance.com>
> 
> As tcp_read_skb() only reads one skb at a time, the while loop is
> unnecessary, we can turn it into an if. This also simplifies the
> code logic.

I think Eric is AFK so we should just apply these, they LGTM.
One minor nit below.

> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -1761,27 +1761,18 @@ int tcp_read_skb(struct sock *sk, skb_read_actor_t recv_actor)
>  	if (sk->sk_state == TCP_LISTEN)
>  		return -ENOTCONN;
>  
> -	while ((skb = tcp_recv_skb(sk, seq, &offset)) != NULL) {
> -		int used;
> -
> +	skb = tcp_recv_skb(sk, seq, &offset);
> +	if (skb) {

if (!skb)
	return 0;

?


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-08-12 23:55 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-08  3:31 [Patch net v2 0/4] tcp: some bug fixes for tcp_read_skb() Cong Wang
2022-08-08  3:31 ` [Patch net v2 1/4] tcp: fix sock skb accounting in tcp_read_skb() Cong Wang
2022-08-08  3:31 ` [Patch net v2 2/4] tcp: fix tcp_cleanup_rbuf() for tcp_read_skb() Cong Wang
2022-08-08  3:31 ` [Patch net v2 3/4] tcp: refactor tcp_read_skb() a bit Cong Wang
2022-08-12 23:55   ` Jakub Kicinski
2022-08-08  3:31 ` [Patch net v2 4/4] tcp: handle pure FIN case correctly Cong Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).