Re: [Patch bpf v2] skmsg: check sk_rcvbuf limit before queuing to ingress_skb

From: Jakub Sitnicki <jakub@cloudflare.com>
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: John Fastabend <john.fastabend@gmail.com>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	bpf <bpf@vger.kernel.org>, Cong Wang <cong.wang@bytedance.com>,
	Jiang Wang <jiang.wang@bytedance.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Lorenz Bauer <lmb@cloudflare.com>
Subject: Re: [Patch bpf v2] skmsg: check sk_rcvbuf limit before queuing to ingress_skb
Date: Mon, 05 Jul 2021 10:24:15 +0200	[thread overview]
Message-ID: <8735strwwg.fsf@cloudflare.com> (raw)
In-Reply-To: <CAM_iQpW69PGfp_Y8mZoqznwCk2axask5qJLB7ntZjFgGO+Eizg@mail.gmail.com>

On Sun, Jul 04, 2021 at 09:53 PM CEST, Cong Wang wrote:
> On Sat, Jul 3, 2021 at 10:52 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>> When running with just the verdict prog attached, the -EIO error from
>> sk_psock_verdict_apply is propagated up to tcp_read_sock. That is, it
>> maps to 0 bytes used by recv_actor. sk_psock_verdict_recv in this case.
>>
>> tcp_read_sock, if 0 bytes were used = copied, won't sk_eat_skb. It stays
>> on sk_receive_queue.
>
> Are you sure?
>
> When recv_actor() returns 0, the while loop breaks:
>
> 1661                         used = recv_actor(desc, skb, offset, len);
> 1662                         if (used <= 0) {
> 1663                                 if (!copied)
> 1664                                         copied = used;
> 1665                                 break;
>
> then it calls sk_eat_skb() a few lines after the loop:
> ...
> 1690                 sk_eat_skb(sk, skb);

This sk_eat_skb is still within the loop:

1636:int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
1637-		  sk_read_actor_t recv_actor)
1638-{
	…
1643-	int copied = 0;
        …
1647-	while ((skb = tcp_recv_skb(sk, seq, &offset)) != NULL) {
1648-		if (offset < skb->len) {
			…
1661-			used = recv_actor(desc, skb, offset, len);
1662-			if (used <= 0) {
1663-				if (!copied)
1664-					copied = used;
1665-				break;
1666-			} else if (used <= len) {
1667-				seq += used;
1668-				copied += used;
1669-				offset += used;
1670-			}
			…
1684-		}
		…
1690-		sk_eat_skb(sk, skb);
		…
1694-	}
	…
1699-	/* Clean up data we have read: This will do ACK frames. */
1700-	if (copied > 0) {
1701-		tcp_recv_skb(sk, seq, &offset);
1702-		tcp_cleanup_rbuf(sk, copied);
1703-	}
1704-	return copied;
1705-}

sk_eat_skb could get called by tcp_recv_skb → sk_eat_skb if recv_actor
returned > 0 (the case when we have parser attached).

>
>>
>>   sk->sk_data_ready
>>     sk_psock_verdict_data_ready
>>       ->read_sock(..., sk_psock_verdict_recv)
>>         tcp_read_sock (used = copied = 0)
>>           sk_psock_verdict_recv -> ret = 0
>>             sk_psock_verdict_apply -> -EIO
>>               sk_psock_skb_redirect -> -EIO
>>
>> However, I think this gets us stuck. What if no more data gets queued,
>> and sk_data_ready doesn't get called again?
>
> I think it is dropped by sk_eat_skb() in TCP case and of course the
> sender will retransmit it after detecting this loss. So from this point of
> view, there is no difference between drops due to overlimit and drops
> due to eBPF program policy.

I'm not sure the retransmit will happen.

We update tp->rcv_nxt (tcp_rcv_nxt_update) when skb gets pushed onto
sk_receive_queue in either:

 - tcp_rcv_established -> tcp_queue_rcv, or
 - tcp_rcv_established -> tcp_data_queue -> tcp_queue_rcv

... and schedule ACK (tcp_event_data_recv) to be sent.

Say we are in quickack mode, then
tcp_ack_snd_check()/__tcp_ack_snd_check() would cause ACK to be sent
out.