bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Sitnicki <jakub@cloudflare.com>
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: John Fastabend <john.fastabend@gmail.com>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	bpf <bpf@vger.kernel.org>, Cong Wang <cong.wang@bytedance.com>,
	Jiang Wang <jiang.wang@bytedance.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Lorenz Bauer <lmb@cloudflare.com>
Subject: Re: [Patch bpf v2] skmsg: check sk_rcvbuf limit before queuing to ingress_skb
Date: Mon, 05 Jul 2021 10:24:15 +0200	[thread overview]
Message-ID: <8735strwwg.fsf@cloudflare.com> (raw)
In-Reply-To: <CAM_iQpW69PGfp_Y8mZoqznwCk2axask5qJLB7ntZjFgGO+Eizg@mail.gmail.com>

On Sun, Jul 04, 2021 at 09:53 PM CEST, Cong Wang wrote:
> On Sat, Jul 3, 2021 at 10:52 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>> When running with just the verdict prog attached, the -EIO error from
>> sk_psock_verdict_apply is propagated up to tcp_read_sock. That is, it
>> maps to 0 bytes used by recv_actor. sk_psock_verdict_recv in this case.
>>
>> tcp_read_sock, if 0 bytes were used = copied, won't sk_eat_skb. It stays
>> on sk_receive_queue.
>
> Are you sure?
>
> When recv_actor() returns 0, the while loop breaks:
>
> 1661                         used = recv_actor(desc, skb, offset, len);
> 1662                         if (used <= 0) {
> 1663                                 if (!copied)
> 1664                                         copied = used;
> 1665                                 break;
>
> then it calls sk_eat_skb() a few lines after the loop:
> ...
> 1690                 sk_eat_skb(sk, skb);

This sk_eat_skb is still within the loop:

1636:int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
1637-		  sk_read_actor_t recv_actor)
1638-{
	…
1643-	int copied = 0;
        …
1647-	while ((skb = tcp_recv_skb(sk, seq, &offset)) != NULL) {
1648-		if (offset < skb->len) {
			…
1661-			used = recv_actor(desc, skb, offset, len);
1662-			if (used <= 0) {
1663-				if (!copied)
1664-					copied = used;
1665-				break;
1666-			} else if (used <= len) {
1667-				seq += used;
1668-				copied += used;
1669-				offset += used;
1670-			}
			…
1684-		}
		…
1690-		sk_eat_skb(sk, skb);
		…
1694-	}
	…
1699-	/* Clean up data we have read: This will do ACK frames. */
1700-	if (copied > 0) {
1701-		tcp_recv_skb(sk, seq, &offset);
1702-		tcp_cleanup_rbuf(sk, copied);
1703-	}
1704-	return copied;
1705-}

sk_eat_skb could get called by tcp_recv_skb → sk_eat_skb if recv_actor
returned > 0 (the case when we have parser attached).

>
>>
>>   sk->sk_data_ready
>>     sk_psock_verdict_data_ready
>>       ->read_sock(..., sk_psock_verdict_recv)
>>         tcp_read_sock (used = copied = 0)
>>           sk_psock_verdict_recv -> ret = 0
>>             sk_psock_verdict_apply -> -EIO
>>               sk_psock_skb_redirect -> -EIO
>>
>> However, I think this gets us stuck. What if no more data gets queued,
>> and sk_data_ready doesn't get called again?
>
> I think it is dropped by sk_eat_skb() in TCP case and of course the
> sender will retransmit it after detecting this loss. So from this point of
> view, there is no difference between drops due to overlimit and drops
> due to eBPF program policy.

I'm not sure the retransmit will happen.

We update tp->rcv_nxt (tcp_rcv_nxt_update) when skb gets pushed onto
sk_receive_queue in either:

 - tcp_rcv_established -> tcp_queue_rcv, or
 - tcp_rcv_established -> tcp_data_queue -> tcp_queue_rcv

... and schedule ACK (tcp_event_data_recv) to be sent.

Say we are in quickack mode, then
tcp_ack_snd_check()/__tcp_ack_snd_check() would cause ACK to be sent
out.

  reply	other threads:[~2021-07-05  8:24 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-01  6:16 [Patch bpf v2] skmsg: check sk_rcvbuf limit before queuing to ingress_skb Cong Wang
2021-07-01 15:56 ` Jakub Sitnicki
2021-07-01 16:26   ` John Fastabend
2021-07-01 16:23 ` John Fastabend
2021-07-01 18:00   ` Cong Wang
2021-07-02 19:33     ` Cong Wang
2021-07-03 17:52   ` Jakub Sitnicki
2021-07-04 13:10     ` Jakub Sitnicki
2021-07-04 19:53     ` Cong Wang
2021-07-05  8:24       ` Jakub Sitnicki [this message]
2021-07-05 16:24         ` John Fastabend

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8735strwwg.fsf@cloudflare.com \
    --to=jakub@cloudflare.com \
    --cc=bpf@vger.kernel.org \
    --cc=cong.wang@bytedance.com \
    --cc=daniel@iogearbox.net \
    --cc=jiang.wang@bytedance.com \
    --cc=john.fastabend@gmail.com \
    --cc=lmb@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).