From: Jakub Sitnicki <jakub@cloudflare.com>
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: John Fastabend <john.fastabend@gmail.com>,
Linux Kernel Network Developers <netdev@vger.kernel.org>,
bpf <bpf@vger.kernel.org>, Cong Wang <cong.wang@bytedance.com>,
Jiang Wang <jiang.wang@bytedance.com>,
Daniel Borkmann <daniel@iogearbox.net>,
Lorenz Bauer <lmb@cloudflare.com>
Subject: Re: [Patch bpf v2] skmsg: check sk_rcvbuf limit before queuing to ingress_skb
Date: Mon, 05 Jul 2021 10:24:15 +0200 [thread overview]
Message-ID: <8735strwwg.fsf@cloudflare.com> (raw)
In-Reply-To: <CAM_iQpW69PGfp_Y8mZoqznwCk2axask5qJLB7ntZjFgGO+Eizg@mail.gmail.com>
On Sun, Jul 04, 2021 at 09:53 PM CEST, Cong Wang wrote:
> On Sat, Jul 3, 2021 at 10:52 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>> When running with just the verdict prog attached, the -EIO error from
>> sk_psock_verdict_apply is propagated up to tcp_read_sock. That is, it
>> maps to 0 bytes used by recv_actor. sk_psock_verdict_recv in this case.
>>
>> tcp_read_sock, if 0 bytes were used = copied, won't sk_eat_skb. It stays
>> on sk_receive_queue.
>
> Are you sure?
>
> When recv_actor() returns 0, the while loop breaks:
>
> 1661 used = recv_actor(desc, skb, offset, len);
> 1662 if (used <= 0) {
> 1663 if (!copied)
> 1664 copied = used;
> 1665 break;
>
> then it calls sk_eat_skb() a few lines after the loop:
> ...
> 1690 sk_eat_skb(sk, skb);
This sk_eat_skb is still within the loop:
1636:int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
1637- sk_read_actor_t recv_actor)
1638-{
…
1643- int copied = 0;
…
1647- while ((skb = tcp_recv_skb(sk, seq, &offset)) != NULL) {
1648- if (offset < skb->len) {
…
1661- used = recv_actor(desc, skb, offset, len);
1662- if (used <= 0) {
1663- if (!copied)
1664- copied = used;
1665- break;
1666- } else if (used <= len) {
1667- seq += used;
1668- copied += used;
1669- offset += used;
1670- }
…
1684- }
…
1690- sk_eat_skb(sk, skb);
…
1694- }
…
1699- /* Clean up data we have read: This will do ACK frames. */
1700- if (copied > 0) {
1701- tcp_recv_skb(sk, seq, &offset);
1702- tcp_cleanup_rbuf(sk, copied);
1703- }
1704- return copied;
1705-}
sk_eat_skb could get called by tcp_recv_skb → sk_eat_skb if recv_actor
returned > 0 (the case when we have parser attached).
>
>>
>> sk->sk_data_ready
>> sk_psock_verdict_data_ready
>> ->read_sock(..., sk_psock_verdict_recv)
>> tcp_read_sock (used = copied = 0)
>> sk_psock_verdict_recv -> ret = 0
>> sk_psock_verdict_apply -> -EIO
>> sk_psock_skb_redirect -> -EIO
>>
>> However, I think this gets us stuck. What if no more data gets queued,
>> and sk_data_ready doesn't get called again?
>
> I think it is dropped by sk_eat_skb() in TCP case and of course the
> sender will retransmit it after detecting this loss. So from this point of
> view, there is no difference between drops due to overlimit and drops
> due to eBPF program policy.
I'm not sure the retransmit will happen.
We update tp->rcv_nxt (tcp_rcv_nxt_update) when skb gets pushed onto
sk_receive_queue in either:
- tcp_rcv_established -> tcp_queue_rcv, or
- tcp_rcv_established -> tcp_data_queue -> tcp_queue_rcv
... and schedule ACK (tcp_event_data_recv) to be sent.
Say we are in quickack mode, then
tcp_ack_snd_check()/__tcp_ack_snd_check() would cause ACK to be sent
out.
next prev parent reply other threads:[~2021-07-05 8:24 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-01 6:16 [Patch bpf v2] skmsg: check sk_rcvbuf limit before queuing to ingress_skb Cong Wang
2021-07-01 15:56 ` Jakub Sitnicki
2021-07-01 16:26 ` John Fastabend
2021-07-01 16:23 ` John Fastabend
2021-07-01 18:00 ` Cong Wang
2021-07-02 19:33 ` Cong Wang
2021-07-03 17:52 ` Jakub Sitnicki
2021-07-04 13:10 ` Jakub Sitnicki
2021-07-04 19:53 ` Cong Wang
2021-07-05 8:24 ` Jakub Sitnicki [this message]
2021-07-05 16:24 ` John Fastabend
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8735strwwg.fsf@cloudflare.com \
--to=jakub@cloudflare.com \
--cc=bpf@vger.kernel.org \
--cc=cong.wang@bytedance.com \
--cc=daniel@iogearbox.net \
--cc=jiang.wang@bytedance.com \
--cc=john.fastabend@gmail.com \
--cc=lmb@cloudflare.com \
--cc=netdev@vger.kernel.org \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).