All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Sitnicki <jakub@cloudflare.com>
To: John Fastabend <john.fastabend@gmail.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	Cong Wang <cong.wang@bytedance.com>,
	Jiang Wang <jiang.wang@bytedance.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Lorenz Bauer <lmb@cloudflare.com>
Subject: Re: [Patch bpf v2] skmsg: check sk_rcvbuf limit before queuing to ingress_skb
Date: Sat, 03 Jul 2021 19:52:35 +0200	[thread overview]
Message-ID: <875yxrs2sc.fsf@cloudflare.com> (raw)
In-Reply-To: <60ddec01c651b_3fe24208dc@john-XPS-13-9370.notmuch>

On Thu, Jul 01, 2021 at 06:23 PM CEST, John Fastabend wrote:

[...]

>> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
>> index 9b6160a191f8..a5185c781332 100644
>> --- a/net/core/skmsg.c
>> +++ b/net/core/skmsg.c
>> @@ -854,7 +854,8 @@ static int sk_psock_skb_redirect(struct sk_psock *from, struct sk_buff *skb)
>>  		return -EIO;
>>  	}
>>  	spin_lock_bh(&psock_other->ingress_lock);
>> -	if (!sk_psock_test_state(psock_other, SK_PSOCK_TX_ENABLED)) {
>> +	if (!sk_psock_test_state(psock_other, SK_PSOCK_TX_ENABLED) ||
>> +	    atomic_read(&sk_other->sk_rmem_alloc) > READ_ONCE(sk_other->sk_rcvbuf)) {
>>  		spin_unlock_bh(&psock_other->ingress_lock);
>>  		skb_bpf_redirect_clear(skb);
>>  		sock_drop(from->sk, skb);
>> @@ -930,7 +931,8 @@ static int sk_psock_verdict_apply(struct sk_psock *psock, struct sk_buff *skb,
>>  		}
>>  		if (err < 0) {
>>  			spin_lock_bh(&psock->ingress_lock);
>> -			if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) {
>> +			if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED) &&
>> +			    atomic_read(&sk_other->sk_rmem_alloc) <= READ_ONCE(sk_other->sk_rcvbuf)) {
>>  				skb_queue_tail(&psock->ingress_skb, skb);
>
> We can't just drop the packet in the memory overrun case here. This will
> break TCP because the data will be gone and no one will retransmit.

I don't think it's always the case that data will be gone. But you're
right that it breaks TCP. I was too quick to Ack this patch.

When running with just the verdict prog attached, the -EIO error from
sk_psock_verdict_apply is propagated up to tcp_read_sock. That is, it
maps to 0 bytes used by recv_actor. sk_psock_verdict_recv in this case.

tcp_read_sock, if 0 bytes were used = copied, won't sk_eat_skb. It stays
on sk_receive_queue.

  sk->sk_data_ready
    sk_psock_verdict_data_ready
      ->read_sock(..., sk_psock_verdict_recv)
        tcp_read_sock (used = copied = 0)
          sk_psock_verdict_recv -> ret = 0
            sk_psock_verdict_apply -> -EIO
              sk_psock_skb_redirect -> -EIO

However, I think this gets us stuck. What if no more data gets queued,
and sk_data_ready doesn't get called again?


Then there is the case when a parser prog is attached. In this case the
skb is really gone if we drop it on redirect.

In sk_psock_strp_read, we ignore the -EIO error from
sk_psock_verdict_apply, and return to tcp_read_sock how many bytes have
been parsed.

  sk->sk_data_ready
    sk_psock_verdict_data_ready
      ->read_sock(..., sk_psock_verdict_recv)
        tcp_read_sock (used = copied = eaten)
          strp_recv -> ret = eaten
            __strp_recv -> ret = eaten
              strp->cb.rcv_msg -> -EIO
                sk_psock_verdict_apply -> -EIO
                  sk_psock_redirect -> -EIO

Maybe we could put the skb back on strp->skb_head list on error, though?

But again some notification would need to trigger a re-read, or we are
stuck.

  parent reply	other threads:[~2021-07-03 17:52 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-01  6:16 [Patch bpf v2] skmsg: check sk_rcvbuf limit before queuing to ingress_skb Cong Wang
2021-07-01 15:56 ` Jakub Sitnicki
2021-07-01 16:26   ` John Fastabend
2021-07-01 16:23 ` John Fastabend
2021-07-01 18:00   ` Cong Wang
2021-07-02 19:33     ` Cong Wang
2021-07-03 17:52   ` Jakub Sitnicki [this message]
2021-07-04 13:10     ` Jakub Sitnicki
2021-07-04 19:53     ` Cong Wang
2021-07-05  8:24       ` Jakub Sitnicki
2021-07-05 16:24         ` John Fastabend

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=875yxrs2sc.fsf@cloudflare.com \
    --to=jakub@cloudflare.com \
    --cc=bpf@vger.kernel.org \
    --cc=cong.wang@bytedance.com \
    --cc=daniel@iogearbox.net \
    --cc=jiang.wang@bytedance.com \
    --cc=john.fastabend@gmail.com \
    --cc=lmb@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.