bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Fastabend <john.fastabend@gmail.com>
To: Cong Wang <xiyou.wangcong@gmail.com>, netdev@vger.kernel.org
Cc: bpf@vger.kernel.org, Cong Wang <cong.wang@bytedance.com>,
	Jiang Wang <jiang.wang@bytedance.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	John Fastabend <john.fastabend@gmail.com>,
	Lorenz Bauer <lmb@cloudflare.com>,
	Jakub Sitnicki <jakub@cloudflare.com>
Subject: RE: [Patch bpf v2] skmsg: check sk_rcvbuf limit before queuing to ingress_skb
Date: Thu, 01 Jul 2021 09:23:29 -0700	[thread overview]
Message-ID: <60ddec01c651b_3fe24208dc@john-XPS-13-9370.notmuch> (raw)
In-Reply-To: <20210701061656.34150-1-xiyou.wangcong@gmail.com>

Cong Wang wrote:
> From: Cong Wang <cong.wang@bytedance.com>
> 
> Jiang observed OOM frequently when testing our AF_UNIX/UDP
> proxy. This is due to the fact that we do not actually limit
> the socket memory before queueing skb to ingress_skb. We
> charge the skb memory later when handling the psock backlog,
> but it is not limited either.

Right, its not limiting but charging it should push back on
the stack so it stops feeding skbs to us. Maybe this doesn't
happen in UDP side?

> 
> This patch adds checks for sk->sk_rcvbuf right before queuing
> to ingress_skb and drops packets if this limit exceeds. This
> is very similar to UDP receive path. Ideally we should set the
> skb owner before this check too, but it is hard to make TCP
> happy about sk_forward_alloc.

But it breaks TCP side see below.

> 
> Reported-by: Jiang Wang <jiang.wang@bytedance.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Lorenz Bauer <lmb@cloudflare.com>
> Cc: Jakub Sitnicki <jakub@cloudflare.com>
> Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> ---
>  net/core/skmsg.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index 9b6160a191f8..a5185c781332 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -854,7 +854,8 @@ static int sk_psock_skb_redirect(struct sk_psock *from, struct sk_buff *skb)
>  		return -EIO;
>  	}
>  	spin_lock_bh(&psock_other->ingress_lock);
> -	if (!sk_psock_test_state(psock_other, SK_PSOCK_TX_ENABLED)) {
> +	if (!sk_psock_test_state(psock_other, SK_PSOCK_TX_ENABLED) ||
> +	    atomic_read(&sk_other->sk_rmem_alloc) > READ_ONCE(sk_other->sk_rcvbuf)) {
>  		spin_unlock_bh(&psock_other->ingress_lock);
>  		skb_bpf_redirect_clear(skb);
>  		sock_drop(from->sk, skb);
> @@ -930,7 +931,8 @@ static int sk_psock_verdict_apply(struct sk_psock *psock, struct sk_buff *skb,
>  		}
>  		if (err < 0) {
>  			spin_lock_bh(&psock->ingress_lock);
> -			if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) {
> +			if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED) &&
> +			    atomic_read(&sk_other->sk_rmem_alloc) <= READ_ONCE(sk_other->sk_rcvbuf)) {
>  				skb_queue_tail(&psock->ingress_skb, skb);

We can't just drop the packet in the memory overrun case here. This will
break TCP because the data will be gone and no one will retransmit.

Thats why in the current scheme on redirect we can push back when we
move it to the other queues ingress message queue or redirect into
the other socket via send.

At one point I considered charging the data sitting in the ingress_skb?
Would that solve the problem here? I think it would cause the enqueue
at the UDP to start dropping packets from __udp_enqueue_schedule_skb()?

>  				schedule_work(&psock->work);
>  				err = 0;
> -- 
> 2.27.0
> 

  parent reply	other threads:[~2021-07-01 16:23 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-01  6:16 [Patch bpf v2] skmsg: check sk_rcvbuf limit before queuing to ingress_skb Cong Wang
2021-07-01 15:56 ` Jakub Sitnicki
2021-07-01 16:26   ` John Fastabend
2021-07-01 16:23 ` John Fastabend [this message]
2021-07-01 18:00   ` Cong Wang
2021-07-02 19:33     ` Cong Wang
2021-07-03 17:52   ` Jakub Sitnicki
2021-07-04 13:10     ` Jakub Sitnicki
2021-07-04 19:53     ` Cong Wang
2021-07-05  8:24       ` Jakub Sitnicki
2021-07-05 16:24         ` John Fastabend

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=60ddec01c651b_3fe24208dc@john-XPS-13-9370.notmuch \
    --to=john.fastabend@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=cong.wang@bytedance.com \
    --cc=daniel@iogearbox.net \
    --cc=jakub@cloudflare.com \
    --cc=jiang.wang@bytedance.com \
    --cc=lmb@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).