All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Sitnicki <jakub@cloudflare.com>
To: John Fastabend <john.fastabend@gmail.com>
Cc: ast@kernel.org, daniel@iogearbox.net, bpf@vger.kernel.org,
	netdev@vger.kernel.org
Subject: Re: [bpf PATCH v2 5/6] bpf, sockmap: Handle memory acct if skb_verdict prog redirects to self
Date: Mon, 16 Nov 2020 15:31:32 +0100	[thread overview]
Message-ID: <87blfxweyj.fsf@cloudflare.com> (raw)
In-Reply-To: <160522367856.135009.17304729578208922913.stgit@john-XPS-13-9370>

On Fri, Nov 13, 2020 at 12:27 AM CET, John Fastabend wrote:
> If the skb_verdict_prog redirects an skb knowingly to itself, fix your
> BPF program this is not optimal and an abuse of the API please use
> SK_PASS. That said there may be cases, such as socket load balancing,
> where picking the socket is hashed based or otherwise picks the same
> socket it was received on in some rare cases. If this happens we don't
> want to confuse userspace giving them an EAGAIN error if we can avoid
> it.
>
> To avoid double accounting in these cases. At the moment even if the
> skb has already been charged against the sockets rcvbuf and forward
> alloc we check it again and do set_owner_r() causing it to be orphaned
> and recharged. For one this is useless work, but more importantly we
> can have a case where the skb could be put on the ingress queue, but
> because we are under memory pressure we return EAGAIN. The trouble
> here is the skb has already been accounted for so any rcvbuf checks
> include the memory associated with the packet already. This rolls
> up and can result in unecessary EAGAIN errors in userspace read()
> calls.
>
> Fix by doing an unlikely check and skipping checks if skb->sk == sk.
>
> Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path")
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> ---
>  net/core/skmsg.c |   17 +++++++++++------
>  1 file changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index 9aed5a2c7c5b..f747ee341fe8 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -404,11 +404,13 @@ static struct sk_msg *sk_psock_create_ingress_msg(struct sock *sk,
>  {
>  	struct sk_msg *msg;
>  
> -	if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf)
> -		return NULL;
> +	if (likely(skb->sk != sk)) {
> +		if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf)
> +			return NULL;
>  
> -	if (!sk_rmem_schedule(sk, skb, skb->truesize))
> -		return NULL;
> +		if (!sk_rmem_schedule(sk, skb, skb->truesize))
> +			return NULL;
> +	}
>  
>  	msg = kzalloc(sizeof(*msg), __GFP_NOWARN | GFP_ATOMIC);
>  	if (unlikely(!msg))
> @@ -455,9 +457,12 @@ static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb)
>  	 * the BPF program was run initiating the redirect to the socket
>  	 * we will eventually receive this data on. The data will be released
>  	 * from skb_consume found in __tcp_bpf_recvmsg() after its been copied
> -	 * into user buffers.
> +	 * into user buffers. If we are receiving on the same sock skb->sk is
> +	 * already assigned, skip memory accounting and owner transition seeing
> +	 * it already set correctly.
>  	 */
> -	skb_set_owner_r(skb, sk);
> +	if (likely(skb->sk != sk))
> +		skb_set_owner_r(skb, sk);
>  	return sk_psock_skb_ingress_enqueue(skb, psock, sk, msg);
>  }
>  

I think all the added checks boil down to having:

	struct sock *sk = psock->sk;

        if (unlikely(skb->sk == sk))
                return sk_psock_skb_ingress_self(psock, skb);

... on entry to sk_psock_skb_ingress().

  reply	other threads:[~2020-11-16 14:31 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-12 23:26 [bpf PATCH v2 0/6] sockmap fixes John Fastabend
2020-11-12 23:26 ` [bpf PATCH v2 1/6] bpf, sockmap: fix partial copy_page_to_iter so progress can still be made John Fastabend
2020-11-12 23:27 ` [bpf PATCH v2 2/6] bpf, sockmap: Ensure SO_RCVBUF memory is observed on ingress redirect John Fastabend
2020-11-12 23:27 ` [bpf PATCH v2 3/6] bpf, sockmap: Use truesize with sk_rmem_schedule() John Fastabend
2020-11-12 23:27 ` [bpf PATCH v2 4/6] bpf, sockmap: Avoid returning unneeded EAGAIN when redirecting to self John Fastabend
2020-11-12 23:27 ` [bpf PATCH v2 5/6] bpf, sockmap: Handle memory acct if skb_verdict prog redirects " John Fastabend
2020-11-16 14:31   ` Jakub Sitnicki [this message]
2020-11-16 22:28     ` John Fastabend
2020-11-12 23:28 ` [bpf PATCH v2 6/6] bpf, sockmap: Avoid failures from skb_to_sgvec when skb has frag_list John Fastabend
2020-11-16 14:49 ` [bpf PATCH v2 0/6] sockmap fixes Jakub Sitnicki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87blfxweyj.fsf@cloudflare.com \
    --to=jakub@cloudflare.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=john.fastabend@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.