netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Fastabend <john.fastabend@gmail.com>
To: Jakub Sitnicki <jakub@cloudflare.com>, bpf@vger.kernel.org
Cc: netdev@vger.kernel.org, kernel-team@cloudflare.com,
	Eric Dumazet <edumazet@google.com>,
	John Fastabend <john.fastabend@gmail.com>,
	Lorenz Bauer <lmb@cloudflare.com>,
	Martin KaFai Lau <kafai@fb.com>
Subject: RE: [PATCH bpf-next v2 03/11] net, sk_msg: Clear sk_user_data pointer on clone if tagged
Date: Sat, 11 Jan 2020 15:38:35 -0800	[thread overview]
Message-ID: <5e1a5c7bb3cbd_1e7f2b0c859c45c0b4@john-XPS-13-9370.notmuch> (raw)
In-Reply-To: <20200110105027.257877-4-jakub@cloudflare.com>

Jakub Sitnicki wrote:
> sk_user_data can hold a pointer to an object that is not intended to be
> shared between the parent socket and the child that gets a pointer copy on
> clone. This is the case when sk_user_data points at reference-counted
> object, like struct sk_psock.
> 
> One way to resolve it is to tag the pointer with a no-copy flag by
> repurposing its lowest bit. Based on the bit-flag value we clear the child
> sk_user_data pointer after cloning the parent socket.
> 
> The no-copy flag is stored in the pointer itself as opposed to externally,
> say in socket flags, to guarantee that the pointer and the flag are copied
> from parent to child socket in an atomic fashion. Parent socket state is
> subject to change while copying, we don't hold any locks at that time.
> 
> This approach relies on an assumption that sk_user_data holds a pointer to
> an object aligned to 2 or more bytes. A manual audit of existing users of
> rcu_dereference_sk_user_data helper confirms it. Also, an RCU-protected
> sk_user_data is not likely to hold a pointer to a char value or a
> pathological case of "struct { char c; }". To be safe, warn when the
> flag-bit is set when setting sk_user_data to catch any future misuses.
> 
> It is worth considering why clearing sk_user_data unconditionally is not an
> option. There exist users, DRBD, NVMe, and Xen drivers being among them,
> that rely on the pointer being copied when cloning the listening socket.
> 
> Potentially we could distinguish these users by checking if the listening
> socket has been created in kernel-space via sock_create_kern, and hence has
> sk_kern_sock flag set. However, this is not the case for NVMe and Xen
> drivers, which create sockets without marking them as belonging to the
> kernel.
> 
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---

LGTM.
Acked-by: John Fastabend <john.fastabend@gmail.com>

> diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
> index e6ffdb47b619..f6c83747c71e 100644
> --- a/net/ipv4/tcp_bpf.c
> +++ b/net/ipv4/tcp_bpf.c
> @@ -535,6 +535,10 @@ static void tcp_bpf_remove(struct sock *sk, struct sk_psock *psock)
>  {
>  	struct sk_psock_link *link;
>  
> +	/* Did a child socket inadvertently inherit parent's psock? */
> +	if (WARN_ON(sk != psock->sk))
> +		return;
> +

Not sure if this is needed. We would probably have hit problems before
we get here anyways for example if the parent sock was deleted while
the child is still around. I think I would just drop it.

>  	while ((link = sk_psock_link_pop(psock))) {
>  		sk_psock_unlink(sk, link);
>  		sk_psock_free_link(link);
> -- 
> 2.24.1
> 



  reply	other threads:[~2020-01-11 23:38 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-10 10:50 [PATCH bpf-next v2 00/11] Extend SOCKMAP to store listening sockets Jakub Sitnicki
2020-01-10 10:50 ` [PATCH bpf-next v2 01/11] bpf, sk_msg: Don't reset saved sock proto on restore Jakub Sitnicki
2020-01-11 22:50   ` John Fastabend
2020-01-10 10:50 ` [PATCH bpf-next v2 02/11] net, sk_msg: Annotate lockless access to sk_prot on clone Jakub Sitnicki
2020-01-11 23:14   ` John Fastabend
2020-01-13 15:09     ` Jakub Sitnicki
2020-01-14  3:14       ` John Fastabend
2020-01-20 17:00       ` John Fastabend
2020-01-20 18:11         ` Jakub Sitnicki
2020-01-21 12:42           ` Jakub Sitnicki
2020-01-10 10:50 ` [PATCH bpf-next v2 03/11] net, sk_msg: Clear sk_user_data pointer on clone if tagged Jakub Sitnicki
2020-01-11 23:38   ` John Fastabend [this message]
2020-01-12 12:55   ` kbuild test robot
2020-01-13 20:15   ` Martin Lau
2020-01-14 16:04     ` Jakub Sitnicki
2020-01-10 10:50 ` [PATCH bpf-next v2 04/11] tcp_bpf: Don't let child socket inherit parent protocol ops on copy Jakub Sitnicki
2020-01-11  2:42   ` kbuild test robot
2020-01-11  3:02   ` kbuild test robot
2020-01-11 23:48   ` John Fastabend
2020-01-13 22:31     ` Jakub Sitnicki
2020-01-13 22:23   ` Martin Lau
2020-01-13 22:42     ` Jakub Sitnicki
2020-01-13 23:23       ` Martin Lau
2020-01-10 10:50 ` [PATCH bpf-next v2 05/11] bpf, sockmap: Allow inserting listening TCP sockets into sockmap Jakub Sitnicki
2020-01-11 23:59   ` John Fastabend
2020-01-13 15:48     ` Jakub Sitnicki
2020-01-10 10:50 ` [PATCH bpf-next v2 06/11] bpf, sockmap: Don't set up sockmap progs for listening sockets Jakub Sitnicki
2020-01-12  0:51   ` John Fastabend
2020-01-12  1:07     ` John Fastabend
2020-01-13 17:59       ` Jakub Sitnicki
2020-01-10 10:50 ` [PATCH bpf-next v2 07/11] bpf, sockmap: Return socket cookie on lookup from syscall Jakub Sitnicki
2020-01-12  0:56   ` John Fastabend
2020-01-13 23:12   ` Martin Lau
2020-01-14  3:16     ` John Fastabend
2020-01-14 15:48       ` Jakub Sitnicki
2020-01-10 10:50 ` [PATCH bpf-next v2 08/11] bpf, sockmap: Let all kernel-land lookup values in SOCKMAP Jakub Sitnicki
2020-01-10 10:50 ` [PATCH bpf-next v2 09/11] bpf: Allow selecting reuseport socket from a SOCKMAP Jakub Sitnicki
2020-01-12  1:00   ` John Fastabend
2020-01-13 23:45   ` Martin Lau
2020-01-15 12:41     ` Jakub Sitnicki
2020-01-13 23:51   ` Martin Lau
2020-01-15 12:57     ` Jakub Sitnicki
2020-01-10 10:50 ` [PATCH bpf-next v2 10/11] selftests/bpf: Extend SK_REUSEPORT tests to cover SOCKMAP Jakub Sitnicki
2020-01-12  1:01   ` John Fastabend
2020-01-10 10:50 ` [PATCH bpf-next v2 11/11] selftests/bpf: Tests for SOCKMAP holding listening sockets Jakub Sitnicki
2020-01-12  1:06   ` John Fastabend
2020-01-13 15:58     ` Jakub Sitnicki
2020-01-11  0:18 ` [PATCH bpf-next v2 00/11] Extend SOCKMAP to store " Alexei Starovoitov
2020-01-11 22:47 ` John Fastabend

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5e1a5c7bb3cbd_1e7f2b0c859c45c0b4@john-XPS-13-9370.notmuch \
    --to=john.fastabend@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=edumazet@google.com \
    --cc=jakub@cloudflare.com \
    --cc=kafai@fb.com \
    --cc=kernel-team@cloudflare.com \
    --cc=lmb@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).