All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Fastabend <john.fastabend@gmail.com>
To: Jakub Sitnicki <jakub@cloudflare.com>,
	John Fastabend <john.fastabend@gmail.com>
Cc: bpf@vger.kernel.org, netdev@vger.kernel.org,
	kernel-team@cloudflare.com, Eric Dumazet <edumazet@google.com>,
	Lorenz Bauer <lmb@cloudflare.com>,
	Martin KaFai Lau <kafai@fb.com>
Subject: Re: [PATCH bpf-next v2 02/11] net, sk_msg: Annotate lockless access to sk_prot on clone
Date: Mon, 20 Jan 2020 09:00:09 -0800	[thread overview]
Message-ID: <5e25dc995d7d_74082aaee6e465b441@john-XPS-13-9370.notmuch> (raw)
In-Reply-To: <87muars890.fsf@cloudflare.com>

Jakub Sitnicki wrote:
> On Sun, Jan 12, 2020 at 12:14 AM CET, John Fastabend wrote:
> > Jakub Sitnicki wrote:
> >> sk_msg and ULP frameworks override protocol callbacks pointer in
> >> sk->sk_prot, while TCP accesses it locklessly when cloning the listening
> >> socket.
> >>
> >> Once we enable use of listening sockets with sockmap (and hence sk_msg),
> >> there can be shared access to sk->sk_prot if socket is getting cloned while
> >> being inserted/deleted to/from the sockmap from another CPU. Mark the
> >> shared access with READ_ONCE/WRITE_ONCE annotations.
> >>
> >> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> >
> > In sockmap side I fixed this by wrapping the access in a lock_sock[0]. So
> > Do you think this is still needed with that in mind? The bpf_clone call
> > is using sk_prot_creater and also setting the newsk's proto field. Even
> > if the listening parent sock was being deleted in parallel would that be
> > a problem? We don't touch sk_prot_creator from the tear down path. I've
> > only scanned the 3..11 patches so maybe the answer is below. If that is
> > the case probably an improved commit message would be helpful.
> 
> I think it is needed. Not because of tcp_bpf_clone or that we access
> listener's sk_prot_creator from there, if I'm grasping your question.
> 
> Either way I'm glad this came up. Let's go though my reasoning and
> verify it. tcp stack accesses the listener sk_prot while cloning it:
> 
> tcp_v4_rcv
>   sk = __inet_lookup_skb(...)
>   tcp_check_req(sk)
>     inet_csk(sk)->icsk_af_ops->syn_recv_sock
>       tcp_v4_syn_recv_sock
>         tcp_create_openreq_child
>           inet_csk_clone_lock
>             sk_clone_lock
>               READ_ONCE(sk->sk_prot)
> 
> It grabs a reference to the listener, but doesn't grab the sk_lock.
> 
> On another CPU we can be inserting/removing the listener socket from the
> sockmap and writing to its sk_prot. We have the update and the remove
> path:
> 
> sock_map_ops->map_update_elem
>   sock_map_update_elem
>     sock_map_update_common
>       sock_map_link_no_progs
>         tcp_bpf_init
>           tcp_bpf_update_sk_prot
>             sk_psock_update_proto
>               WRITE_ONCE(sk->sk_prot, ops)
> 
> sock_map_ops->map_delete_elem
>   sock_map_delete_elem
>     __sock_map_delete
>      sock_map_unref
>        sk_psock_put
>          sk_psock_drop
>            sk_psock_restore_proto
>              tcp_update_ulp
>                WRITE_ONCE(sk->sk_prot, proto)
> 
> Following the guidelines from KTSAN project [0], sk_prot looks like a
> candidate for annotating it. At least on these 3 call paths.
> 
> If that sounds correct, I can add it to the patch description.
> 
> Thanks,
> -jkbs
> 
> [0] https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE

Hi Jakub, can push this to bpf tree as well? There is another case
already in-kernel where this is needed. If the map is removed while
a recvmsg is in flight.

 tcp_bpf_recvmsg()
  psock = sk_psock_get(sk)                         <- refcnt 2
  lock_sock(sk);
  ...                                
                                  sock_map_free()  <- refcnt 1
  release_sock(sk)
  sk_psock_put()                                   <- refcnt 0

Then can you add this diff as well I got a bit too carried away
with that. If your busy I can do it as well if you want. Thanks!

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 3866d7e20c07..ded2d5227678 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -594,8 +594,6 @@ EXPORT_SYMBOL_GPL(sk_psock_destroy);
 
 void sk_psock_drop(struct sock *sk, struct sk_psock *psock)
 {
-       sock_owned_by_me(sk);
-
        sk_psock_cork_free(psock);
        sk_psock_zap_ingress(psock);

  parent reply	other threads:[~2020-01-20 17:00 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-10 10:50 [PATCH bpf-next v2 00/11] Extend SOCKMAP to store listening sockets Jakub Sitnicki
2020-01-10 10:50 ` [PATCH bpf-next v2 01/11] bpf, sk_msg: Don't reset saved sock proto on restore Jakub Sitnicki
2020-01-11 22:50   ` John Fastabend
2020-01-10 10:50 ` [PATCH bpf-next v2 02/11] net, sk_msg: Annotate lockless access to sk_prot on clone Jakub Sitnicki
2020-01-11 23:14   ` John Fastabend
2020-01-13 15:09     ` Jakub Sitnicki
2020-01-14  3:14       ` John Fastabend
2020-01-20 17:00       ` John Fastabend [this message]
2020-01-20 18:11         ` Jakub Sitnicki
2020-01-21 12:42           ` Jakub Sitnicki
2020-01-10 10:50 ` [PATCH bpf-next v2 03/11] net, sk_msg: Clear sk_user_data pointer on clone if tagged Jakub Sitnicki
2020-01-11 23:38   ` John Fastabend
2020-01-12 12:55   ` kbuild test robot
2020-01-12 12:55     ` kbuild test robot
2020-01-13 20:15   ` Martin Lau
2020-01-14 16:04     ` Jakub Sitnicki
2020-01-10 10:50 ` [PATCH bpf-next v2 04/11] tcp_bpf: Don't let child socket inherit parent protocol ops on copy Jakub Sitnicki
2020-01-11  2:42   ` kbuild test robot
2020-01-11  2:42     ` kbuild test robot
2020-01-11  3:02   ` kbuild test robot
2020-01-11  3:02     ` kbuild test robot
2020-01-11 23:48   ` John Fastabend
2020-01-13 22:31     ` Jakub Sitnicki
2020-01-13 22:23   ` Martin Lau
2020-01-13 22:42     ` Jakub Sitnicki
2020-01-13 23:23       ` Martin Lau
2020-01-10 10:50 ` [PATCH bpf-next v2 05/11] bpf, sockmap: Allow inserting listening TCP sockets into sockmap Jakub Sitnicki
2020-01-11 23:59   ` John Fastabend
2020-01-13 15:48     ` Jakub Sitnicki
2020-01-10 10:50 ` [PATCH bpf-next v2 06/11] bpf, sockmap: Don't set up sockmap progs for listening sockets Jakub Sitnicki
2020-01-12  0:51   ` John Fastabend
2020-01-12  1:07     ` John Fastabend
2020-01-13 17:59       ` Jakub Sitnicki
2020-01-10 10:50 ` [PATCH bpf-next v2 07/11] bpf, sockmap: Return socket cookie on lookup from syscall Jakub Sitnicki
2020-01-12  0:56   ` John Fastabend
2020-01-13 23:12   ` Martin Lau
2020-01-14  3:16     ` John Fastabend
2020-01-14 15:48       ` Jakub Sitnicki
2020-01-10 10:50 ` [PATCH bpf-next v2 08/11] bpf, sockmap: Let all kernel-land lookup values in SOCKMAP Jakub Sitnicki
2020-01-10 10:50 ` [PATCH bpf-next v2 09/11] bpf: Allow selecting reuseport socket from a SOCKMAP Jakub Sitnicki
2020-01-12  1:00   ` John Fastabend
2020-01-13 23:45   ` Martin Lau
2020-01-15 12:41     ` Jakub Sitnicki
2020-01-13 23:51   ` Martin Lau
2020-01-15 12:57     ` Jakub Sitnicki
2020-01-10 10:50 ` [PATCH bpf-next v2 10/11] selftests/bpf: Extend SK_REUSEPORT tests to cover SOCKMAP Jakub Sitnicki
2020-01-12  1:01   ` John Fastabend
2020-01-10 10:50 ` [PATCH bpf-next v2 11/11] selftests/bpf: Tests for SOCKMAP holding listening sockets Jakub Sitnicki
2020-01-12  1:06   ` John Fastabend
2020-01-13 15:58     ` Jakub Sitnicki
2020-01-11  0:18 ` [PATCH bpf-next v2 00/11] Extend SOCKMAP to store " Alexei Starovoitov
2020-01-11 22:47 ` John Fastabend

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5e25dc995d7d_74082aaee6e465b441@john-XPS-13-9370.notmuch \
    --to=john.fastabend@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=edumazet@google.com \
    --cc=jakub@cloudflare.com \
    --cc=kafai@fb.com \
    --cc=kernel-team@cloudflare.com \
    --cc=lmb@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.