From: Jakub Sitnicki <jakub@cloudflare.com>
To: bpf@vger.kernel.org
Cc: netdev@vger.kernel.org, kernel-team@cloudflare.com,
Martin KaFai Lau <kafai@fb.com>
Subject: [PATCH bpf-next v6 02/12] net, sk_msg: Annotate lockless access to sk_prot on clone
Date: Mon, 27 Jan 2020 14:10:47 +0100 [thread overview]
Message-ID: <20200127131057.150941-3-jakub@cloudflare.com> (raw)
In-Reply-To: <20200127131057.150941-1-jakub@cloudflare.com>
sk_msg and ULP frameworks override protocol callbacks pointer in
sk->sk_prot, while tcp accesses it locklessly when cloning the listening
socket, that is with neither sk_lock nor sk_callback_lock held.
Once we enable use of listening sockets with sockmap (and hence sk_msg),
there will be shared access to sk->sk_prot if socket is getting cloned
while being inserted/deleted to/from the sockmap from another CPU:
Read side:
tcp_v4_rcv
sk = __inet_lookup_skb(...)
tcp_check_req(sk)
inet_csk(sk)->icsk_af_ops->syn_recv_sock
tcp_v4_syn_recv_sock
tcp_create_openreq_child
inet_csk_clone_lock
sk_clone_lock
READ_ONCE(sk->sk_prot)
Write side:
sock_map_ops->map_update_elem
sock_map_update_elem
sock_map_update_common
sock_map_link_no_progs
tcp_bpf_init
tcp_bpf_update_sk_prot
sk_psock_update_proto
WRITE_ONCE(sk->sk_prot, ops)
sock_map_ops->map_delete_elem
sock_map_delete_elem
__sock_map_delete
sock_map_unref
sk_psock_put
sk_psock_drop
sk_psock_restore_proto
tcp_update_ulp
WRITE_ONCE(sk->sk_prot, proto)
Mark the shared access with READ_ONCE/WRITE_ONCE annotations.
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
include/linux/skmsg.h | 3 ++-
net/core/sock.c | 5 +++--
net/ipv4/tcp_bpf.c | 4 +++-
net/ipv4/tcp_ulp.c | 3 ++-
net/tls/tls_main.c | 3 ++-
5 files changed, 12 insertions(+), 6 deletions(-)
diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index 6311838e7df8..8647937998f3 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -352,7 +352,8 @@ static inline void sk_psock_update_proto(struct sock *sk,
psock->saved_write_space = sk->sk_write_space;
psock->sk_proto = sk->sk_prot;
- sk->sk_prot = ops;
+ /* Pairs with lockless read in sk_clone_lock() */
+ WRITE_ONCE(sk->sk_prot, ops);
}
static inline void sk_psock_restore_proto(struct sock *sk,
diff --git a/net/core/sock.c b/net/core/sock.c
index a4c8fac781ff..3953bb23f4d0 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1792,16 +1792,17 @@ static void sk_init_common(struct sock *sk)
*/
struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
{
+ struct proto *prot = READ_ONCE(sk->sk_prot);
struct sock *newsk;
bool is_charged = true;
- newsk = sk_prot_alloc(sk->sk_prot, priority, sk->sk_family);
+ newsk = sk_prot_alloc(prot, priority, sk->sk_family);
if (newsk != NULL) {
struct sk_filter *filter;
sock_copy(newsk, sk);
- newsk->sk_prot_creator = sk->sk_prot;
+ newsk->sk_prot_creator = prot;
/* SANITY */
if (likely(newsk->sk_net_refcnt))
diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index 8a01428f80c1..dd183b050642 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -645,8 +645,10 @@ static void tcp_bpf_reinit_sk_prot(struct sock *sk, struct sk_psock *psock)
/* Reinit occurs when program types change e.g. TCP_BPF_TX is removed
* or added requiring sk_prot hook updates. We keep original saved
* hooks in this case.
+ *
+ * Pairs with lockless read in sk_clone_lock().
*/
- sk->sk_prot = &tcp_bpf_prots[family][config];
+ WRITE_ONCE(sk->sk_prot, &tcp_bpf_prots[family][config]);
}
static int tcp_bpf_assert_proto_ops(struct proto *ops)
diff --git a/net/ipv4/tcp_ulp.c b/net/ipv4/tcp_ulp.c
index 38d3ad141161..6c43fa189195 100644
--- a/net/ipv4/tcp_ulp.c
+++ b/net/ipv4/tcp_ulp.c
@@ -106,7 +106,8 @@ void tcp_update_ulp(struct sock *sk, struct proto *proto,
if (!icsk->icsk_ulp_ops) {
sk->sk_write_space = write_space;
- sk->sk_prot = proto;
+ /* Pairs with lockless read in sk_clone_lock() */
+ WRITE_ONCE(sk->sk_prot, proto);
return;
}
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 94774c0e5ff3..b19a54f4cdcc 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -742,7 +742,8 @@ static void tls_update(struct sock *sk, struct proto *p,
ctx->sk_write_space = write_space;
ctx->sk_proto = p;
} else {
- sk->sk_prot = p;
+ /* Pairs with lockless read in sk_clone_lock() */
+ WRITE_ONCE(sk->sk_prot, p);
sk->sk_write_space = write_space;
}
}
--
2.24.1
next prev parent reply other threads:[~2020-01-27 13:11 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-27 13:10 [PATCH bpf-next v6 00/12] Extend SOCKMAP to store listening sockets Jakub Sitnicki
2020-01-27 13:10 ` [PATCH bpf-next v6 01/12] bpf, sk_msg: Don't clear saved sock proto on restore Jakub Sitnicki
2020-01-27 13:10 ` Jakub Sitnicki [this message]
2020-01-27 13:10 ` [PATCH bpf-next v6 03/12] net, sk_msg: Clear sk_user_data pointer on clone if tagged Jakub Sitnicki
2020-01-27 13:10 ` [PATCH bpf-next v6 04/12] tcp_bpf: Don't let child socket inherit parent protocol ops on copy Jakub Sitnicki
2020-01-27 13:10 ` [PATCH bpf-next v6 05/12] bpf, sockmap: Allow inserting listening TCP sockets into sockmap Jakub Sitnicki
2020-01-27 13:10 ` [PATCH bpf-next v6 06/12] bpf, sockmap: Don't set up sockmap progs for listening sockets Jakub Sitnicki
2020-01-27 13:10 ` [PATCH bpf-next v6 07/12] bpf, sockmap: Return socket cookie on lookup from syscall Jakub Sitnicki
2020-01-27 13:10 ` [PATCH bpf-next v6 08/12] bpf, sockmap: Let all kernel-land lookup values in SOCKMAP Jakub Sitnicki
2020-01-27 13:10 ` [PATCH bpf-next v6 09/12] bpf: Allow selecting reuseport socket from a SOCKMAP Jakub Sitnicki
2020-01-27 13:10 ` [PATCH bpf-next v6 10/12] net: Generate reuseport group ID on group creation Jakub Sitnicki
2020-01-27 13:10 ` [PATCH bpf-next v6 11/12] selftests/bpf: Extend SK_REUSEPORT tests to cover SOCKMAP Jakub Sitnicki
2020-01-27 13:10 ` [PATCH bpf-next v6 12/12] selftests/bpf: Tests for SOCKMAP holding listening sockets Jakub Sitnicki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200127131057.150941-3-jakub@cloudflare.com \
--to=jakub@cloudflare.com \
--cc=bpf@vger.kernel.org \
--cc=kafai@fb.com \
--cc=kernel-team@cloudflare.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).