All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v2 0/9] bpf: sockmap, sockhash: support storing UDP sockets
@ 2020-02-28 11:53 Lorenz Bauer
  2020-02-28 11:53 ` [PATCH bpf-next v2 1/9] bpf: sockmap: only check ULP for TCP sockets Lorenz Bauer
                   ` (10 more replies)
  0 siblings, 11 replies; 20+ messages in thread
From: Lorenz Bauer @ 2020-02-28 11:53 UTC (permalink / raw)
  To: john.fastabend, Alexei Starovoitov, Daniel Borkmann
  Cc: kernel-team, Lorenz Bauer, netdev, bpf

Thanks for all the reviews so far! I've fixed the identified bug and addressed
feedback as much as possible.

I've not taken up Jakub's suggestion to get rid of sk_psock_hooks_init. My
intention is to encapsulate initializing both v4 and v6 protos. Really, it's
down to personal preference, and I'm happy to remove it if others prefer
Jakub's approach. I'm also still eager for a solution that requires less
machinery.

Changes since v1:
- Check newsk->sk_prot in tcp_bpf_clone
- Fix compilation with BPF_STREAM_PARSER disabled
- Use spin_lock_init instead of static initializer
- Elaborate on TCPF_SYN_RECV
- Cosmetic changes to TEST macros, and more tests
- Add Jakub and me as maintainers

Lorenz Bauer (9):
  bpf: sockmap: only check ULP for TCP sockets
  bpf: tcp: guard declarations with CONFIG_NET_SOCK_MSG
  bpf: sockmap: move generic sockmap hooks from BPF TCP
  skmsg: introduce sk_psock_hooks
  bpf: sockmap: allow UDP sockets
  selftests: bpf: don't listen() on UDP sockets
  selftests: bpf: add tests for UDP sockets in sockmap
  selftests: bpf: enable UDP sockmap reuseport tests
  bpf, doc: update maintainers for L7 BPF

 MAINTAINERS                                   |   3 +
 include/linux/bpf.h                           |   4 +-
 include/linux/skmsg.h                         |  72 +++----
 include/linux/udp.h                           |   4 +
 include/net/tcp.h                             |  18 +-
 net/core/skmsg.c                              |  55 +++++
 net/core/sock_map.c                           | 160 ++++++++++----
 net/ipv4/Makefile                             |   1 +
 net/ipv4/tcp_bpf.c                            | 169 +++------------
 net/ipv4/udp_bpf.c                            |  53 +++++
 .../bpf/prog_tests/select_reuseport.c         |   6 -
 .../selftests/bpf/prog_tests/sockmap_listen.c | 204 +++++++++++++-----
 12 files changed, 465 insertions(+), 284 deletions(-)
 create mode 100644 net/ipv4/udp_bpf.c

-- 
2.20.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v2 1/9] bpf: sockmap: only check ULP for TCP sockets
  2020-02-28 11:53 [PATCH bpf-next v2 0/9] bpf: sockmap, sockhash: support storing UDP sockets Lorenz Bauer
@ 2020-02-28 11:53 ` Lorenz Bauer
  2020-03-03 17:35   ` John Fastabend
  2020-02-28 11:53 ` [PATCH bpf-next v2 2/9] bpf: tcp: guard declarations with CONFIG_NET_SOCK_MSG Lorenz Bauer
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Lorenz Bauer @ 2020-02-28 11:53 UTC (permalink / raw)
  To: john.fastabend, Daniel Borkmann, Jakub Sitnicki, Lorenz Bauer,
	David S. Miller, Jakub Kicinski, Alexei Starovoitov
  Cc: kernel-team, netdev, bpf, linux-kernel

The sock map code checks that a socket does not have an active upper
layer protocol before inserting it into the map. This requires casting
via inet_csk, which isn't valid for UDP sockets.

Guard checks for ULP by checking inet_sk(sk)->is_icsk first.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
---
 include/linux/skmsg.h |  8 +++++++-
 net/core/sock_map.c   | 11 +++++++----
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index 112765bd146d..54a9a3e36b29 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -360,7 +360,13 @@ static inline void sk_psock_restore_proto(struct sock *sk,
 					  struct sk_psock *psock)
 {
 	sk->sk_prot->unhash = psock->saved_unhash;
-	tcp_update_ulp(sk, psock->sk_proto, psock->saved_write_space);
+	if (inet_sk(sk)->is_icsk) {
+		tcp_update_ulp(sk, psock->sk_proto, psock->saved_write_space);
+	} else {
+		sk->sk_write_space = psock->saved_write_space;
+		/* Pairs with lockless read in sk_clone_lock() */
+		WRITE_ONCE(sk->sk_prot, psock->sk_proto);
+	}
 }
 
 static inline void sk_psock_set_state(struct sk_psock *psock,
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 2e0f465295c3..695ecacc7afa 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -94,6 +94,11 @@ static void sock_map_sk_release(struct sock *sk)
 	release_sock(sk);
 }
 
+static bool sock_map_sk_has_ulp(struct sock *sk)
+{
+	return inet_sk(sk)->is_icsk && !!inet_csk(sk)->icsk_ulp_ops;
+}
+
 static void sock_map_add_link(struct sk_psock *psock,
 			      struct sk_psock_link *link,
 			      struct bpf_map *map, void *link_raw)
@@ -384,7 +389,6 @@ static int sock_map_update_common(struct bpf_map *map, u32 idx,
 				  struct sock *sk, u64 flags)
 {
 	struct bpf_stab *stab = container_of(map, struct bpf_stab, map);
-	struct inet_connection_sock *icsk = inet_csk(sk);
 	struct sk_psock_link *link;
 	struct sk_psock *psock;
 	struct sock *osk;
@@ -395,7 +399,7 @@ static int sock_map_update_common(struct bpf_map *map, u32 idx,
 		return -EINVAL;
 	if (unlikely(idx >= map->max_entries))
 		return -E2BIG;
-	if (unlikely(rcu_access_pointer(icsk->icsk_ulp_data)))
+	if (sock_map_sk_has_ulp(sk))
 		return -EINVAL;
 
 	link = sk_psock_init_link();
@@ -738,7 +742,6 @@ static int sock_hash_update_common(struct bpf_map *map, void *key,
 				   struct sock *sk, u64 flags)
 {
 	struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
-	struct inet_connection_sock *icsk = inet_csk(sk);
 	u32 key_size = map->key_size, hash;
 	struct bpf_htab_elem *elem, *elem_new;
 	struct bpf_htab_bucket *bucket;
@@ -749,7 +752,7 @@ static int sock_hash_update_common(struct bpf_map *map, void *key,
 	WARN_ON_ONCE(!rcu_read_lock_held());
 	if (unlikely(flags > BPF_EXIST))
 		return -EINVAL;
-	if (unlikely(icsk->icsk_ulp_data))
+	if (sock_map_sk_has_ulp(sk))
 		return -EINVAL;
 
 	link = sk_psock_init_link();
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v2 2/9] bpf: tcp: guard declarations with CONFIG_NET_SOCK_MSG
  2020-02-28 11:53 [PATCH bpf-next v2 0/9] bpf: sockmap, sockhash: support storing UDP sockets Lorenz Bauer
  2020-02-28 11:53 ` [PATCH bpf-next v2 1/9] bpf: sockmap: only check ULP for TCP sockets Lorenz Bauer
@ 2020-02-28 11:53 ` Lorenz Bauer
  2020-03-03 17:46   ` John Fastabend
  2020-02-28 11:53 ` [PATCH bpf-next v2 3/9] bpf: sockmap: move generic sockmap hooks from BPF TCP Lorenz Bauer
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Lorenz Bauer @ 2020-02-28 11:53 UTC (permalink / raw)
  To: john.fastabend, Eric Dumazet, David S. Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann
  Cc: kernel-team, Lorenz Bauer, netdev, linux-kernel, bpf

tcp_bpf.c is only included in the build if CONFIG_NET_SOCK_MSG is
selected. The declaration should therefore be guarded as such.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
---
 include/net/tcp.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 07f947cc80e6..a30022482dbc 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2195,6 +2195,7 @@ void tcp_update_ulp(struct sock *sk, struct proto *p,
 struct sk_msg;
 struct sk_psock;
 
+#ifdef CONFIG_NET_SOCK_MSG
 int tcp_bpf_init(struct sock *sk);
 void tcp_bpf_reinit(struct sock *sk);
 int tcp_bpf_sendmsg_redir(struct sock *sk, struct sk_msg *msg, u32 bytes,
@@ -2203,13 +2204,12 @@ int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		    int nonblock, int flags, int *addr_len);
 int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock,
 		      struct msghdr *msg, int len, int flags);
-#ifdef CONFIG_NET_SOCK_MSG
 void tcp_bpf_clone(const struct sock *sk, struct sock *newsk);
 #else
 static inline void tcp_bpf_clone(const struct sock *sk, struct sock *newsk)
 {
 }
-#endif
+#endif /* CONFIG_NET_SOCK_MSG */
 
 /* Call BPF_SOCK_OPS program that returns an int. If the return value
  * is < 0, then the BPF op failed (for example if the loaded BPF
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v2 3/9] bpf: sockmap: move generic sockmap hooks from BPF TCP
  2020-02-28 11:53 [PATCH bpf-next v2 0/9] bpf: sockmap, sockhash: support storing UDP sockets Lorenz Bauer
  2020-02-28 11:53 ` [PATCH bpf-next v2 1/9] bpf: sockmap: only check ULP for TCP sockets Lorenz Bauer
  2020-02-28 11:53 ` [PATCH bpf-next v2 2/9] bpf: tcp: guard declarations with CONFIG_NET_SOCK_MSG Lorenz Bauer
@ 2020-02-28 11:53 ` Lorenz Bauer
  2020-03-03 17:50   ` Martin KaFai Lau
  2020-03-03 17:52   ` John Fastabend
  2020-02-28 11:53 ` [PATCH bpf-next v2 4/9] skmsg: introduce sk_psock_hooks Lorenz Bauer
                   ` (7 subsequent siblings)
  10 siblings, 2 replies; 20+ messages in thread
From: Lorenz Bauer @ 2020-02-28 11:53 UTC (permalink / raw)
  To: john.fastabend, Alexei Starovoitov, Daniel Borkmann,
	Jakub Sitnicki, Lorenz Bauer, Eric Dumazet, David S. Miller,
	Jakub Kicinski, Alexey Kuznetsov, Hideaki YOSHIFUJI
  Cc: kernel-team, netdev, bpf, linux-kernel

The close, unhash and clone handlers from TCP sockmap are actually generic,
and can be reused by UDP sockmap. Move the helpers into the sockmap code
base and expose them. This requires tcp_bpf_(re)init and tcp_bpf_clone to
be conditional on BPF_STREAM_PARSER.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
---
 include/linux/bpf.h   |  4 ++-
 include/linux/skmsg.h | 28 ----------------
 include/net/tcp.h     | 15 +++++----
 net/core/sock_map.c   | 77 +++++++++++++++++++++++++++++++++++++++++--
 net/ipv4/tcp_bpf.c    | 59 ++++-----------------------------
 5 files changed, 92 insertions(+), 91 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 1acd5bf70350..00bb3c59c2ae 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1385,6 +1385,8 @@ static inline void bpf_map_offload_map_free(struct bpf_map *map)
 #if defined(CONFIG_BPF_STREAM_PARSER)
 int sock_map_prog_update(struct bpf_map *map, struct bpf_prog *prog, u32 which);
 int sock_map_get_from_fd(const union bpf_attr *attr, struct bpf_prog *prog);
+void sock_map_unhash(struct sock *sk);
+void sock_map_close(struct sock *sk, long timeout);
 #else
 static inline int sock_map_prog_update(struct bpf_map *map,
 				       struct bpf_prog *prog, u32 which)
@@ -1397,7 +1399,7 @@ static inline int sock_map_get_from_fd(const union bpf_attr *attr,
 {
 	return -EINVAL;
 }
-#endif
+#endif /* CONFIG_BPF_STREAM_PARSER */
 
 #if defined(CONFIG_INET) && defined(CONFIG_BPF_SYSCALL)
 void bpf_sk_reuseport_detach(struct sock *sk);
diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index 54a9a3e36b29..c881094387db 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -323,14 +323,6 @@ static inline void sk_psock_free_link(struct sk_psock_link *link)
 }
 
 struct sk_psock_link *sk_psock_link_pop(struct sk_psock *psock);
-#if defined(CONFIG_BPF_STREAM_PARSER)
-void sk_psock_unlink(struct sock *sk, struct sk_psock_link *link);
-#else
-static inline void sk_psock_unlink(struct sock *sk,
-				   struct sk_psock_link *link)
-{
-}
-#endif
 
 void __sk_psock_purge_ingress_msg(struct sk_psock *psock);
 
@@ -387,26 +379,6 @@ static inline bool sk_psock_test_state(const struct sk_psock *psock,
 	return test_bit(bit, &psock->state);
 }
 
-static inline struct sk_psock *sk_psock_get_checked(struct sock *sk)
-{
-	struct sk_psock *psock;
-
-	rcu_read_lock();
-	psock = sk_psock(sk);
-	if (psock) {
-		if (sk->sk_prot->recvmsg != tcp_bpf_recvmsg) {
-			psock = ERR_PTR(-EBUSY);
-			goto out;
-		}
-
-		if (!refcount_inc_not_zero(&psock->refcnt))
-			psock = ERR_PTR(-EBUSY);
-	}
-out:
-	rcu_read_unlock();
-	return psock;
-}
-
 static inline struct sk_psock *sk_psock_get(struct sock *sk)
 {
 	struct sk_psock *psock;
diff --git a/include/net/tcp.h b/include/net/tcp.h
index a30022482dbc..f5503b2c7bed 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2195,20 +2195,23 @@ void tcp_update_ulp(struct sock *sk, struct proto *p,
 struct sk_msg;
 struct sk_psock;
 
-#ifdef CONFIG_NET_SOCK_MSG
+#ifdef CONFIG_BPF_STREAM_PARSER
 int tcp_bpf_init(struct sock *sk);
 void tcp_bpf_reinit(struct sock *sk);
+void tcp_bpf_clone(const struct sock *sk, struct sock *newsk);
+#else
+static inline void tcp_bpf_clone(const struct sock *sk, struct sock *newsk)
+{
+}
+#endif /* CONFIG_BPF_STREAM_PARSER */
+
+#ifdef CONFIG_NET_SOCK_MSG
 int tcp_bpf_sendmsg_redir(struct sock *sk, struct sk_msg *msg, u32 bytes,
 			  int flags);
 int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		    int nonblock, int flags, int *addr_len);
 int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock,
 		      struct msghdr *msg, int len, int flags);
-void tcp_bpf_clone(const struct sock *sk, struct sock *newsk);
-#else
-static inline void tcp_bpf_clone(const struct sock *sk, struct sock *newsk)
-{
-}
 #endif /* CONFIG_NET_SOCK_MSG */
 
 /* Call BPF_SOCK_OPS program that returns an int. If the return value
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 695ecacc7afa..459b3ba16023 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -146,6 +146,26 @@ static void sock_map_unref(struct sock *sk, void *link_raw)
 	}
 }
 
+static struct sk_psock *sock_map_psock_get_checked(struct sock *sk)
+{
+	struct sk_psock *psock;
+
+	rcu_read_lock();
+	psock = sk_psock(sk);
+	if (psock) {
+		if (sk->sk_prot->recvmsg != tcp_bpf_recvmsg) {
+			psock = ERR_PTR(-EBUSY);
+			goto out;
+		}
+
+		if (!refcount_inc_not_zero(&psock->refcnt))
+			psock = ERR_PTR(-EBUSY);
+	}
+out:
+	rcu_read_unlock();
+	return psock;
+}
+
 static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs,
 			 struct sock *sk)
 {
@@ -177,7 +197,7 @@ static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs,
 		}
 	}
 
-	psock = sk_psock_get_checked(sk);
+	psock = sock_map_psock_get_checked(sk);
 	if (IS_ERR(psock)) {
 		ret = PTR_ERR(psock);
 		goto out_progs;
@@ -240,7 +260,7 @@ static int sock_map_link_no_progs(struct bpf_map *map, struct sock *sk)
 	struct sk_psock *psock;
 	int ret;
 
-	psock = sk_psock_get_checked(sk);
+	psock = sock_map_psock_get_checked(sk);
 	if (IS_ERR(psock))
 		return PTR_ERR(psock);
 
@@ -1132,7 +1152,7 @@ int sock_map_prog_update(struct bpf_map *map, struct bpf_prog *prog,
 	return 0;
 }
 
-void sk_psock_unlink(struct sock *sk, struct sk_psock_link *link)
+static void sock_map_unlink(struct sock *sk, struct sk_psock_link *link)
 {
 	switch (link->map->map_type) {
 	case BPF_MAP_TYPE_SOCKMAP:
@@ -1145,3 +1165,54 @@ void sk_psock_unlink(struct sock *sk, struct sk_psock_link *link)
 		break;
 	}
 }
+
+static void sock_map_remove_links(struct sock *sk, struct sk_psock *psock)
+{
+	struct sk_psock_link *link;
+
+	while ((link = sk_psock_link_pop(psock))) {
+		sock_map_unlink(sk, link);
+		sk_psock_free_link(link);
+	}
+}
+
+void sock_map_unhash(struct sock *sk)
+{
+	void (*saved_unhash)(struct sock *sk);
+	struct sk_psock *psock;
+
+	rcu_read_lock();
+	psock = sk_psock(sk);
+	if (unlikely(!psock)) {
+		rcu_read_unlock();
+		if (sk->sk_prot->unhash)
+			sk->sk_prot->unhash(sk);
+		return;
+	}
+
+	saved_unhash = psock->saved_unhash;
+	sock_map_remove_links(sk, psock);
+	rcu_read_unlock();
+	saved_unhash(sk);
+}
+
+void sock_map_close(struct sock *sk, long timeout)
+{
+	void (*saved_close)(struct sock *sk, long timeout);
+	struct sk_psock *psock;
+
+	lock_sock(sk);
+	rcu_read_lock();
+	psock = sk_psock(sk);
+	if (unlikely(!psock)) {
+		rcu_read_unlock();
+		release_sock(sk);
+		return sk->sk_prot->close(sk, timeout);
+	}
+
+	saved_close = psock->saved_close;
+	sock_map_remove_links(sk, psock);
+	rcu_read_unlock();
+	release_sock(sk);
+	saved_close(sk, timeout);
+}
diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index 7d6e1b75d4d4..3f9a50e54c1d 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -10,6 +10,7 @@
 #include <net/inet_common.h>
 #include <net/tls.h>
 
+#ifdef CONFIG_BPF_STREAM_PARSER
 static bool tcp_bpf_stream_read(const struct sock *sk)
 {
 	struct sk_psock *psock;
@@ -22,6 +23,7 @@ static bool tcp_bpf_stream_read(const struct sock *sk)
 	rcu_read_unlock();
 	return !empty;
 }
+#endif /* CONFIG_BPF_STREAM_PARSER */
 
 static int tcp_bpf_wait_data(struct sock *sk, struct sk_psock *psock,
 			     int flags, long timeo, int *err)
@@ -298,6 +300,7 @@ int tcp_bpf_sendmsg_redir(struct sock *sk, struct sk_msg *msg,
 }
 EXPORT_SYMBOL_GPL(tcp_bpf_sendmsg_redir);
 
+#ifdef CONFIG_BPF_STREAM_PARSER
 static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
 				struct sk_msg *msg, int *copied, int flags)
 {
@@ -528,57 +531,6 @@ static int tcp_bpf_sendpage(struct sock *sk, struct page *page, int offset,
 	return copied ? copied : err;
 }
 
-static void tcp_bpf_remove(struct sock *sk, struct sk_psock *psock)
-{
-	struct sk_psock_link *link;
-
-	while ((link = sk_psock_link_pop(psock))) {
-		sk_psock_unlink(sk, link);
-		sk_psock_free_link(link);
-	}
-}
-
-static void tcp_bpf_unhash(struct sock *sk)
-{
-	void (*saved_unhash)(struct sock *sk);
-	struct sk_psock *psock;
-
-	rcu_read_lock();
-	psock = sk_psock(sk);
-	if (unlikely(!psock)) {
-		rcu_read_unlock();
-		if (sk->sk_prot->unhash)
-			sk->sk_prot->unhash(sk);
-		return;
-	}
-
-	saved_unhash = psock->saved_unhash;
-	tcp_bpf_remove(sk, psock);
-	rcu_read_unlock();
-	saved_unhash(sk);
-}
-
-static void tcp_bpf_close(struct sock *sk, long timeout)
-{
-	void (*saved_close)(struct sock *sk, long timeout);
-	struct sk_psock *psock;
-
-	lock_sock(sk);
-	rcu_read_lock();
-	psock = sk_psock(sk);
-	if (unlikely(!psock)) {
-		rcu_read_unlock();
-		release_sock(sk);
-		return sk->sk_prot->close(sk, timeout);
-	}
-
-	saved_close = psock->saved_close;
-	tcp_bpf_remove(sk, psock);
-	rcu_read_unlock();
-	release_sock(sk);
-	saved_close(sk, timeout);
-}
-
 enum {
 	TCP_BPF_IPV4,
 	TCP_BPF_IPV6,
@@ -599,8 +551,8 @@ static void tcp_bpf_rebuild_protos(struct proto prot[TCP_BPF_NUM_CFGS],
 				   struct proto *base)
 {
 	prot[TCP_BPF_BASE]			= *base;
-	prot[TCP_BPF_BASE].unhash		= tcp_bpf_unhash;
-	prot[TCP_BPF_BASE].close		= tcp_bpf_close;
+	prot[TCP_BPF_BASE].unhash		= sock_map_unhash;
+	prot[TCP_BPF_BASE].close		= sock_map_close;
 	prot[TCP_BPF_BASE].recvmsg		= tcp_bpf_recvmsg;
 	prot[TCP_BPF_BASE].stream_memory_read	= tcp_bpf_stream_read;
 
@@ -707,3 +659,4 @@ void tcp_bpf_clone(const struct sock *sk, struct sock *newsk)
 	if (prot == &tcp_bpf_prots[family][TCP_BPF_BASE])
 		newsk->sk_prot = sk->sk_prot_creator;
 }
+#endif /* CONFIG_BPF_STREAM_PARSER */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v2 4/9] skmsg: introduce sk_psock_hooks
  2020-02-28 11:53 [PATCH bpf-next v2 0/9] bpf: sockmap, sockhash: support storing UDP sockets Lorenz Bauer
                   ` (2 preceding siblings ...)
  2020-02-28 11:53 ` [PATCH bpf-next v2 3/9] bpf: sockmap: move generic sockmap hooks from BPF TCP Lorenz Bauer
@ 2020-02-28 11:53 ` Lorenz Bauer
  2020-02-28 11:53 ` [PATCH bpf-next v2 5/9] bpf: sockmap: allow UDP sockets Lorenz Bauer
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Lorenz Bauer @ 2020-02-28 11:53 UTC (permalink / raw)
  To: john.fastabend, Daniel Borkmann, Jakub Sitnicki, Lorenz Bauer,
	Eric Dumazet, David S. Miller, Jakub Kicinski, Alexey Kuznetsov,
	Hideaki YOSHIFUJI, Alexei Starovoitov
  Cc: kernel-team, netdev, bpf, linux-kernel

The sockmap works by overriding some of the callbacks in sk->sk_prot, while
leaving others untouched. This means that we need access to the struct proto
for any protocol we want to support. For IPv4 this is trivial, since both
TCP and UDP are always compiled in. IPv6 may be disabled or compiled as a
module, so the existing TCP sockmap hooks use some trickery to lazily
initialize the modified struct proto for TCPv6.

Pull this logic into a standalone struct sk_psock_hooks, so that it can be
re-used by UDP sockmap.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
---
 include/linux/skmsg.h |  36 +++++++++-----
 include/net/tcp.h     |   1 -
 net/core/skmsg.c      |  55 +++++++++++++++++++++
 net/core/sock_map.c   |  24 ++++-----
 net/ipv4/tcp_bpf.c    | 112 ++++++++++++------------------------------
 5 files changed, 118 insertions(+), 110 deletions(-)

diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index c881094387db..174c76c725fb 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -109,6 +109,16 @@ struct sk_psock {
 	};
 };
 
+struct sk_psock_hooks {
+	struct proto *base_ipv6;
+	struct proto *ipv4;
+	struct proto *ipv6;
+	spinlock_t ipv6_lock;
+	int (*rebuild_proto)(struct proto prot[], struct proto *base);
+	struct proto *(*choose_proto)(struct proto prot[],
+				      struct sk_psock *psock);
+};
+
 int sk_msg_alloc(struct sock *sk, struct sk_msg *msg, int len,
 		 int elem_first_coalesce);
 int sk_msg_clone(struct sock *sk, struct sk_msg *dst, struct sk_msg *src,
@@ -335,23 +345,14 @@ static inline void sk_psock_cork_free(struct sk_psock *psock)
 	}
 }
 
-static inline void sk_psock_update_proto(struct sock *sk,
-					 struct sk_psock *psock,
-					 struct proto *ops)
-{
-	psock->saved_unhash = sk->sk_prot->unhash;
-	psock->saved_close = sk->sk_prot->close;
-	psock->saved_write_space = sk->sk_write_space;
-
-	psock->sk_proto = sk->sk_prot;
-	/* Pairs with lockless read in sk_clone_lock() */
-	WRITE_ONCE(sk->sk_prot, ops);
-}
-
 static inline void sk_psock_restore_proto(struct sock *sk,
 					  struct sk_psock *psock)
 {
+	if (!psock->sk_proto)
+		return;
+
 	sk->sk_prot->unhash = psock->saved_unhash;
+
 	if (inet_sk(sk)->is_icsk) {
 		tcp_update_ulp(sk, psock->sk_proto, psock->saved_write_space);
 	} else {
@@ -424,4 +425,13 @@ static inline void psock_progs_drop(struct sk_psock_progs *progs)
 	psock_set_prog(&progs->skb_verdict, NULL);
 }
 
+static inline int sk_psock_hooks_init(struct sk_psock_hooks *hooks,
+				       struct proto *ipv4_base)
+{
+	spin_lock_init(&hooks->ipv6_lock);
+	return hooks->rebuild_proto(hooks->ipv4, ipv4_base);
+}
+
+int sk_psock_hooks_install(struct sk_psock_hooks *hooks, struct sock *sk);
+
 #endif /* _LINUX_SKMSG_H */
diff --git a/include/net/tcp.h b/include/net/tcp.h
index f5503b2c7bed..37f89ed7919c 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2197,7 +2197,6 @@ struct sk_psock;
 
 #ifdef CONFIG_BPF_STREAM_PARSER
 int tcp_bpf_init(struct sock *sk);
-void tcp_bpf_reinit(struct sock *sk);
 void tcp_bpf_clone(const struct sock *sk, struct sock *newsk);
 #else
 static inline void tcp_bpf_clone(const struct sock *sk, struct sock *newsk)
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index c479372f2cd2..8d46ded3b1a5 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -840,3 +840,58 @@ void sk_psock_stop_strp(struct sock *sk, struct sk_psock *psock)
 	strp_stop(&parser->strp);
 	parser->enabled = false;
 }
+
+static inline int sk_psock_hooks_init_ipv6(struct sk_psock_hooks *hooks,
+					    struct proto *base)
+{
+	int ret = 0;
+
+	if (likely(base == smp_load_acquire(&hooks->base_ipv6)))
+		return 0;
+
+	spin_lock_bh(&hooks->ipv6_lock);
+	if (likely(base != hooks->base_ipv6)) {
+		ret = hooks->rebuild_proto(hooks->ipv6, base);
+		if (!ret)
+			smp_store_release(&hooks->base_ipv6, base);
+	}
+	spin_unlock_bh(&hooks->ipv6_lock);
+	return ret;
+}
+
+int sk_psock_hooks_install(struct sk_psock_hooks *hooks, struct sock *sk)
+{
+	struct sk_psock *psock = sk_psock(sk);
+	struct proto *prot_base;
+
+	WARN_ON_ONCE(!rcu_read_lock_held());
+
+	if (unlikely(!psock))
+		return -EINVAL;
+
+	/* Initialize saved callbacks and original proto only once, since this
+	 * function may be called multiple times for a psock, e.g. when
+	 * psock->progs.msg_parser is updated.
+	 *
+	 * Since we've not installed the hooks, psock is not yet in use and
+	 * we can initialize it without synchronization.
+	 */
+	if (!psock->sk_proto) {
+		struct proto *prot = READ_ONCE(sk->sk_prot);
+
+		if (sk->sk_family == AF_INET6 &&
+		    sk_psock_hooks_init_ipv6(hooks, prot))
+			return -EINVAL;
+
+		psock->saved_unhash = prot->unhash;
+		psock->saved_close = prot->close;
+		psock->saved_write_space = sk->sk_write_space;
+
+		psock->sk_proto = prot;
+	}
+
+	/* Pairs with lockless read in sk_clone_lock() */
+	prot_base = sk->sk_family == AF_INET ? hooks->ipv4 : hooks->ipv6;
+	WRITE_ONCE(sk->sk_prot, hooks->choose_proto(prot_base, psock));
+	return 0;
+}
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 459b3ba16023..c84cc9fc7f6b 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -170,8 +170,8 @@ static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs,
 			 struct sock *sk)
 {
 	struct bpf_prog *msg_parser, *skb_parser, *skb_verdict;
-	bool skb_progs, sk_psock_is_new = false;
 	struct sk_psock *psock;
+	bool skb_progs;
 	int ret;
 
 	skb_verdict = READ_ONCE(progs->skb_verdict);
@@ -206,9 +206,8 @@ static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs,
 	if (psock) {
 		if ((msg_parser && READ_ONCE(psock->progs.msg_parser)) ||
 		    (skb_progs  && READ_ONCE(psock->progs.skb_parser))) {
-			sk_psock_put(sk, psock);
 			ret = -EBUSY;
-			goto out_progs;
+			goto out_drop;
 		}
 	} else {
 		psock = sk_psock_init(sk, map->numa_node);
@@ -216,18 +215,14 @@ static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs,
 			ret = -ENOMEM;
 			goto out_progs;
 		}
-		sk_psock_is_new = true;
 	}
 
 	if (msg_parser)
 		psock_set_prog(&psock->progs.msg_parser, msg_parser);
-	if (sk_psock_is_new) {
-		ret = tcp_bpf_init(sk);
-		if (ret < 0)
-			goto out_drop;
-	} else {
-		tcp_bpf_reinit(sk);
-	}
+
+	ret = tcp_bpf_init(sk);
+	if (ret < 0)
+		goto out_drop;
 
 	write_lock_bh(&sk->sk_callback_lock);
 	if (skb_progs && !psock->parser.enabled) {
@@ -264,15 +259,14 @@ static int sock_map_link_no_progs(struct bpf_map *map, struct sock *sk)
 	if (IS_ERR(psock))
 		return PTR_ERR(psock);
 
-	if (psock) {
-		tcp_bpf_reinit(sk);
-		return 0;
-	}
+	if (psock)
+		goto init;
 
 	psock = sk_psock_init(sk, map->numa_node);
 	if (!psock)
 		return -ENOMEM;
 
+init:
 	ret = tcp_bpf_init(sk);
 	if (ret < 0)
 		sk_psock_put(sk, psock);
diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index 3f9a50e54c1d..390717eae10e 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -531,25 +531,23 @@ static int tcp_bpf_sendpage(struct sock *sk, struct page *page, int offset,
 	return copied ? copied : err;
 }
 
-enum {
-	TCP_BPF_IPV4,
-	TCP_BPF_IPV6,
-	TCP_BPF_NUM_PROTS,
-};
-
 enum {
 	TCP_BPF_BASE,
 	TCP_BPF_TX,
 	TCP_BPF_NUM_CFGS,
 };
 
-static struct proto *tcpv6_prot_saved __read_mostly;
-static DEFINE_SPINLOCK(tcpv6_prot_lock);
-static struct proto tcp_bpf_prots[TCP_BPF_NUM_PROTS][TCP_BPF_NUM_CFGS];
-
-static void tcp_bpf_rebuild_protos(struct proto prot[TCP_BPF_NUM_CFGS],
-				   struct proto *base)
+static int tcp_bpf_rebuild_proto(struct proto prot[], struct proto *base)
 {
+	/* In order to avoid retpoline, we make assumptions when we call
+	 * into ops if e.g. a psock is not present. Make sure they are
+	 * indeed valid assumptions.
+	 */
+	if (base->recvmsg  != tcp_recvmsg ||
+	    base->sendmsg  != tcp_sendmsg ||
+	    base->sendpage != tcp_sendpage)
+		return -ENOTSUPP;
+
 	prot[TCP_BPF_BASE]			= *base;
 	prot[TCP_BPF_BASE].unhash		= sock_map_unhash;
 	prot[TCP_BPF_BASE].close		= sock_map_close;
@@ -559,91 +557,42 @@ static void tcp_bpf_rebuild_protos(struct proto prot[TCP_BPF_NUM_CFGS],
 	prot[TCP_BPF_TX]			= prot[TCP_BPF_BASE];
 	prot[TCP_BPF_TX].sendmsg		= tcp_bpf_sendmsg;
 	prot[TCP_BPF_TX].sendpage		= tcp_bpf_sendpage;
-}
-
-static void tcp_bpf_check_v6_needs_rebuild(struct sock *sk, struct proto *ops)
-{
-	if (sk->sk_family == AF_INET6 &&
-	    unlikely(ops != smp_load_acquire(&tcpv6_prot_saved))) {
-		spin_lock_bh(&tcpv6_prot_lock);
-		if (likely(ops != tcpv6_prot_saved)) {
-			tcp_bpf_rebuild_protos(tcp_bpf_prots[TCP_BPF_IPV6], ops);
-			smp_store_release(&tcpv6_prot_saved, ops);
-		}
-		spin_unlock_bh(&tcpv6_prot_lock);
-	}
-}
-
-static int __init tcp_bpf_v4_build_proto(void)
-{
-	tcp_bpf_rebuild_protos(tcp_bpf_prots[TCP_BPF_IPV4], &tcp_prot);
 	return 0;
 }
-core_initcall(tcp_bpf_v4_build_proto);
 
-static void tcp_bpf_update_sk_prot(struct sock *sk, struct sk_psock *psock)
+static struct proto *tcp_bpf_choose_proto(struct proto prot[],
+					  struct sk_psock *psock)
 {
-	int family = sk->sk_family == AF_INET6 ? TCP_BPF_IPV6 : TCP_BPF_IPV4;
-	int config = psock->progs.msg_parser   ? TCP_BPF_TX   : TCP_BPF_BASE;
+	int config = psock->progs.msg_parser ? TCP_BPF_TX : TCP_BPF_BASE;
 
-	sk_psock_update_proto(sk, psock, &tcp_bpf_prots[family][config]);
+	return &prot[config];
 }
 
-static void tcp_bpf_reinit_sk_prot(struct sock *sk, struct sk_psock *psock)
-{
-	int family = sk->sk_family == AF_INET6 ? TCP_BPF_IPV6 : TCP_BPF_IPV4;
-	int config = psock->progs.msg_parser   ? TCP_BPF_TX   : TCP_BPF_BASE;
-
-	/* Reinit occurs when program types change e.g. TCP_BPF_TX is removed
-	 * or added requiring sk_prot hook updates. We keep original saved
-	 * hooks in this case.
-	 *
-	 * Pairs with lockless read in sk_clone_lock().
-	 */
-	WRITE_ONCE(sk->sk_prot, &tcp_bpf_prots[family][config]);
-}
-
-static int tcp_bpf_assert_proto_ops(struct proto *ops)
-{
-	/* In order to avoid retpoline, we make assumptions when we call
-	 * into ops if e.g. a psock is not present. Make sure they are
-	 * indeed valid assumptions.
-	 */
-	return ops->recvmsg  == tcp_recvmsg &&
-	       ops->sendmsg  == tcp_sendmsg &&
-	       ops->sendpage == tcp_sendpage ? 0 : -ENOTSUPP;
-}
+static struct proto tcp_bpf_ipv4[TCP_BPF_NUM_CFGS];
+static struct proto tcp_bpf_ipv6[TCP_BPF_NUM_CFGS];
+static struct sk_psock_hooks tcp_bpf_hooks __read_mostly = {
+	.ipv4 = &tcp_bpf_ipv4[0],
+	.ipv6 = &tcp_bpf_ipv6[0],
+	.rebuild_proto = tcp_bpf_rebuild_proto,
+	.choose_proto = tcp_bpf_choose_proto,
+};
 
-void tcp_bpf_reinit(struct sock *sk)
+static int __init tcp_bpf_init_psock_hooks(void)
 {
-	struct sk_psock *psock;
-
-	sock_owned_by_me(sk);
-
-	rcu_read_lock();
-	psock = sk_psock(sk);
-	tcp_bpf_reinit_sk_prot(sk, psock);
-	rcu_read_unlock();
+	return sk_psock_hooks_init(&tcp_bpf_hooks, &tcp_prot);
 }
+core_initcall(tcp_bpf_init_psock_hooks);
 
 int tcp_bpf_init(struct sock *sk)
 {
-	struct proto *ops = READ_ONCE(sk->sk_prot);
-	struct sk_psock *psock;
+	int ret;
 
 	sock_owned_by_me(sk);
 
 	rcu_read_lock();
-	psock = sk_psock(sk);
-	if (unlikely(!psock || psock->sk_proto ||
-		     tcp_bpf_assert_proto_ops(ops))) {
-		rcu_read_unlock();
-		return -EINVAL;
-	}
-	tcp_bpf_check_v6_needs_rebuild(sk, ops);
-	tcp_bpf_update_sk_prot(sk, psock);
+	ret = sk_psock_hooks_install(&tcp_bpf_hooks, sk);
 	rcu_read_unlock();
-	return 0;
+	return ret;
 }
 
 /* If a child got cloned from a listening socket that had tcp_bpf
@@ -653,10 +602,11 @@ int tcp_bpf_init(struct sock *sk)
  */
 void tcp_bpf_clone(const struct sock *sk, struct sock *newsk)
 {
-	int family = sk->sk_family == AF_INET6 ? TCP_BPF_IPV6 : TCP_BPF_IPV4;
 	struct proto *prot = newsk->sk_prot;
 
-	if (prot == &tcp_bpf_prots[family][TCP_BPF_BASE])
+	/* TCP_LISTEN can only use TCP_BPF_BASE, so this is safe */
+	if (unlikely(prot == &tcp_bpf_ipv4[TCP_BPF_BASE] ||
+		     prot == &tcp_bpf_ipv6[TCP_BPF_BASE]))
 		newsk->sk_prot = sk->sk_prot_creator;
 }
 #endif /* CONFIG_BPF_STREAM_PARSER */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v2 5/9] bpf: sockmap: allow UDP sockets
  2020-02-28 11:53 [PATCH bpf-next v2 0/9] bpf: sockmap, sockhash: support storing UDP sockets Lorenz Bauer
                   ` (3 preceding siblings ...)
  2020-02-28 11:53 ` [PATCH bpf-next v2 4/9] skmsg: introduce sk_psock_hooks Lorenz Bauer
@ 2020-02-28 11:53 ` Lorenz Bauer
  2020-03-03 17:56   ` Martin KaFai Lau
  2020-02-28 11:53 ` [PATCH bpf-next v2 6/9] selftests: bpf: don't listen() on " Lorenz Bauer
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Lorenz Bauer @ 2020-02-28 11:53 UTC (permalink / raw)
  To: john.fastabend, Daniel Borkmann, Jakub Sitnicki, Lorenz Bauer,
	David S. Miller, Jakub Kicinski, Alexey Kuznetsov,
	Hideaki YOSHIFUJI, Alexei Starovoitov
  Cc: kernel-team, linux-kernel, netdev, bpf

Add basic psock hooks for UDP sockets. This allows adding and
removing sockets, as well as automatic removal on unhash and close.

sock_map_sk_state_allowed is called from the syscall path, and
ensures that only established or listening sockets are added.
No such check exists for the BPF path: we rely on sockets being
in particular states when a BPF sock ops hook is executed.
For the passive open hook this means that sockets are actually in
TCP_SYN_RECV state (and unhashed) when they are added to the
sock map.

UDP sockets are not saddled with this inconsistency, and so the
checks for both syscall and BPF path should be identical. Rather
than duplicating the logic into sock_map_sk_state_allowed merge
it with sock_map_sk_is_suitable.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
---
 MAINTAINERS         |  1 +
 include/linux/udp.h |  4 ++++
 net/core/sock_map.c | 52 ++++++++++++++++++++++++++------------------
 net/ipv4/Makefile   |  1 +
 net/ipv4/udp_bpf.c  | 53 +++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 90 insertions(+), 21 deletions(-)
 create mode 100644 net/ipv4/udp_bpf.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 2af5fa73155e..495ba52038ad 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9358,6 +9358,7 @@ F:	include/linux/skmsg.h
 F:	net/core/skmsg.c
 F:	net/core/sock_map.c
 F:	net/ipv4/tcp_bpf.c
+F:	net/ipv4/udp_bpf.c
 
 LANTIQ / INTEL Ethernet drivers
 M:	Hauke Mehrtens <hauke@hauke-m.de>
diff --git a/include/linux/udp.h b/include/linux/udp.h
index aa84597bdc33..2485a35d113c 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -143,4 +143,8 @@ static inline bool udp_unexpected_gso(struct sock *sk, struct sk_buff *skb)
 
 #define IS_UDPLITE(__sk) (__sk->sk_protocol == IPPROTO_UDPLITE)
 
+#ifdef CONFIG_BPF_STREAM_PARSER
+int udp_bpf_init(struct sock *sk);
+#endif
+
 #endif	/* _LINUX_UDP_H */
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index c84cc9fc7f6b..d742e1538ae9 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -153,7 +153,7 @@ static struct sk_psock *sock_map_psock_get_checked(struct sock *sk)
 	rcu_read_lock();
 	psock = sk_psock(sk);
 	if (psock) {
-		if (sk->sk_prot->recvmsg != tcp_bpf_recvmsg) {
+		if (sk->sk_prot->close != sock_map_close) {
 			psock = ERR_PTR(-EBUSY);
 			goto out;
 		}
@@ -166,6 +166,14 @@ static struct sk_psock *sock_map_psock_get_checked(struct sock *sk)
 	return psock;
 }
 
+static int sock_map_init_hooks(struct sock *sk)
+{
+	if (sk->sk_type == SOCK_DGRAM)
+		return udp_bpf_init(sk);
+	else
+		return tcp_bpf_init(sk);
+}
+
 static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs,
 			 struct sock *sk)
 {
@@ -220,7 +228,7 @@ static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs,
 	if (msg_parser)
 		psock_set_prog(&psock->progs.msg_parser, msg_parser);
 
-	ret = tcp_bpf_init(sk);
+	ret = sock_map_init_hooks(sk);
 	if (ret < 0)
 		goto out_drop;
 
@@ -267,7 +275,7 @@ static int sock_map_link_no_progs(struct bpf_map *map, struct sock *sk)
 		return -ENOMEM;
 
 init:
-	ret = tcp_bpf_init(sk);
+	ret = sock_map_init_hooks(sk);
 	if (ret < 0)
 		sk_psock_put(sk, psock);
 	return ret;
@@ -394,9 +402,14 @@ static int sock_map_get_next_key(struct bpf_map *map, void *key, void *next)
 	return 0;
 }
 
+static bool sock_map_sk_is_tcp(const struct sock *sk)
+{
+	return sk->sk_type == SOCK_STREAM && sk->sk_protocol == IPPROTO_TCP;
+}
+
 static bool sock_map_redirect_allowed(const struct sock *sk)
 {
-	return sk->sk_state != TCP_LISTEN;
+	return sock_map_sk_is_tcp(sk) && sk->sk_state != TCP_LISTEN;
 }
 
 static int sock_map_update_common(struct bpf_map *map, u32 idx,
@@ -466,15 +479,20 @@ static bool sock_map_op_okay(const struct bpf_sock_ops_kern *ops)
 	       ops->op == BPF_SOCK_OPS_TCP_LISTEN_CB;
 }
 
-static bool sock_map_sk_is_suitable(const struct sock *sk)
+static bool sock_map_sk_is_udp(const struct sock *sk)
 {
-	return sk->sk_type == SOCK_STREAM &&
-	       sk->sk_protocol == IPPROTO_TCP;
+	return sk->sk_type == SOCK_DGRAM && sk->sk_protocol == IPPROTO_UDP;
 }
 
-static bool sock_map_sk_state_allowed(const struct sock *sk)
+static bool sock_map_sk_is_suitable(const struct sock *sk, bool from_bpf)
 {
-	return (1 << sk->sk_state) & (TCPF_ESTABLISHED | TCPF_LISTEN);
+	int tcp_flags = TCPF_ESTABLISHED | TCPF_LISTEN;
+
+	if (from_bpf)
+		tcp_flags |= TCPF_SYN_RECV;
+
+	return (sock_map_sk_is_udp(sk) && sk_hashed(sk)) ||
+	       (sock_map_sk_is_tcp(sk) && (1 << sk->sk_state) & tcp_flags);
 }
 
 static int sock_map_update_elem(struct bpf_map *map, void *key,
@@ -501,13 +519,9 @@ static int sock_map_update_elem(struct bpf_map *map, void *key,
 		ret = -EINVAL;
 		goto out;
 	}
-	if (!sock_map_sk_is_suitable(sk)) {
-		ret = -EOPNOTSUPP;
-		goto out;
-	}
 
 	sock_map_sk_acquire(sk);
-	if (!sock_map_sk_state_allowed(sk))
+	if (!sock_map_sk_is_suitable(sk, false))
 		ret = -EOPNOTSUPP;
 	else
 		ret = sock_map_update_common(map, idx, sk, flags);
@@ -522,7 +536,7 @@ BPF_CALL_4(bpf_sock_map_update, struct bpf_sock_ops_kern *, sops,
 {
 	WARN_ON_ONCE(!rcu_read_lock_held());
 
-	if (likely(sock_map_sk_is_suitable(sops->sk) &&
+	if (likely(sock_map_sk_is_suitable(sops->sk, true) &&
 		   sock_map_op_okay(sops)))
 		return sock_map_update_common(map, *(u32 *)key, sops->sk,
 					      flags);
@@ -849,13 +863,9 @@ static int sock_hash_update_elem(struct bpf_map *map, void *key,
 		ret = -EINVAL;
 		goto out;
 	}
-	if (!sock_map_sk_is_suitable(sk)) {
-		ret = -EOPNOTSUPP;
-		goto out;
-	}
 
 	sock_map_sk_acquire(sk);
-	if (!sock_map_sk_state_allowed(sk))
+	if (!sock_map_sk_is_suitable(sk, false))
 		ret = -EOPNOTSUPP;
 	else
 		ret = sock_hash_update_common(map, key, sk, flags);
@@ -1023,7 +1033,7 @@ BPF_CALL_4(bpf_sock_hash_update, struct bpf_sock_ops_kern *, sops,
 {
 	WARN_ON_ONCE(!rcu_read_lock_held());
 
-	if (likely(sock_map_sk_is_suitable(sops->sk) &&
+	if (likely(sock_map_sk_is_suitable(sops->sk, true) &&
 		   sock_map_op_okay(sops)))
 		return sock_hash_update_common(map, key, sops->sk, flags);
 	return -EOPNOTSUPP;
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index 9d97bace13c8..9e1a186a3671 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -61,6 +61,7 @@ obj-$(CONFIG_TCP_CONG_LP) += tcp_lp.o
 obj-$(CONFIG_TCP_CONG_YEAH) += tcp_yeah.o
 obj-$(CONFIG_TCP_CONG_ILLINOIS) += tcp_illinois.o
 obj-$(CONFIG_NET_SOCK_MSG) += tcp_bpf.o
+obj-$(CONFIG_BPF_STREAM_PARSER) += udp_bpf.o
 obj-$(CONFIG_NETLABEL) += cipso_ipv4.o
 
 obj-$(CONFIG_XFRM) += xfrm4_policy.o xfrm4_state.o xfrm4_input.o \
diff --git a/net/ipv4/udp_bpf.c b/net/ipv4/udp_bpf.c
new file mode 100644
index 000000000000..3ecf246076be
--- /dev/null
+++ b/net/ipv4/udp_bpf.c
@@ -0,0 +1,53 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020 Cloudflare Ltd https://cloudflare.com */
+
+#include <linux/bpf.h>
+#include <linux/filter.h>
+#include <linux/init.h>
+#include <linux/skmsg.h>
+#include <linux/wait.h>
+#include <net/udp.h>
+
+#include <net/inet_common.h>
+
+static int udp_bpf_rebuild_protos(struct proto *prot, struct proto *base)
+{
+	*prot        = *base;
+	prot->unhash = sock_map_unhash;
+	prot->close  = sock_map_close;
+	return 0;
+}
+
+static struct proto *udp_bpf_choose_proto(struct proto prot[],
+					  struct sk_psock *psock)
+{
+	return prot;
+}
+
+static struct proto udpv4_proto;
+static struct proto udpv6_proto;
+
+static struct sk_psock_hooks udp_bpf_hooks __read_mostly = {
+	.ipv4 = &udpv4_proto,
+	.ipv6 = &udpv6_proto,
+	.rebuild_proto = udp_bpf_rebuild_protos,
+	.choose_proto = udp_bpf_choose_proto,
+};
+
+static int __init udp_bpf_init_psock_hooks(void)
+{
+	return sk_psock_hooks_init(&udp_bpf_hooks, &udp_prot);
+}
+core_initcall(udp_bpf_init_psock_hooks);
+
+int udp_bpf_init(struct sock *sk)
+{
+	int ret;
+
+	sock_owned_by_me(sk);
+
+	rcu_read_lock();
+	ret = sk_psock_hooks_install(&udp_bpf_hooks, sk);
+	rcu_read_unlock();
+	return ret;
+}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v2 6/9] selftests: bpf: don't listen() on UDP sockets
  2020-02-28 11:53 [PATCH bpf-next v2 0/9] bpf: sockmap, sockhash: support storing UDP sockets Lorenz Bauer
                   ` (4 preceding siblings ...)
  2020-02-28 11:53 ` [PATCH bpf-next v2 5/9] bpf: sockmap: allow UDP sockets Lorenz Bauer
@ 2020-02-28 11:53 ` Lorenz Bauer
  2020-02-28 11:53 ` [PATCH bpf-next v2 7/9] selftests: bpf: add tests for UDP sockets in sockmap Lorenz Bauer
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Lorenz Bauer @ 2020-02-28 11:53 UTC (permalink / raw)
  To: john.fastabend, Shuah Khan, Alexei Starovoitov, Daniel Borkmann
  Cc: kernel-team, Lorenz Bauer, linux-kselftest, netdev, bpf, linux-kernel

Most tests for TCP sockmap can be adapted to UDP sockmap if the
listen call is skipped. Rename listen_loopback, etc. to socket_loopback
and skip listen() for SOCK_DGRAM.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
---
 .../selftests/bpf/prog_tests/sockmap_listen.c | 47 ++++++++++---------
 1 file changed, 25 insertions(+), 22 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
index b1b2acea0638..4ba41dd26d6b 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
@@ -230,7 +230,7 @@ static int enable_reuseport(int s, int progfd)
 	return 0;
 }
 
-static int listen_loopback_reuseport(int family, int sotype, int progfd)
+static int socket_loopback_reuseport(int family, int sotype, int progfd)
 {
 	struct sockaddr_storage addr;
 	socklen_t len;
@@ -249,6 +249,9 @@ static int listen_loopback_reuseport(int family, int sotype, int progfd)
 	if (err)
 		goto close;
 
+	if (sotype == SOCK_DGRAM)
+		return s;
+
 	err = xlisten(s, SOMAXCONN);
 	if (err)
 		goto close;
@@ -259,9 +262,9 @@ static int listen_loopback_reuseport(int family, int sotype, int progfd)
 	return -1;
 }
 
-static int listen_loopback(int family, int sotype)
+static int socket_loopback(int family, int sotype)
 {
-	return listen_loopback_reuseport(family, sotype, -1);
+	return socket_loopback_reuseport(family, sotype, -1);
 }
 
 static void test_insert_invalid(int family, int sotype, int mapfd)
@@ -333,7 +336,7 @@ static void test_insert_listening(int family, int sotype, int mapfd)
 	u32 key;
 	int s;
 
-	s = listen_loopback(family, sotype);
+	s = socket_loopback(family, sotype);
 	if (s < 0)
 		return;
 
@@ -349,7 +352,7 @@ static void test_delete_after_insert(int family, int sotype, int mapfd)
 	u32 key;
 	int s;
 
-	s = listen_loopback(family, sotype);
+	s = socket_loopback(family, sotype);
 	if (s < 0)
 		return;
 
@@ -366,7 +369,7 @@ static void test_delete_after_close(int family, int sotype, int mapfd)
 	u64 value;
 	u32 key;
 
-	s = listen_loopback(family, sotype);
+	s = socket_loopback(family, sotype);
 	if (s < 0)
 		return;
 
@@ -390,7 +393,7 @@ static void test_lookup_after_insert(int family, int sotype, int mapfd)
 	u32 key;
 	int s;
 
-	s = listen_loopback(family, sotype);
+	s = socket_loopback(family, sotype);
 	if (s < 0)
 		return;
 
@@ -417,7 +420,7 @@ static void test_lookup_after_delete(int family, int sotype, int mapfd)
 	u64 value;
 	u32 key;
 
-	s = listen_loopback(family, sotype);
+	s = socket_loopback(family, sotype);
 	if (s < 0)
 		return;
 
@@ -439,7 +442,7 @@ static void test_lookup_32_bit_value(int family, int sotype, int mapfd)
 	u32 key, value32;
 	int err, s;
 
-	s = listen_loopback(family, sotype);
+	s = socket_loopback(family, sotype);
 	if (s < 0)
 		return;
 
@@ -470,11 +473,11 @@ static void test_update_listening(int family, int sotype, int mapfd)
 	u64 value;
 	u32 key;
 
-	s1 = listen_loopback(family, sotype);
+	s1 = socket_loopback(family, sotype);
 	if (s1 < 0)
 		return;
 
-	s2 = listen_loopback(family, sotype);
+	s2 = socket_loopback(family, sotype);
 	if (s2 < 0)
 		goto close_s1;
 
@@ -500,7 +503,7 @@ static void test_destroy_orphan_child(int family, int sotype, int mapfd)
 	u64 value;
 	u32 key;
 
-	s = listen_loopback(family, sotype);
+	s = socket_loopback(family, sotype);
 	if (s < 0)
 		return;
 
@@ -534,7 +537,7 @@ static void test_clone_after_delete(int family, int sotype, int mapfd)
 	u64 value;
 	u32 key;
 
-	s = listen_loopback(family, sotype);
+	s = socket_loopback(family, sotype);
 	if (s < 0)
 		return;
 
@@ -570,7 +573,7 @@ static void test_accept_after_delete(int family, int sotype, int mapfd)
 	socklen_t len;
 	u64 value;
 
-	s = listen_loopback(family, sotype);
+	s = socket_loopback(family, sotype);
 	if (s == -1)
 		return;
 
@@ -624,7 +627,7 @@ static void test_accept_before_delete(int family, int sotype, int mapfd)
 	socklen_t len;
 	u64 value;
 
-	s = listen_loopback(family, sotype);
+	s = socket_loopback(family, sotype);
 	if (s == -1)
 		return;
 
@@ -735,7 +738,7 @@ static void test_syn_recv_insert_delete(int family, int sotype, int mapfd)
 	int err, s;
 	u64 value;
 
-	s = listen_loopback(family, sotype | SOCK_NONBLOCK);
+	s = socket_loopback(family, sotype | SOCK_NONBLOCK);
 	if (s < 0)
 		return;
 
@@ -877,7 +880,7 @@ static void redir_to_connected(int family, int sotype, int sock_mapfd,
 
 	zero_verdict_count(verd_mapfd);
 
-	s = listen_loopback(family, sotype | SOCK_NONBLOCK);
+	s = socket_loopback(family, sotype | SOCK_NONBLOCK);
 	if (s < 0)
 		return;
 
@@ -1009,7 +1012,7 @@ static void redir_to_listening(int family, int sotype, int sock_mapfd,
 
 	zero_verdict_count(verd_mapfd);
 
-	s = listen_loopback(family, sotype | SOCK_NONBLOCK);
+	s = socket_loopback(family, sotype | SOCK_NONBLOCK);
 	if (s < 0)
 		return;
 
@@ -1120,7 +1123,7 @@ static void test_reuseport_select_listening(int family, int sotype,
 
 	zero_verdict_count(verd_map);
 
-	s = listen_loopback_reuseport(family, sotype, reuseport_prog);
+	s = socket_loopback_reuseport(family, sotype, reuseport_prog);
 	if (s < 0)
 		return;
 
@@ -1174,7 +1177,7 @@ static void test_reuseport_select_connected(int family, int sotype,
 
 	zero_verdict_count(verd_map);
 
-	s = listen_loopback_reuseport(family, sotype, reuseport_prog);
+	s = socket_loopback_reuseport(family, sotype, reuseport_prog);
 	if (s < 0)
 		return;
 
@@ -1249,11 +1252,11 @@ static void test_reuseport_mixed_groups(int family, int sotype, int sock_map,
 	zero_verdict_count(verd_map);
 
 	/* Create two listeners, each in its own reuseport group */
-	s1 = listen_loopback_reuseport(family, sotype, reuseport_prog);
+	s1 = socket_loopback_reuseport(family, sotype, reuseport_prog);
 	if (s1 < 0)
 		return;
 
-	s2 = listen_loopback_reuseport(family, sotype, reuseport_prog);
+	s2 = socket_loopback_reuseport(family, sotype, reuseport_prog);
 	if (s2 < 0)
 		goto close_srv1;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v2 7/9] selftests: bpf: add tests for UDP sockets in sockmap
  2020-02-28 11:53 [PATCH bpf-next v2 0/9] bpf: sockmap, sockhash: support storing UDP sockets Lorenz Bauer
                   ` (5 preceding siblings ...)
  2020-02-28 11:53 ` [PATCH bpf-next v2 6/9] selftests: bpf: don't listen() on " Lorenz Bauer
@ 2020-02-28 11:53 ` Lorenz Bauer
  2020-02-28 11:53 ` [PATCH bpf-next v2 8/9] selftests: bpf: enable UDP sockmap reuseport tests Lorenz Bauer
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Lorenz Bauer @ 2020-02-28 11:53 UTC (permalink / raw)
  To: john.fastabend, Shuah Khan, Alexei Starovoitov, Daniel Borkmann
  Cc: kernel-team, Lorenz Bauer, Jakub Sitnicki, linux-kselftest,
	netdev, bpf, linux-kernel

Expand the TCP sockmap test suite to also check UDP sockets.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
---
 .../selftests/bpf/prog_tests/sockmap_listen.c | 157 ++++++++++++++----
 1 file changed, 127 insertions(+), 30 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
index 4ba41dd26d6b..52aa468bdccd 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
@@ -108,6 +108,22 @@
 		__ret;                                                         \
 	})
 
+#define xsend(fd, buf, len, flags)                                             \
+	({                                                                     \
+		ssize_t __ret = send((fd), (buf), (len), (flags));             \
+		if (__ret == -1)                                               \
+			FAIL_ERRNO("send");                                    \
+		__ret;                                                         \
+	})
+
+#define xrecv(fd, buf, len, flags)                                             \
+	({                                                                     \
+		ssize_t __ret = recv((fd), (buf), (len), (flags));             \
+		if (__ret == -1)                                               \
+			FAIL_ERRNO("recv");                                    \
+		__ret;                                                         \
+	})
+
 #define xsocket(family, sotype, flags)                                         \
 	({                                                                     \
 		int __ret = socket(family, sotype, flags);                     \
@@ -330,7 +346,7 @@ static void test_insert_bound(int family, int sotype, int mapfd)
 	xclose(s);
 }
 
-static void test_insert_listening(int family, int sotype, int mapfd)
+static void test_insert(int family, int sotype, int mapfd)
 {
 	u64 value;
 	u32 key;
@@ -467,7 +483,7 @@ static void test_lookup_32_bit_value(int family, int sotype, int mapfd)
 	xclose(s);
 }
 
-static void test_update_listening(int family, int sotype, int mapfd)
+static void test_update_existing(int family, int sotype, int mapfd)
 {
 	int s1, s2;
 	u64 value;
@@ -1116,7 +1132,7 @@ static void test_reuseport_select_listening(int family, int sotype,
 {
 	struct sockaddr_storage addr;
 	unsigned int pass;
-	int s, c, p, err;
+	int s, c, err;
 	socklen_t len;
 	u64 value;
 	u32 key;
@@ -1145,19 +1161,33 @@ static void test_reuseport_select_listening(int family, int sotype,
 	if (err)
 		goto close_cli;
 
-	p = xaccept(s, NULL, NULL);
-	if (p < 0)
-		goto close_cli;
+	if (sotype == SOCK_STREAM) {
+		int p;
+
+		p = xaccept(s, NULL, NULL);
+		if (p < 0)
+			goto close_cli;
+		xclose(p);
+	} else {
+		char b = 'a';
+		ssize_t n;
+
+		n = xsend(c, &b, sizeof(b), 0);
+		if (n == -1)
+			goto close_cli;
+
+		n = xrecv(s, &b, sizeof(b), 0);
+		if (n == -1)
+			goto close_cli;
+	}
 
 	key = SK_PASS;
 	err = xbpf_map_lookup_elem(verd_map, &key, &pass);
 	if (err)
-		goto close_peer;
+		goto close_cli;
 	if (pass != 1)
 		FAIL("want pass count 1, have %d", pass);
 
-close_peer:
-	xclose(p);
 close_cli:
 	xclose(c);
 close_srv:
@@ -1201,9 +1231,24 @@ static void test_reuseport_select_connected(int family, int sotype,
 	if (err)
 		goto close_cli0;
 
-	p0 = xaccept(s, NULL, NULL);
-	if (err)
-		goto close_cli0;
+	if (sotype == SOCK_STREAM) {
+		p0 = xaccept(s, NULL, NULL);
+		if (p0 < 0)
+			goto close_cli0;
+	} else {
+		p0 = xsocket(family, sotype, 0);
+		if (p0 < 0)
+			goto close_cli0;
+
+		len = sizeof(addr);
+		err = xgetsockname(c0, sockaddr(&addr), &len);
+		if (err)
+			goto close_cli0;
+
+		err = xconnect(p0, sockaddr(&addr), len);
+		if (err)
+			goto close_cli0;
+	}
 
 	/* Update sock_map[0] to redirect to a connected socket */
 	key = 0;
@@ -1216,8 +1261,24 @@ static void test_reuseport_select_connected(int family, int sotype,
 	if (c1 < 0)
 		goto close_peer0;
 
+	len = sizeof(addr);
+	err = xgetsockname(s, sockaddr(&addr), &len);
+	if (err)
+		goto close_srv;
+
 	errno = 0;
 	err = connect(c1, sockaddr(&addr), len);
+	if (sotype == SOCK_DGRAM) {
+		char b = 'a';
+		ssize_t n;
+
+		n = xsend(c1, &b, sizeof(b), 0);
+		if (n == -1)
+			goto close_cli1;
+
+		n = recv(c1, &b, sizeof(b), 0);
+		err = n == -1;
+	}
 	if (!err || errno != ECONNREFUSED)
 		FAIL_ERRNO("connect: expected ECONNREFUSED");
 
@@ -1281,7 +1342,18 @@ static void test_reuseport_mixed_groups(int family, int sotype, int sock_map,
 		goto close_srv2;
 
 	err = connect(c, sockaddr(&addr), len);
-	if (err && errno != ECONNREFUSED) {
+	if (sotype == SOCK_DGRAM) {
+		char b = 'a';
+		ssize_t n;
+
+		n = xsend(c, &b, sizeof(b), 0);
+		if (n == -1)
+			goto close_cli;
+
+		n = recv(c, &b, sizeof(b), 0);
+		err = n == -1;
+	}
+	if (!err || errno != ECONNREFUSED) {
 		FAIL_ERRNO("connect: expected ECONNREFUSED");
 		goto close_cli;
 	}
@@ -1302,9 +1374,9 @@ static void test_reuseport_mixed_groups(int family, int sotype, int sock_map,
 	xclose(s1);
 }
 
-#define TEST(fn)                                                               \
+#define TEST(fn, ...)                                                          \
 	{                                                                      \
-		fn, #fn                                                        \
+		fn, #fn, __VA_ARGS__                                           \
 	}
 
 static void test_ops_cleanup(const struct bpf_map *map)
@@ -1353,18 +1425,31 @@ static const char *map_type_str(const struct bpf_map *map)
 	}
 }
 
+static const char *sotype_str(int sotype)
+{
+	switch (sotype) {
+	case SOCK_DGRAM:
+		return "UDP";
+	case SOCK_STREAM:
+		return "TCP";
+	default:
+		return "unknown";
+	}
+}
+
 static void test_ops(struct test_sockmap_listen *skel, struct bpf_map *map,
 		     int family, int sotype)
 {
 	const struct op_test {
 		void (*fn)(int family, int sotype, int mapfd);
 		const char *name;
+		int sotype;
 	} tests[] = {
 		/* insert */
 		TEST(test_insert_invalid),
 		TEST(test_insert_opened),
-		TEST(test_insert_bound),
-		TEST(test_insert_listening),
+		TEST(test_insert_bound, SOCK_STREAM),
+		TEST(test_insert),
 		/* delete */
 		TEST(test_delete_after_insert),
 		TEST(test_delete_after_close),
@@ -1373,28 +1458,32 @@ static void test_ops(struct test_sockmap_listen *skel, struct bpf_map *map,
 		TEST(test_lookup_after_delete),
 		TEST(test_lookup_32_bit_value),
 		/* update */
-		TEST(test_update_listening),
+		TEST(test_update_existing),
 		/* races with insert/delete */
-		TEST(test_destroy_orphan_child),
-		TEST(test_syn_recv_insert_delete),
-		TEST(test_race_insert_listen),
+		TEST(test_destroy_orphan_child, SOCK_STREAM),
+		TEST(test_syn_recv_insert_delete, SOCK_STREAM),
+		TEST(test_race_insert_listen, SOCK_STREAM),
 		/* child clone */
-		TEST(test_clone_after_delete),
-		TEST(test_accept_after_delete),
-		TEST(test_accept_before_delete),
+		TEST(test_clone_after_delete, SOCK_STREAM),
+		TEST(test_accept_after_delete, SOCK_STREAM),
+		TEST(test_accept_before_delete, SOCK_STREAM),
 	};
-	const char *family_name, *map_name;
+	const char *family_name, *map_name, *sotype_name;
 	const struct op_test *t;
 	char s[MAX_TEST_NAME];
 	int map_fd;
 
 	family_name = family_str(family);
 	map_name = map_type_str(map);
+	sotype_name = sotype_str(sotype);
 	map_fd = bpf_map__fd(map);
 
 	for (t = tests; t < tests + ARRAY_SIZE(tests); t++) {
-		snprintf(s, sizeof(s), "%s %s %s", map_name, family_name,
-			 t->name);
+		snprintf(s, sizeof(s), "%s %s %s %s", map_name, family_name,
+			 sotype_name, t->name);
+
+		if (t->sotype != 0 && t->sotype != sotype)
+			continue;
 
 		if (!test__start_subtest(s))
 			continue;
@@ -1427,6 +1516,7 @@ static void test_redir(struct test_sockmap_listen *skel, struct bpf_map *map,
 	for (t = tests; t < tests + ARRAY_SIZE(tests); t++) {
 		snprintf(s, sizeof(s), "%s %s %s", map_name, family_name,
 			 t->name);
+
 		if (!test__start_subtest(s))
 			continue;
 
@@ -1441,26 +1531,31 @@ static void test_reuseport(struct test_sockmap_listen *skel,
 		void (*fn)(int family, int sotype, int socket_map,
 			   int verdict_map, int reuseport_prog);
 		const char *name;
+		int sotype;
 	} tests[] = {
 		TEST(test_reuseport_select_listening),
 		TEST(test_reuseport_select_connected),
 		TEST(test_reuseport_mixed_groups),
 	};
 	int socket_map, verdict_map, reuseport_prog;
-	const char *family_name, *map_name;
+	const char *family_name, *map_name, *sotype_name;
 	const struct reuseport_test *t;
 	char s[MAX_TEST_NAME];
 
 	family_name = family_str(family);
 	map_name = map_type_str(map);
+	sotype_name = sotype_str(sotype);
 
 	socket_map = bpf_map__fd(map);
 	verdict_map = bpf_map__fd(skel->maps.verdict_map);
 	reuseport_prog = bpf_program__fd(skel->progs.prog_reuseport);
 
 	for (t = tests; t < tests + ARRAY_SIZE(tests); t++) {
-		snprintf(s, sizeof(s), "%s %s %s", map_name, family_name,
-			 t->name);
+		snprintf(s, sizeof(s), "%s %s %s %s", map_name, family_name,
+			 sotype_name, t->name);
+
+		if (t->sotype != 0 && t->sotype != sotype)
+			continue;
 
 		if (!test__start_subtest(s))
 			continue;
@@ -1473,8 +1568,10 @@ static void run_tests(struct test_sockmap_listen *skel, struct bpf_map *map,
 		      int family)
 {
 	test_ops(skel, map, family, SOCK_STREAM);
+	test_ops(skel, map, family, SOCK_DGRAM);
 	test_redir(skel, map, family, SOCK_STREAM);
 	test_reuseport(skel, map, family, SOCK_STREAM);
+	test_reuseport(skel, map, family, SOCK_DGRAM);
 }
 
 void test_sockmap_listen(void)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v2 8/9] selftests: bpf: enable UDP sockmap reuseport tests
  2020-02-28 11:53 [PATCH bpf-next v2 0/9] bpf: sockmap, sockhash: support storing UDP sockets Lorenz Bauer
                   ` (6 preceding siblings ...)
  2020-02-28 11:53 ` [PATCH bpf-next v2 7/9] selftests: bpf: add tests for UDP sockets in sockmap Lorenz Bauer
@ 2020-02-28 11:53 ` Lorenz Bauer
  2020-02-28 11:53 ` [PATCH bpf-next v2 9/9] bpf, doc: update maintainers for L7 BPF Lorenz Bauer
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Lorenz Bauer @ 2020-02-28 11:53 UTC (permalink / raw)
  To: john.fastabend, Shuah Khan, Alexei Starovoitov, Daniel Borkmann
  Cc: kernel-team, Lorenz Bauer, linux-kselftest, netdev, bpf, linux-kernel

Remove the guard that disables UDP tests now that sockmap
has support for them.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
---
 tools/testing/selftests/bpf/prog_tests/select_reuseport.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/select_reuseport.c b/tools/testing/selftests/bpf/prog_tests/select_reuseport.c
index a1dd13b34d4b..821b4146b7b6 100644
--- a/tools/testing/selftests/bpf/prog_tests/select_reuseport.c
+++ b/tools/testing/selftests/bpf/prog_tests/select_reuseport.c
@@ -805,12 +805,6 @@ static void test_config(int sotype, sa_family_t family, bool inany)
 	char s[MAX_TEST_NAME];
 	const struct test *t;
 
-	/* SOCKMAP/SOCKHASH don't support UDP yet */
-	if (sotype == SOCK_DGRAM &&
-	    (inner_map_type == BPF_MAP_TYPE_SOCKMAP ||
-	     inner_map_type == BPF_MAP_TYPE_SOCKHASH))
-		return;
-
 	for (t = tests; t < tests + ARRAY_SIZE(tests); t++) {
 		if (t->need_sotype && t->need_sotype != sotype)
 			continue; /* test not compatible with socket type */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v2 9/9] bpf, doc: update maintainers for L7 BPF
  2020-02-28 11:53 [PATCH bpf-next v2 0/9] bpf: sockmap, sockhash: support storing UDP sockets Lorenz Bauer
                   ` (7 preceding siblings ...)
  2020-02-28 11:53 ` [PATCH bpf-next v2 8/9] selftests: bpf: enable UDP sockmap reuseport tests Lorenz Bauer
@ 2020-02-28 11:53 ` Lorenz Bauer
  2020-03-03 19:45   ` John Fastabend
  2020-02-28 11:57 ` [PATCH bpf-next v2 0/9] bpf: sockmap, sockhash: support storing UDP sockets Lorenz Bauer
  2020-03-01 21:26 ` Jakub Sitnicki
  10 siblings, 1 reply; 20+ messages in thread
From: Lorenz Bauer @ 2020-02-28 11:53 UTC (permalink / raw)
  To: john.fastabend, Alexei Starovoitov, Daniel Borkmann
  Cc: kernel-team, Lorenz Bauer, linux-kernel, netdev, bpf

Add Jakub and myself as maintainers for sockmap related code.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
---
 MAINTAINERS | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 495ba52038ad..8517965adde8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9351,6 +9351,8 @@ F:	include/net/l3mdev.h
 L7 BPF FRAMEWORK
 M:	John Fastabend <john.fastabend@gmail.com>
 M:	Daniel Borkmann <daniel@iogearbox.net>
+M:	Jakub Sitnicki <jakub@cloudflare.com>
+M:	Lorenz Bauer <lmb@cloudflare.com>
 L:	netdev@vger.kernel.org
 L:	bpf@vger.kernel.org
 S:	Maintained
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 0/9] bpf: sockmap, sockhash: support storing UDP sockets
  2020-02-28 11:53 [PATCH bpf-next v2 0/9] bpf: sockmap, sockhash: support storing UDP sockets Lorenz Bauer
                   ` (8 preceding siblings ...)
  2020-02-28 11:53 ` [PATCH bpf-next v2 9/9] bpf, doc: update maintainers for L7 BPF Lorenz Bauer
@ 2020-02-28 11:57 ` Lorenz Bauer
  2020-03-01 21:26 ` Jakub Sitnicki
  10 siblings, 0 replies; 20+ messages in thread
From: Lorenz Bauer @ 2020-02-28 11:57 UTC (permalink / raw)
  To: John Fastabend, Alexei Starovoitov, Daniel Borkmann
  Cc: kernel-team, Networking, bpf

On Fri, 28 Feb 2020 at 11:54, Lorenz Bauer <lmb@cloudflare.com> wrote:
>
> Thanks for all the reviews so far! I've fixed the identified bug and addressed
> feedback as much as possible.

This should've been a reply to
https://lore.kernel.org/bpf/20200225135636.5768-1-lmb@cloudflare.com/
"[PATCH bpf-next 0/7] bpf: sockmap, sockhash: support storing UDP sockets"

Sorry!

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 0/9] bpf: sockmap, sockhash: support storing UDP sockets
  2020-02-28 11:53 [PATCH bpf-next v2 0/9] bpf: sockmap, sockhash: support storing UDP sockets Lorenz Bauer
                   ` (9 preceding siblings ...)
  2020-02-28 11:57 ` [PATCH bpf-next v2 0/9] bpf: sockmap, sockhash: support storing UDP sockets Lorenz Bauer
@ 2020-03-01 21:26 ` Jakub Sitnicki
  10 siblings, 0 replies; 20+ messages in thread
From: Jakub Sitnicki @ 2020-03-01 21:26 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: john.fastabend, Alexei Starovoitov, Daniel Borkmann, kernel-team,
	netdev, bpf

On Fri, Feb 28, 2020 at 12:53 PM CET, Lorenz Bauer wrote:
> Thanks for all the reviews so far! I've fixed the identified bug and addressed
> feedback as much as possible.
>
> I've not taken up Jakub's suggestion to get rid of sk_psock_hooks_init. My
> intention is to encapsulate initializing both v4 and v6 protos. Really, it's
> down to personal preference, and I'm happy to remove it if others prefer
> Jakub's approach. I'm also still eager for a solution that requires less
> machinery.

I was going to do expenses but this seemed more fun. Challenge accepted.

I think we can massage tcp_bpf to extract sk_prot initialization bits
out of it into skmsg, and have skmsg call back into tcp/udp_bpf.

3 callbacks from skmsg back to tcp/udp_bpf would be needed to get there:

 - assert_proto_ops
 - check_v6_needs_rebuild
 - get_proto (akin to choose_proto in this series)

With that in place we could go for a direct dispatch based on sock type:

#define sk_psock_assert_proto_ops(sk, ops)		\
	(sk->sk_type == SOCK_STREAM			\
	 ? tcp_bpf_assert_proto_ops(ops)		\
	 : udp_bpf_assert_proto_ops(ops))

The steps to get there would be:

 1. extract tcp_bpf_get_proto
 2. fold tcp_bpf_update_sk_prot
 3. move tcp_bpf_init -> sk_psock_init_proto
 4. fold tcp_bpf_reinit_sk_prot
 5. move tcp_bpf_reinit -> sk_psock_reinit_proto
 6. add macros for callbacks into tcp_bpf
 7. add udp_bpf

... which I've given a shot at, mixing it into your patches:

  https://github.com/jsitnicki/linux/commits/extract-sk-psock-proto

Note, I didn't make an effort to share the code for rebuilding v6 proto
between tcp_bpf and udp_bpf. This construct seems hard to read already
as is without making it generic.

Final thought, I would probably place the bits common between tcp_bpf
and udp_bpf in skmsg under sk_psock_* namespace, instead of sockmap.

It is just my interpretation, but I think that was the idea outlined in
commit 604326b41a6f ("bpf, sockmap: convert to generic sk_msg
interface"):

    The code itself has been split and refactored into three bigger
    pieces: i) the generic sk_msg API which deals with managing the
    scatter gather ring, providing helpers for walking and mangling,
    transferring application data from user space into it, and preparing
    it for BPF pre/post-processing, ii) the plain sock map itself
    where sockets can be attached to or detached from; these bits
    are independent of i) which can now be used also without sock
    map, and iii) the integration with plain TCP as one protocol
    to be used for processing L7 application data (later this could
    e.g. also be extended to other protocols like UDP).

As skmsg already hosts some sk_psock_* functions used by tcp_bpf, and
they share the same build toggle NET_SK_MSG, that seems natural.

>
> Changes since v1:
> - Check newsk->sk_prot in tcp_bpf_clone
> - Fix compilation with BPF_STREAM_PARSER disabled
> - Use spin_lock_init instead of static initializer
> - Elaborate on TCPF_SYN_RECV
> - Cosmetic changes to TEST macros, and more tests
> - Add Jakub and me as maintainers
>
> Lorenz Bauer (9):
>   bpf: sockmap: only check ULP for TCP sockets
>   bpf: tcp: guard declarations with CONFIG_NET_SOCK_MSG
>   bpf: sockmap: move generic sockmap hooks from BPF TCP
>   skmsg: introduce sk_psock_hooks
>   bpf: sockmap: allow UDP sockets
>   selftests: bpf: don't listen() on UDP sockets
>   selftests: bpf: add tests for UDP sockets in sockmap
>   selftests: bpf: enable UDP sockmap reuseport tests
>   bpf, doc: update maintainers for L7 BPF
>
>  MAINTAINERS                                   |   3 +
>  include/linux/bpf.h                           |   4 +-
>  include/linux/skmsg.h                         |  72 +++----
>  include/linux/udp.h                           |   4 +
>  include/net/tcp.h                             |  18 +-
>  net/core/skmsg.c                              |  55 +++++
>  net/core/sock_map.c                           | 160 ++++++++++----
>  net/ipv4/Makefile                             |   1 +
>  net/ipv4/tcp_bpf.c                            | 169 +++------------
>  net/ipv4/udp_bpf.c                            |  53 +++++
>  .../bpf/prog_tests/select_reuseport.c         |   6 -
>  .../selftests/bpf/prog_tests/sockmap_listen.c | 204 +++++++++++++-----
>  12 files changed, 465 insertions(+), 284 deletions(-)
>  create mode 100644 net/ipv4/udp_bpf.c

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH bpf-next v2 1/9] bpf: sockmap: only check ULP for TCP sockets
  2020-02-28 11:53 ` [PATCH bpf-next v2 1/9] bpf: sockmap: only check ULP for TCP sockets Lorenz Bauer
@ 2020-03-03 17:35   ` John Fastabend
  2020-03-04  8:39     ` Lorenz Bauer
  0 siblings, 1 reply; 20+ messages in thread
From: John Fastabend @ 2020-03-03 17:35 UTC (permalink / raw)
  To: Lorenz Bauer, john.fastabend, Daniel Borkmann, Jakub Sitnicki,
	Lorenz Bauer, David S. Miller, Jakub Kicinski,
	Alexei Starovoitov
  Cc: kernel-team, netdev, bpf, linux-kernel

Lorenz Bauer wrote:
> The sock map code checks that a socket does not have an active upper
> layer protocol before inserting it into the map. This requires casting
> via inet_csk, which isn't valid for UDP sockets.
> 
> Guard checks for ULP by checking inet_sk(sk)->is_icsk first.
> 
> Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
> ---
>  include/linux/skmsg.h |  8 +++++++-
>  net/core/sock_map.c   | 11 +++++++----
>  2 files changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
> index 112765bd146d..54a9a3e36b29 100644
> --- a/include/linux/skmsg.h
> +++ b/include/linux/skmsg.h
> @@ -360,7 +360,13 @@ static inline void sk_psock_restore_proto(struct sock *sk,
>  					  struct sk_psock *psock)
>  {
>  	sk->sk_prot->unhash = psock->saved_unhash;
> -	tcp_update_ulp(sk, psock->sk_proto, psock->saved_write_space);
> +	if (inet_sk(sk)->is_icsk) {

use sock_map_sk_has_ulp() here as well and then drop the !icsk->icsk_ulp_ops
case in tcp_update_ulp()?

> +		tcp_update_ulp(sk, psock->sk_proto, psock->saved_write_space);
> +	} else {
> +		sk->sk_write_space = psock->saved_write_space;
> +		/* Pairs with lockless read in sk_clone_lock() */
> +		WRITE_ONCE(sk->sk_prot, psock->sk_proto);
> +	}
>  }
>  
>  static inline void sk_psock_set_state(struct sk_psock *psock,
> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> index 2e0f465295c3..695ecacc7afa 100644
> --- a/net/core/sock_map.c
> +++ b/net/core/sock_map.c
> @@ -94,6 +94,11 @@ static void sock_map_sk_release(struct sock *sk)
>  	release_sock(sk);
>  }
>  
> +static bool sock_map_sk_has_ulp(struct sock *sk)
> +{
> +	return inet_sk(sk)->is_icsk && !!inet_csk(sk)->icsk_ulp_ops;
> +}
> +
>  static void sock_map_add_link(struct sk_psock *psock,
>  			      struct sk_psock_link *link,
>  			      struct bpf_map *map, void *link_raw)
> @@ -384,7 +389,6 @@ static int sock_map_update_common(struct bpf_map *map, u32 idx,
>  				  struct sock *sk, u64 flags)
>  {
>  	struct bpf_stab *stab = container_of(map, struct bpf_stab, map);
> -	struct inet_connection_sock *icsk = inet_csk(sk);
>  	struct sk_psock_link *link;
>  	struct sk_psock *psock;
>  	struct sock *osk;
> @@ -395,7 +399,7 @@ static int sock_map_update_common(struct bpf_map *map, u32 idx,
>  		return -EINVAL;
>  	if (unlikely(idx >= map->max_entries))
>  		return -E2BIG;
> -	if (unlikely(rcu_access_pointer(icsk->icsk_ulp_data)))
> +	if (sock_map_sk_has_ulp(sk))
>  		return -EINVAL;
>  
>  	link = sk_psock_init_link();
> @@ -738,7 +742,6 @@ static int sock_hash_update_common(struct bpf_map *map, void *key,
>  				   struct sock *sk, u64 flags)
>  {
>  	struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
> -	struct inet_connection_sock *icsk = inet_csk(sk);
>  	u32 key_size = map->key_size, hash;
>  	struct bpf_htab_elem *elem, *elem_new;
>  	struct bpf_htab_bucket *bucket;
> @@ -749,7 +752,7 @@ static int sock_hash_update_common(struct bpf_map *map, void *key,
>  	WARN_ON_ONCE(!rcu_read_lock_held());
>  	if (unlikely(flags > BPF_EXIST))
>  		return -EINVAL;
> -	if (unlikely(icsk->icsk_ulp_data))
> +	if (sock_map_sk_has_ulp(sk))
>  		return -EINVAL;
>  
>  	link = sk_psock_init_link();
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH bpf-next v2 2/9] bpf: tcp: guard declarations with CONFIG_NET_SOCK_MSG
  2020-02-28 11:53 ` [PATCH bpf-next v2 2/9] bpf: tcp: guard declarations with CONFIG_NET_SOCK_MSG Lorenz Bauer
@ 2020-03-03 17:46   ` John Fastabend
  0 siblings, 0 replies; 20+ messages in thread
From: John Fastabend @ 2020-03-03 17:46 UTC (permalink / raw)
  To: Lorenz Bauer, john.fastabend, Eric Dumazet, David S. Miller,
	Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann
  Cc: kernel-team, Lorenz Bauer, netdev, linux-kernel, bpf

Lorenz Bauer wrote:
> tcp_bpf.c is only included in the build if CONFIG_NET_SOCK_MSG is
> selected. The declaration should therefore be guarded as such.
> 
> Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
> ---

Seems OK, stream parser configs require sock msg now so that
dependency chain is already covered.

Acked-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 3/9] bpf: sockmap: move generic sockmap hooks from BPF TCP
  2020-02-28 11:53 ` [PATCH bpf-next v2 3/9] bpf: sockmap: move generic sockmap hooks from BPF TCP Lorenz Bauer
@ 2020-03-03 17:50   ` Martin KaFai Lau
  2020-03-04  8:44     ` Lorenz Bauer
  2020-03-03 17:52   ` John Fastabend
  1 sibling, 1 reply; 20+ messages in thread
From: Martin KaFai Lau @ 2020-03-03 17:50 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: john.fastabend, Alexei Starovoitov, Daniel Borkmann,
	Jakub Sitnicki, Eric Dumazet, David S. Miller, Jakub Kicinski,
	Alexey Kuznetsov, Hideaki YOSHIFUJI, kernel-team, netdev, bpf,
	linux-kernel

On Fri, Feb 28, 2020 at 11:53:38AM +0000, Lorenz Bauer wrote:
> The close, unhash and clone handlers from TCP sockmap are actually generic,
> and can be reused by UDP sockmap. Move the helpers into the sockmap code
Is clone reused in UDP? 

> base and expose them. This requires tcp_bpf_(re)init and tcp_bpf_clone to
> be conditional on BPF_STREAM_PARSER.
> 

[ ... ]

> @@ -707,3 +659,4 @@ void tcp_bpf_clone(const struct sock *sk, struct sock *newsk)
>  	if (prot == &tcp_bpf_prots[family][TCP_BPF_BASE])
>  		newsk->sk_prot = sk->sk_prot_creator;
>  }
> +#endif /* CONFIG_BPF_STREAM_PARSER */
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH bpf-next v2 3/9] bpf: sockmap: move generic sockmap hooks from BPF TCP
  2020-02-28 11:53 ` [PATCH bpf-next v2 3/9] bpf: sockmap: move generic sockmap hooks from BPF TCP Lorenz Bauer
  2020-03-03 17:50   ` Martin KaFai Lau
@ 2020-03-03 17:52   ` John Fastabend
  1 sibling, 0 replies; 20+ messages in thread
From: John Fastabend @ 2020-03-03 17:52 UTC (permalink / raw)
  To: Lorenz Bauer, john.fastabend, Alexei Starovoitov,
	Daniel Borkmann, Jakub Sitnicki, Lorenz Bauer, Eric Dumazet,
	David S. Miller, Jakub Kicinski, Alexey Kuznetsov,
	Hideaki YOSHIFUJI
  Cc: kernel-team, netdev, bpf, linux-kernel

Lorenz Bauer wrote:
> The close, unhash and clone handlers from TCP sockmap are actually generic,
> and can be reused by UDP sockmap. Move the helpers into the sockmap code
> base and expose them. This requires tcp_bpf_(re)init and tcp_bpf_clone to
> be conditional on BPF_STREAM_PARSER.
> 
> Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
> ---
>  include/linux/bpf.h   |  4 ++-
>  include/linux/skmsg.h | 28 ----------------
>  include/net/tcp.h     | 15 +++++----
>  net/core/sock_map.c   | 77 +++++++++++++++++++++++++++++++++++++++++--
>  net/ipv4/tcp_bpf.c    | 59 ++++-----------------------------
>  5 files changed, 92 insertions(+), 91 deletions(-)

No changes just moving code around it seems.

Acked-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 5/9] bpf: sockmap: allow UDP sockets
  2020-02-28 11:53 ` [PATCH bpf-next v2 5/9] bpf: sockmap: allow UDP sockets Lorenz Bauer
@ 2020-03-03 17:56   ` Martin KaFai Lau
  0 siblings, 0 replies; 20+ messages in thread
From: Martin KaFai Lau @ 2020-03-03 17:56 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: john.fastabend, Daniel Borkmann, Jakub Sitnicki, David S. Miller,
	Jakub Kicinski, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Alexei Starovoitov, kernel-team, linux-kernel, netdev, bpf

On Fri, Feb 28, 2020 at 11:53:40AM +0000, Lorenz Bauer wrote:
> Add basic psock hooks for UDP sockets. This allows adding and
> removing sockets, as well as automatic removal on unhash and close.
> 
> sock_map_sk_state_allowed is called from the syscall path, and
> ensures that only established or listening sockets are added.
> No such check exists for the BPF path: we rely on sockets being
> in particular states when a BPF sock ops hook is executed.
> For the passive open hook this means that sockets are actually in
> TCP_SYN_RECV state (and unhashed) when they are added to the
> sock map.
> 
> UDP sockets are not saddled with this inconsistency, and so the
> checks for both syscall and BPF path should be identical. Rather
> than duplicating the logic into sock_map_sk_state_allowed merge
> it with sock_map_sk_is_suitable.
> 
> Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
> ---
>  MAINTAINERS         |  1 +
>  include/linux/udp.h |  4 ++++
>  net/core/sock_map.c | 52 ++++++++++++++++++++++++++------------------
>  net/ipv4/Makefile   |  1 +
>  net/ipv4/udp_bpf.c  | 53 +++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 90 insertions(+), 21 deletions(-)
>  create mode 100644 net/ipv4/udp_bpf.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 2af5fa73155e..495ba52038ad 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -9358,6 +9358,7 @@ F:	include/linux/skmsg.h
>  F:	net/core/skmsg.c
>  F:	net/core/sock_map.c
>  F:	net/ipv4/tcp_bpf.c
> +F:	net/ipv4/udp_bpf.c
>  
>  LANTIQ / INTEL Ethernet drivers
>  M:	Hauke Mehrtens <hauke@hauke-m.de>
> diff --git a/include/linux/udp.h b/include/linux/udp.h
> index aa84597bdc33..2485a35d113c 100644
> --- a/include/linux/udp.h
> +++ b/include/linux/udp.h
> @@ -143,4 +143,8 @@ static inline bool udp_unexpected_gso(struct sock *sk, struct sk_buff *skb)
>  
>  #define IS_UDPLITE(__sk) (__sk->sk_protocol == IPPROTO_UDPLITE)
>  
> +#ifdef CONFIG_BPF_STREAM_PARSER
> +int udp_bpf_init(struct sock *sk);
> +#endif
> +
>  #endif	/* _LINUX_UDP_H */
> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> index c84cc9fc7f6b..d742e1538ae9 100644
> --- a/net/core/sock_map.c
> +++ b/net/core/sock_map.c

[ ... ]

> @@ -466,15 +479,20 @@ static bool sock_map_op_okay(const struct bpf_sock_ops_kern *ops)
>  	       ops->op == BPF_SOCK_OPS_TCP_LISTEN_CB;
>  }
>  
> -static bool sock_map_sk_is_suitable(const struct sock *sk)
> +static bool sock_map_sk_is_udp(const struct sock *sk)
>  {
> -	return sk->sk_type == SOCK_STREAM &&
> -	       sk->sk_protocol == IPPROTO_TCP;
> +	return sk->sk_type == SOCK_DGRAM && sk->sk_protocol == IPPROTO_UDP;
>  }
>  
> -static bool sock_map_sk_state_allowed(const struct sock *sk)
> +static bool sock_map_sk_is_suitable(const struct sock *sk, bool from_bpf)
"from_bpf" seems unnecessary.

>  {
> -	return (1 << sk->sk_state) & (TCPF_ESTABLISHED | TCPF_LISTEN);
> +	int tcp_flags = TCPF_ESTABLISHED | TCPF_LISTEN;
> +
> +	if (from_bpf)
> +		tcp_flags |= TCPF_SYN_RECV;
> +
> +	return (sock_map_sk_is_udp(sk) && sk_hashed(sk)) ||
> +	       (sock_map_sk_is_tcp(sk) && (1 << sk->sk_state) & tcp_flags);
>  }
>  
>  static int sock_map_update_elem(struct bpf_map *map, void *key,

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH bpf-next v2 9/9] bpf, doc: update maintainers for L7 BPF
  2020-02-28 11:53 ` [PATCH bpf-next v2 9/9] bpf, doc: update maintainers for L7 BPF Lorenz Bauer
@ 2020-03-03 19:45   ` John Fastabend
  0 siblings, 0 replies; 20+ messages in thread
From: John Fastabend @ 2020-03-03 19:45 UTC (permalink / raw)
  To: Lorenz Bauer, john.fastabend, Alexei Starovoitov, Daniel Borkmann
  Cc: kernel-team, Lorenz Bauer, linux-kernel, netdev, bpf

Lorenz Bauer wrote:
> Add Jakub and myself as maintainers for sockmap related code.
> 
> Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
> ---
>  MAINTAINERS | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 495ba52038ad..8517965adde8 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -9351,6 +9351,8 @@ F:	include/net/l3mdev.h
>  L7 BPF FRAMEWORK
>  M:	John Fastabend <john.fastabend@gmail.com>
>  M:	Daniel Borkmann <daniel@iogearbox.net>
> +M:	Jakub Sitnicki <jakub@cloudflare.com>
> +M:	Lorenz Bauer <lmb@cloudflare.com>
>  L:	netdev@vger.kernel.org
>  L:	bpf@vger.kernel.org
>  S:	Maintained
> -- 
> 2.20.1
> 

Acked-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 1/9] bpf: sockmap: only check ULP for TCP sockets
  2020-03-03 17:35   ` John Fastabend
@ 2020-03-04  8:39     ` Lorenz Bauer
  0 siblings, 0 replies; 20+ messages in thread
From: Lorenz Bauer @ 2020-03-04  8:39 UTC (permalink / raw)
  To: John Fastabend
  Cc: Daniel Borkmann, Jakub Sitnicki, David S. Miller, Jakub Kicinski,
	Alexei Starovoitov, kernel-team, Networking, bpf, linux-kernel

On Tue, 3 Mar 2020 at 18:35, John Fastabend <john.fastabend@gmail.com> wrote:
>
> > diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
> > index 112765bd146d..54a9a3e36b29 100644
> > --- a/include/linux/skmsg.h
> > +++ b/include/linux/skmsg.h
> > @@ -360,7 +360,13 @@ static inline void sk_psock_restore_proto(struct sock *sk,
> >                                         struct sk_psock *psock)
> >  {
> >       sk->sk_prot->unhash = psock->saved_unhash;
> > -     tcp_update_ulp(sk, psock->sk_proto, psock->saved_write_space);
> > +     if (inet_sk(sk)->is_icsk) {
>
> use sock_map_sk_has_ulp() here as well and then drop the !icsk->icsk_ulp_ops
> case in tcp_update_ulp()?

This requires moving sock_map_sk_has_ulp to the header, which seemed like the
incorrect place. How about adding bool inet_csk_has_ulp(sk) to
inet_connection_sock.h?

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 3/9] bpf: sockmap: move generic sockmap hooks from BPF TCP
  2020-03-03 17:50   ` Martin KaFai Lau
@ 2020-03-04  8:44     ` Lorenz Bauer
  0 siblings, 0 replies; 20+ messages in thread
From: Lorenz Bauer @ 2020-03-04  8:44 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: John Fastabend, Alexei Starovoitov, Daniel Borkmann,
	Jakub Sitnicki, Eric Dumazet, David S. Miller, Jakub Kicinski,
	Alexey Kuznetsov, Hideaki YOSHIFUJI, kernel-team, Networking,
	bpf, linux-kernel

On Tue, 3 Mar 2020 at 18:51, Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Fri, Feb 28, 2020 at 11:53:38AM +0000, Lorenz Bauer wrote:
> > The close, unhash and clone handlers from TCP sockmap are actually generic,
> > and can be reused by UDP sockmap. Move the helpers into the sockmap code
> Is clone reused in UDP?

No, my bad. I'll update the commit message.

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-03-04  8:45 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-28 11:53 [PATCH bpf-next v2 0/9] bpf: sockmap, sockhash: support storing UDP sockets Lorenz Bauer
2020-02-28 11:53 ` [PATCH bpf-next v2 1/9] bpf: sockmap: only check ULP for TCP sockets Lorenz Bauer
2020-03-03 17:35   ` John Fastabend
2020-03-04  8:39     ` Lorenz Bauer
2020-02-28 11:53 ` [PATCH bpf-next v2 2/9] bpf: tcp: guard declarations with CONFIG_NET_SOCK_MSG Lorenz Bauer
2020-03-03 17:46   ` John Fastabend
2020-02-28 11:53 ` [PATCH bpf-next v2 3/9] bpf: sockmap: move generic sockmap hooks from BPF TCP Lorenz Bauer
2020-03-03 17:50   ` Martin KaFai Lau
2020-03-04  8:44     ` Lorenz Bauer
2020-03-03 17:52   ` John Fastabend
2020-02-28 11:53 ` [PATCH bpf-next v2 4/9] skmsg: introduce sk_psock_hooks Lorenz Bauer
2020-02-28 11:53 ` [PATCH bpf-next v2 5/9] bpf: sockmap: allow UDP sockets Lorenz Bauer
2020-03-03 17:56   ` Martin KaFai Lau
2020-02-28 11:53 ` [PATCH bpf-next v2 6/9] selftests: bpf: don't listen() on " Lorenz Bauer
2020-02-28 11:53 ` [PATCH bpf-next v2 7/9] selftests: bpf: add tests for UDP sockets in sockmap Lorenz Bauer
2020-02-28 11:53 ` [PATCH bpf-next v2 8/9] selftests: bpf: enable UDP sockmap reuseport tests Lorenz Bauer
2020-02-28 11:53 ` [PATCH bpf-next v2 9/9] bpf, doc: update maintainers for L7 BPF Lorenz Bauer
2020-03-03 19:45   ` John Fastabend
2020-02-28 11:57 ` [PATCH bpf-next v2 0/9] bpf: sockmap, sockhash: support storing UDP sockets Lorenz Bauer
2020-03-01 21:26 ` Jakub Sitnicki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.