All of lore.kernel.org
 help / color / mirror / Atom feed
* [Patch bpf-next 00/19] sock_map: add non-TCP and cross-protocol support
@ 2021-02-03  4:16 Cong Wang
  2021-02-03  4:16 ` [Patch bpf-next 01/19] bpf: rename BPF_STREAM_PARSER to BPF_SOCK_MAP Cong Wang
                   ` (19 more replies)
  0 siblings, 20 replies; 40+ messages in thread
From: Cong Wang @ 2021-02-03  4:16 UTC (permalink / raw)
  To: netdev; +Cc: bpf, duanxiongchun, wangdongdong.6, jiang.wang, Cong Wang

From: Cong Wang <cong.wang@bytedance.com>

Currently sockmap only fully supports TCP, UDP is partially supported
as it is only allowed to add into sockmap. This patch extends sockmap
with: 1) full UDP support; 2) full AF_UNIX dgram support; 3) cross
protocol support. Our goal is to allow socket splice between AF_UNIX
dgram and UDP.

On the high level, ->sendmsg_locked() and ->read_sock() are required
for each protocol to support sockmap redirection, and in order to do
sock proto update, a new ops ->update_proto() is introduced, which is
also required to implement. It is slightly harder for AF_UNIX, as it
does not have a full struct proto implementation and redirection.

In order to support cross protocol, we have to make skb independent
of protocols, which is extremely hard given how creatively UDP uses
dev_scratch. Fortunately, we can pass skmsg instead of skb when
redirecting to ingress, the only thing needs to add is a new
->recvmsg() to retrieve skmsg. On the egress side, a new skb is
allocated behind skb_send_sock_locked(), it comes for free.
Another big barrier is skb CB, which was hard-coded as TCP_CB(),
I switch it to skb ext to solve this problem. Please see patch 3 for
more details.

This patchset passed all tests, the existing ones and the new ones I
add within this patchset.

---

Cong Wang (19):
  bpf: rename BPF_STREAM_PARSER to BPF_SOCK_MAP
  skmsg: get rid of struct sk_psock_parser
  skmsg: use skb ext instead of TCP_SKB_CB
  sock_map: rename skb_parser and skb_verdict
  sock_map: introduce BPF_SK_SKB_VERDICT
  sock: introduce sk_prot->update_proto()
  udp: implement ->sendmsg_locked()
  udp: implement ->read_sock() for sockmap
  udp: add ->read_sock() and ->sendmsg_locked() to ipv6
  af_unix: implement ->sendmsg_locked for dgram socket
  af_unix: implement ->read_sock() for sockmap
  af_unix: implement ->update_proto()
  af_unix: set TCP_ESTABLISHED for datagram sockets too
  skmsg: extract __tcp_bpf_recvmsg() and tcp_bpf_wait_data()
  udp: implement udp_bpf_recvmsg() for sockmap
  af_unix: implement unix_dgram_bpf_recvmsg()
  sock_map: update sock type checks
  selftests/bpf: add test cases for unix and udp sockmap
  selftests/bpf: add test case for redirection between udp and unix

 MAINTAINERS                                   |   1 +
 include/linux/bpf.h                           |   4 +-
 include/linux/bpf_types.h                     |   2 +-
 include/linux/skbuff.h                        |   4 +
 include/linux/skmsg.h                         |  90 +++-
 include/net/af_unix.h                         |  13 +
 include/net/ipv6.h                            |   1 +
 include/net/sock.h                            |   3 +
 include/net/tcp.h                             |  33 +-
 include/net/udp.h                             |   9 +-
 include/uapi/linux/bpf.h                      |   1 +
 kernel/bpf/syscall.c                          |   1 +
 net/Kconfig                                   |  14 +-
 net/core/Makefile                             |   2 +-
 net/core/filter.c                             |   3 +-
 net/core/skbuff.c                             |   7 +
 net/core/skmsg.c                              | 223 +++++---
 net/core/sock_map.c                           | 128 ++---
 net/ipv4/Makefile                             |   2 +-
 net/ipv4/af_inet.c                            |   2 +
 net/ipv4/tcp_bpf.c                            | 130 +----
 net/ipv4/tcp_ipv4.c                           |   3 +
 net/ipv4/udp.c                                |  68 ++-
 net/ipv4/udp_bpf.c                            |  78 ++-
 net/ipv6/af_inet6.c                           |   2 +
 net/ipv6/tcp_ipv6.c                           |   3 +
 net/ipv6/udp.c                                |  30 +-
 net/tls/tls_sw.c                              |   4 +-
 net/unix/Makefile                             |   1 +
 net/unix/af_unix.c                            | 105 +++-
 net/unix/unix_bpf.c                           |  99 ++++
 tools/bpf/bpftool/common.c                    |   1 +
 tools/bpf/bpftool/prog.c                      |   1 +
 tools/include/uapi/linux/bpf.h                |   1 +
 .../selftests/bpf/prog_tests/sockmap_listen.c | 475 +++++++++++++++++-
 .../selftests/bpf/progs/test_sockmap_listen.c |  24 +-
 36 files changed, 1233 insertions(+), 335 deletions(-)
 create mode 100644 net/unix/unix_bpf.c

-- 
2.25.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Patch bpf-next 01/19] bpf: rename BPF_STREAM_PARSER to BPF_SOCK_MAP
  2021-02-03  4:16 [Patch bpf-next 00/19] sock_map: add non-TCP and cross-protocol support Cong Wang
@ 2021-02-03  4:16 ` Cong Wang
  2021-02-05 10:32   ` Jakub Sitnicki
  2021-02-08  8:21   ` John Fastabend
  2021-02-03  4:16 ` [Patch bpf-next 02/19] skmsg: get rid of struct sk_psock_parser Cong Wang
                   ` (18 subsequent siblings)
  19 siblings, 2 replies; 40+ messages in thread
From: Cong Wang @ 2021-02-03  4:16 UTC (permalink / raw)
  To: netdev
  Cc: bpf, duanxiongchun, wangdongdong.6, jiang.wang, Cong Wang,
	John Fastabend, Daniel Borkmann, Jakub Sitnicki, Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

Before we add non-TCP support, it is necessary to rename
BPF_STREAM_PARSER as it will be no longer specific to TCP,
and it does not have to be a parser either.

This patch renames BPF_STREAM_PARSER to BPF_SOCK_MAP, so
that sock_map.c hopefully would be protocol-independent.

Also, improve its Kconfig description to avoid confusion.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 include/linux/bpf.h       |  4 ++--
 include/linux/bpf_types.h |  2 +-
 include/net/tcp.h         |  4 ++--
 include/net/udp.h         |  4 ++--
 net/Kconfig               | 13 ++++++-------
 net/core/Makefile         |  2 +-
 net/ipv4/Makefile         |  2 +-
 net/ipv4/tcp_bpf.c        |  4 ++--
 8 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 321966fc35db..b5af6a4e9927 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1771,7 +1771,7 @@ static inline void bpf_map_offload_map_free(struct bpf_map *map)
 }
 #endif /* CONFIG_NET && CONFIG_BPF_SYSCALL */
 
-#if defined(CONFIG_BPF_STREAM_PARSER)
+#if defined(CONFIG_BPF_SOCK_MAP)
 int sock_map_prog_update(struct bpf_map *map, struct bpf_prog *prog,
 			 struct bpf_prog *old, u32 which);
 int sock_map_get_from_fd(const union bpf_attr *attr, struct bpf_prog *prog);
@@ -1804,7 +1804,7 @@ static inline int sock_map_update_elem_sys(struct bpf_map *map, void *key, void
 {
 	return -EOPNOTSUPP;
 }
-#endif /* CONFIG_BPF_STREAM_PARSER */
+#endif /* CONFIG_BPF_SOCK_MAP */
 
 #if defined(CONFIG_INET) && defined(CONFIG_BPF_SYSCALL)
 void bpf_sk_reuseport_detach(struct sock *sk);
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 99f7fd657d87..6e27726ae578 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -103,7 +103,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_HASH_OF_MAPS, htab_of_maps_map_ops)
 BPF_MAP_TYPE(BPF_MAP_TYPE_DEVMAP, dev_map_ops)
 BPF_MAP_TYPE(BPF_MAP_TYPE_DEVMAP_HASH, dev_map_hash_ops)
 BPF_MAP_TYPE(BPF_MAP_TYPE_SK_STORAGE, sk_storage_map_ops)
-#if defined(CONFIG_BPF_STREAM_PARSER)
+#if defined(CONFIG_BPF_SOCK_MAP)
 BPF_MAP_TYPE(BPF_MAP_TYPE_SOCKMAP, sock_map_ops)
 BPF_MAP_TYPE(BPF_MAP_TYPE_SOCKHASH, sock_hash_ops)
 #endif
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 4bb42fb19711..be66571ad122 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2207,14 +2207,14 @@ void tcp_update_ulp(struct sock *sk, struct proto *p,
 struct sk_msg;
 struct sk_psock;
 
-#ifdef CONFIG_BPF_STREAM_PARSER
+#ifdef CONFIG_BPF_SOCK_MAP
 struct proto *tcp_bpf_get_proto(struct sock *sk, struct sk_psock *psock);
 void tcp_bpf_clone(const struct sock *sk, struct sock *newsk);
 #else
 static inline void tcp_bpf_clone(const struct sock *sk, struct sock *newsk)
 {
 }
-#endif /* CONFIG_BPF_STREAM_PARSER */
+#endif /* CONFIG_BPF_SOCK_MAP */
 
 #ifdef CONFIG_NET_SOCK_MSG
 int tcp_bpf_sendmsg_redir(struct sock *sk, struct sk_msg *msg, u32 bytes,
diff --git a/include/net/udp.h b/include/net/udp.h
index 877832bed471..0ff921e6b866 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -511,9 +511,9 @@ static inline struct sk_buff *udp_rcv_segment(struct sock *sk,
 	return segs;
 }
 
-#ifdef CONFIG_BPF_STREAM_PARSER
+#ifdef CONFIG_BPF_SOCK_MAP
 struct sk_psock;
 struct proto *udp_bpf_get_proto(struct sock *sk, struct sk_psock *psock);
-#endif /* BPF_STREAM_PARSER */
+#endif /* CONFIG_BPF_SOCK_MAP */
 
 #endif	/* _UDP_H */
diff --git a/net/Kconfig b/net/Kconfig
index f4c32d982af6..0cc0805a8127 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -305,20 +305,19 @@ config BPF_JIT
 	  /proc/sys/net/core/bpf_jit_harden   (optional)
 	  /proc/sys/net/core/bpf_jit_kallsyms (optional)
 
-config BPF_STREAM_PARSER
-	bool "enable BPF STREAM_PARSER"
+config BPF_SOCK_MAP
+	bool "enable BPF socket maps"
 	depends on INET
 	depends on BPF_SYSCALL
 	depends on CGROUP_BPF
 	select STREAM_PARSER
 	select NET_SOCK_MSG
 	help
-	  Enabling this allows a stream parser to be used with
-	  BPF_MAP_TYPE_SOCKMAP.
+	  Enabling this allows skb parser and verdict to be used with
+	  BPF_MAP_TYPE_SOCKMAP or BPF_MAP_TYPE_SOCKHASH.
 
-	  BPF_MAP_TYPE_SOCKMAP provides a map type to use with network sockets.
-	  It can be used to enforce socket policy, implement socket redirects,
-	  etc.
+	  This provides a BPF map type to use with network sockets. It can
+	  be used to enforce socket policy, implement socket redirects, etc.
 
 config NET_FLOW_LIMIT
 	bool
diff --git a/net/core/Makefile b/net/core/Makefile
index 3e2c378e5f31..e7c1bdaadefd 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -28,7 +28,7 @@ obj-$(CONFIG_CGROUP_NET_PRIO) += netprio_cgroup.o
 obj-$(CONFIG_CGROUP_NET_CLASSID) += netclassid_cgroup.o
 obj-$(CONFIG_LWTUNNEL) += lwtunnel.o
 obj-$(CONFIG_LWTUNNEL_BPF) += lwt_bpf.o
-obj-$(CONFIG_BPF_STREAM_PARSER) += sock_map.o
+obj-$(CONFIG_BPF_SOCK_MAP) += sock_map.o
 obj-$(CONFIG_DST_CACHE) += dst_cache.o
 obj-$(CONFIG_HWBM) += hwbm.o
 obj-$(CONFIG_NET_DEVLINK) += devlink.o
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index 5b77a46885b9..f72f84d1b982 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -62,7 +62,7 @@ obj-$(CONFIG_TCP_CONG_LP) += tcp_lp.o
 obj-$(CONFIG_TCP_CONG_YEAH) += tcp_yeah.o
 obj-$(CONFIG_TCP_CONG_ILLINOIS) += tcp_illinois.o
 obj-$(CONFIG_NET_SOCK_MSG) += tcp_bpf.o
-obj-$(CONFIG_BPF_STREAM_PARSER) += udp_bpf.o
+obj-$(CONFIG_BPF_SOCK_MAP) += udp_bpf.o
 obj-$(CONFIG_NETLABEL) += cipso_ipv4.o
 
 obj-$(CONFIG_XFRM) += xfrm4_policy.o xfrm4_state.o xfrm4_input.o \
diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index bc7d2a586e18..2252f1d90676 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -229,7 +229,7 @@ int tcp_bpf_sendmsg_redir(struct sock *sk, struct sk_msg *msg,
 }
 EXPORT_SYMBOL_GPL(tcp_bpf_sendmsg_redir);
 
-#ifdef CONFIG_BPF_STREAM_PARSER
+#ifdef CONFIG_BPF_SOCK_MAP
 static bool tcp_bpf_stream_read(const struct sock *sk)
 {
 	struct sk_psock *psock;
@@ -629,4 +629,4 @@ void tcp_bpf_clone(const struct sock *sk, struct sock *newsk)
 	if (prot == &tcp_bpf_prots[family][TCP_BPF_BASE])
 		newsk->sk_prot = sk->sk_prot_creator;
 }
-#endif /* CONFIG_BPF_STREAM_PARSER */
+#endif /* CONFIG_BPF_SOCK_MAP */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Patch bpf-next 02/19] skmsg: get rid of struct sk_psock_parser
  2021-02-03  4:16 [Patch bpf-next 00/19] sock_map: add non-TCP and cross-protocol support Cong Wang
  2021-02-03  4:16 ` [Patch bpf-next 01/19] bpf: rename BPF_STREAM_PARSER to BPF_SOCK_MAP Cong Wang
@ 2021-02-03  4:16 ` Cong Wang
  2021-02-05 11:25   ` Jakub Sitnicki
  2021-02-03  4:16 ` [Patch bpf-next 03/19] skmsg: use skb ext instead of TCP_SKB_CB Cong Wang
                   ` (17 subsequent siblings)
  19 siblings, 1 reply; 40+ messages in thread
From: Cong Wang @ 2021-02-03  4:16 UTC (permalink / raw)
  To: netdev
  Cc: bpf, duanxiongchun, wangdongdong.6, jiang.wang, Cong Wang,
	John Fastabend, Daniel Borkmann, Jakub Sitnicki, Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

struct sk_psock_parser is embedded in sk_psock, it is
unnecessary as skb verdict also uses ->saved_data_ready.
We can simply fold these fields into sk_psock.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 include/linux/skmsg.h | 16 +++++-------
 net/core/skmsg.c      | 58 ++++++++++++++++---------------------------
 net/core/sock_map.c   |  8 +++---
 3 files changed, 31 insertions(+), 51 deletions(-)

diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index 8edbbf5f2f93..56d641df3b0c 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -70,12 +70,6 @@ struct sk_psock_link {
 	void				*link_raw;
 };
 
-struct sk_psock_parser {
-	struct strparser		strp;
-	bool				enabled;
-	void (*saved_data_ready)(struct sock *sk);
-};
-
 struct sk_psock_work_state {
 	struct sk_buff			*skb;
 	u32				len;
@@ -90,7 +84,8 @@ struct sk_psock {
 	u32				eval;
 	struct sk_msg			*cork;
 	struct sk_psock_progs		progs;
-	struct sk_psock_parser		parser;
+	struct strparser		strp;
+	bool				bpf_running;
 	struct sk_buff_head		ingress_skb;
 	struct list_head		ingress_msg;
 	unsigned long			state;
@@ -100,6 +95,7 @@ struct sk_psock {
 	void (*saved_unhash)(struct sock *sk);
 	void (*saved_close)(struct sock *sk, long timeout);
 	void (*saved_write_space)(struct sock *sk);
+	void (*saved_data_ready)(struct sock *sk);
 	struct proto			*sk_proto;
 	struct sk_psock_work_state	work_state;
 	struct work_struct		work;
@@ -400,8 +396,8 @@ static inline void sk_psock_put(struct sock *sk, struct sk_psock *psock)
 
 static inline void sk_psock_data_ready(struct sock *sk, struct sk_psock *psock)
 {
-	if (psock->parser.enabled)
-		psock->parser.saved_data_ready(sk);
+	if (psock->bpf_running)
+		psock->saved_data_ready(sk);
 	else
 		sk->sk_data_ready(sk);
 }
@@ -440,6 +436,6 @@ static inline bool sk_psock_strp_enabled(struct sk_psock *psock)
 {
 	if (!psock)
 		return false;
-	return psock->parser.enabled;
+	return psock->bpf_running;
 }
 #endif /* _LINUX_SKMSG_H */
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 1261512d6807..f72fcb03d25c 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -653,7 +653,7 @@ static void sk_psock_destroy_deferred(struct work_struct *gc)
 
 	/* Parser has been stopped */
 	if (psock->progs.skb_parser)
-		strp_done(&psock->parser.strp);
+		strp_done(&psock->strp);
 
 	cancel_work_sync(&psock->work);
 
@@ -750,14 +750,6 @@ static int sk_psock_bpf_run(struct sk_psock *psock, struct bpf_prog *prog,
 	return bpf_prog_run_pin_on_cpu(prog, skb);
 }
 
-static struct sk_psock *sk_psock_from_strp(struct strparser *strp)
-{
-	struct sk_psock_parser *parser;
-
-	parser = container_of(strp, struct sk_psock_parser, strp);
-	return container_of(parser, struct sk_psock, parser);
-}
-
 static void sk_psock_skb_redirect(struct sk_buff *skb)
 {
 	struct sk_psock *psock_other;
@@ -899,7 +891,7 @@ static int sk_psock_strp_read_done(struct strparser *strp, int err)
 
 static int sk_psock_strp_parse(struct strparser *strp, struct sk_buff *skb)
 {
-	struct sk_psock *psock = sk_psock_from_strp(strp);
+	struct sk_psock *psock = container_of(strp, struct sk_psock, strp);
 	struct bpf_prog *prog;
 	int ret = skb->len;
 
@@ -923,10 +915,10 @@ static void sk_psock_strp_data_ready(struct sock *sk)
 	psock = sk_psock(sk);
 	if (likely(psock)) {
 		if (tls_sw_has_ctx_rx(sk)) {
-			psock->parser.saved_data_ready(sk);
+			psock->saved_data_ready(sk);
 		} else {
 			write_lock_bh(&sk->sk_callback_lock);
-			strp_data_ready(&psock->parser.strp);
+			strp_data_ready(&psock->strp);
 			write_unlock_bh(&sk->sk_callback_lock);
 		}
 	}
@@ -1009,57 +1001,49 @@ int sk_psock_init_strp(struct sock *sk, struct sk_psock *psock)
 		.parse_msg	= sk_psock_strp_parse,
 	};
 
-	psock->parser.enabled = false;
-	return strp_init(&psock->parser.strp, sk, &cb);
+	psock->bpf_running = false;
+	return strp_init(&psock->strp, sk, &cb);
 }
 
 void sk_psock_start_verdict(struct sock *sk, struct sk_psock *psock)
 {
-	struct sk_psock_parser *parser = &psock->parser;
-
-	if (parser->enabled)
+	if (psock->bpf_running)
 		return;
 
-	parser->saved_data_ready = sk->sk_data_ready;
+	psock->saved_data_ready = sk->sk_data_ready;
 	sk->sk_data_ready = sk_psock_verdict_data_ready;
 	sk->sk_write_space = sk_psock_write_space;
-	parser->enabled = true;
+	psock->bpf_running = true;
 }
 
 void sk_psock_start_strp(struct sock *sk, struct sk_psock *psock)
 {
-	struct sk_psock_parser *parser = &psock->parser;
-
-	if (parser->enabled)
+	if (psock->bpf_running)
 		return;
 
-	parser->saved_data_ready = sk->sk_data_ready;
+	psock->saved_data_ready = sk->sk_data_ready;
 	sk->sk_data_ready = sk_psock_strp_data_ready;
 	sk->sk_write_space = sk_psock_write_space;
-	parser->enabled = true;
+	psock->bpf_running = true;
 }
 
 void sk_psock_stop_strp(struct sock *sk, struct sk_psock *psock)
 {
-	struct sk_psock_parser *parser = &psock->parser;
-
-	if (!parser->enabled)
+	if (!psock->bpf_running)
 		return;
 
-	sk->sk_data_ready = parser->saved_data_ready;
-	parser->saved_data_ready = NULL;
-	strp_stop(&parser->strp);
-	parser->enabled = false;
+	sk->sk_data_ready = psock->saved_data_ready;
+	psock->saved_data_ready = NULL;
+	strp_stop(&psock->strp);
+	psock->bpf_running = false;
 }
 
 void sk_psock_stop_verdict(struct sock *sk, struct sk_psock *psock)
 {
-	struct sk_psock_parser *parser = &psock->parser;
-
-	if (!parser->enabled)
+	if (!psock->bpf_running)
 		return;
 
-	sk->sk_data_ready = parser->saved_data_ready;
-	parser->saved_data_ready = NULL;
-	parser->enabled = false;
+	sk->sk_data_ready = psock->saved_data_ready;
+	psock->saved_data_ready = NULL;
+	psock->bpf_running = false;
 }
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index d758fb83c884..37ff8e13e4cc 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -148,9 +148,9 @@ static void sock_map_del_link(struct sock *sk,
 			struct bpf_map *map = link->map;
 			struct bpf_stab *stab = container_of(map, struct bpf_stab,
 							     map);
-			if (psock->parser.enabled && stab->progs.skb_parser)
+			if (psock->bpf_running && stab->progs.skb_parser)
 				strp_stop = true;
-			if (psock->parser.enabled && stab->progs.skb_verdict)
+			if (psock->bpf_running && stab->progs.skb_verdict)
 				verdict_stop = true;
 			list_del(&link->list);
 			sk_psock_free_link(link);
@@ -283,14 +283,14 @@ static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs,
 		goto out_drop;
 
 	write_lock_bh(&sk->sk_callback_lock);
-	if (skb_parser && skb_verdict && !psock->parser.enabled) {
+	if (skb_parser && skb_verdict && !psock->bpf_running) {
 		ret = sk_psock_init_strp(sk, psock);
 		if (ret)
 			goto out_unlock_drop;
 		psock_set_prog(&psock->progs.skb_verdict, skb_verdict);
 		psock_set_prog(&psock->progs.skb_parser, skb_parser);
 		sk_psock_start_strp(sk, psock);
-	} else if (!skb_parser && skb_verdict && !psock->parser.enabled) {
+	} else if (!skb_parser && skb_verdict && !psock->bpf_running) {
 		psock_set_prog(&psock->progs.skb_verdict, skb_verdict);
 		sk_psock_start_verdict(sk,psock);
 	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Patch bpf-next 03/19] skmsg: use skb ext instead of TCP_SKB_CB
  2021-02-03  4:16 [Patch bpf-next 00/19] sock_map: add non-TCP and cross-protocol support Cong Wang
  2021-02-03  4:16 ` [Patch bpf-next 01/19] bpf: rename BPF_STREAM_PARSER to BPF_SOCK_MAP Cong Wang
  2021-02-03  4:16 ` [Patch bpf-next 02/19] skmsg: get rid of struct sk_psock_parser Cong Wang
@ 2021-02-03  4:16 ` Cong Wang
  2021-02-05 22:09   ` Jakub Sitnicki
  2021-02-03  4:16 ` [Patch bpf-next 04/19] sock_map: rename skb_parser and skb_verdict Cong Wang
                   ` (16 subsequent siblings)
  19 siblings, 1 reply; 40+ messages in thread
From: Cong Wang @ 2021-02-03  4:16 UTC (permalink / raw)
  To: netdev
  Cc: bpf, duanxiongchun, wangdongdong.6, jiang.wang, Cong Wang,
	John Fastabend, Daniel Borkmann, Jakub Sitnicki, Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

Currently TCP_SKB_CB() is hard-coded in skmsg code, it certainly
won't work for any other non-TCP protocols. We can move them to
skb ext instead of playing with skb cb, which is harder to make
correct.

Of course, except ->data_end, which is used by
sk_skb_convert_ctx_access() to adjust compile-time constant offset.
Fortunately, we can reuse the anonymous union where the field
'tcp_tsorted_anchor' is and save/restore the overwritten part
before/after a brief use.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 include/linux/skbuff.h |  4 ++++
 include/linux/skmsg.h  | 45 ++++++++++++++++++++++++++++++++++++++++++
 include/net/tcp.h      | 25 -----------------------
 net/Kconfig            |  1 +
 net/core/filter.c      |  3 +--
 net/core/skbuff.c      |  7 +++++++
 net/core/skmsg.c       | 44 ++++++++++++++++++++++++++++-------------
 net/core/sock_map.c    | 12 +++++------
 8 files changed, 94 insertions(+), 47 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 46f901adf1a8..12a28268233a 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -755,6 +755,7 @@ struct sk_buff {
 			void		(*destructor)(struct sk_buff *skb);
 		};
 		struct list_head	tcp_tsorted_anchor;
+		void			*data_end;
 	};
 
 #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
@@ -4166,6 +4167,9 @@ enum skb_ext_id {
 #endif
 #if IS_ENABLED(CONFIG_MPTCP)
 	SKB_EXT_MPTCP,
+#endif
+#if IS_ENABLED(CONFIG_NET_SOCK_MSG)
+	SKB_EXT_BPF,
 #endif
 	SKB_EXT_NUM, /* must be last */
 };
diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index 56d641df3b0c..e212b0d1ba35 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -438,4 +438,49 @@ static inline bool sk_psock_strp_enabled(struct sk_psock *psock)
 		return false;
 	return psock->bpf_running;
 }
+
+struct skb_bpf_ext {
+	__u32 flags;
+	struct sock *sk_redir;
+};
+
+static inline void bpf_compute_data_end_sk_skb(struct sk_buff *skb)
+{
+	skb->data_end = skb->data + skb_headlen(skb);
+}
+
+#if IS_ENABLED(CONFIG_NET_SOCK_MSG)
+static inline
+bool skb_bpf_ext_ingress(const struct sk_buff *skb)
+{
+	struct skb_bpf_ext *ext = skb_ext_find(skb, SKB_EXT_BPF);
+
+	return ext->flags & BPF_F_INGRESS;
+}
+
+static inline
+void skb_bpf_ext_set_ingress(const struct sk_buff *skb)
+{
+	struct skb_bpf_ext *ext = skb_ext_find(skb, SKB_EXT_BPF);
+
+	ext->flags |= BPF_F_INGRESS;
+}
+
+static inline
+struct sock *skb_bpf_ext_redirect_fetch(struct sk_buff *skb)
+{
+	struct skb_bpf_ext *ext = skb_ext_find(skb, SKB_EXT_BPF);
+
+	return ext->sk_redir;
+}
+
+static inline
+void skb_bpf_ext_redirect_clear(struct sk_buff *skb)
+{
+	struct skb_bpf_ext *ext = skb_ext_find(skb, SKB_EXT_BPF);
+
+	ext->flags = 0;
+	ext->sk_redir = NULL;
+}
+#endif /* CONFIG_NET_SOCK_MSG */
 #endif /* _LINUX_SKMSG_H */
diff --git a/include/net/tcp.h b/include/net/tcp.h
index be66571ad122..f7591768525d 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -882,36 +882,11 @@ struct tcp_skb_cb {
 			struct inet6_skb_parm	h6;
 #endif
 		} header;	/* For incoming skbs */
-		struct {
-			__u32 flags;
-			struct sock *sk_redir;
-			void *data_end;
-		} bpf;
 	};
 };
 
 #define TCP_SKB_CB(__skb)	((struct tcp_skb_cb *)&((__skb)->cb[0]))
 
-static inline void bpf_compute_data_end_sk_skb(struct sk_buff *skb)
-{
-	TCP_SKB_CB(skb)->bpf.data_end = skb->data + skb_headlen(skb);
-}
-
-static inline bool tcp_skb_bpf_ingress(const struct sk_buff *skb)
-{
-	return TCP_SKB_CB(skb)->bpf.flags & BPF_F_INGRESS;
-}
-
-static inline struct sock *tcp_skb_bpf_redirect_fetch(struct sk_buff *skb)
-{
-	return TCP_SKB_CB(skb)->bpf.sk_redir;
-}
-
-static inline void tcp_skb_bpf_redirect_clear(struct sk_buff *skb)
-{
-	TCP_SKB_CB(skb)->bpf.sk_redir = NULL;
-}
-
 extern const struct inet_connection_sock_af_ops ipv4_specific;
 
 #if IS_ENABLED(CONFIG_IPV6)
diff --git a/net/Kconfig b/net/Kconfig
index 0cc0805a8127..1e45bcaa23f1 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -422,6 +422,7 @@ config SOCK_VALIDATE_XMIT
 
 config NET_SOCK_MSG
 	bool
+	select SKB_EXTENSIONS
 	default n
 	help
 	  The NET_SOCK_MSG provides a framework for plain sockets (e.g. TCP) or
diff --git a/net/core/filter.c b/net/core/filter.c
index e15d4741719a..c1a19a663630 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -9532,8 +9532,7 @@ static u32 sk_skb_convert_ctx_access(enum bpf_access_type type,
 	case offsetof(struct __sk_buff, data_end):
 		off  = si->off;
 		off -= offsetof(struct __sk_buff, data_end);
-		off += offsetof(struct sk_buff, cb);
-		off += offsetof(struct tcp_skb_cb, bpf.data_end);
+		off += offsetof(struct sk_buff, data_end);
 		*insn++ = BPF_LDX_MEM(BPF_SIZEOF(void *), si->dst_reg,
 				      si->src_reg, off);
 		break;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 145503d3f06b..7695a2b65832 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -60,6 +60,7 @@
 #include <linux/prefetch.h>
 #include <linux/if_vlan.h>
 #include <linux/mpls.h>
+#include <linux/skmsg.h>
 
 #include <net/protocol.h>
 #include <net/dst.h>
@@ -4259,6 +4260,9 @@ static const u8 skb_ext_type_len[] = {
 #if IS_ENABLED(CONFIG_MPTCP)
 	[SKB_EXT_MPTCP] = SKB_EXT_CHUNKSIZEOF(struct mptcp_ext),
 #endif
+#if IS_ENABLED(CONFIG_NET_SOCK_MSG)
+	[SKB_EXT_BPF] = SKB_EXT_CHUNKSIZEOF(struct skb_bpf_ext),
+#endif
 };
 
 static __always_inline unsigned int skb_ext_total_length(void)
@@ -4275,6 +4279,9 @@ static __always_inline unsigned int skb_ext_total_length(void)
 #endif
 #if IS_ENABLED(CONFIG_MPTCP)
 		skb_ext_type_len[SKB_EXT_MPTCP] +
+#endif
+#if IS_ENABLED(CONFIG_NET_SOCK_MSG)
+		skb_ext_type_len[SKB_EXT_BPF] +
 #endif
 		0;
 }
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index f72fcb03d25c..2b5b8f05187a 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -525,7 +525,8 @@ static void sk_psock_backlog(struct work_struct *work)
 		len = skb->len;
 		off = 0;
 start:
-		ingress = tcp_skb_bpf_ingress(skb);
+		ingress = skb_bpf_ext_ingress(skb);
+		skb_ext_del(skb, SKB_EXT_BPF);
 		do {
 			ret = -EIO;
 			if (likely(psock->sk->sk_socket))
@@ -746,8 +747,13 @@ EXPORT_SYMBOL_GPL(sk_psock_msg_verdict);
 static int sk_psock_bpf_run(struct sk_psock *psock, struct bpf_prog *prog,
 			    struct sk_buff *skb)
 {
-	bpf_compute_data_end_sk_skb(skb);
-	return bpf_prog_run_pin_on_cpu(prog, skb);
+	int ret;
+
+	tcp_skb_tsorted_save(skb) {
+		bpf_compute_data_end_sk_skb(skb);
+		ret = bpf_prog_run_pin_on_cpu(prog, skb);
+	} tcp_skb_tsorted_restore(skb);
+	return ret;
 }
 
 static void sk_psock_skb_redirect(struct sk_buff *skb)
@@ -755,7 +761,7 @@ static void sk_psock_skb_redirect(struct sk_buff *skb)
 	struct sk_psock *psock_other;
 	struct sock *sk_other;
 
-	sk_other = tcp_skb_bpf_redirect_fetch(skb);
+	sk_other = skb_bpf_ext_redirect_fetch(skb);
 	/* This error is a buggy BPF program, it returned a redirect
 	 * return code, but then didn't set a redirect interface.
 	 */
@@ -797,6 +803,9 @@ int sk_psock_tls_strp_read(struct sk_psock *psock, struct sk_buff *skb)
 	struct bpf_prog *prog;
 	int ret = __SK_PASS;
 
+	if (!skb_ext_add(skb, SKB_EXT_BPF))
+		return __SK_DROP;
+
 	rcu_read_lock();
 	prog = READ_ONCE(psock->progs.skb_verdict);
 	if (likely(prog)) {
@@ -805,9 +814,9 @@ int sk_psock_tls_strp_read(struct sk_psock *psock, struct sk_buff *skb)
 		 * TLS context.
 		 */
 		skb->sk = psock->sk;
-		tcp_skb_bpf_redirect_clear(skb);
+		skb_bpf_ext_redirect_clear(skb);
 		ret = sk_psock_bpf_run(psock, prog, skb);
-		ret = sk_psock_map_verd(ret, tcp_skb_bpf_redirect_fetch(skb));
+		ret = sk_psock_map_verd(ret, skb_bpf_ext_redirect_fetch(skb));
 		skb->sk = NULL;
 	}
 	sk_psock_tls_verdict_apply(skb, psock->sk, ret);
@@ -819,7 +828,6 @@ EXPORT_SYMBOL_GPL(sk_psock_tls_strp_read);
 static void sk_psock_verdict_apply(struct sk_psock *psock,
 				   struct sk_buff *skb, int verdict)
 {
-	struct tcp_skb_cb *tcp;
 	struct sock *sk_other;
 	int err = -EIO;
 
@@ -831,9 +839,7 @@ static void sk_psock_verdict_apply(struct sk_psock *psock,
 			goto out_free;
 		}
 
-		tcp = TCP_SKB_CB(skb);
-		tcp->bpf.flags |= BPF_F_INGRESS;
-
+		skb_bpf_ext_set_ingress(skb);
 		/* If the queue is empty then we can submit directly
 		 * into the msg queue. If its not empty we have to
 		 * queue work otherwise we may get OOO data. Otherwise,
@@ -873,11 +879,15 @@ static void sk_psock_strp_read(struct strparser *strp, struct sk_buff *skb)
 		goto out;
 	}
 	skb_set_owner_r(skb, sk);
+	if (!skb_ext_add(skb, SKB_EXT_BPF)) {
+		kfree_skb(skb);
+		goto out;
+	}
 	prog = READ_ONCE(psock->progs.skb_verdict);
 	if (likely(prog)) {
-		tcp_skb_bpf_redirect_clear(skb);
+		skb_bpf_ext_redirect_clear(skb);
 		ret = sk_psock_bpf_run(psock, prog, skb);
-		ret = sk_psock_map_verd(ret, tcp_skb_bpf_redirect_fetch(skb));
+		ret = sk_psock_map_verd(ret, skb_bpf_ext_redirect_fetch(skb));
 	}
 	sk_psock_verdict_apply(psock, skb, ret);
 out:
@@ -949,11 +959,17 @@ static int sk_psock_verdict_recv(read_descriptor_t *desc, struct sk_buff *skb,
 		goto out;
 	}
 	skb_set_owner_r(skb, sk);
+	if (!skb_ext_add(skb, SKB_EXT_BPF)) {
+		len = 0;
+		kfree_skb(skb);
+		goto out;
+	}
+
 	prog = READ_ONCE(psock->progs.skb_verdict);
 	if (likely(prog)) {
-		tcp_skb_bpf_redirect_clear(skb);
+		skb_bpf_ext_redirect_clear(skb);
 		ret = sk_psock_bpf_run(psock, prog, skb);
-		ret = sk_psock_map_verd(ret, tcp_skb_bpf_redirect_fetch(skb));
+		ret = sk_psock_map_verd(ret, skb_bpf_ext_redirect_fetch(skb));
 	}
 	sk_psock_verdict_apply(psock, skb, ret);
 out:
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 37ff8e13e4cc..7b70fa290836 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -657,7 +657,7 @@ const struct bpf_func_proto bpf_sock_map_update_proto = {
 BPF_CALL_4(bpf_sk_redirect_map, struct sk_buff *, skb,
 	   struct bpf_map *, map, u32, key, u64, flags)
 {
-	struct tcp_skb_cb *tcb = TCP_SKB_CB(skb);
+	struct skb_bpf_ext *ext = skb_ext_find(skb, SKB_EXT_BPF);
 	struct sock *sk;
 
 	if (unlikely(flags & ~(BPF_F_INGRESS)))
@@ -667,8 +667,8 @@ BPF_CALL_4(bpf_sk_redirect_map, struct sk_buff *, skb,
 	if (unlikely(!sk || !sock_map_redirect_allowed(sk)))
 		return SK_DROP;
 
-	tcb->bpf.flags = flags;
-	tcb->bpf.sk_redir = sk;
+	ext->flags = flags;
+	ext->sk_redir = sk;
 	return SK_PASS;
 }
 
@@ -1250,7 +1250,7 @@ const struct bpf_func_proto bpf_sock_hash_update_proto = {
 BPF_CALL_4(bpf_sk_redirect_hash, struct sk_buff *, skb,
 	   struct bpf_map *, map, void *, key, u64, flags)
 {
-	struct tcp_skb_cb *tcb = TCP_SKB_CB(skb);
+	struct skb_bpf_ext *ext = skb_ext_find(skb, SKB_EXT_BPF);
 	struct sock *sk;
 
 	if (unlikely(flags & ~(BPF_F_INGRESS)))
@@ -1260,8 +1260,8 @@ BPF_CALL_4(bpf_sk_redirect_hash, struct sk_buff *, skb,
 	if (unlikely(!sk || !sock_map_redirect_allowed(sk)))
 		return SK_DROP;
 
-	tcb->bpf.flags = flags;
-	tcb->bpf.sk_redir = sk;
+	ext->flags = flags;
+	ext->sk_redir = sk;
 	return SK_PASS;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Patch bpf-next 04/19] sock_map: rename skb_parser and skb_verdict
  2021-02-03  4:16 [Patch bpf-next 00/19] sock_map: add non-TCP and cross-protocol support Cong Wang
                   ` (2 preceding siblings ...)
  2021-02-03  4:16 ` [Patch bpf-next 03/19] skmsg: use skb ext instead of TCP_SKB_CB Cong Wang
@ 2021-02-03  4:16 ` Cong Wang
  2021-02-08  8:27   ` John Fastabend
  2021-02-03  4:16 ` [Patch bpf-next 05/19] sock_map: introduce BPF_SK_SKB_VERDICT Cong Wang
                   ` (15 subsequent siblings)
  19 siblings, 1 reply; 40+ messages in thread
From: Cong Wang @ 2021-02-03  4:16 UTC (permalink / raw)
  To: netdev
  Cc: bpf, duanxiongchun, wangdongdong.6, jiang.wang, Cong Wang,
	John Fastabend, Daniel Borkmann, Jakub Sitnicki, Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

These two ebpf programs are tied to BPF_SK_SKB_STREAM_PARSER
and BPF_SK_SKB_STREAM_VERDICT, rename them to reflect the fact
they are currently used for TCP. And save the generic name
skb_verdict for general use.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 include/linux/skmsg.h                         |  8 +--
 net/core/skmsg.c                              | 14 ++---
 net/core/sock_map.c                           | 60 +++++++++----------
 .../selftests/bpf/prog_tests/sockmap_listen.c |  8 +--
 .../selftests/bpf/progs/test_sockmap_listen.c |  4 +-
 5 files changed, 47 insertions(+), 47 deletions(-)

diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index e212b0d1ba35..218566ac4fa1 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -56,8 +56,8 @@ struct sk_msg {
 
 struct sk_psock_progs {
 	struct bpf_prog			*msg_parser;
-	struct bpf_prog			*skb_parser;
-	struct bpf_prog			*skb_verdict;
+	struct bpf_prog			*stream_parser;
+	struct bpf_prog			*stream_verdict;
 };
 
 enum sk_psock_state_bits {
@@ -426,8 +426,8 @@ static inline int psock_replace_prog(struct bpf_prog **pprog,
 static inline void psock_progs_drop(struct sk_psock_progs *progs)
 {
 	psock_set_prog(&progs->msg_parser, NULL);
-	psock_set_prog(&progs->skb_parser, NULL);
-	psock_set_prog(&progs->skb_verdict, NULL);
+	psock_set_prog(&progs->stream_parser, NULL);
+	psock_set_prog(&progs->stream_verdict, NULL);
 }
 
 int sk_psock_tls_strp_read(struct sk_psock *psock, struct sk_buff *skb);
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 2b5b8f05187a..51446fe63be5 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -653,7 +653,7 @@ static void sk_psock_destroy_deferred(struct work_struct *gc)
 	/* No sk_callback_lock since already detached. */
 
 	/* Parser has been stopped */
-	if (psock->progs.skb_parser)
+	if (psock->progs.stream_parser)
 		strp_done(&psock->strp);
 
 	cancel_work_sync(&psock->work);
@@ -686,9 +686,9 @@ void sk_psock_drop(struct sock *sk, struct sk_psock *psock)
 	write_lock_bh(&sk->sk_callback_lock);
 	sk_psock_restore_proto(sk, psock);
 	rcu_assign_sk_user_data(sk, NULL);
-	if (psock->progs.skb_parser)
+	if (psock->progs.stream_parser)
 		sk_psock_stop_strp(sk, psock);
-	else if (psock->progs.skb_verdict)
+	else if (psock->progs.stream_verdict)
 		sk_psock_stop_verdict(sk, psock);
 	write_unlock_bh(&sk->sk_callback_lock);
 	sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED);
@@ -807,7 +807,7 @@ int sk_psock_tls_strp_read(struct sk_psock *psock, struct sk_buff *skb)
 		return __SK_DROP;
 
 	rcu_read_lock();
-	prog = READ_ONCE(psock->progs.skb_verdict);
+	prog = READ_ONCE(psock->progs.stream_verdict);
 	if (likely(prog)) {
 		/* We skip full set_owner_r here because if we do a SK_PASS
 		 * or SK_DROP we can skip skb memory accounting and use the
@@ -883,7 +883,7 @@ static void sk_psock_strp_read(struct strparser *strp, struct sk_buff *skb)
 		kfree_skb(skb);
 		goto out;
 	}
-	prog = READ_ONCE(psock->progs.skb_verdict);
+	prog = READ_ONCE(psock->progs.stream_verdict);
 	if (likely(prog)) {
 		skb_bpf_ext_redirect_clear(skb);
 		ret = sk_psock_bpf_run(psock, prog, skb);
@@ -906,7 +906,7 @@ static int sk_psock_strp_parse(struct strparser *strp, struct sk_buff *skb)
 	int ret = skb->len;
 
 	rcu_read_lock();
-	prog = READ_ONCE(psock->progs.skb_parser);
+	prog = READ_ONCE(psock->progs.stream_parser);
 	if (likely(prog)) {
 		skb->sk = psock->sk;
 		ret = sk_psock_bpf_run(psock, prog, skb);
@@ -965,7 +965,7 @@ static int sk_psock_verdict_recv(read_descriptor_t *desc, struct sk_buff *skb,
 		goto out;
 	}
 
-	prog = READ_ONCE(psock->progs.skb_verdict);
+	prog = READ_ONCE(psock->progs.stream_verdict);
 	if (likely(prog)) {
 		skb_bpf_ext_redirect_clear(skb);
 		ret = sk_psock_bpf_run(psock, prog, skb);
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 7b70fa290836..521663582982 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -148,9 +148,9 @@ static void sock_map_del_link(struct sock *sk,
 			struct bpf_map *map = link->map;
 			struct bpf_stab *stab = container_of(map, struct bpf_stab,
 							     map);
-			if (psock->bpf_running && stab->progs.skb_parser)
+			if (psock->bpf_running && stab->progs.stream_parser)
 				strp_stop = true;
-			if (psock->bpf_running && stab->progs.skb_verdict)
+			if (psock->bpf_running && stab->progs.stream_verdict)
 				verdict_stop = true;
 			list_del(&link->list);
 			sk_psock_free_link(link);
@@ -224,23 +224,23 @@ static struct sk_psock *sock_map_psock_get_checked(struct sock *sk)
 static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs,
 			 struct sock *sk)
 {
-	struct bpf_prog *msg_parser, *skb_parser, *skb_verdict;
+	struct bpf_prog *msg_parser, *stream_parser, *stream_verdict;
 	struct sk_psock *psock;
 	int ret;
 
-	skb_verdict = READ_ONCE(progs->skb_verdict);
-	if (skb_verdict) {
-		skb_verdict = bpf_prog_inc_not_zero(skb_verdict);
-		if (IS_ERR(skb_verdict))
-			return PTR_ERR(skb_verdict);
+	stream_verdict = READ_ONCE(progs->stream_verdict);
+	if (stream_verdict) {
+		stream_verdict = bpf_prog_inc_not_zero(stream_verdict);
+		if (IS_ERR(stream_verdict))
+			return PTR_ERR(stream_verdict);
 	}
 
-	skb_parser = READ_ONCE(progs->skb_parser);
-	if (skb_parser) {
-		skb_parser = bpf_prog_inc_not_zero(skb_parser);
-		if (IS_ERR(skb_parser)) {
-			ret = PTR_ERR(skb_parser);
-			goto out_put_skb_verdict;
+	stream_parser = READ_ONCE(progs->stream_parser);
+	if (stream_parser) {
+		stream_parser = bpf_prog_inc_not_zero(stream_parser);
+		if (IS_ERR(stream_parser)) {
+			ret = PTR_ERR(stream_parser);
+			goto out_put_stream_verdict;
 		}
 	}
 
@@ -249,7 +249,7 @@ static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs,
 		msg_parser = bpf_prog_inc_not_zero(msg_parser);
 		if (IS_ERR(msg_parser)) {
 			ret = PTR_ERR(msg_parser);
-			goto out_put_skb_parser;
+			goto out_put_stream_parser;
 		}
 	}
 
@@ -261,8 +261,8 @@ static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs,
 
 	if (psock) {
 		if ((msg_parser && READ_ONCE(psock->progs.msg_parser)) ||
-		    (skb_parser  && READ_ONCE(psock->progs.skb_parser)) ||
-		    (skb_verdict && READ_ONCE(psock->progs.skb_verdict))) {
+		    (stream_parser  && READ_ONCE(psock->progs.stream_parser)) ||
+		    (stream_verdict && READ_ONCE(psock->progs.stream_verdict))) {
 			sk_psock_put(sk, psock);
 			ret = -EBUSY;
 			goto out_progs;
@@ -283,15 +283,15 @@ static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs,
 		goto out_drop;
 
 	write_lock_bh(&sk->sk_callback_lock);
-	if (skb_parser && skb_verdict && !psock->bpf_running) {
+	if (stream_parser && stream_verdict && !psock->bpf_running) {
 		ret = sk_psock_init_strp(sk, psock);
 		if (ret)
 			goto out_unlock_drop;
-		psock_set_prog(&psock->progs.skb_verdict, skb_verdict);
-		psock_set_prog(&psock->progs.skb_parser, skb_parser);
+		psock_set_prog(&psock->progs.stream_verdict, stream_verdict);
+		psock_set_prog(&psock->progs.stream_parser, stream_parser);
 		sk_psock_start_strp(sk, psock);
-	} else if (!skb_parser && skb_verdict && !psock->bpf_running) {
-		psock_set_prog(&psock->progs.skb_verdict, skb_verdict);
+	} else if (!stream_parser && stream_verdict && !psock->bpf_running) {
+		psock_set_prog(&psock->progs.stream_verdict, stream_verdict);
 		sk_psock_start_verdict(sk,psock);
 	}
 	write_unlock_bh(&sk->sk_callback_lock);
@@ -303,12 +303,12 @@ static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs,
 out_progs:
 	if (msg_parser)
 		bpf_prog_put(msg_parser);
-out_put_skb_parser:
-	if (skb_parser)
-		bpf_prog_put(skb_parser);
-out_put_skb_verdict:
-	if (skb_verdict)
-		bpf_prog_put(skb_verdict);
+out_put_stream_parser:
+	if (stream_parser)
+		bpf_prog_put(stream_parser);
+out_put_stream_verdict:
+	if (stream_verdict)
+		bpf_prog_put(stream_verdict);
 	return ret;
 }
 
@@ -1462,10 +1462,10 @@ int sock_map_prog_update(struct bpf_map *map, struct bpf_prog *prog,
 		pprog = &progs->msg_parser;
 		break;
 	case BPF_SK_SKB_STREAM_PARSER:
-		pprog = &progs->skb_parser;
+		pprog = &progs->stream_parser;
 		break;
 	case BPF_SK_SKB_STREAM_VERDICT:
-		pprog = &progs->skb_verdict;
+		pprog = &progs->stream_verdict;
 		break;
 	default:
 		return -EOPNOTSUPP;
diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
index d7d65a700799..c26e6bf05e49 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
@@ -1014,8 +1014,8 @@ static void test_skb_redir_to_connected(struct test_sockmap_listen *skel,
 					struct bpf_map *inner_map, int family,
 					int sotype)
 {
-	int verdict = bpf_program__fd(skel->progs.prog_skb_verdict);
-	int parser = bpf_program__fd(skel->progs.prog_skb_parser);
+	int verdict = bpf_program__fd(skel->progs.prog_stream_verdict);
+	int parser = bpf_program__fd(skel->progs.prog_stream_parser);
 	int verdict_map = bpf_map__fd(skel->maps.verdict_map);
 	int sock_map = bpf_map__fd(inner_map);
 	int err;
@@ -1125,8 +1125,8 @@ static void test_skb_redir_to_listening(struct test_sockmap_listen *skel,
 					struct bpf_map *inner_map, int family,
 					int sotype)
 {
-	int verdict = bpf_program__fd(skel->progs.prog_skb_verdict);
-	int parser = bpf_program__fd(skel->progs.prog_skb_parser);
+	int verdict = bpf_program__fd(skel->progs.prog_stream_verdict);
+	int parser = bpf_program__fd(skel->progs.prog_stream_parser);
 	int verdict_map = bpf_map__fd(skel->maps.verdict_map);
 	int sock_map = bpf_map__fd(inner_map);
 	int err;
diff --git a/tools/testing/selftests/bpf/progs/test_sockmap_listen.c b/tools/testing/selftests/bpf/progs/test_sockmap_listen.c
index a3a366c57ce1..fa221141e9c1 100644
--- a/tools/testing/selftests/bpf/progs/test_sockmap_listen.c
+++ b/tools/testing/selftests/bpf/progs/test_sockmap_listen.c
@@ -31,13 +31,13 @@ struct {
 static volatile bool test_sockmap; /* toggled by user-space */
 
 SEC("sk_skb/stream_parser")
-int prog_skb_parser(struct __sk_buff *skb)
+int prog_stream_parser(struct __sk_buff *skb)
 {
 	return skb->len;
 }
 
 SEC("sk_skb/stream_verdict")
-int prog_skb_verdict(struct __sk_buff *skb)
+int prog_stream_verdict(struct __sk_buff *skb)
 {
 	unsigned int *count;
 	__u32 zero = 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Patch bpf-next 05/19] sock_map: introduce BPF_SK_SKB_VERDICT
  2021-02-03  4:16 [Patch bpf-next 00/19] sock_map: add non-TCP and cross-protocol support Cong Wang
                   ` (3 preceding siblings ...)
  2021-02-03  4:16 ` [Patch bpf-next 04/19] sock_map: rename skb_parser and skb_verdict Cong Wang
@ 2021-02-03  4:16 ` Cong Wang
  2021-02-08  8:31   ` John Fastabend
  2021-02-03  4:16 ` [Patch bpf-next 06/19] sock: introduce sk_prot->update_proto() Cong Wang
                   ` (14 subsequent siblings)
  19 siblings, 1 reply; 40+ messages in thread
From: Cong Wang @ 2021-02-03  4:16 UTC (permalink / raw)
  To: netdev
  Cc: bpf, duanxiongchun, wangdongdong.6, jiang.wang, Cong Wang,
	John Fastabend, Daniel Borkmann, Jakub Sitnicki, Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

I was planning to reuse BPF_SK_SKB_STREAM_VERDICT but its name is
confusing and more importantly it seems kTLS relies on it to deliver
sk_msg too. To avoid messing up kTLS, we can just reuse the stream
verdict code but introduce a new type of eBPF program, skb_verdict.
Users are not allowed to set stream_verdict and skb_verdict at the
same time.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 include/linux/skmsg.h          |  3 +++
 include/uapi/linux/bpf.h       |  1 +
 kernel/bpf/syscall.c           |  1 +
 net/core/skmsg.c               |  4 +++-
 net/core/sock_map.c            | 23 ++++++++++++++++++++++-
 tools/bpf/bpftool/common.c     |  1 +
 tools/bpf/bpftool/prog.c       |  1 +
 tools/include/uapi/linux/bpf.h |  1 +
 8 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index 218566ac4fa1..cb79b1afa556 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -58,6 +58,7 @@ struct sk_psock_progs {
 	struct bpf_prog			*msg_parser;
 	struct bpf_prog			*stream_parser;
 	struct bpf_prog			*stream_verdict;
+	struct bpf_prog			*skb_verdict;
 };
 
 enum sk_psock_state_bits {
@@ -428,6 +429,7 @@ static inline void psock_progs_drop(struct sk_psock_progs *progs)
 	psock_set_prog(&progs->msg_parser, NULL);
 	psock_set_prog(&progs->stream_parser, NULL);
 	psock_set_prog(&progs->stream_verdict, NULL);
+	psock_set_prog(&progs->skb_verdict, NULL);
 }
 
 int sk_psock_tls_strp_read(struct sk_psock *psock, struct sk_buff *skb);
@@ -482,5 +484,6 @@ void skb_bpf_ext_redirect_clear(struct sk_buff *skb)
 	ext->flags = 0;
 	ext->sk_redir = NULL;
 }
+
 #endif /* CONFIG_NET_SOCK_MSG */
 #endif /* _LINUX_SKMSG_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c001766adcbc..c1a412ebfb08 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -247,6 +247,7 @@ enum bpf_attach_type {
 	BPF_XDP_CPUMAP,
 	BPF_SK_LOOKUP,
 	BPF_XDP,
+	BPF_SK_SKB_VERDICT,
 	__MAX_BPF_ATTACH_TYPE
 };
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index e5999d86c76e..a56549fc2825 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2936,6 +2936,7 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type)
 		return BPF_PROG_TYPE_SK_MSG;
 	case BPF_SK_SKB_STREAM_PARSER:
 	case BPF_SK_SKB_STREAM_VERDICT:
+	case BPF_SK_SKB_VERDICT:
 		return BPF_PROG_TYPE_SK_SKB;
 	case BPF_LIRC_MODE2:
 		return BPF_PROG_TYPE_LIRC_MODE2;
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 51446fe63be5..ecbd6f0d49a5 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -688,7 +688,7 @@ void sk_psock_drop(struct sock *sk, struct sk_psock *psock)
 	rcu_assign_sk_user_data(sk, NULL);
 	if (psock->progs.stream_parser)
 		sk_psock_stop_strp(sk, psock);
-	else if (psock->progs.stream_verdict)
+	else if (psock->progs.stream_verdict || psock->progs.skb_verdict)
 		sk_psock_stop_verdict(sk, psock);
 	write_unlock_bh(&sk->sk_callback_lock);
 	sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED);
@@ -966,6 +966,8 @@ static int sk_psock_verdict_recv(read_descriptor_t *desc, struct sk_buff *skb,
 	}
 
 	prog = READ_ONCE(psock->progs.stream_verdict);
+	if (!prog)
+		prog = READ_ONCE(psock->progs.skb_verdict);
 	if (likely(prog)) {
 		skb_bpf_ext_redirect_clear(skb);
 		ret = sk_psock_bpf_run(psock, prog, skb);
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 521663582982..f827f1ecefcc 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -152,6 +152,8 @@ static void sock_map_del_link(struct sock *sk,
 				strp_stop = true;
 			if (psock->bpf_running && stab->progs.stream_verdict)
 				verdict_stop = true;
+			if (psock->bpf_running && stab->progs.skb_verdict)
+				verdict_stop = true;
 			list_del(&link->list);
 			sk_psock_free_link(link);
 		}
@@ -224,7 +226,7 @@ static struct sk_psock *sock_map_psock_get_checked(struct sock *sk)
 static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs,
 			 struct sock *sk)
 {
-	struct bpf_prog *msg_parser, *stream_parser, *stream_verdict;
+	struct bpf_prog *msg_parser, *stream_parser, *stream_verdict, *skb_verdict;
 	struct sk_psock *psock;
 	int ret;
 
@@ -253,6 +255,15 @@ static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs,
 		}
 	}
 
+	skb_verdict = READ_ONCE(progs->skb_verdict);
+	if (skb_verdict) {
+		skb_verdict = bpf_prog_inc_not_zero(skb_verdict);
+		if (IS_ERR(skb_verdict)) {
+			ret = PTR_ERR(skb_verdict);
+			goto out_put_msg_parser;
+		}
+	}
+
 	psock = sock_map_psock_get_checked(sk);
 	if (IS_ERR(psock)) {
 		ret = PTR_ERR(psock);
@@ -262,6 +273,7 @@ static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs,
 	if (psock) {
 		if ((msg_parser && READ_ONCE(psock->progs.msg_parser)) ||
 		    (stream_parser  && READ_ONCE(psock->progs.stream_parser)) ||
+		    (skb_verdict && READ_ONCE(psock->progs.skb_verdict)) ||
 		    (stream_verdict && READ_ONCE(psock->progs.stream_verdict))) {
 			sk_psock_put(sk, psock);
 			ret = -EBUSY;
@@ -293,6 +305,9 @@ static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs,
 	} else if (!stream_parser && stream_verdict && !psock->bpf_running) {
 		psock_set_prog(&psock->progs.stream_verdict, stream_verdict);
 		sk_psock_start_verdict(sk,psock);
+	} else if (!stream_verdict && skb_verdict && !psock->bpf_running) {
+		psock_set_prog(&psock->progs.skb_verdict, skb_verdict);
+		sk_psock_start_verdict(sk, psock);
 	}
 	write_unlock_bh(&sk->sk_callback_lock);
 	return 0;
@@ -301,6 +316,9 @@ static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs,
 out_drop:
 	sk_psock_put(sk, psock);
 out_progs:
+	if (skb_verdict)
+		bpf_prog_put(skb_verdict);
+out_put_msg_parser:
 	if (msg_parser)
 		bpf_prog_put(msg_parser);
 out_put_stream_parser:
@@ -1467,6 +1485,9 @@ int sock_map_prog_update(struct bpf_map *map, struct bpf_prog *prog,
 	case BPF_SK_SKB_STREAM_VERDICT:
 		pprog = &progs->stream_verdict;
 		break;
+	case BPF_SK_SKB_VERDICT:
+		pprog = &progs->skb_verdict;
+		break;
 	default:
 		return -EOPNOTSUPP;
 	}
diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index 65303664417e..1828bba19020 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -57,6 +57,7 @@ const char * const attach_type_name[__MAX_BPF_ATTACH_TYPE] = {
 
 	[BPF_SK_SKB_STREAM_PARSER]	= "sk_skb_stream_parser",
 	[BPF_SK_SKB_STREAM_VERDICT]	= "sk_skb_stream_verdict",
+	[BPF_SK_SKB_VERDICT]		= "sk_skb_verdict",
 	[BPF_SK_MSG_VERDICT]		= "sk_msg_verdict",
 	[BPF_LIRC_MODE2]		= "lirc_mode2",
 	[BPF_FLOW_DISSECTOR]		= "flow_dissector",
diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index 1fe3ba255bad..a78d8c03b7ea 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -76,6 +76,7 @@ enum dump_mode {
 static const char * const attach_type_strings[] = {
 	[BPF_SK_SKB_STREAM_PARSER] = "stream_parser",
 	[BPF_SK_SKB_STREAM_VERDICT] = "stream_verdict",
+	[BPF_SK_SKB_VERDICT] = "skb_verdict",
 	[BPF_SK_MSG_VERDICT] = "msg_verdict",
 	[BPF_FLOW_DISSECTOR] = "flow_dissector",
 	[__MAX_BPF_ATTACH_TYPE] = NULL,
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index c001766adcbc..c1a412ebfb08 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -247,6 +247,7 @@ enum bpf_attach_type {
 	BPF_XDP_CPUMAP,
 	BPF_SK_LOOKUP,
 	BPF_XDP,
+	BPF_SK_SKB_VERDICT,
 	__MAX_BPF_ATTACH_TYPE
 };
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Patch bpf-next 06/19] sock: introduce sk_prot->update_proto()
  2021-02-03  4:16 [Patch bpf-next 00/19] sock_map: add non-TCP and cross-protocol support Cong Wang
                   ` (4 preceding siblings ...)
  2021-02-03  4:16 ` [Patch bpf-next 05/19] sock_map: introduce BPF_SK_SKB_VERDICT Cong Wang
@ 2021-02-03  4:16 ` Cong Wang
  2021-02-03  4:16 ` [Patch bpf-next 07/19] udp: implement ->sendmsg_locked() Cong Wang
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 40+ messages in thread
From: Cong Wang @ 2021-02-03  4:16 UTC (permalink / raw)
  To: netdev
  Cc: bpf, duanxiongchun, wangdongdong.6, jiang.wang, Cong Wang,
	John Fastabend, Daniel Borkmann, Jakub Sitnicki, Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

Currently sockmap calls into each protocol to update the struct
proto and replace it. This certainly won't work when the protocol
is implemented as a module, for example, AF_UNIX.

Introduce a new ops sk->sk_prot->update_proto(), so each protocol
can implement its own way to replace the struct proto.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 include/linux/skmsg.h | 18 +++---------------
 include/net/sock.h    |  3 +++
 include/net/tcp.h     |  2 +-
 include/net/udp.h     |  2 +-
 net/core/sock_map.c   | 22 +++-------------------
 net/ipv4/tcp_bpf.c    | 20 +++++++++++++++++---
 net/ipv4/tcp_ipv4.c   |  3 +++
 net/ipv4/udp.c        |  3 +++
 net/ipv4/udp_bpf.c    | 14 ++++++++++++--
 net/ipv6/tcp_ipv6.c   |  3 +++
 net/ipv6/udp.c        |  3 +++
 11 files changed, 52 insertions(+), 41 deletions(-)

diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index cb79b1afa556..cb94d0f89c08 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -97,6 +97,7 @@ struct sk_psock {
 	void (*saved_close)(struct sock *sk, long timeout);
 	void (*saved_write_space)(struct sock *sk);
 	void (*saved_data_ready)(struct sock *sk);
+	int  (*saved_update_proto)(struct sock *sk, bool restore);
 	struct proto			*sk_proto;
 	struct sk_psock_work_state	work_state;
 	struct work_struct		work;
@@ -335,25 +336,12 @@ static inline void sk_psock_cork_free(struct sk_psock *psock)
 	}
 }
 
-static inline void sk_psock_update_proto(struct sock *sk,
-					 struct sk_psock *psock,
-					 struct proto *ops)
-{
-	/* Pairs with lockless read in sk_clone_lock() */
-	WRITE_ONCE(sk->sk_prot, ops);
-}
-
 static inline void sk_psock_restore_proto(struct sock *sk,
 					  struct sk_psock *psock)
 {
 	sk->sk_prot->unhash = psock->saved_unhash;
-	if (inet_csk_has_ulp(sk)) {
-		tcp_update_ulp(sk, psock->sk_proto, psock->saved_write_space);
-	} else {
-		sk->sk_write_space = psock->saved_write_space;
-		/* Pairs with lockless read in sk_clone_lock() */
-		WRITE_ONCE(sk->sk_prot, psock->sk_proto);
-	}
+	if (psock->saved_update_proto)
+		psock->saved_update_proto(sk, true);
 }
 
 static inline void sk_psock_set_state(struct sk_psock *psock,
diff --git a/include/net/sock.h b/include/net/sock.h
index 7644ea64a376..e474a9202be8 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1184,6 +1184,9 @@ struct proto {
 	void			(*unhash)(struct sock *sk);
 	void			(*rehash)(struct sock *sk);
 	int			(*get_port)(struct sock *sk, unsigned short snum);
+#ifdef CONFIG_BPF_SOCK_MAP
+	int			(*update_proto)(struct sock *sk, bool restore);
+#endif
 
 	/* Keeping track of sockets in use */
 #ifdef CONFIG_PROC_FS
diff --git a/include/net/tcp.h b/include/net/tcp.h
index f7591768525d..c2fff35859b6 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2183,7 +2183,7 @@ struct sk_msg;
 struct sk_psock;
 
 #ifdef CONFIG_BPF_SOCK_MAP
-struct proto *tcp_bpf_get_proto(struct sock *sk, struct sk_psock *psock);
+int tcp_bpf_update_proto(struct sock *sk, bool restore);
 void tcp_bpf_clone(const struct sock *sk, struct sock *newsk);
 #else
 static inline void tcp_bpf_clone(const struct sock *sk, struct sock *newsk)
diff --git a/include/net/udp.h b/include/net/udp.h
index 0ff921e6b866..e3e5dfc8e0f0 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -513,7 +513,7 @@ static inline struct sk_buff *udp_rcv_segment(struct sock *sk,
 
 #ifdef CONFIG_BPF_SOCK_MAP
 struct sk_psock;
-struct proto *udp_bpf_get_proto(struct sock *sk, struct sk_psock *psock);
+int udp_bpf_update_proto(struct sock *sk, bool restore);
 #endif /* CONFIG_BPF_SOCK_MAP */
 
 #endif	/* _UDP_H */
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index f827f1ecefcc..255067e5c73a 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -181,26 +181,10 @@ static void sock_map_unref(struct sock *sk, void *link_raw)
 
 static int sock_map_init_proto(struct sock *sk, struct sk_psock *psock)
 {
-	struct proto *prot;
-
-	switch (sk->sk_type) {
-	case SOCK_STREAM:
-		prot = tcp_bpf_get_proto(sk, psock);
-		break;
-
-	case SOCK_DGRAM:
-		prot = udp_bpf_get_proto(sk, psock);
-		break;
-
-	default:
+	if (!sk->sk_prot->update_proto)
 		return -EINVAL;
-	}
-
-	if (IS_ERR(prot))
-		return PTR_ERR(prot);
-
-	sk_psock_update_proto(sk, psock, prot);
-	return 0;
+	psock->saved_update_proto = sk->sk_prot->update_proto;
+	return sk->sk_prot->update_proto(sk, false);
 }
 
 static struct sk_psock *sock_map_psock_get_checked(struct sock *sk)
diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index 2252f1d90676..16e00802ccba 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -601,19 +601,33 @@ static int tcp_bpf_assert_proto_ops(struct proto *ops)
 	       ops->sendpage == tcp_sendpage ? 0 : -ENOTSUPP;
 }
 
-struct proto *tcp_bpf_get_proto(struct sock *sk, struct sk_psock *psock)
+int tcp_bpf_update_proto(struct sock *sk, bool restore)
 {
+	struct sk_psock *psock = sk_psock(sk);
 	int family = sk->sk_family == AF_INET6 ? TCP_BPF_IPV6 : TCP_BPF_IPV4;
 	int config = psock->progs.msg_parser   ? TCP_BPF_TX   : TCP_BPF_BASE;
 
+	if (restore) {
+		if (inet_csk_has_ulp(sk)) {
+			tcp_update_ulp(sk, psock->sk_proto, psock->saved_write_space);
+		} else {
+			sk->sk_write_space = psock->saved_write_space;
+			/* Pairs with lockless read in sk_clone_lock() */
+			WRITE_ONCE(sk->sk_prot, psock->sk_proto);
+		}
+		return 0;
+	}
+
 	if (sk->sk_family == AF_INET6) {
 		if (tcp_bpf_assert_proto_ops(psock->sk_proto))
-			return ERR_PTR(-EINVAL);
+			return -EINVAL;
 
 		tcp_bpf_check_v6_needs_rebuild(psock->sk_proto);
 	}
 
-	return &tcp_bpf_prots[family][config];
+	/* Pairs with lockless read in sk_clone_lock() */
+	WRITE_ONCE(sk->sk_prot, &tcp_bpf_prots[family][config]);
+	return 0;
 }
 
 /* If a child got cloned from a listening socket that had tcp_bpf
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 62b6fd385a47..d7c30b762cc3 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2803,6 +2803,9 @@ struct proto tcp_prot = {
 	.hash			= inet_hash,
 	.unhash			= inet_unhash,
 	.get_port		= inet_csk_get_port,
+#ifdef CONFIG_BPF_SOCK_MAP
+	.update_proto		= tcp_bpf_update_proto,
+#endif
 	.enter_memory_pressure	= tcp_enter_memory_pressure,
 	.leave_memory_pressure	= tcp_leave_memory_pressure,
 	.stream_memory_free	= tcp_stream_memory_free,
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index c67e483fce41..84ab4f2e874a 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2843,6 +2843,9 @@ struct proto udp_prot = {
 	.unhash			= udp_lib_unhash,
 	.rehash			= udp_v4_rehash,
 	.get_port		= udp_v4_get_port,
+#ifdef CONFIG_BPF_SOCK_MAP
+	.update_proto		= udp_bpf_update_proto,
+#endif
 	.memory_allocated	= &udp_memory_allocated,
 	.sysctl_mem		= sysctl_udp_mem,
 	.sysctl_wmem_offset	= offsetof(struct net, ipv4.sysctl_udp_wmem_min),
diff --git a/net/ipv4/udp_bpf.c b/net/ipv4/udp_bpf.c
index 7a94791efc1a..595836088e85 100644
--- a/net/ipv4/udp_bpf.c
+++ b/net/ipv4/udp_bpf.c
@@ -41,12 +41,22 @@ static int __init udp_bpf_v4_build_proto(void)
 }
 core_initcall(udp_bpf_v4_build_proto);
 
-struct proto *udp_bpf_get_proto(struct sock *sk, struct sk_psock *psock)
+int udp_bpf_update_proto(struct sock *sk, bool restore)
 {
 	int family = sk->sk_family == AF_INET ? UDP_BPF_IPV4 : UDP_BPF_IPV6;
+	struct sk_psock *psock = sk_psock(sk);
+
+	if (restore) {
+		sk->sk_write_space = psock->saved_write_space;
+		/* Pairs with lockless read in sk_clone_lock() */
+		WRITE_ONCE(sk->sk_prot, psock->sk_proto);
+		return 0;
+	}
 
 	if (sk->sk_family == AF_INET6)
 		udp_bpf_check_v6_needs_rebuild(psock->sk_proto);
 
-	return &udp_bpf_prots[family];
+	/* Pairs with lockless read in sk_clone_lock() */
+	WRITE_ONCE(sk->sk_prot, &udp_bpf_prots[family]);
+	return 0;
 }
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 8539715ff035..77b11799a3fe 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -2131,6 +2131,9 @@ struct proto tcpv6_prot = {
 	.hash			= inet6_hash,
 	.unhash			= inet_unhash,
 	.get_port		= inet_csk_get_port,
+#ifdef CONFIG_BPF_SOCK_MAP
+	.update_proto		= tcp_bpf_update_proto,
+#endif
 	.enter_memory_pressure	= tcp_enter_memory_pressure,
 	.leave_memory_pressure	= tcp_leave_memory_pressure,
 	.stream_memory_free	= tcp_stream_memory_free,
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index a02ac875a923..66ebdfc83c95 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1711,6 +1711,9 @@ struct proto udpv6_prot = {
 	.unhash			= udp_lib_unhash,
 	.rehash			= udp_v6_rehash,
 	.get_port		= udp_v6_get_port,
+#ifdef CONFIG_BPF_SOCK_MAP
+	.update_proto		= udp_bpf_update_proto,
+#endif
 	.memory_allocated	= &udp_memory_allocated,
 	.sysctl_mem		= sysctl_udp_mem,
 	.sysctl_wmem_offset     = offsetof(struct net, ipv4.sysctl_udp_wmem_min),
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Patch bpf-next 07/19] udp: implement ->sendmsg_locked()
  2021-02-03  4:16 [Patch bpf-next 00/19] sock_map: add non-TCP and cross-protocol support Cong Wang
                   ` (5 preceding siblings ...)
  2021-02-03  4:16 ` [Patch bpf-next 06/19] sock: introduce sk_prot->update_proto() Cong Wang
@ 2021-02-03  4:16 ` Cong Wang
  2021-02-03  4:16 ` [Patch bpf-next 08/19] udp: implement ->read_sock() for sockmap Cong Wang
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 40+ messages in thread
From: Cong Wang @ 2021-02-03  4:16 UTC (permalink / raw)
  To: netdev
  Cc: bpf, duanxiongchun, wangdongdong.6, jiang.wang, Cong Wang,
	John Fastabend, Daniel Borkmann, Jakub Sitnicki, Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

UDP already has udp_sendmsg() which takes lock_sock() inside.
We have to build ->sendmsg_locked() on top of it, by adding
a new parameter for whether the sock has been locked.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 include/net/udp.h  |  1 +
 net/ipv4/af_inet.c |  1 +
 net/ipv4/udp.c     | 30 +++++++++++++++++++++++-------
 3 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/include/net/udp.h b/include/net/udp.h
index e3e5dfc8e0f0..13f9354dbd3e 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -289,6 +289,7 @@ int udp_get_port(struct sock *sk, unsigned short snum,
 int udp_err(struct sk_buff *, u32);
 int udp_abort(struct sock *sk, int err);
 int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len);
+int udp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t len);
 int udp_push_pending_frames(struct sock *sk);
 void udp_flush_pending_frames(struct sock *sk);
 int udp_cmsg_send(struct sock *sk, struct msghdr *msg, u16 *gso_size);
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index aaa94bea19c3..d184d9379a92 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1071,6 +1071,7 @@ const struct proto_ops inet_dgram_ops = {
 	.setsockopt	   = sock_common_setsockopt,
 	.getsockopt	   = sock_common_getsockopt,
 	.sendmsg	   = inet_sendmsg,
+	.sendmsg_locked    = udp_sendmsg_locked,
 	.recvmsg	   = inet_recvmsg,
 	.mmap		   = sock_no_mmap,
 	.sendpage	   = inet_sendpage,
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 84ab4f2e874a..635e1e8b2968 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1018,7 +1018,7 @@ int udp_cmsg_send(struct sock *sk, struct msghdr *msg, u16 *gso_size)
 }
 EXPORT_SYMBOL_GPL(udp_cmsg_send);
 
-int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
+static int __udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len, bool locked)
 {
 	struct inet_sock *inet = inet_sk(sk);
 	struct udp_sock *up = udp_sk(sk);
@@ -1057,15 +1057,18 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		 * There are pending frames.
 		 * The socket lock must be held while it's corked.
 		 */
-		lock_sock(sk);
+		if (!locked)
+			lock_sock(sk);
 		if (likely(up->pending)) {
 			if (unlikely(up->pending != AF_INET)) {
-				release_sock(sk);
+				if (!locked)
+					release_sock(sk);
 				return -EINVAL;
 			}
 			goto do_append_data;
 		}
-		release_sock(sk);
+		if (!locked)
+			release_sock(sk);
 	}
 	ulen += sizeof(struct udphdr);
 
@@ -1235,11 +1238,13 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		goto out;
 	}
 
-	lock_sock(sk);
+	if (!locked)
+		lock_sock(sk);
 	if (unlikely(up->pending)) {
 		/* The socket is already corked while preparing it. */
 		/* ... which is an evident application bug. --ANK */
-		release_sock(sk);
+		if (!locked)
+			release_sock(sk);
 
 		net_dbg_ratelimited("socket already corked\n");
 		err = -EINVAL;
@@ -1266,7 +1271,8 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		err = udp_push_pending_frames(sk);
 	else if (unlikely(skb_queue_empty(&sk->sk_write_queue)))
 		up->pending = 0;
-	release_sock(sk);
+	if (!locked)
+		release_sock(sk);
 
 out:
 	ip_rt_put(rt);
@@ -1296,8 +1302,18 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	err = 0;
 	goto out;
 }
+
+int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
+{
+	return __udp_sendmsg(sk, msg, len, false);
+}
 EXPORT_SYMBOL(udp_sendmsg);
 
+int udp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t len)
+{
+	return __udp_sendmsg(sk, msg, len, true);
+}
+
 int udp_sendpage(struct sock *sk, struct page *page, int offset,
 		 size_t size, int flags)
 {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Patch bpf-next 08/19] udp: implement ->read_sock() for sockmap
  2021-02-03  4:16 [Patch bpf-next 00/19] sock_map: add non-TCP and cross-protocol support Cong Wang
                   ` (6 preceding siblings ...)
  2021-02-03  4:16 ` [Patch bpf-next 07/19] udp: implement ->sendmsg_locked() Cong Wang
@ 2021-02-03  4:16 ` Cong Wang
  2021-02-08  9:48   ` Lorenz Bauer
  2021-02-03  4:16 ` [Patch bpf-next 09/19] udp: add ->read_sock() and ->sendmsg_locked() to ipv6 Cong Wang
                   ` (11 subsequent siblings)
  19 siblings, 1 reply; 40+ messages in thread
From: Cong Wang @ 2021-02-03  4:16 UTC (permalink / raw)
  To: netdev
  Cc: bpf, duanxiongchun, wangdongdong.6, jiang.wang, Cong Wang,
	John Fastabend, Daniel Borkmann, Jakub Sitnicki, Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 include/net/udp.h  |  2 ++
 net/ipv4/af_inet.c |  1 +
 net/ipv4/udp.c     | 34 ++++++++++++++++++++++++++++++++++
 3 files changed, 37 insertions(+)

diff --git a/include/net/udp.h b/include/net/udp.h
index 13f9354dbd3e..b6b75cabf4e4 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -327,6 +327,8 @@ struct sock *__udp6_lib_lookup(struct net *net,
 			       struct sk_buff *skb);
 struct sock *udp6_lib_lookup_skb(const struct sk_buff *skb,
 				 __be16 sport, __be16 dport);
+int udp_read_sock(struct sock *sk, read_descriptor_t *desc,
+		  sk_read_actor_t recv_actor);
 
 /* UDP uses skb->dev_scratch to cache as much information as possible and avoid
  * possibly multiple cache miss on dequeue()
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index d184d9379a92..4a4c6d3d2786 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1072,6 +1072,7 @@ const struct proto_ops inet_dgram_ops = {
 	.getsockopt	   = sock_common_getsockopt,
 	.sendmsg	   = inet_sendmsg,
 	.sendmsg_locked    = udp_sendmsg_locked,
+	.read_sock	   = udp_read_sock,
 	.recvmsg	   = inet_recvmsg,
 	.mmap		   = sock_no_mmap,
 	.sendpage	   = inet_sendpage,
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 635e1e8b2968..6dffbcec0b51 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1792,6 +1792,40 @@ struct sk_buff *__skb_recv_udp(struct sock *sk, unsigned int flags,
 }
 EXPORT_SYMBOL(__skb_recv_udp);
 
+int udp_read_sock(struct sock *sk, read_descriptor_t *desc,
+		  sk_read_actor_t recv_actor)
+{
+	struct sk_buff *skb;
+	int copied = 0, err;
+
+	while (1) {
+		int offset = 0;
+
+		skb = __skb_recv_udp(sk, 0, 1, &offset, &err);
+		if (!skb)
+			break;
+		if (offset < skb->len) {
+			int used;
+			size_t len;
+
+			len = skb->len - offset;
+			used = recv_actor(desc, skb, offset, len);
+			if (used <= 0) {
+				if (!copied)
+					copied = used;
+				break;
+			} else if (used <= len) {
+				copied += used;
+				offset += used;
+			}
+		}
+		if (!desc->count)
+			break;
+	}
+
+	return copied;
+}
+
 /*
  * 	This should be easy, if there is something there we
  * 	return it, otherwise we block.
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Patch bpf-next 09/19] udp: add ->read_sock() and ->sendmsg_locked() to ipv6
  2021-02-03  4:16 [Patch bpf-next 00/19] sock_map: add non-TCP and cross-protocol support Cong Wang
                   ` (7 preceding siblings ...)
  2021-02-03  4:16 ` [Patch bpf-next 08/19] udp: implement ->read_sock() for sockmap Cong Wang
@ 2021-02-03  4:16 ` Cong Wang
  2021-02-03  4:16 ` [Patch bpf-next 10/19] af_unix: implement ->sendmsg_locked for dgram socket Cong Wang
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 40+ messages in thread
From: Cong Wang @ 2021-02-03  4:16 UTC (permalink / raw)
  To: netdev
  Cc: bpf, duanxiongchun, wangdongdong.6, jiang.wang, Cong Wang,
	John Fastabend, Daniel Borkmann, Jakub Sitnicki, Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

Similarly, udpv6_sendmsg() takes lock_sock() inside too,
we have to build ->sendmsg_locked() on top of it.

For ->read_sock(), we can just use udp_read_sock().

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 include/net/ipv6.h  |  1 +
 net/ipv4/udp.c      |  1 +
 net/ipv6/af_inet6.c |  2 ++
 net/ipv6/udp.c      | 27 +++++++++++++++++++++------
 4 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index bd1f396cc9c7..48b6850dae85 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -1119,6 +1119,7 @@ int inet6_hash_connect(struct inet_timewait_death_row *death_row,
 int inet6_sendmsg(struct socket *sock, struct msghdr *msg, size_t size);
 int inet6_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
 		  int flags);
+int udpv6_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t len);
 
 /*
  * reassembly.c
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 6dffbcec0b51..3acb1be73131 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1825,6 +1825,7 @@ int udp_read_sock(struct sock *sk, read_descriptor_t *desc,
 
 	return copied;
 }
+EXPORT_SYMBOL(udp_read_sock);
 
 /*
  * 	This should be easy, if there is something there we
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index f091fe9b4da5..63c2d024f572 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -714,7 +714,9 @@ const struct proto_ops inet6_dgram_ops = {
 	.setsockopt	   = sock_common_setsockopt,	/* ok		*/
 	.getsockopt	   = sock_common_getsockopt,	/* ok		*/
 	.sendmsg	   = inet6_sendmsg,		/* retpoline's sake */
+	.sendmsg_locked	   = udpv6_sendmsg_locked,
 	.recvmsg	   = inet6_recvmsg,		/* retpoline's sake */
+	.read_sock	   = udp_read_sock,
 	.mmap		   = sock_no_mmap,
 	.sendpage	   = sock_no_sendpage,
 	.set_peek_off	   = sk_set_peek_off,
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 66ebdfc83c95..c52ea171060d 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1272,7 +1272,7 @@ static int udp_v6_push_pending_frames(struct sock *sk)
 	return err;
 }
 
-int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
+static int __udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len, bool locked)
 {
 	struct ipv6_txoptions opt_space;
 	struct udp_sock *up = udp_sk(sk);
@@ -1361,7 +1361,8 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		 * There are pending frames.
 		 * The socket lock must be held while it's corked.
 		 */
-		lock_sock(sk);
+		if (!locked)
+			lock_sock(sk);
 		if (likely(up->pending)) {
 			if (unlikely(up->pending != AF_INET6)) {
 				release_sock(sk);
@@ -1370,7 +1371,8 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 			dst = NULL;
 			goto do_append_data;
 		}
-		release_sock(sk);
+		if (!locked)
+			release_sock(sk);
 	}
 	ulen += sizeof(struct udphdr);
 
@@ -1533,11 +1535,13 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		goto out;
 	}
 
-	lock_sock(sk);
+	if (!locked)
+		lock_sock(sk);
 	if (unlikely(up->pending)) {
 		/* The socket is already corked while preparing it. */
 		/* ... which is an evident application bug. --ANK */
-		release_sock(sk);
+		if (!locked)
+			release_sock(sk);
 
 		net_dbg_ratelimited("udp cork app bug 2\n");
 		err = -EINVAL;
@@ -1562,7 +1566,8 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 
 	if (err > 0)
 		err = np->recverr ? net_xmit_errno(err) : 0;
-	release_sock(sk);
+	if (!locked)
+		release_sock(sk);
 
 out:
 	dst_release(dst);
@@ -1593,6 +1598,16 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	goto out;
 }
 
+int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
+{
+	return __udpv6_sendmsg(sk, msg, len, false);
+}
+
+int udpv6_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t len)
+{
+	return __udpv6_sendmsg(sk, msg, len, true);
+}
+
 void udpv6_destroy_sock(struct sock *sk)
 {
 	struct udp_sock *up = udp_sk(sk);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Patch bpf-next 10/19] af_unix: implement ->sendmsg_locked for dgram socket
  2021-02-03  4:16 [Patch bpf-next 00/19] sock_map: add non-TCP and cross-protocol support Cong Wang
                   ` (8 preceding siblings ...)
  2021-02-03  4:16 ` [Patch bpf-next 09/19] udp: add ->read_sock() and ->sendmsg_locked() to ipv6 Cong Wang
@ 2021-02-03  4:16 ` Cong Wang
  2021-02-03  4:16 ` [Patch bpf-next 11/19] af_unix: implement ->read_sock() for sockmap Cong Wang
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 40+ messages in thread
From: Cong Wang @ 2021-02-03  4:16 UTC (permalink / raw)
  To: netdev
  Cc: bpf, duanxiongchun, wangdongdong.6, jiang.wang, Cong Wang,
	John Fastabend, Daniel Borkmann, Jakub Sitnicki, Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

We already have unix_dgram_sendmsg(), we can just build
its ->sendmsg_locked() on top of it.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 net/unix/af_unix.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 41c3303c3357..4e1fa4ecbcfb 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -659,6 +659,7 @@ static ssize_t unix_stream_sendpage(struct socket *, struct page *, int offset,
 static ssize_t unix_stream_splice_read(struct socket *,  loff_t *ppos,
 				       struct pipe_inode_info *, size_t size,
 				       unsigned int flags);
+static int __unix_dgram_sendmsg(struct sock*, struct msghdr *, size_t);
 static int unix_dgram_sendmsg(struct socket *, struct msghdr *, size_t);
 static int unix_dgram_recvmsg(struct socket *, struct msghdr *, size_t, int);
 static int unix_dgram_connect(struct socket *, struct sockaddr *,
@@ -738,6 +739,7 @@ static const struct proto_ops unix_dgram_ops = {
 	.listen =	sock_no_listen,
 	.shutdown =	unix_shutdown,
 	.sendmsg =	unix_dgram_sendmsg,
+	.sendmsg_locked = __unix_dgram_sendmsg,
 	.recvmsg =	unix_dgram_recvmsg,
 	.mmap =		sock_no_mmap,
 	.sendpage =	sock_no_sendpage,
@@ -1611,10 +1613,10 @@ static void scm_stat_del(struct sock *sk, struct sk_buff *skb)
  *	Send AF_UNIX data.
  */
 
-static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
-			      size_t len)
+static int __unix_dgram_sendmsg(struct sock *sk, struct msghdr *msg,
+				size_t len)
 {
-	struct sock *sk = sock->sk;
+	struct socket *sock = sk->sk_socket;
 	struct net *net = sock_net(sk);
 	struct unix_sock *u = unix_sk(sk);
 	DECLARE_SOCKADDR(struct sockaddr_un *, sunaddr, msg->msg_name);
@@ -1814,6 +1816,12 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
 	return err;
 }
 
+static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
+			      size_t len)
+{
+	return __unix_dgram_sendmsg(sock->sk, msg, len);
+}
+
 /* We use paged skbs for stream sockets, and limit occupancy to 32768
  * bytes, and a minimum of a full page.
  */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Patch bpf-next 11/19] af_unix: implement ->read_sock() for sockmap
  2021-02-03  4:16 [Patch bpf-next 00/19] sock_map: add non-TCP and cross-protocol support Cong Wang
                   ` (9 preceding siblings ...)
  2021-02-03  4:16 ` [Patch bpf-next 10/19] af_unix: implement ->sendmsg_locked for dgram socket Cong Wang
@ 2021-02-03  4:16 ` Cong Wang
  2021-02-03  4:16 ` [Patch bpf-next 12/19] af_unix: implement ->update_proto() Cong Wang
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 40+ messages in thread
From: Cong Wang @ 2021-02-03  4:16 UTC (permalink / raw)
  To: netdev
  Cc: bpf, duanxiongchun, wangdongdong.6, jiang.wang, Cong Wang,
	John Fastabend, Daniel Borkmann, Jakub Sitnicki, Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 net/unix/af_unix.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 4e1fa4ecbcfb..9315c4f4c27a 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -662,6 +662,7 @@ static ssize_t unix_stream_splice_read(struct socket *,  loff_t *ppos,
 static int __unix_dgram_sendmsg(struct sock*, struct msghdr *, size_t);
 static int unix_dgram_sendmsg(struct socket *, struct msghdr *, size_t);
 static int unix_dgram_recvmsg(struct socket *, struct msghdr *, size_t, int);
+int unix_read_sock(struct sock *sk, read_descriptor_t *desc, sk_read_actor_t recv_actor);
 static int unix_dgram_connect(struct socket *, struct sockaddr *,
 			      int, int);
 static int unix_seqpacket_sendmsg(struct socket *, struct msghdr *, size_t);
@@ -739,6 +740,7 @@ static const struct proto_ops unix_dgram_ops = {
 	.listen =	sock_no_listen,
 	.shutdown =	unix_shutdown,
 	.sendmsg =	unix_dgram_sendmsg,
+	.read_sock =	unix_read_sock,
 	.sendmsg_locked = __unix_dgram_sendmsg,
 	.recvmsg =	unix_dgram_recvmsg,
 	.mmap =		sock_no_mmap,
@@ -2190,6 +2192,50 @@ static int unix_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
 	return err;
 }
 
+int unix_read_sock(struct sock *sk, read_descriptor_t *desc,
+		   sk_read_actor_t recv_actor)
+{
+	unsigned int flags = MSG_DONTWAIT;
+	struct unix_sock *u = unix_sk(sk);
+	struct sk_buff *skb;
+	int copied = 0;
+
+	while (1) {
+		int offset, err;
+
+		mutex_lock(&u->iolock);
+		skb = __skb_recv_datagram(sk, &sk->sk_receive_queue, flags,
+					  &offset, &err);
+		if (!skb) {
+			mutex_unlock(&u->iolock);
+			break;
+		}
+
+		if (offset < skb->len) {
+			int used;
+			size_t len;
+
+			len = skb->len - offset;
+			used = recv_actor(desc, skb, offset, len);
+			if (used <= 0) {
+				if (!copied)
+					copied = used;
+				mutex_unlock(&u->iolock);
+				break;
+			} else if (used <= len) {
+				copied += used;
+				offset += used;
+			}
+		}
+		mutex_unlock(&u->iolock);
+
+		if (!desc->count)
+			break;
+	}
+
+	return copied;
+}
+
 /*
  *	Sleep until more data has arrived. But check for races..
  */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Patch bpf-next 12/19] af_unix: implement ->update_proto()
  2021-02-03  4:16 [Patch bpf-next 00/19] sock_map: add non-TCP and cross-protocol support Cong Wang
                   ` (10 preceding siblings ...)
  2021-02-03  4:16 ` [Patch bpf-next 11/19] af_unix: implement ->read_sock() for sockmap Cong Wang
@ 2021-02-03  4:16 ` Cong Wang
  2021-02-03  4:16 ` [Patch bpf-next 13/19] af_unix: set TCP_ESTABLISHED for datagram sockets too Cong Wang
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 40+ messages in thread
From: Cong Wang @ 2021-02-03  4:16 UTC (permalink / raw)
  To: netdev
  Cc: bpf, duanxiongchun, wangdongdong.6, jiang.wang, Cong Wang,
	John Fastabend, Daniel Borkmann, Jakub Sitnicki, Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

unix_proto is special, it is very different from INET proto,
which even does not have a ->close(). We have to add a dummy
one to satisfy sockmap.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 MAINTAINERS           |  1 +
 include/net/af_unix.h | 10 +++++++++
 net/unix/Makefile     |  1 +
 net/unix/af_unix.c    | 12 ++++++++++-
 net/unix/unix_bpf.c   | 50 +++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 73 insertions(+), 1 deletion(-)
 create mode 100644 net/unix/unix_bpf.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 1df56a32d2df..1fa3971c45b0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9950,6 +9950,7 @@ F:	net/core/skmsg.c
 F:	net/core/sock_map.c
 F:	net/ipv4/tcp_bpf.c
 F:	net/ipv4/udp_bpf.c
+F:	net/unix/unix_bpf.c
 
 LANTIQ / INTEL Ethernet drivers
 M:	Hauke Mehrtens <hauke@hauke-m.de>
diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index f42fdddecd41..fa75f899e88a 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -89,4 +89,14 @@ void unix_sysctl_unregister(struct net *net);
 static inline int unix_sysctl_register(struct net *net) { return 0; }
 static inline void unix_sysctl_unregister(struct net *net) {}
 #endif
+
+extern struct proto unix_proto;
+
+#ifdef CONFIG_BPF_SOCK_MAP
+int unix_bpf_update_proto(struct sock *sk, bool restore);
+void __init unix_bpf_build_proto(void);
+#else
+static inline void __init unix_bpf_build_proto(void)
+{}
+#endif
 #endif
diff --git a/net/unix/Makefile b/net/unix/Makefile
index 54e58cc4f945..7d2c70c575b6 100644
--- a/net/unix/Makefile
+++ b/net/unix/Makefile
@@ -7,6 +7,7 @@ obj-$(CONFIG_UNIX)	+= unix.o
 
 unix-y			:= af_unix.o garbage.o
 unix-$(CONFIG_SYSCTL)	+= sysctl_net_unix.o
+unix-$(CONFIG_BPF_SOCK_MAP) += unix_bpf.o
 
 obj-$(CONFIG_UNIX_DIAG)	+= unix_diag.o
 unix_diag-y		:= diag.o
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 9315c4f4c27a..4ce12d3c369e 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -773,10 +773,18 @@ static const struct proto_ops unix_seqpacket_ops = {
 	.show_fdinfo =	unix_show_fdinfo,
 };
 
-static struct proto unix_proto = {
+static void unix_close(struct sock *sk, long timeout)
+{
+}
+
+struct proto unix_proto = {
 	.name			= "UNIX",
 	.owner			= THIS_MODULE,
 	.obj_size		= sizeof(struct unix_sock),
+	.close			= unix_close,
+#ifdef CONFIG_BPF_SOCK_MAP
+	.update_proto		= unix_bpf_update_proto,
+#endif
 };
 
 static struct sock *unix_create1(struct net *net, struct socket *sock, int kern)
@@ -861,6 +869,7 @@ static int unix_release(struct socket *sock)
 		return 0;
 
 	unix_release_sock(sk, 0);
+	sk->sk_prot->close(sk, 0);
 	sock->sk = NULL;
 
 	return 0;
@@ -2973,6 +2982,7 @@ static int __init af_unix_init(void)
 
 	sock_register(&unix_family_ops);
 	register_pernet_subsys(&unix_net_ops);
+	unix_bpf_build_proto();
 out:
 	return rc;
 }
diff --git a/net/unix/unix_bpf.c b/net/unix/unix_bpf.c
new file mode 100644
index 000000000000..2e6a26ec4958
--- /dev/null
+++ b/net/unix/unix_bpf.c
@@ -0,0 +1,50 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Cong Wang <cong.wang@bytedance.com> */
+
+#include <linux/skmsg.h>
+#include <net/sock.h>
+#include <net/af_unix.h>
+
+static struct proto *unix_prot_saved __read_mostly;
+static DEFINE_SPINLOCK(unix_prot_lock);
+static struct proto unix_bpf_prot;
+
+static void unix_bpf_rebuild_protos(struct proto *prot, const struct proto *base)
+{
+	*prot        = *base;
+	prot->close  = sock_map_close;
+}
+
+static void unix_bpf_check_needs_rebuild(struct proto *ops)
+{
+	if (unlikely(ops != smp_load_acquire(&unix_prot_saved))) {
+		spin_lock_bh(&unix_prot_lock);
+		if (likely(ops != unix_prot_saved)) {
+			unix_bpf_rebuild_protos(&unix_bpf_prot, ops);
+			smp_store_release(&unix_prot_saved, ops);
+		}
+		spin_unlock_bh(&unix_prot_lock);
+	}
+}
+
+int unix_bpf_update_proto(struct sock *sk, bool restore)
+{
+	struct sk_psock *psock = sk_psock(sk);
+
+	if (restore) {
+		sk->sk_write_space = psock->saved_write_space;
+		/* Pairs with lockless read in sk_clone_lock() */
+		WRITE_ONCE(sk->sk_prot, psock->sk_proto);
+		return 0;
+	}
+
+	unix_bpf_check_needs_rebuild(psock->sk_proto);
+	/* Pairs with lockless read in sk_clone_lock() */
+	WRITE_ONCE(sk->sk_prot, &unix_bpf_prot);
+	return 0;
+}
+
+void __init unix_bpf_build_proto(void)
+{
+	unix_bpf_rebuild_protos(&unix_bpf_prot, &unix_proto);
+}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Patch bpf-next 13/19] af_unix: set TCP_ESTABLISHED for datagram sockets too
  2021-02-03  4:16 [Patch bpf-next 00/19] sock_map: add non-TCP and cross-protocol support Cong Wang
                   ` (11 preceding siblings ...)
  2021-02-03  4:16 ` [Patch bpf-next 12/19] af_unix: implement ->update_proto() Cong Wang
@ 2021-02-03  4:16 ` Cong Wang
  2021-02-03  4:16 ` [Patch bpf-next 14/19] skmsg: extract __tcp_bpf_recvmsg() and tcp_bpf_wait_data() Cong Wang
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 40+ messages in thread
From: Cong Wang @ 2021-02-03  4:16 UTC (permalink / raw)
  To: netdev
  Cc: bpf, duanxiongchun, wangdongdong.6, jiang.wang, Cong Wang,
	John Fastabend, Daniel Borkmann, Jakub Sitnicki, Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

Currently only unix stream socket sets TCP_ESTABLISHED,
datagram socket can set this too when they connect to its
peer socket. At least __ip4_datagram_connect() does the same.

This will be used by the next patch to determine whether an
AF_UNIX datagram socket can be redirected in sockmap.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 net/unix/af_unix.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 4ce12d3c369e..21c4406f879b 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1206,6 +1206,8 @@ static int unix_dgram_connect(struct socket *sock, struct sockaddr *addr,
 		unix_peer(sk) = other;
 		unix_state_double_unlock(sk, other);
 	}
+
+	sk->sk_state = other->sk_state = TCP_ESTABLISHED;
 	return 0;
 
 out_unlock:
@@ -1438,12 +1440,10 @@ static int unix_socketpair(struct socket *socka, struct socket *sockb)
 	init_peercred(ska);
 	init_peercred(skb);
 
-	if (ska->sk_type != SOCK_DGRAM) {
-		ska->sk_state = TCP_ESTABLISHED;
-		skb->sk_state = TCP_ESTABLISHED;
-		socka->state  = SS_CONNECTED;
-		sockb->state  = SS_CONNECTED;
-	}
+	ska->sk_state = TCP_ESTABLISHED;
+	skb->sk_state = TCP_ESTABLISHED;
+	socka->state  = SS_CONNECTED;
+	sockb->state  = SS_CONNECTED;
 	return 0;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Patch bpf-next 14/19] skmsg: extract __tcp_bpf_recvmsg() and tcp_bpf_wait_data()
  2021-02-03  4:16 [Patch bpf-next 00/19] sock_map: add non-TCP and cross-protocol support Cong Wang
                   ` (12 preceding siblings ...)
  2021-02-03  4:16 ` [Patch bpf-next 13/19] af_unix: set TCP_ESTABLISHED for datagram sockets too Cong Wang
@ 2021-02-03  4:16 ` Cong Wang
  2021-02-03  4:16 ` [Patch bpf-next 15/19] udp: implement udp_bpf_recvmsg() for sockmap Cong Wang
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 40+ messages in thread
From: Cong Wang @ 2021-02-03  4:16 UTC (permalink / raw)
  To: netdev
  Cc: bpf, duanxiongchun, wangdongdong.6, jiang.wang, Cong Wang,
	John Fastabend, Daniel Borkmann, Jakub Sitnicki, Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

Although these two functions are only used by TCP, they are not
specific to TCP at all, both operate on skmsg and ingress_msg,
so fit in net/core/skmsg.c very well.

And we will need them for non-TCP, so rename and move them to
skmsg.c and export them to modules.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 include/linux/skmsg.h |   4 ++
 include/net/tcp.h     |   2 -
 net/core/skmsg.c      | 104 +++++++++++++++++++++++++++++++++++++++++
 net/ipv4/tcp_bpf.c    | 106 +-----------------------------------------
 net/tls/tls_sw.c      |   4 +-
 5 files changed, 112 insertions(+), 108 deletions(-)

diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index cb94d0f89c08..0e52fc5521a0 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -125,6 +125,10 @@ int sk_msg_zerocopy_from_iter(struct sock *sk, struct iov_iter *from,
 			      struct sk_msg *msg, u32 bytes);
 int sk_msg_memcopy_from_iter(struct sock *sk, struct iov_iter *from,
 			     struct sk_msg *msg, u32 bytes);
+int sk_msg_wait_data(struct sock *sk, struct sk_psock *psock, int flags,
+		     long timeo, int *err);
+int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg,
+		   int len, int flags);
 
 static inline void sk_msg_check_to_free(struct sk_msg *msg, u32 i, u32 bytes)
 {
diff --git a/include/net/tcp.h b/include/net/tcp.h
index c2fff35859b6..b314aee5800d 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2194,8 +2194,6 @@ static inline void tcp_bpf_clone(const struct sock *sk, struct sock *newsk)
 #ifdef CONFIG_NET_SOCK_MSG
 int tcp_bpf_sendmsg_redir(struct sock *sk, struct sk_msg *msg, u32 bytes,
 			  int flags);
-int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock,
-		      struct msghdr *msg, int len, int flags);
 #endif /* CONFIG_NET_SOCK_MSG */
 
 #ifdef CONFIG_CGROUP_BPF
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index ecbd6f0d49a5..8e3edbdf4c7c 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -399,6 +399,110 @@ int sk_msg_memcopy_from_iter(struct sock *sk, struct iov_iter *from,
 }
 EXPORT_SYMBOL_GPL(sk_msg_memcopy_from_iter);
 
+int sk_msg_wait_data(struct sock *sk, struct sk_psock *psock, int flags,
+		     long timeo, int *err)
+{
+	DEFINE_WAIT_FUNC(wait, woken_wake_function);
+	int ret = 0;
+
+	if (sk->sk_shutdown & RCV_SHUTDOWN)
+		return 1;
+
+	if (!timeo)
+		return ret;
+
+	add_wait_queue(sk_sleep(sk), &wait);
+	sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk);
+	ret = sk_wait_event(sk, &timeo,
+			    !list_empty(&psock->ingress_msg) ||
+			    !skb_queue_empty(&sk->sk_receive_queue), &wait);
+	sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk);
+	remove_wait_queue(sk_sleep(sk), &wait);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(sk_msg_wait_data);
+
+/* Receive sk_msg from psock->ingress_msg to @msg. */
+int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg,
+		   int len, int flags)
+{
+	struct iov_iter *iter = &msg->msg_iter;
+	int peek = flags & MSG_PEEK;
+	struct sk_msg *msg_rx;
+	int i, copied = 0;
+
+	msg_rx = list_first_entry_or_null(&psock->ingress_msg,
+					  struct sk_msg, list);
+
+	while (copied != len) {
+		struct scatterlist *sge;
+
+		if (unlikely(!msg_rx))
+			break;
+
+		i = msg_rx->sg.start;
+		do {
+			struct page *page;
+			int copy;
+
+			sge = sk_msg_elem(msg_rx, i);
+			copy = sge->length;
+			page = sg_page(sge);
+			if (copied + copy > len)
+				copy = len - copied;
+			copy = copy_page_to_iter(page, sge->offset, copy, iter);
+			if (!copy)
+				return copied ? copied : -EFAULT;
+
+			copied += copy;
+			if (likely(!peek)) {
+				sge->offset += copy;
+				sge->length -= copy;
+				if (!msg_rx->skb)
+					sk_mem_uncharge(sk, copy);
+				msg_rx->sg.size -= copy;
+
+				if (!sge->length) {
+					sk_msg_iter_var_next(i);
+					if (!msg_rx->skb)
+						put_page(page);
+				}
+			} else {
+				/* Lets not optimize peek case if copy_page_to_iter
+				 * didn't copy the entire length lets just break.
+				 */
+				if (copy != sge->length)
+					return copied;
+				sk_msg_iter_var_next(i);
+			}
+
+			if (copied == len)
+				break;
+		} while (i != msg_rx->sg.end);
+
+		if (unlikely(peek)) {
+			if (msg_rx == list_last_entry(&psock->ingress_msg,
+						      struct sk_msg, list))
+				break;
+			msg_rx = list_next_entry(msg_rx, list);
+			continue;
+		}
+
+		msg_rx->sg.start = i;
+		if (!sge->length && msg_rx->sg.start == msg_rx->sg.end) {
+			list_del(&msg_rx->list);
+			if (msg_rx->skb)
+				consume_skb(msg_rx->skb);
+			kfree(msg_rx);
+		}
+		msg_rx = list_first_entry_or_null(&psock->ingress_msg,
+						  struct sk_msg, list);
+	}
+
+	return copied;
+}
+EXPORT_SYMBOL_GPL(sk_msg_recvmsg);
+
 static struct sk_msg *sk_psock_create_ingress_msg(struct sock *sk,
 						  struct sk_buff *skb)
 {
diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index 16e00802ccba..3c0206a4f0e0 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -10,86 +10,6 @@
 #include <net/inet_common.h>
 #include <net/tls.h>
 
-int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock,
-		      struct msghdr *msg, int len, int flags)
-{
-	struct iov_iter *iter = &msg->msg_iter;
-	int peek = flags & MSG_PEEK;
-	struct sk_msg *msg_rx;
-	int i, copied = 0;
-
-	msg_rx = list_first_entry_or_null(&psock->ingress_msg,
-					  struct sk_msg, list);
-
-	while (copied != len) {
-		struct scatterlist *sge;
-
-		if (unlikely(!msg_rx))
-			break;
-
-		i = msg_rx->sg.start;
-		do {
-			struct page *page;
-			int copy;
-
-			sge = sk_msg_elem(msg_rx, i);
-			copy = sge->length;
-			page = sg_page(sge);
-			if (copied + copy > len)
-				copy = len - copied;
-			copy = copy_page_to_iter(page, sge->offset, copy, iter);
-			if (!copy)
-				return copied ? copied : -EFAULT;
-
-			copied += copy;
-			if (likely(!peek)) {
-				sge->offset += copy;
-				sge->length -= copy;
-				if (!msg_rx->skb)
-					sk_mem_uncharge(sk, copy);
-				msg_rx->sg.size -= copy;
-
-				if (!sge->length) {
-					sk_msg_iter_var_next(i);
-					if (!msg_rx->skb)
-						put_page(page);
-				}
-			} else {
-				/* Lets not optimize peek case if copy_page_to_iter
-				 * didn't copy the entire length lets just break.
-				 */
-				if (copy != sge->length)
-					return copied;
-				sk_msg_iter_var_next(i);
-			}
-
-			if (copied == len)
-				break;
-		} while (i != msg_rx->sg.end);
-
-		if (unlikely(peek)) {
-			if (msg_rx == list_last_entry(&psock->ingress_msg,
-						      struct sk_msg, list))
-				break;
-			msg_rx = list_next_entry(msg_rx, list);
-			continue;
-		}
-
-		msg_rx->sg.start = i;
-		if (!sge->length && msg_rx->sg.start == msg_rx->sg.end) {
-			list_del(&msg_rx->list);
-			if (msg_rx->skb)
-				consume_skb(msg_rx->skb);
-			kfree(msg_rx);
-		}
-		msg_rx = list_first_entry_or_null(&psock->ingress_msg,
-						  struct sk_msg, list);
-	}
-
-	return copied;
-}
-EXPORT_SYMBOL_GPL(__tcp_bpf_recvmsg);
-
 static int bpf_tcp_ingress(struct sock *sk, struct sk_psock *psock,
 			   struct sk_msg *msg, u32 apply_bytes, int flags)
 {
@@ -243,28 +163,6 @@ static bool tcp_bpf_stream_read(const struct sock *sk)
 	return !empty;
 }
 
-static int tcp_bpf_wait_data(struct sock *sk, struct sk_psock *psock,
-			     int flags, long timeo, int *err)
-{
-	DEFINE_WAIT_FUNC(wait, woken_wake_function);
-	int ret = 0;
-
-	if (sk->sk_shutdown & RCV_SHUTDOWN)
-		return 1;
-
-	if (!timeo)
-		return ret;
-
-	add_wait_queue(sk_sleep(sk), &wait);
-	sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk);
-	ret = sk_wait_event(sk, &timeo,
-			    !list_empty(&psock->ingress_msg) ||
-			    !skb_queue_empty(&sk->sk_receive_queue), &wait);
-	sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk);
-	remove_wait_queue(sk_sleep(sk), &wait);
-	return ret;
-}
-
 static int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		    int nonblock, int flags, int *addr_len)
 {
@@ -284,13 +182,13 @@ static int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 	}
 	lock_sock(sk);
 msg_bytes_ready:
-	copied = __tcp_bpf_recvmsg(sk, psock, msg, len, flags);
+	copied = sk_msg_recvmsg(sk, psock, msg, len, flags);
 	if (!copied) {
 		int data, err = 0;
 		long timeo;
 
 		timeo = sock_rcvtimeo(sk, nonblock);
-		data = tcp_bpf_wait_data(sk, psock, flags, timeo, &err);
+		data = sk_msg_wait_data(sk, psock, flags, timeo, &err);
 		if (data) {
 			if (!sk_psock_queue_empty(psock))
 				goto msg_bytes_ready;
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 01d933ae5f16..1dcb34dfd56b 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1789,8 +1789,8 @@ int tls_sw_recvmsg(struct sock *sk,
 		skb = tls_wait_data(sk, psock, flags, timeo, &err);
 		if (!skb) {
 			if (psock) {
-				int ret = __tcp_bpf_recvmsg(sk, psock,
-							    msg, len, flags);
+				int ret = sk_msg_recvmsg(sk, psock, msg, len,
+							 flags);
 
 				if (ret > 0) {
 					decrypted += ret;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Patch bpf-next 15/19] udp: implement udp_bpf_recvmsg() for sockmap
  2021-02-03  4:16 [Patch bpf-next 00/19] sock_map: add non-TCP and cross-protocol support Cong Wang
                   ` (13 preceding siblings ...)
  2021-02-03  4:16 ` [Patch bpf-next 14/19] skmsg: extract __tcp_bpf_recvmsg() and tcp_bpf_wait_data() Cong Wang
@ 2021-02-03  4:16 ` Cong Wang
  2021-02-03  4:16 ` [Patch bpf-next 16/19] af_unix: implement unix_dgram_bpf_recvmsg() Cong Wang
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 40+ messages in thread
From: Cong Wang @ 2021-02-03  4:16 UTC (permalink / raw)
  To: netdev
  Cc: bpf, duanxiongchun, wangdongdong.6, jiang.wang, Cong Wang,
	John Fastabend, Daniel Borkmann, Jakub Sitnicki, Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

We have to implement udp_bpf_recvmsg() to replace the ->recvmsg()
to retrieve skmsg from ingress_msg.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 net/ipv4/udp_bpf.c | 64 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 63 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/udp_bpf.c b/net/ipv4/udp_bpf.c
index 595836088e85..9a37ba056575 100644
--- a/net/ipv4/udp_bpf.c
+++ b/net/ipv4/udp_bpf.c
@@ -4,6 +4,68 @@
 #include <linux/skmsg.h>
 #include <net/sock.h>
 #include <net/udp.h>
+#include <net/inet_common.h>
+
+#include "udp_impl.h"
+
+static struct proto *udpv6_prot_saved __read_mostly;
+
+static int sk_udp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
+			  int noblock, int flags, int *addr_len)
+{
+#if IS_ENABLED(CONFIG_IPV6)
+	if (sk->sk_family == AF_INET6)
+		return udpv6_prot_saved->recvmsg(sk, msg, len, noblock, flags,
+						 addr_len);
+#endif
+	return udp_prot.recvmsg(sk, msg, len, noblock, flags, addr_len);
+}
+
+static int udp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
+			   int nonblock, int flags, int *addr_len)
+{
+	struct sk_psock *psock;
+	int copied, ret;
+
+	if (unlikely(flags & MSG_ERRQUEUE))
+		return inet_recv_error(sk, msg, len, addr_len);
+
+	psock = sk_psock_get(sk);
+	if (unlikely(!psock))
+		return sk_udp_recvmsg(sk, msg, len, nonblock, flags, addr_len);
+
+	lock_sock(sk);
+	if (sk_psock_queue_empty(psock)) {
+		ret = sk_udp_recvmsg(sk, msg, len, nonblock, flags, addr_len);
+		goto out;
+	}
+
+msg_bytes_ready:
+	copied = sk_msg_recvmsg(sk, psock, msg, len, flags);
+	if (!copied) {
+		int data, err = 0;
+		long timeo;
+
+		timeo = sock_rcvtimeo(sk, nonblock);
+		data = sk_msg_wait_data(sk, psock, flags, timeo, &err);
+		if (data) {
+			if (!sk_psock_queue_empty(psock))
+				goto msg_bytes_ready;
+			ret = sk_udp_recvmsg(sk, msg, len, nonblock, flags, addr_len);
+			goto out;
+		}
+		if (err) {
+			ret = err;
+			goto out;
+		}
+		copied = -EAGAIN;
+	}
+	ret = copied;
+out:
+	release_sock(sk);
+	sk_psock_put(sk, psock);
+	return ret;
+}
 
 enum {
 	UDP_BPF_IPV4,
@@ -11,7 +73,6 @@ enum {
 	UDP_BPF_NUM_PROTS,
 };
 
-static struct proto *udpv6_prot_saved __read_mostly;
 static DEFINE_SPINLOCK(udpv6_prot_lock);
 static struct proto udp_bpf_prots[UDP_BPF_NUM_PROTS];
 
@@ -20,6 +81,7 @@ static void udp_bpf_rebuild_protos(struct proto *prot, const struct proto *base)
 	*prot        = *base;
 	prot->unhash = sock_map_unhash;
 	prot->close  = sock_map_close;
+	prot->recvmsg = udp_bpf_recvmsg;
 }
 
 static void udp_bpf_check_v6_needs_rebuild(struct proto *ops)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Patch bpf-next 16/19] af_unix: implement unix_dgram_bpf_recvmsg()
  2021-02-03  4:16 [Patch bpf-next 00/19] sock_map: add non-TCP and cross-protocol support Cong Wang
                   ` (14 preceding siblings ...)
  2021-02-03  4:16 ` [Patch bpf-next 15/19] udp: implement udp_bpf_recvmsg() for sockmap Cong Wang
@ 2021-02-03  4:16 ` Cong Wang
  2021-02-03  4:16 ` [Patch bpf-next 17/19] sock_map: update sock type checks Cong Wang
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 40+ messages in thread
From: Cong Wang @ 2021-02-03  4:16 UTC (permalink / raw)
  To: netdev
  Cc: bpf, duanxiongchun, wangdongdong.6, jiang.wang, Cong Wang,
	John Fastabend, Daniel Borkmann, Jakub Sitnicki, Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

We have to implement unix_dgram_bpf_recvmsg() to replace the
original ->recvmsg() to retrieve skmsg from ingress_msg.

AF_UNIX is again special here because the lack of
sk_prot->recvmsg(). I simply add a special case inside
unix_dgram_recvmsg() to call sk->sk_prot->recvmsg() directly.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 include/net/af_unix.h |  3 +++
 net/unix/af_unix.c    | 21 ++++++++++++++++---
 net/unix/unix_bpf.c   | 49 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 70 insertions(+), 3 deletions(-)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index fa75f899e88a..f6c43667e995 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -82,6 +82,9 @@ static inline struct unix_sock *unix_sk(const struct sock *sk)
 long unix_inq_len(struct sock *sk);
 long unix_outq_len(struct sock *sk);
 
+int __unix_dgram_recvmsg(struct sock *sk, struct msghdr *msg, size_t size,
+			 int nonblock, int flags, int *addr_len);
+
 #ifdef CONFIG_SYSCTL
 int unix_sysctl_register(struct net *net);
 void unix_sysctl_unregister(struct net *net);
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 21c4406f879b..eebcd6f7ef88 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2094,11 +2094,11 @@ static void unix_copy_addr(struct msghdr *msg, struct sock *sk)
 	}
 }
 
-static int unix_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
-			      size_t size, int flags)
+int __unix_dgram_recvmsg(struct sock *sk, struct msghdr *msg, size_t size,
+			 int nonblock, int flags, int *addr_len)
 {
 	struct scm_cookie scm;
-	struct sock *sk = sock->sk;
+	struct socket *sock = sk->sk_socket;
 	struct unix_sock *u = unix_sk(sk);
 	struct sk_buff *skb, *last;
 	long timeo;
@@ -2201,6 +2201,21 @@ static int unix_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
 	return err;
 }
 
+static int unix_dgram_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
+			      int flags)
+{
+	struct sock *sk = sock->sk;
+	int addr_len = 0;
+
+#ifdef CONFIG_BPF_SOCK_MAP
+	if (sk->sk_prot != &unix_proto)
+		return sk->sk_prot->recvmsg(sk, msg, size, flags & MSG_DONTWAIT,
+					    flags & ~MSG_DONTWAIT, &addr_len);
+#endif
+	return __unix_dgram_recvmsg(sk, msg, size, flags & MSG_DONTWAIT,
+				    flags, &addr_len);
+}
+
 int unix_read_sock(struct sock *sk, read_descriptor_t *desc,
 		   sk_read_actor_t recv_actor)
 {
diff --git a/net/unix/unix_bpf.c b/net/unix/unix_bpf.c
index 2e6a26ec4958..570261fd18cd 100644
--- a/net/unix/unix_bpf.c
+++ b/net/unix/unix_bpf.c
@@ -5,6 +5,54 @@
 #include <net/sock.h>
 #include <net/af_unix.h>
 
+static int unix_dgram_bpf_recvmsg(struct sock *sk, struct msghdr *msg,
+				  size_t len, int nonblock, int flags,
+				  int *addr_len)
+{
+	struct sk_psock *psock;
+	int copied, ret;
+
+	psock = sk_psock_get(sk);
+	if (unlikely(!psock))
+		return __unix_dgram_recvmsg(sk, msg, len, nonblock, flags,
+					    addr_len);
+
+	lock_sock(sk);
+	if (!skb_queue_empty(&sk->sk_receive_queue) &&
+	    sk_psock_queue_empty(psock)) {
+		ret = __unix_dgram_recvmsg(sk, msg, len, nonblock, flags,
+					   addr_len);
+		goto out;
+	}
+
+msg_bytes_ready:
+	copied = sk_msg_recvmsg(sk, psock, msg, len, flags);
+	if (!copied) {
+		int data, err = 0;
+		long timeo;
+
+		timeo = sock_rcvtimeo(sk, nonblock);
+		data = sk_msg_wait_data(sk, psock, flags, timeo, &err);
+		if (data) {
+			if (!sk_psock_queue_empty(psock))
+				goto msg_bytes_ready;
+			ret = __unix_dgram_recvmsg(sk, msg, len, nonblock,
+						   flags, addr_len);
+			goto out;
+		}
+		if (err) {
+			ret = err;
+			goto out;
+		}
+		copied = -EAGAIN;
+	}
+	ret = copied;
+out:
+	release_sock(sk);
+	sk_psock_put(sk, psock);
+	return ret;
+}
+
 static struct proto *unix_prot_saved __read_mostly;
 static DEFINE_SPINLOCK(unix_prot_lock);
 static struct proto unix_bpf_prot;
@@ -13,6 +61,7 @@ static void unix_bpf_rebuild_protos(struct proto *prot, const struct proto *base
 {
 	*prot        = *base;
 	prot->close  = sock_map_close;
+	prot->recvmsg = unix_dgram_bpf_recvmsg;
 }
 
 static void unix_bpf_check_needs_rebuild(struct proto *ops)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Patch bpf-next 17/19] sock_map: update sock type checks
  2021-02-03  4:16 [Patch bpf-next 00/19] sock_map: add non-TCP and cross-protocol support Cong Wang
                   ` (15 preceding siblings ...)
  2021-02-03  4:16 ` [Patch bpf-next 16/19] af_unix: implement unix_dgram_bpf_recvmsg() Cong Wang
@ 2021-02-03  4:16 ` Cong Wang
  2021-02-03  4:16 ` [Patch bpf-next 18/19] selftests/bpf: add test cases for unix and udp sockmap Cong Wang
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 40+ messages in thread
From: Cong Wang @ 2021-02-03  4:16 UTC (permalink / raw)
  To: netdev
  Cc: bpf, duanxiongchun, wangdongdong.6, jiang.wang, Cong Wang,
	John Fastabend, Daniel Borkmann, Jakub Sitnicki, Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

Now both AF_UNIX and UDP support sockmap and redirection,
we can safely update the sock type checks for them accordingly.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 net/core/skmsg.c    |  3 ++-
 net/core/sock_map.c | 15 ++++++++++++---
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 8e3edbdf4c7c..a502137f7bc2 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -667,7 +667,8 @@ struct sk_psock *sk_psock_init(struct sock *sk, int node)
 
 	write_lock_bh(&sk->sk_callback_lock);
 
-	if (inet_csk_has_ulp(sk)) {
+	if ((sk->sk_family == AF_INET || sk->sk_family == AF_INET6) &&
+	    inet_csk_has_ulp(sk)) {
 		psock = ERR_PTR(-EINVAL);
 		goto out;
 	}
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 255067e5c73a..7e56a3ec7a57 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -544,14 +544,22 @@ static bool sk_is_udp(const struct sock *sk)
 	       sk->sk_protocol == IPPROTO_UDP;
 }
 
+static bool sk_is_unix(const struct sock *sk)
+{
+	return sk->sk_type == SOCK_DGRAM && sk->sk_family == AF_UNIX;
+}
+
 static bool sock_map_redirect_allowed(const struct sock *sk)
 {
-	return sk_is_tcp(sk) && sk->sk_state != TCP_LISTEN;
+	if (sk_is_tcp(sk))
+		return sk->sk_state != TCP_LISTEN;
+	else
+		return sk->sk_state == TCP_ESTABLISHED;
 }
 
 static bool sock_map_sk_is_suitable(const struct sock *sk)
 {
-	return sk_is_tcp(sk) || sk_is_udp(sk);
+	return !!sk->sk_prot->update_proto;
 }
 
 static bool sock_map_sk_state_allowed(const struct sock *sk)
@@ -560,7 +568,8 @@ static bool sock_map_sk_state_allowed(const struct sock *sk)
 		return (1 << sk->sk_state) & (TCPF_ESTABLISHED | TCPF_LISTEN);
 	else if (sk_is_udp(sk))
 		return sk_hashed(sk);
-
+	else if (sk_is_unix(sk))
+		return sk->sk_state == TCP_ESTABLISHED;
 	return false;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Patch bpf-next 18/19] selftests/bpf: add test cases for unix and udp sockmap
  2021-02-03  4:16 [Patch bpf-next 00/19] sock_map: add non-TCP and cross-protocol support Cong Wang
                   ` (16 preceding siblings ...)
  2021-02-03  4:16 ` [Patch bpf-next 17/19] sock_map: update sock type checks Cong Wang
@ 2021-02-03  4:16 ` Cong Wang
  2021-02-05 10:53   ` Jakub Sitnicki
  2021-02-03  4:16 ` [Patch bpf-next 19/19] selftests/bpf: add test case for redirection between udp and unix Cong Wang
  2021-02-03 17:48 ` [Patch bpf-next 00/19] sock_map: add non-TCP and cross-protocol support Alexei Starovoitov
  19 siblings, 1 reply; 40+ messages in thread
From: Cong Wang @ 2021-02-03  4:16 UTC (permalink / raw)
  To: netdev
  Cc: bpf, duanxiongchun, wangdongdong.6, jiang.wang, Cong Wang,
	John Fastabend, Daniel Borkmann, Jakub Sitnicki, Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

Add two test cases to ensure redirection between two
AF_UNIX sockets or two UDP sockets work.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 .../selftests/bpf/prog_tests/sockmap_listen.c | 241 ++++++++++++++++++
 .../selftests/bpf/progs/test_sockmap_listen.c |  20 ++
 2 files changed, 261 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
index c26e6bf05e49..8f52302165a6 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
@@ -1441,6 +1441,8 @@ static const char *family_str(sa_family_t family)
 		return "IPv4";
 	case AF_INET6:
 		return "IPv6";
+	case AF_UNIX:
+		return "Unix";
 	default:
 		return "unknown";
 	}
@@ -1563,6 +1565,239 @@ static void test_redir(struct test_sockmap_listen *skel, struct bpf_map *map,
 	}
 }
 
+static void udp_redir_to_connected(int family, int sotype, int sock_mapfd,
+				   int verd_mapfd, enum redir_mode mode)
+{
+	const char *log_prefix = redir_mode_str(mode);
+	struct sockaddr_storage addr;
+	int c0, c1, p0, p1;
+	unsigned int pass;
+	socklen_t len;
+	int err, n;
+	u64 value;
+	u32 key;
+	char b;
+
+	zero_verdict_count(verd_mapfd);
+
+	p0 = socket_loopback(family, sotype | SOCK_NONBLOCK);
+	if (p0 < 0)
+		return;
+	len = sizeof(addr);
+	err = xgetsockname(p0, sockaddr(&addr), &len);
+	if (err)
+		goto close_peer0;
+
+	c0 = xsocket(family, sotype | SOCK_NONBLOCK, 0);
+	if (c0 < 0)
+		goto close_peer0;
+	err = xconnect(c0, sockaddr(&addr), len);
+	if (err)
+		goto close_cli0;
+	err = xgetsockname(c0, sockaddr(&addr), &len);
+	if (err)
+		goto close_cli0;
+	err = xconnect(p0, sockaddr(&addr), len);
+	if (err)
+		goto close_cli0;
+
+	p1 = socket_loopback(family, sotype | SOCK_NONBLOCK);
+	if (p1 < 0)
+		goto close_cli0;
+	err = xgetsockname(p1, sockaddr(&addr), &len);
+	if (err)
+		goto close_cli0;
+
+	c1 = xsocket(family, sotype | SOCK_NONBLOCK, 0);
+	if (c1 < 0)
+		goto close_peer1;
+	err = xconnect(c1, sockaddr(&addr), len);
+	if (err)
+		goto close_cli1;
+	err = xgetsockname(c1, sockaddr(&addr), &len);
+	if (err)
+		goto close_cli1;
+	err = xconnect(p1, sockaddr(&addr), len);
+	if (err)
+		goto close_cli1;
+
+	key = 0;
+	value = p0;
+	err = xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST);
+	if (err)
+		goto close_cli1;
+
+	key = 1;
+	value = p1;
+	err = xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST);
+	if (err)
+		goto close_cli1;
+
+	n = write(c1, "a", 1);
+	if (n < 0)
+		FAIL_ERRNO("%s: write", log_prefix);
+	if (n == 0)
+		FAIL("%s: incomplete write", log_prefix);
+	if (n < 1)
+		goto close_cli1;
+
+	key = SK_PASS;
+	err = xbpf_map_lookup_elem(verd_mapfd, &key, &pass);
+	if (err)
+		goto close_cli1;
+	if (pass != 1)
+		FAIL("%s: want pass count 1, have %d", log_prefix, pass);
+
+	n = read(mode == REDIR_INGRESS ? p0 : c0, &b, 1);
+	if (n < 0)
+		FAIL_ERRNO("%s: read", log_prefix);
+	if (n == 0)
+		FAIL("%s: incomplete read", log_prefix);
+
+close_cli1:
+	xclose(c1);
+close_peer1:
+	xclose(p1);
+close_cli0:
+	xclose(c0);
+close_peer0:
+	xclose(p0);
+}
+
+static void udp_skb_redir_to_connected(struct test_sockmap_listen *skel,
+					   struct bpf_map *inner_map, int family,
+					   int sotype)
+{
+	int verdict = bpf_program__fd(skel->progs.prog_skb_verdict);
+	int verdict_map = bpf_map__fd(skel->maps.verdict_map);
+	int sock_map = bpf_map__fd(inner_map);
+	int err;
+
+	err = xbpf_prog_attach(verdict, sock_map, BPF_SK_SKB_VERDICT, 0);
+	if (err)
+		return;
+
+	skel->bss->test_ingress = false;
+	udp_redir_to_connected(family, sotype, sock_map, verdict_map,
+			       REDIR_EGRESS);
+	skel->bss->test_ingress = true;
+	udp_redir_to_connected(family, sotype, sock_map, verdict_map,
+			       REDIR_INGRESS);
+
+	xbpf_prog_detach2(verdict, sock_map, BPF_SK_SKB_VERDICT);
+}
+
+static void test_udp_redir(struct test_sockmap_listen *skel, struct bpf_map *map,
+			   int family)
+{
+	const char *family_name, *map_name;
+	char s[MAX_TEST_NAME];
+
+	family_name = family_str(family);
+	map_name = map_type_str(map);
+	snprintf(s, sizeof(s), "%s %s %s", map_name, family_name, __func__);
+	if (!test__start_subtest(s))
+		return;
+	udp_skb_redir_to_connected(skel, map, family, SOCK_DGRAM);
+}
+
+static void unix_redir_to_connected(int sotype, int sock_mapfd,
+			       int verd_mapfd, enum redir_mode mode)
+{
+	const char *log_prefix = redir_mode_str(mode);
+	int c0, c1, p0, p1;
+	unsigned int pass;
+	int err, n;
+	int sfd[2];
+	u64 value;
+	u32 key;
+	char b;
+
+	zero_verdict_count(verd_mapfd);
+
+	if (socketpair(AF_UNIX, sotype | SOCK_NONBLOCK, 0, sfd))
+		return;
+	c0 = sfd[0], p0 = sfd[1];
+
+	if (socketpair(AF_UNIX, sotype | SOCK_NONBLOCK, 0, sfd))
+		goto close0;
+	c1 = sfd[0], p1 = sfd[1];
+
+	key = 0;
+	value = p0;
+	err = xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST);
+	if (err)
+		goto close;
+
+	key = 1;
+	value = p1;
+	err = xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST);
+	if (err)
+		goto close;
+
+	n = write(c1, "a", 1);
+	if (n < 0)
+		FAIL_ERRNO("%s: write", log_prefix);
+	if (n == 0)
+		FAIL("%s: incomplete write", log_prefix);
+	if (n < 1)
+		goto close;
+
+	key = SK_PASS;
+	err = xbpf_map_lookup_elem(verd_mapfd, &key, &pass);
+	if (err)
+		goto close;
+	if (pass != 1)
+		FAIL("%s: want pass count 1, have %d", log_prefix, pass);
+
+	n = read(mode == REDIR_INGRESS ? p0 : c0, &b, 1);
+	if (n < 0)
+		FAIL_ERRNO("%s: read", log_prefix);
+	if (n == 0)
+		FAIL("%s: incomplete read", log_prefix);
+
+close:
+	xclose(c1);
+	xclose(p1);
+close0:
+	xclose(c0);
+	xclose(p0);
+}
+
+static void unix_skb_redir_to_connected(struct test_sockmap_listen *skel,
+					struct bpf_map *inner_map, int sotype)
+{
+	int verdict = bpf_program__fd(skel->progs.prog_skb_verdict);
+	int verdict_map = bpf_map__fd(skel->maps.verdict_map);
+	int sock_map = bpf_map__fd(inner_map);
+	int err;
+
+	err = xbpf_prog_attach(verdict, sock_map, BPF_SK_SKB_VERDICT, 0);
+	if (err)
+		return;
+
+	skel->bss->test_ingress = false;
+	unix_redir_to_connected(sotype, sock_map, verdict_map, REDIR_EGRESS);
+	skel->bss->test_ingress = true;
+	unix_redir_to_connected(sotype, sock_map, verdict_map, REDIR_INGRESS);
+
+	xbpf_prog_detach2(verdict, sock_map, BPF_SK_SKB_VERDICT);
+}
+
+static void test_unix_redir(struct test_sockmap_listen *skel, struct bpf_map *map,
+			    int sotype)
+{
+	const char *family_name, *map_name;
+	char s[MAX_TEST_NAME];
+
+	family_name = family_str(AF_UNIX);
+	map_name = map_type_str(map);
+	snprintf(s, sizeof(s), "%s %s %s", map_name, family_name, __func__);
+	if (!test__start_subtest(s))
+		return;
+	unix_skb_redir_to_connected(skel, map, sotype);
+}
+
 static void test_reuseport(struct test_sockmap_listen *skel,
 			   struct bpf_map *map, int family, int sotype)
 {
@@ -1626,10 +1861,16 @@ void test_sockmap_listen(void)
 	skel->bss->test_sockmap = true;
 	run_tests(skel, skel->maps.sock_map, AF_INET);
 	run_tests(skel, skel->maps.sock_map, AF_INET6);
+	test_udp_redir(skel, skel->maps.sock_map, AF_INET);
+	test_udp_redir(skel, skel->maps.sock_map, AF_INET6);
+	test_unix_redir(skel, skel->maps.sock_map, SOCK_DGRAM);
 
 	skel->bss->test_sockmap = false;
 	run_tests(skel, skel->maps.sock_hash, AF_INET);
 	run_tests(skel, skel->maps.sock_hash, AF_INET6);
+	test_udp_redir(skel, skel->maps.sock_hash, AF_INET);
+	test_udp_redir(skel, skel->maps.sock_hash, AF_INET6);
+	test_unix_redir(skel, skel->maps.sock_hash, SOCK_DGRAM);
 
 	test_sockmap_listen__destroy(skel);
 }
diff --git a/tools/testing/selftests/bpf/progs/test_sockmap_listen.c b/tools/testing/selftests/bpf/progs/test_sockmap_listen.c
index fa221141e9c1..49537c78e34a 100644
--- a/tools/testing/selftests/bpf/progs/test_sockmap_listen.c
+++ b/tools/testing/selftests/bpf/progs/test_sockmap_listen.c
@@ -29,6 +29,7 @@ struct {
 } verdict_map SEC(".maps");
 
 static volatile bool test_sockmap; /* toggled by user-space */
+static volatile bool test_ingress; /* toggled by user-space */
 
 SEC("sk_skb/stream_parser")
 int prog_stream_parser(struct __sk_buff *skb)
@@ -55,6 +56,25 @@ int prog_stream_verdict(struct __sk_buff *skb)
 	return verdict;
 }
 
+SEC("sk_skb/skb_verdict")
+int prog_skb_verdict(struct __sk_buff *skb)
+{
+	unsigned int *count;
+	__u32 zero = 0;
+	int verdict;
+
+	if (test_sockmap)
+		verdict = bpf_sk_redirect_map(skb, &sock_map, zero, test_ingress ? BPF_F_INGRESS : 0);
+	else
+		verdict = bpf_sk_redirect_hash(skb, &sock_hash, &zero, test_ingress ? BPF_F_INGRESS : 0);
+
+	count = bpf_map_lookup_elem(&verdict_map, &verdict);
+	if (count)
+		(*count)++;
+
+	return verdict;
+}
+
 SEC("sk_msg")
 int prog_msg_verdict(struct sk_msg_md *msg)
 {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Patch bpf-next 19/19] selftests/bpf: add test case for redirection between udp and unix
  2021-02-03  4:16 [Patch bpf-next 00/19] sock_map: add non-TCP and cross-protocol support Cong Wang
                   ` (17 preceding siblings ...)
  2021-02-03  4:16 ` [Patch bpf-next 18/19] selftests/bpf: add test cases for unix and udp sockmap Cong Wang
@ 2021-02-03  4:16 ` Cong Wang
  2021-02-03 17:48 ` [Patch bpf-next 00/19] sock_map: add non-TCP and cross-protocol support Alexei Starovoitov
  19 siblings, 0 replies; 40+ messages in thread
From: Cong Wang @ 2021-02-03  4:16 UTC (permalink / raw)
  To: netdev
  Cc: bpf, duanxiongchun, wangdongdong.6, jiang.wang, Cong Wang,
	John Fastabend, Daniel Borkmann, Jakub Sitnicki, Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 .../selftests/bpf/prog_tests/sockmap_listen.c | 226 ++++++++++++++++++
 1 file changed, 226 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
index 8f52302165a6..e0c2a0a4f501 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
@@ -1798,6 +1798,228 @@ static void test_unix_redir(struct test_sockmap_listen *skel, struct bpf_map *ma
 	unix_skb_redir_to_connected(skel, map, sotype);
 }
 
+static void udp_unix_redir_to_connected(int family, int sock_mapfd,
+					int verd_mapfd, enum redir_mode mode)
+{
+	const char *log_prefix = redir_mode_str(mode);
+	struct sockaddr_storage addr;
+	int c0, c1, p0, p1;
+	unsigned int pass;
+	socklen_t len;
+	int err, n;
+	int sfd[2];
+	u64 value;
+	u32 key;
+	char b;
+
+	zero_verdict_count(verd_mapfd);
+
+	if (socketpair(AF_UNIX, SOCK_DGRAM | SOCK_NONBLOCK, 0, sfd))
+		return;
+	c0 = sfd[0], p0 = sfd[1];
+
+	p1 = socket_loopback(family, SOCK_DGRAM | SOCK_NONBLOCK);
+	if (p0 < 0)
+		goto close;
+	len = sizeof(addr);
+	err = xgetsockname(p1, sockaddr(&addr), &len);
+	if (err)
+		goto close_peer1;
+
+	c1 = xsocket(family, SOCK_DGRAM | SOCK_NONBLOCK, 0);
+	if (c1 < 0)
+		goto close_peer1;
+	err = xconnect(c1, sockaddr(&addr), len);
+	if (err)
+		goto close_cli1;
+	err = xgetsockname(c1, sockaddr(&addr), &len);
+	if (err)
+		goto close_cli1;
+	err = xconnect(p1, sockaddr(&addr), len);
+	if (err)
+		goto close_cli1;
+
+	key = 0;
+	value = p0;
+	err = xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST);
+	if (err)
+		goto close_cli1;
+
+	key = 1;
+	value = p1;
+	err = xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST);
+	if (err)
+		goto close_cli1;
+
+	n = write(c1, "a", 1);
+	if (n < 0)
+		FAIL_ERRNO("%s: write", log_prefix);
+	if (n == 0)
+		FAIL("%s: incomplete write", log_prefix);
+	if (n < 1)
+		goto close_cli1;
+
+	key = SK_PASS;
+	err = xbpf_map_lookup_elem(verd_mapfd, &key, &pass);
+	if (err)
+		goto close_cli1;
+	if (pass != 1)
+		FAIL("%s: want pass count 1, have %d", log_prefix, pass);
+
+	n = read(mode == REDIR_INGRESS ? p0 : c0, &b, 1);
+	if (n < 0)
+		FAIL_ERRNO("%s: read", log_prefix);
+	if (n == 0)
+		FAIL("%s: incomplete read", log_prefix);
+
+close_cli1:
+	xclose(c1);
+close_peer1:
+	xclose(p1);
+close:
+	xclose(c0);
+	xclose(p0);
+}
+
+static void udp_unix_skb_redir_to_connected(struct test_sockmap_listen *skel,
+					    struct bpf_map *inner_map, int family)
+{
+	int verdict = bpf_program__fd(skel->progs.prog_skb_verdict);
+	int verdict_map = bpf_map__fd(skel->maps.verdict_map);
+	int sock_map = bpf_map__fd(inner_map);
+	int err;
+
+	err = xbpf_prog_attach(verdict, sock_map, BPF_SK_SKB_VERDICT, 0);
+	if (err)
+		return;
+
+	skel->bss->test_ingress = false;
+	udp_unix_redir_to_connected(family, sock_map, verdict_map, REDIR_EGRESS);
+	skel->bss->test_ingress = true;
+	udp_unix_redir_to_connected(family, sock_map, verdict_map, REDIR_INGRESS);
+
+	xbpf_prog_detach2(verdict, sock_map, BPF_SK_SKB_VERDICT);
+}
+
+static void unix_udp_redir_to_connected(int family, int sock_mapfd,
+					int verd_mapfd, enum redir_mode mode)
+{
+	const char *log_prefix = redir_mode_str(mode);
+	struct sockaddr_storage addr;
+	int c0, c1, p0, p1;
+	unsigned int pass;
+	socklen_t len;
+	int err, n;
+	int sfd[2];
+	u64 value;
+	u32 key;
+	char b;
+
+	zero_verdict_count(verd_mapfd);
+
+	p0 = socket_loopback(family, SOCK_DGRAM | SOCK_NONBLOCK);
+	if (p0 < 0)
+		return;
+	len = sizeof(addr);
+	err = xgetsockname(p0, sockaddr(&addr), &len);
+	if (err)
+		goto close_peer0;
+
+	c0 = xsocket(family, SOCK_DGRAM | SOCK_NONBLOCK, 0);
+	if (c0 < 0)
+		goto close_peer0;
+	err = xconnect(c0, sockaddr(&addr), len);
+	if (err)
+		goto close_cli0;
+	err = xgetsockname(c0, sockaddr(&addr), &len);
+	if (err)
+		goto close_cli0;
+	err = xconnect(p0, sockaddr(&addr), len);
+	if (err)
+		goto close_cli0;
+
+	if (socketpair(AF_UNIX, SOCK_DGRAM | SOCK_NONBLOCK, 0, sfd))
+		goto close_cli0;
+	c1 = sfd[0], p1 = sfd[1];
+
+	key = 0;
+	value = p0;
+	err = xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST);
+	if (err)
+		goto close;
+
+	key = 1;
+	value = p1;
+	err = xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST);
+	if (err)
+		goto close;
+
+	n = write(c1, "a", 1);
+	if (n < 0)
+		FAIL_ERRNO("%s: write", log_prefix);
+	if (n == 0)
+		FAIL("%s: incomplete write", log_prefix);
+	if (n < 1)
+		goto close;
+
+	key = SK_PASS;
+	err = xbpf_map_lookup_elem(verd_mapfd, &key, &pass);
+	if (err)
+		goto close;
+	if (pass != 1)
+		FAIL("%s: want pass count 1, have %d", log_prefix, pass);
+
+	n = read(mode == REDIR_INGRESS ? p0 : c0, &b, 1);
+	if (n < 0)
+		FAIL_ERRNO("%s: read", log_prefix);
+	if (n == 0)
+		FAIL("%s: incomplete read", log_prefix);
+
+close:
+	xclose(c1);
+	xclose(p1);
+close_cli0:
+	xclose(c0);
+close_peer0:
+	xclose(p0);
+
+}
+
+static void unix_udp_skb_redir_to_connected(struct test_sockmap_listen *skel,
+					    struct bpf_map *inner_map, int family)
+{
+	int verdict = bpf_program__fd(skel->progs.prog_skb_verdict);
+	int verdict_map = bpf_map__fd(skel->maps.verdict_map);
+	int sock_map = bpf_map__fd(inner_map);
+	int err;
+
+	err = xbpf_prog_attach(verdict, sock_map, BPF_SK_SKB_VERDICT, 0);
+	if (err)
+		return;
+
+	skel->bss->test_ingress = false;
+	unix_udp_redir_to_connected(family, sock_map, verdict_map, REDIR_EGRESS);
+	skel->bss->test_ingress = true;
+	unix_udp_redir_to_connected(family, sock_map, verdict_map, REDIR_INGRESS);
+
+	xbpf_prog_detach2(verdict, sock_map, BPF_SK_SKB_VERDICT);
+}
+
+static void test_udp_unix_redir(struct test_sockmap_listen *skel, struct bpf_map *map,
+				int family)
+{
+	const char *family_name, *map_name;
+	char s[MAX_TEST_NAME];
+
+	family_name = family_str(family);
+	map_name = map_type_str(map);
+	snprintf(s, sizeof(s), "%s %s %s", map_name, family_name, __func__);
+	if (!test__start_subtest(s))
+		return;
+	udp_unix_skb_redir_to_connected(skel, map, family);
+	unix_udp_skb_redir_to_connected(skel, map, family);
+}
+
 static void test_reuseport(struct test_sockmap_listen *skel,
 			   struct bpf_map *map, int family, int sotype)
 {
@@ -1864,6 +2086,8 @@ void test_sockmap_listen(void)
 	test_udp_redir(skel, skel->maps.sock_map, AF_INET);
 	test_udp_redir(skel, skel->maps.sock_map, AF_INET6);
 	test_unix_redir(skel, skel->maps.sock_map, SOCK_DGRAM);
+	test_udp_unix_redir(skel, skel->maps.sock_map, AF_INET);
+	test_udp_unix_redir(skel, skel->maps.sock_map, AF_INET6);
 
 	skel->bss->test_sockmap = false;
 	run_tests(skel, skel->maps.sock_hash, AF_INET);
@@ -1871,6 +2095,8 @@ void test_sockmap_listen(void)
 	test_udp_redir(skel, skel->maps.sock_hash, AF_INET);
 	test_udp_redir(skel, skel->maps.sock_hash, AF_INET6);
 	test_unix_redir(skel, skel->maps.sock_hash, SOCK_DGRAM);
+	test_udp_unix_redir(skel, skel->maps.sock_hash, AF_INET);
+	test_udp_unix_redir(skel, skel->maps.sock_hash, AF_INET6);
 
 	test_sockmap_listen__destroy(skel);
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [Patch bpf-next 00/19] sock_map: add non-TCP and cross-protocol support
  2021-02-03  4:16 [Patch bpf-next 00/19] sock_map: add non-TCP and cross-protocol support Cong Wang
                   ` (18 preceding siblings ...)
  2021-02-03  4:16 ` [Patch bpf-next 19/19] selftests/bpf: add test case for redirection between udp and unix Cong Wang
@ 2021-02-03 17:48 ` Alexei Starovoitov
  2021-02-03 19:22   ` Cong Wang
  19 siblings, 1 reply; 40+ messages in thread
From: Alexei Starovoitov @ 2021-02-03 17:48 UTC (permalink / raw)
  To: Cong Wang
  Cc: netdev, bpf, duanxiongchun, wangdongdong.6, jiang.wang, Cong Wang

On Tue, Feb 02, 2021 at 08:16:17PM -0800, Cong Wang wrote:
> From: Cong Wang <cong.wang@bytedance.com>
> 
> Currently sockmap only fully supports TCP, UDP is partially supported
> as it is only allowed to add into sockmap. This patch extends sockmap
> with: 1) full UDP support; 2) full AF_UNIX dgram support; 3) cross
> protocol support. Our goal is to allow socket splice between AF_UNIX
> dgram and UDP.

Please expand on the use case. The 'splice between af_unix and udp'
doesn't tell me much. The selftest doesn't help to understand the scope either.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Patch bpf-next 00/19] sock_map: add non-TCP and cross-protocol support
  2021-02-03 17:48 ` [Patch bpf-next 00/19] sock_map: add non-TCP and cross-protocol support Alexei Starovoitov
@ 2021-02-03 19:22   ` Cong Wang
  2021-02-03 20:29     ` John Fastabend
  0 siblings, 1 reply; 40+ messages in thread
From: Cong Wang @ 2021-02-03 19:22 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Linux Kernel Network Developers, bpf, duanxiongchun,
	Dongdong Wang, jiang.wang, Cong Wang

On Wed, Feb 3, 2021 at 9:48 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Feb 02, 2021 at 08:16:17PM -0800, Cong Wang wrote:
> > From: Cong Wang <cong.wang@bytedance.com>
> >
> > Currently sockmap only fully supports TCP, UDP is partially supported
> > as it is only allowed to add into sockmap. This patch extends sockmap
> > with: 1) full UDP support; 2) full AF_UNIX dgram support; 3) cross
> > protocol support. Our goal is to allow socket splice between AF_UNIX
> > dgram and UDP.
>
> Please expand on the use case. The 'splice between af_unix and udp'
> doesn't tell me much. The selftest doesn't help to understand the scope either.

Sure. We have thousands of services connected to a daemon on every host
with UNIX dgram sockets, after they are moved into VM, we have to add a proxy
to forward these communications from VM to host, because rewriting thousands
of them is not practical. This proxy uses a UNIX socket connected to services
and uses a UDP socket to connect to the host. It is inefficient because data is
copied between kernel space and user space twice, and we can not use
splice() which only supports TCP. Therefore, we want to use sockmap to do
the splicing without even going to user-space at all (after the initial setup).

My colleague Jiang (already Cc'ed) is working on the sockmap support for
vsock so that we can move from UDP to vsock for host-VM communications.

If this is useful, I can add it in this cover letter in the next update.

Thanks.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Patch bpf-next 00/19] sock_map: add non-TCP and cross-protocol support
  2021-02-03 19:22   ` Cong Wang
@ 2021-02-03 20:29     ` John Fastabend
  0 siblings, 0 replies; 40+ messages in thread
From: John Fastabend @ 2021-02-03 20:29 UTC (permalink / raw)
  To: Cong Wang, Alexei Starovoitov
  Cc: Linux Kernel Network Developers, bpf, duanxiongchun,
	Dongdong Wang, jiang.wang, Cong Wang

Cong Wang wrote:
> On Wed, Feb 3, 2021 at 9:48 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Tue, Feb 02, 2021 at 08:16:17PM -0800, Cong Wang wrote:
> > > From: Cong Wang <cong.wang@bytedance.com>
> > >
> > > Currently sockmap only fully supports TCP, UDP is partially supported
> > > as it is only allowed to add into sockmap. This patch extends sockmap
> > > with: 1) full UDP support; 2) full AF_UNIX dgram support; 3) cross
> > > protocol support. Our goal is to allow socket splice between AF_UNIX
> > > dgram and UDP.
> >
> > Please expand on the use case. The 'splice between af_unix and udp'
> > doesn't tell me much. The selftest doesn't help to understand the scope either.
> 
> Sure. We have thousands of services connected to a daemon on every host
> with UNIX dgram sockets, after they are moved into VM, we have to add a proxy
> to forward these communications from VM to host, because rewriting thousands
> of them is not practical. This proxy uses a UNIX socket connected to services
> and uses a UDP socket to connect to the host. It is inefficient because data is
> copied between kernel space and user space twice, and we can not use
> splice() which only supports TCP. Therefore, we want to use sockmap to do
> the splicing without even going to user-space at all (after the initial setup).

Thanks for the details. We also have a use-case similar to TCP sockets
to apply policy/redirect to UDP sockets so will want similar semantics to
how TCP skmsg programs work on egress.

> 
> My colleague Jiang (already Cc'ed) is working on the sockmap support for
> vsock so that we can move from UDP to vsock for host-VM communications.

Great. The host-VM channel came up a few times in the initial sockmap work,
but I never got around to starting.

> 
> If this is useful, I can add it in this cover letter in the next update.
> 

Please add to the cover letter. I'll review the series today or
tomorrow, I have a couple things on the TODO list for today that
I need to get done first.

> Thanks.

Thanks for doing this work.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Patch bpf-next 01/19] bpf: rename BPF_STREAM_PARSER to BPF_SOCK_MAP
  2021-02-03  4:16 ` [Patch bpf-next 01/19] bpf: rename BPF_STREAM_PARSER to BPF_SOCK_MAP Cong Wang
@ 2021-02-05 10:32   ` Jakub Sitnicki
  2021-02-09  1:40     ` Cong Wang
  2021-02-08  8:21   ` John Fastabend
  1 sibling, 1 reply; 40+ messages in thread
From: Jakub Sitnicki @ 2021-02-05 10:32 UTC (permalink / raw)
  To: Cong Wang
  Cc: netdev, bpf, duanxiongchun, wangdongdong.6, jiang.wang,
	Cong Wang, John Fastabend, Daniel Borkmann, Lorenz Bauer

On Wed, Feb 03, 2021 at 05:16 AM CET, Cong Wang wrote:
> From: Cong Wang <cong.wang@bytedance.com>
>
> Before we add non-TCP support, it is necessary to rename
> BPF_STREAM_PARSER as it will be no longer specific to TCP,
> and it does not have to be a parser either.
>
> This patch renames BPF_STREAM_PARSER to BPF_SOCK_MAP, so
> that sock_map.c hopefully would be protocol-independent.
>
> Also, improve its Kconfig description to avoid confusion.
>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Jakub Sitnicki <jakub@cloudflare.com>
> Cc: Lorenz Bauer <lmb@cloudflare.com>
> Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> ---
>  include/linux/bpf.h       |  4 ++--
>  include/linux/bpf_types.h |  2 +-
>  include/net/tcp.h         |  4 ++--
>  include/net/udp.h         |  4 ++--
>  net/Kconfig               | 13 ++++++-------
>  net/core/Makefile         |  2 +-
>  net/ipv4/Makefile         |  2 +-
>  net/ipv4/tcp_bpf.c        |  4 ++--
>  8 files changed, 17 insertions(+), 18 deletions(-)

We also have a couple of references to CONFIG_BPF_STREAM_PARSER in
tools/tests:

$ git grep -i bpf_stream_parser
...
tools/bpf/bpftool/feature.c:            { "CONFIG_BPF_STREAM_PARSER", },
tools/testing/selftests/bpf/config:CONFIG_BPF_STREAM_PARSER=y

[...]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Patch bpf-next 18/19] selftests/bpf: add test cases for unix and udp sockmap
  2021-02-03  4:16 ` [Patch bpf-next 18/19] selftests/bpf: add test cases for unix and udp sockmap Cong Wang
@ 2021-02-05 10:53   ` Jakub Sitnicki
  2021-02-08 18:43     ` Cong Wang
  0 siblings, 1 reply; 40+ messages in thread
From: Jakub Sitnicki @ 2021-02-05 10:53 UTC (permalink / raw)
  To: Cong Wang
  Cc: netdev, bpf, duanxiongchun, wangdongdong.6, jiang.wang,
	Cong Wang, John Fastabend, Daniel Borkmann, Lorenz Bauer

On Wed, Feb 03, 2021 at 05:16 AM CET, Cong Wang wrote:
> From: Cong Wang <cong.wang@bytedance.com>
>
> Add two test cases to ensure redirection between two
> AF_UNIX sockets or two UDP sockets work.
>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Jakub Sitnicki <jakub@cloudflare.com>
> Cc: Lorenz Bauer <lmb@cloudflare.com>
> Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> ---

If you extract a helper for creating a pair of connected sockets that:

 1) delegates to socketpair() for AF_UNIX,
 2) emulates socketpair() for INET/DGRAM,

then tests for INET and UNIX datagram sockets could share code.

[...]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Patch bpf-next 02/19] skmsg: get rid of struct sk_psock_parser
  2021-02-03  4:16 ` [Patch bpf-next 02/19] skmsg: get rid of struct sk_psock_parser Cong Wang
@ 2021-02-05 11:25   ` Jakub Sitnicki
  2021-02-08  8:39     ` John Fastabend
  0 siblings, 1 reply; 40+ messages in thread
From: Jakub Sitnicki @ 2021-02-05 11:25 UTC (permalink / raw)
  To: Cong Wang
  Cc: netdev, bpf, duanxiongchun, wangdongdong.6, jiang.wang,
	Cong Wang, John Fastabend, Daniel Borkmann, Lorenz Bauer

On Wed, Feb 03, 2021 at 05:16 AM CET, Cong Wang wrote:
> From: Cong Wang <cong.wang@bytedance.com>
>
> struct sk_psock_parser is embedded in sk_psock, it is
> unnecessary as skb verdict also uses ->saved_data_ready.
> We can simply fold these fields into sk_psock.
>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Jakub Sitnicki <jakub@cloudflare.com>
> Cc: Lorenz Bauer <lmb@cloudflare.com>
> Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> ---

This one looks like a candidate for splitting out of the series, as it
stands on its own, to make the itself series smaller.

Also, it seems that we always have:

  parser.enabled/bpf_running == (saved_data_ready != NULL)

Maybe parser.enabled can be turned into a predicate function.

[...]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Patch bpf-next 03/19] skmsg: use skb ext instead of TCP_SKB_CB
  2021-02-03  4:16 ` [Patch bpf-next 03/19] skmsg: use skb ext instead of TCP_SKB_CB Cong Wang
@ 2021-02-05 22:09   ` Jakub Sitnicki
  2021-02-08 18:56     ` Cong Wang
  0 siblings, 1 reply; 40+ messages in thread
From: Jakub Sitnicki @ 2021-02-05 22:09 UTC (permalink / raw)
  To: Cong Wang
  Cc: netdev, bpf, duanxiongchun, wangdongdong.6, jiang.wang,
	Cong Wang, John Fastabend, Daniel Borkmann, Lorenz Bauer

On Wed, Feb 03, 2021 at 05:16 AM CET, Cong Wang wrote:
> From: Cong Wang <cong.wang@bytedance.com>
>
> Currently TCP_SKB_CB() is hard-coded in skmsg code, it certainly
> won't work for any other non-TCP protocols. We can move them to
> skb ext instead of playing with skb cb, which is harder to make
> correct.
>
> Of course, except ->data_end, which is used by
> sk_skb_convert_ctx_access() to adjust compile-time constant offset.
> Fortunately, we can reuse the anonymous union where the field
> 'tcp_tsorted_anchor' is and save/restore the overwritten part
> before/after a brief use.
>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Jakub Sitnicki <jakub@cloudflare.com>
> Cc: Lorenz Bauer <lmb@cloudflare.com>
> Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> ---
>  include/linux/skbuff.h |  4 ++++
>  include/linux/skmsg.h  | 45 ++++++++++++++++++++++++++++++++++++++++++
>  include/net/tcp.h      | 25 -----------------------
>  net/Kconfig            |  1 +
>  net/core/filter.c      |  3 +--
>  net/core/skbuff.c      |  7 +++++++
>  net/core/skmsg.c       | 44 ++++++++++++++++++++++++++++-------------
>  net/core/sock_map.c    | 12 +++++------
>  8 files changed, 94 insertions(+), 47 deletions(-)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 46f901adf1a8..12a28268233a 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -755,6 +755,7 @@ struct sk_buff {
>  			void		(*destructor)(struct sk_buff *skb);
>  		};
>  		struct list_head	tcp_tsorted_anchor;
> +		void			*data_end;
>  	};

I think we can avoid `data_end` by computing it in BPF with the help of
a scratch register. Similar to how we compute skb_shinfo(skb) in
bpf_convert_shinfo_access(). Something like:

static struct bpf_insn *bpf_convert_data_end_access(const struct bpf_insn *si,
						    struct bpf_insn *insn)
{
	/* dst_reg = skb->data */
	*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct sk_buff, data),
			      si->dst_reg, si->src_reg,
			      offsetof(struct sk_buff, data));
	/* AX = skb->len */
	*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct sk_buff, len),
			      BPF_REG_AX, si->src_reg,
			      offsetof(struct sk_buff, len));
	/* dst_reg = skb->data + skb->len */
	*insn++ = BPF_ALU64_REG(BPF_ADD, si->dst_reg, BPF_REG_AX);
	/* AX = skb->data_len */
	*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct sk_buff, data_len),
			      BPF_REG_AX, si->src_reg,
			      offsetof(struct sk_buff, data_len));
	/* dst_reg = skb->data + skb->len - skb->data_len */
	*insn++ = BPF_ALU64_REG(BPF_SUB, si->dst_reg, BPF_REG_AX);

	return insn;
}

[...]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [Patch bpf-next 01/19] bpf: rename BPF_STREAM_PARSER to BPF_SOCK_MAP
  2021-02-03  4:16 ` [Patch bpf-next 01/19] bpf: rename BPF_STREAM_PARSER to BPF_SOCK_MAP Cong Wang
  2021-02-05 10:32   ` Jakub Sitnicki
@ 2021-02-08  8:21   ` John Fastabend
  2021-02-08  9:50     ` Lorenz Bauer
  2021-02-09  1:45     ` Cong Wang
  1 sibling, 2 replies; 40+ messages in thread
From: John Fastabend @ 2021-02-08  8:21 UTC (permalink / raw)
  To: Cong Wang, netdev
  Cc: bpf, duanxiongchun, wangdongdong.6, jiang.wang, Cong Wang,
	John Fastabend, Daniel Borkmann, Jakub Sitnicki, Lorenz Bauer

Cong Wang wrote:
> From: Cong Wang <cong.wang@bytedance.com>
> 
> Before we add non-TCP support, it is necessary to rename
> BPF_STREAM_PARSER as it will be no longer specific to TCP,
> and it does not have to be a parser either.
> 
> This patch renames BPF_STREAM_PARSER to BPF_SOCK_MAP, so
> that sock_map.c hopefully would be protocol-independent.
> 
> Also, improve its Kconfig description to avoid confusion.
> 
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Jakub Sitnicki <jakub@cloudflare.com>
> Cc: Lorenz Bauer <lmb@cloudflare.com>
> Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> ---

The BPF_STREAM_PARSER config was originally added because we need
the STREAM_PARSER define and wanted a way to get the 'depends on'
lines in Kconfig correct.

Rather than rename this, lets reduce its scope to just the set
of actions that need the STREAM_PARSER, this should be just the
stream parser programs. We probably should have done this sooner,
but doing it now will be fine.

I can probably draft a quick patch tomorrow if above is not clear.
It can go into bpf-next outside this series as well to reduce
the 19 patches a bit.

Thanks,
John

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [Patch bpf-next 04/19] sock_map: rename skb_parser and skb_verdict
  2021-02-03  4:16 ` [Patch bpf-next 04/19] sock_map: rename skb_parser and skb_verdict Cong Wang
@ 2021-02-08  8:27   ` John Fastabend
  0 siblings, 0 replies; 40+ messages in thread
From: John Fastabend @ 2021-02-08  8:27 UTC (permalink / raw)
  To: Cong Wang, netdev
  Cc: bpf, duanxiongchun, wangdongdong.6, jiang.wang, Cong Wang,
	John Fastabend, Daniel Borkmann, Jakub Sitnicki, Lorenz Bauer

Cong Wang wrote:
> From: Cong Wang <cong.wang@bytedance.com>
> 
> These two ebpf programs are tied to BPF_SK_SKB_STREAM_PARSER
> and BPF_SK_SKB_STREAM_VERDICT, rename them to reflect the fact
> they are currently used for TCP. And save the generic name
> skb_verdict for general use.
> 
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Jakub Sitnicki <jakub@cloudflare.com>
> Cc: Lorenz Bauer <lmb@cloudflare.com>
> Signed-off-by: Cong Wang <cong.wang@bytedance.com>

We've recently added support for running a verdict without a stream
parser. But the rename should also be OK, then it will match the
prog names which is nice.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [Patch bpf-next 05/19] sock_map: introduce BPF_SK_SKB_VERDICT
  2021-02-03  4:16 ` [Patch bpf-next 05/19] sock_map: introduce BPF_SK_SKB_VERDICT Cong Wang
@ 2021-02-08  8:31   ` John Fastabend
  0 siblings, 0 replies; 40+ messages in thread
From: John Fastabend @ 2021-02-08  8:31 UTC (permalink / raw)
  To: Cong Wang, netdev
  Cc: bpf, duanxiongchun, wangdongdong.6, jiang.wang, Cong Wang,
	John Fastabend, Daniel Borkmann, Jakub Sitnicki, Lorenz Bauer

Cong Wang wrote:
> From: Cong Wang <cong.wang@bytedance.com>
> 
> I was planning to reuse BPF_SK_SKB_STREAM_VERDICT but its name is
> confusing and more importantly it seems kTLS relies on it to deliver
> sk_msg too. To avoid messing up kTLS, we can just reuse the stream
> verdict code but introduce a new type of eBPF program, skb_verdict.
> Users are not allowed to set stream_verdict and skb_verdict at the
> same time.
> 
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Jakub Sitnicki <jakub@cloudflare.com>
> Cc: Lorenz Bauer <lmb@cloudflare.com>
> Signed-off-by: Cong Wang <cong.wang@bytedance.com>

I think it will be better if we can keep the same name. Does it break
kTLS somehow? I'm not seeing it with a quick scan.

We can always alias a better name with the same value for readability.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Patch bpf-next 02/19] skmsg: get rid of struct sk_psock_parser
  2021-02-05 11:25   ` Jakub Sitnicki
@ 2021-02-08  8:39     ` John Fastabend
  2021-02-09  0:19       ` Cong Wang
  0 siblings, 1 reply; 40+ messages in thread
From: John Fastabend @ 2021-02-08  8:39 UTC (permalink / raw)
  To: Jakub Sitnicki, Cong Wang
  Cc: netdev, bpf, duanxiongchun, wangdongdong.6, jiang.wang,
	Cong Wang, John Fastabend, Daniel Borkmann, Lorenz Bauer

Jakub Sitnicki wrote:
> On Wed, Feb 03, 2021 at 05:16 AM CET, Cong Wang wrote:
> > From: Cong Wang <cong.wang@bytedance.com>
> >
> > struct sk_psock_parser is embedded in sk_psock, it is
> > unnecessary as skb verdict also uses ->saved_data_ready.
> > We can simply fold these fields into sk_psock.
> >
> > Cc: John Fastabend <john.fastabend@gmail.com>
> > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > Cc: Jakub Sitnicki <jakub@cloudflare.com>
> > Cc: Lorenz Bauer <lmb@cloudflare.com>
> > Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> > ---
> 
> This one looks like a candidate for splitting out of the series, as it
> stands on its own, to make the itself series smaller.
> 
> Also, it seems that we always have:
> 
>   parser.enabled/bpf_running == (saved_data_ready != NULL)
> 
> Maybe parser.enabled can be turned into a predicate function.
> 
> [...]

Agree. To speed this along consider breaking it into three
series.

 - couple cleanup things: this patch, config option, etc.

 - udp changes

 - af_unix changes.

Although if you really think udp changes and af_unix need to go
together that is fine imo. I think the basic rule is to try and avoid
getting patch counts above 10 or so if at all possible.

At least this patch, the renaming patch, and the config patch
can get pulled out into their own series so we can get those
merged and out of the way.

Thanks,
John

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Patch bpf-next 08/19] udp: implement ->read_sock() for sockmap
  2021-02-03  4:16 ` [Patch bpf-next 08/19] udp: implement ->read_sock() for sockmap Cong Wang
@ 2021-02-08  9:48   ` Lorenz Bauer
  2021-02-09  1:35     ` Cong Wang
  0 siblings, 1 reply; 40+ messages in thread
From: Lorenz Bauer @ 2021-02-08  9:48 UTC (permalink / raw)
  To: Cong Wang
  Cc: Networking, bpf, duanxiongchun, wangdongdong.6, jiang.wang,
	Cong Wang, John Fastabend, Daniel Borkmann, Jakub Sitnicki

On Wed, 3 Feb 2021 at 04:17, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> From: Cong Wang <cong.wang@bytedance.com>
>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Jakub Sitnicki <jakub@cloudflare.com>
> Cc: Lorenz Bauer <lmb@cloudflare.com>
> Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> ---
>  include/net/udp.h  |  2 ++
>  net/ipv4/af_inet.c |  1 +
>  net/ipv4/udp.c     | 34 ++++++++++++++++++++++++++++++++++
>  3 files changed, 37 insertions(+)
>
> diff --git a/include/net/udp.h b/include/net/udp.h
> index 13f9354dbd3e..b6b75cabf4e4 100644
> --- a/include/net/udp.h
> +++ b/include/net/udp.h
> @@ -327,6 +327,8 @@ struct sock *__udp6_lib_lookup(struct net *net,
>                                struct sk_buff *skb);
>  struct sock *udp6_lib_lookup_skb(const struct sk_buff *skb,
>                                  __be16 sport, __be16 dport);
> +int udp_read_sock(struct sock *sk, read_descriptor_t *desc,
> +                 sk_read_actor_t recv_actor);
>
>  /* UDP uses skb->dev_scratch to cache as much information as possible and avoid
>   * possibly multiple cache miss on dequeue()
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index d184d9379a92..4a4c6d3d2786 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -1072,6 +1072,7 @@ const struct proto_ops inet_dgram_ops = {
>         .getsockopt        = sock_common_getsockopt,
>         .sendmsg           = inet_sendmsg,
>         .sendmsg_locked    = udp_sendmsg_locked,
> +       .read_sock         = udp_read_sock,
>         .recvmsg           = inet_recvmsg,
>         .mmap              = sock_no_mmap,
>         .sendpage          = inet_sendpage,
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 635e1e8b2968..6dffbcec0b51 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -1792,6 +1792,40 @@ struct sk_buff *__skb_recv_udp(struct sock *sk, unsigned int flags,
>  }
>  EXPORT_SYMBOL(__skb_recv_udp);
>
> +int udp_read_sock(struct sock *sk, read_descriptor_t *desc,
> +                 sk_read_actor_t recv_actor)
> +{
> +       struct sk_buff *skb;
> +       int copied = 0, err;
> +
> +       while (1) {
> +               int offset = 0;
> +
> +               skb = __skb_recv_udp(sk, 0, 1, &offset, &err);

Seems like err isn't used outside of the loop, is that on purpose? If
yes, how about moving the declaration of err to be with offset. Maybe
rename to ignored?

> +               if (!skb)
> +                       break;
> +               if (offset < skb->len) {
> +                       int used;
> +                       size_t len;
> +
> +                       len = skb->len - offset;
> +                       used = recv_actor(desc, skb, offset, len);
> +                       if (used <= 0) {
> +                               if (!copied)
> +                                       copied = used;
> +                               break;
> +                       } else if (used <= len) {

In which case can used be > len?


> +                               copied += used;
> +                               offset += used;
> +                       }
> +               }
> +               if (!desc->count)
> +                       break;
> +       }
> +
> +       return copied;
> +}
> +
>  /*
>   *     This should be easy, if there is something there we
>   *     return it, otherwise we block.
> --
> 2.25.1
>


--
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Patch bpf-next 01/19] bpf: rename BPF_STREAM_PARSER to BPF_SOCK_MAP
  2021-02-08  8:21   ` John Fastabend
@ 2021-02-08  9:50     ` Lorenz Bauer
  2021-02-09  1:45     ` Cong Wang
  1 sibling, 0 replies; 40+ messages in thread
From: Lorenz Bauer @ 2021-02-08  9:50 UTC (permalink / raw)
  To: John Fastabend
  Cc: Cong Wang, Networking, bpf, duanxiongchun, wangdongdong.6,
	jiang.wang, Cong Wang, Daniel Borkmann, Jakub Sitnicki

On Mon, 8 Feb 2021 at 08:21, John Fastabend <john.fastabend@gmail.com> wrote:
...
>
> Rather than rename this, lets reduce its scope to just the set
> of actions that need the STREAM_PARSER, this should be just the
> stream parser programs. We probably should have done this sooner,
> but doing it now will be fine.

So sockmap would not be hidden behind a CONFIG anymore? That
would be great.


-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Patch bpf-next 18/19] selftests/bpf: add test cases for unix and udp sockmap
  2021-02-05 10:53   ` Jakub Sitnicki
@ 2021-02-08 18:43     ` Cong Wang
  0 siblings, 0 replies; 40+ messages in thread
From: Cong Wang @ 2021-02-08 18:43 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: Linux Kernel Network Developers, bpf, duanxiongchun,
	Dongdong Wang, jiang.wang, Cong Wang, John Fastabend,
	Daniel Borkmann, Lorenz Bauer

On Fri, Feb 5, 2021 at 2:53 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> On Wed, Feb 03, 2021 at 05:16 AM CET, Cong Wang wrote:
> > From: Cong Wang <cong.wang@bytedance.com>
> >
> > Add two test cases to ensure redirection between two
> > AF_UNIX sockets or two UDP sockets work.
> >
> > Cc: John Fastabend <john.fastabend@gmail.com>
> > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > Cc: Jakub Sitnicki <jakub@cloudflare.com>
> > Cc: Lorenz Bauer <lmb@cloudflare.com>
> > Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> > ---
>
> If you extract a helper for creating a pair of connected sockets that:
>
>  1) delegates to socketpair() for AF_UNIX,
>  2) emulates socketpair() for INET/DGRAM,
>
> then tests for INET and UNIX datagram sockets could share code.

Good idea. Will do it in the next update.

Thanks!

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Patch bpf-next 03/19] skmsg: use skb ext instead of TCP_SKB_CB
  2021-02-05 22:09   ` Jakub Sitnicki
@ 2021-02-08 18:56     ` Cong Wang
  0 siblings, 0 replies; 40+ messages in thread
From: Cong Wang @ 2021-02-08 18:56 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: Linux Kernel Network Developers, bpf, duanxiongchun,
	Dongdong Wang, jiang.wang, Cong Wang, John Fastabend,
	Daniel Borkmann, Lorenz Bauer

On Fri, Feb 5, 2021 at 2:09 PM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> On Wed, Feb 03, 2021 at 05:16 AM CET, Cong Wang wrote:
> > From: Cong Wang <cong.wang@bytedance.com>
> >
> > Currently TCP_SKB_CB() is hard-coded in skmsg code, it certainly
> > won't work for any other non-TCP protocols. We can move them to
> > skb ext instead of playing with skb cb, which is harder to make
> > correct.
> >
> > Of course, except ->data_end, which is used by
> > sk_skb_convert_ctx_access() to adjust compile-time constant offset.
> > Fortunately, we can reuse the anonymous union where the field
> > 'tcp_tsorted_anchor' is and save/restore the overwritten part
> > before/after a brief use.
> >
> > Cc: John Fastabend <john.fastabend@gmail.com>
> > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > Cc: Jakub Sitnicki <jakub@cloudflare.com>
> > Cc: Lorenz Bauer <lmb@cloudflare.com>
> > Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> > ---
> >  include/linux/skbuff.h |  4 ++++
> >  include/linux/skmsg.h  | 45 ++++++++++++++++++++++++++++++++++++++++++
> >  include/net/tcp.h      | 25 -----------------------
> >  net/Kconfig            |  1 +
> >  net/core/filter.c      |  3 +--
> >  net/core/skbuff.c      |  7 +++++++
> >  net/core/skmsg.c       | 44 ++++++++++++++++++++++++++++-------------
> >  net/core/sock_map.c    | 12 +++++------
> >  8 files changed, 94 insertions(+), 47 deletions(-)
> >
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index 46f901adf1a8..12a28268233a 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -755,6 +755,7 @@ struct sk_buff {
> >                       void            (*destructor)(struct sk_buff *skb);
> >               };
> >               struct list_head        tcp_tsorted_anchor;
> > +             void                    *data_end;
> >       };
>
> I think we can avoid `data_end` by computing it in BPF with the help of
> a scratch register. Similar to how we compute skb_shinfo(skb) in
> bpf_convert_shinfo_access(). Something like:

Sounds like an excellent idea! It is certainly much better if we can just
compute it at run-time. I will give this a try.

Thanks!

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Patch bpf-next 02/19] skmsg: get rid of struct sk_psock_parser
  2021-02-08  8:39     ` John Fastabend
@ 2021-02-09  0:19       ` Cong Wang
  0 siblings, 0 replies; 40+ messages in thread
From: Cong Wang @ 2021-02-09  0:19 UTC (permalink / raw)
  To: John Fastabend
  Cc: Jakub Sitnicki, Linux Kernel Network Developers, bpf,
	duanxiongchun, Dongdong Wang, jiang.wang, Cong Wang,
	Daniel Borkmann, Lorenz Bauer

On Mon, Feb 8, 2021 at 12:39 AM John Fastabend <john.fastabend@gmail.com> wrote:
>
> Jakub Sitnicki wrote:
> > On Wed, Feb 03, 2021 at 05:16 AM CET, Cong Wang wrote:
> > > From: Cong Wang <cong.wang@bytedance.com>
> > >
> > > struct sk_psock_parser is embedded in sk_psock, it is
> > > unnecessary as skb verdict also uses ->saved_data_ready.
> > > We can simply fold these fields into sk_psock.
> > >
> > > Cc: John Fastabend <john.fastabend@gmail.com>
> > > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > > Cc: Jakub Sitnicki <jakub@cloudflare.com>
> > > Cc: Lorenz Bauer <lmb@cloudflare.com>
> > > Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> > > ---
> >
> > This one looks like a candidate for splitting out of the series, as it
> > stands on its own, to make the itself series smaller.
> >
> > Also, it seems that we always have:
> >
> >   parser.enabled/bpf_running == (saved_data_ready != NULL)
> >
> > Maybe parser.enabled can be turned into a predicate function.

Yeah, this looks cleaner.

> >
> > [...]
>
> Agree. To speed this along consider breaking it into three
> series.
>
>  - couple cleanup things: this patch, config option, etc.
>
>  - udp changes
>
>  - af_unix changes.
>
> Although if you really think udp changes and af_unix need to go
> together that is fine imo. I think the basic rule is to try and avoid
> getting patch counts above 10 or so if at all possible.
>
> At least this patch, the renaming patch, and the config patch
> can get pulled out into their own series so we can get those
> merged and out of the way.

Sounds good. Since each patch is already separated, there is no
extra burden on my side to split the whole patchset as suggested
above, except adjusting the cover letters.

Thanks.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Patch bpf-next 08/19] udp: implement ->read_sock() for sockmap
  2021-02-08  9:48   ` Lorenz Bauer
@ 2021-02-09  1:35     ` Cong Wang
  0 siblings, 0 replies; 40+ messages in thread
From: Cong Wang @ 2021-02-09  1:35 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Networking, bpf, duanxiongchun, Dongdong Wang, jiang.wang,
	Cong Wang, John Fastabend, Daniel Borkmann, Jakub Sitnicki

On Mon, Feb 8, 2021 at 1:48 AM Lorenz Bauer <lmb@cloudflare.com> wrote:
>
> On Wed, 3 Feb 2021 at 04:17, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >
> > From: Cong Wang <cong.wang@bytedance.com>
> >
> > Cc: John Fastabend <john.fastabend@gmail.com>
> > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > Cc: Jakub Sitnicki <jakub@cloudflare.com>
> > Cc: Lorenz Bauer <lmb@cloudflare.com>
> > Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> > ---
> >  include/net/udp.h  |  2 ++
> >  net/ipv4/af_inet.c |  1 +
> >  net/ipv4/udp.c     | 34 ++++++++++++++++++++++++++++++++++
> >  3 files changed, 37 insertions(+)
> >
> > diff --git a/include/net/udp.h b/include/net/udp.h
> > index 13f9354dbd3e..b6b75cabf4e4 100644
> > --- a/include/net/udp.h
> > +++ b/include/net/udp.h
> > @@ -327,6 +327,8 @@ struct sock *__udp6_lib_lookup(struct net *net,
> >                                struct sk_buff *skb);
> >  struct sock *udp6_lib_lookup_skb(const struct sk_buff *skb,
> >                                  __be16 sport, __be16 dport);
> > +int udp_read_sock(struct sock *sk, read_descriptor_t *desc,
> > +                 sk_read_actor_t recv_actor);
> >
> >  /* UDP uses skb->dev_scratch to cache as much information as possible and avoid
> >   * possibly multiple cache miss on dequeue()
> > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> > index d184d9379a92..4a4c6d3d2786 100644
> > --- a/net/ipv4/af_inet.c
> > +++ b/net/ipv4/af_inet.c
> > @@ -1072,6 +1072,7 @@ const struct proto_ops inet_dgram_ops = {
> >         .getsockopt        = sock_common_getsockopt,
> >         .sendmsg           = inet_sendmsg,
> >         .sendmsg_locked    = udp_sendmsg_locked,
> > +       .read_sock         = udp_read_sock,
> >         .recvmsg           = inet_recvmsg,
> >         .mmap              = sock_no_mmap,
> >         .sendpage          = inet_sendpage,
> > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > index 635e1e8b2968..6dffbcec0b51 100644
> > --- a/net/ipv4/udp.c
> > +++ b/net/ipv4/udp.c
> > @@ -1792,6 +1792,40 @@ struct sk_buff *__skb_recv_udp(struct sock *sk, unsigned int flags,
> >  }
> >  EXPORT_SYMBOL(__skb_recv_udp);
> >
> > +int udp_read_sock(struct sock *sk, read_descriptor_t *desc,
> > +                 sk_read_actor_t recv_actor)
> > +{
> > +       struct sk_buff *skb;
> > +       int copied = 0, err;
> > +
> > +       while (1) {
> > +               int offset = 0;
> > +
> > +               skb = __skb_recv_udp(sk, 0, 1, &offset, &err);
>
> Seems like err isn't used outside of the loop, is that on purpose? If
> yes, how about moving the declaration of err to be with offset. Maybe
> rename to ignored?

It should be moved inside the loop.

>
> > +               if (!skb)
> > +                       break;
> > +               if (offset < skb->len) {
> > +                       int used;
> > +                       size_t len;
> > +
> > +                       len = skb->len - offset;
> > +                       used = recv_actor(desc, skb, offset, len);
> > +                       if (used <= 0) {
> > +                               if (!copied)
> > +                                       copied = used;
> > +                               break;
> > +                       } else if (used <= len) {
>
> In which case can used be > len?

I think in splice() case it could return a larger value than 'len', but
UDP does not support splice() even after this patchset. I can change
it to 'else', or just leave it as it is, in case we will add splice() support in
the future.

Thanks.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Patch bpf-next 01/19] bpf: rename BPF_STREAM_PARSER to BPF_SOCK_MAP
  2021-02-05 10:32   ` Jakub Sitnicki
@ 2021-02-09  1:40     ` Cong Wang
  0 siblings, 0 replies; 40+ messages in thread
From: Cong Wang @ 2021-02-09  1:40 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: Linux Kernel Network Developers, bpf, duanxiongchun,
	Dongdong Wang, jiang.wang, Cong Wang, John Fastabend,
	Daniel Borkmann, Lorenz Bauer

On Fri, Feb 5, 2021 at 2:32 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> On Wed, Feb 03, 2021 at 05:16 AM CET, Cong Wang wrote:
> > From: Cong Wang <cong.wang@bytedance.com>
> >
> > Before we add non-TCP support, it is necessary to rename
> > BPF_STREAM_PARSER as it will be no longer specific to TCP,
> > and it does not have to be a parser either.
> >
> > This patch renames BPF_STREAM_PARSER to BPF_SOCK_MAP, so
> > that sock_map.c hopefully would be protocol-independent.
> >
> > Also, improve its Kconfig description to avoid confusion.
> >
> > Cc: John Fastabend <john.fastabend@gmail.com>
> > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > Cc: Jakub Sitnicki <jakub@cloudflare.com>
> > Cc: Lorenz Bauer <lmb@cloudflare.com>
> > Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> > ---
> >  include/linux/bpf.h       |  4 ++--
> >  include/linux/bpf_types.h |  2 +-
> >  include/net/tcp.h         |  4 ++--
> >  include/net/udp.h         |  4 ++--
> >  net/Kconfig               | 13 ++++++-------
> >  net/core/Makefile         |  2 +-
> >  net/ipv4/Makefile         |  2 +-
> >  net/ipv4/tcp_bpf.c        |  4 ++--
> >  8 files changed, 17 insertions(+), 18 deletions(-)
>
> We also have a couple of references to CONFIG_BPF_STREAM_PARSER in
> tools/tests:
>
> $ git grep -i bpf_stream_parser
> ...
> tools/bpf/bpftool/feature.c:            { "CONFIG_BPF_STREAM_PARSER", },
> tools/testing/selftests/bpf/config:CONFIG_BPF_STREAM_PARSER=y

I think I did a grep in the whole tree, but still missed these two.

Thanks for catching it!

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Patch bpf-next 01/19] bpf: rename BPF_STREAM_PARSER to BPF_SOCK_MAP
  2021-02-08  8:21   ` John Fastabend
  2021-02-08  9:50     ` Lorenz Bauer
@ 2021-02-09  1:45     ` Cong Wang
  2021-02-09  6:48       ` John Fastabend
  1 sibling, 1 reply; 40+ messages in thread
From: Cong Wang @ 2021-02-09  1:45 UTC (permalink / raw)
  To: John Fastabend
  Cc: Linux Kernel Network Developers, bpf, duanxiongchun,
	Dongdong Wang, jiang.wang, Cong Wang, Daniel Borkmann,
	Jakub Sitnicki, Lorenz Bauer

On Mon, Feb 8, 2021 at 12:21 AM John Fastabend <john.fastabend@gmail.com> wrote:
>
> Cong Wang wrote:
> > From: Cong Wang <cong.wang@bytedance.com>
> >
> > Before we add non-TCP support, it is necessary to rename
> > BPF_STREAM_PARSER as it will be no longer specific to TCP,
> > and it does not have to be a parser either.
> >
> > This patch renames BPF_STREAM_PARSER to BPF_SOCK_MAP, so
> > that sock_map.c hopefully would be protocol-independent.
> >
> > Also, improve its Kconfig description to avoid confusion.
> >
> > Cc: John Fastabend <john.fastabend@gmail.com>
> > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > Cc: Jakub Sitnicki <jakub@cloudflare.com>
> > Cc: Lorenz Bauer <lmb@cloudflare.com>
> > Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> > ---
>
> The BPF_STREAM_PARSER config was originally added because we need
> the STREAM_PARSER define and wanted a way to get the 'depends on'
> lines in Kconfig correct.
>
> Rather than rename this, lets reduce its scope to just the set
> of actions that need the STREAM_PARSER, this should be just the
> stream parser programs. We probably should have done this sooner,
> but doing it now will be fine.

This makes sense, but we still need a Kconfig for the rest sockmap
code, right? At least for the dependency on NET_SOCK_MSG?

>
> I can probably draft a quick patch tomorrow if above is not clear.
> It can go into bpf-next outside this series as well to reduce
> the 19 patches a bit.

I can handle it in my next update too, like all other feedbacks.

Thanks.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Patch bpf-next 01/19] bpf: rename BPF_STREAM_PARSER to BPF_SOCK_MAP
  2021-02-09  1:45     ` Cong Wang
@ 2021-02-09  6:48       ` John Fastabend
  0 siblings, 0 replies; 40+ messages in thread
From: John Fastabend @ 2021-02-09  6:48 UTC (permalink / raw)
  To: Cong Wang, John Fastabend
  Cc: Linux Kernel Network Developers, bpf, duanxiongchun,
	Dongdong Wang, jiang.wang, Cong Wang, Daniel Borkmann,
	Jakub Sitnicki, Lorenz Bauer

Cong Wang wrote:
> On Mon, Feb 8, 2021 at 12:21 AM John Fastabend <john.fastabend@gmail.com> wrote:
> >
> > Cong Wang wrote:
> > > From: Cong Wang <cong.wang@bytedance.com>
> > >
> > > Before we add non-TCP support, it is necessary to rename
> > > BPF_STREAM_PARSER as it will be no longer specific to TCP,
> > > and it does not have to be a parser either.
> > >
> > > This patch renames BPF_STREAM_PARSER to BPF_SOCK_MAP, so
> > > that sock_map.c hopefully would be protocol-independent.
> > >
> > > Also, improve its Kconfig description to avoid confusion.
> > >
> > > Cc: John Fastabend <john.fastabend@gmail.com>
> > > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > > Cc: Jakub Sitnicki <jakub@cloudflare.com>
> > > Cc: Lorenz Bauer <lmb@cloudflare.com>
> > > Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> > > ---
> >
> > The BPF_STREAM_PARSER config was originally added because we need
> > the STREAM_PARSER define and wanted a way to get the 'depends on'
> > lines in Kconfig correct.
> >
> > Rather than rename this, lets reduce its scope to just the set
> > of actions that need the STREAM_PARSER, this should be just the
> > stream parser programs. We probably should have done this sooner,
> > but doing it now will be fine.
> 
> This makes sense, but we still need a Kconfig for the rest sockmap
> code, right? At least for the dependency on NET_SOCK_MSG?

Lets just enable NET_SOCK_MSG when CONFIG_BPF_SYSCALL is enabled. We
never put any of the other maps, devmap, cpumap, etc. behind an
explicit flag like this.

> 
> >
> > I can probably draft a quick patch tomorrow if above is not clear.
> > It can go into bpf-next outside this series as well to reduce
> > the 19 patches a bit.
> 
> I can handle it in my next update too, like all other feedbacks.

Great thanks. Especially because I haven't got to it yet today.

> 
> Thanks.



^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2021-02-09  6:49 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-03  4:16 [Patch bpf-next 00/19] sock_map: add non-TCP and cross-protocol support Cong Wang
2021-02-03  4:16 ` [Patch bpf-next 01/19] bpf: rename BPF_STREAM_PARSER to BPF_SOCK_MAP Cong Wang
2021-02-05 10:32   ` Jakub Sitnicki
2021-02-09  1:40     ` Cong Wang
2021-02-08  8:21   ` John Fastabend
2021-02-08  9:50     ` Lorenz Bauer
2021-02-09  1:45     ` Cong Wang
2021-02-09  6:48       ` John Fastabend
2021-02-03  4:16 ` [Patch bpf-next 02/19] skmsg: get rid of struct sk_psock_parser Cong Wang
2021-02-05 11:25   ` Jakub Sitnicki
2021-02-08  8:39     ` John Fastabend
2021-02-09  0:19       ` Cong Wang
2021-02-03  4:16 ` [Patch bpf-next 03/19] skmsg: use skb ext instead of TCP_SKB_CB Cong Wang
2021-02-05 22:09   ` Jakub Sitnicki
2021-02-08 18:56     ` Cong Wang
2021-02-03  4:16 ` [Patch bpf-next 04/19] sock_map: rename skb_parser and skb_verdict Cong Wang
2021-02-08  8:27   ` John Fastabend
2021-02-03  4:16 ` [Patch bpf-next 05/19] sock_map: introduce BPF_SK_SKB_VERDICT Cong Wang
2021-02-08  8:31   ` John Fastabend
2021-02-03  4:16 ` [Patch bpf-next 06/19] sock: introduce sk_prot->update_proto() Cong Wang
2021-02-03  4:16 ` [Patch bpf-next 07/19] udp: implement ->sendmsg_locked() Cong Wang
2021-02-03  4:16 ` [Patch bpf-next 08/19] udp: implement ->read_sock() for sockmap Cong Wang
2021-02-08  9:48   ` Lorenz Bauer
2021-02-09  1:35     ` Cong Wang
2021-02-03  4:16 ` [Patch bpf-next 09/19] udp: add ->read_sock() and ->sendmsg_locked() to ipv6 Cong Wang
2021-02-03  4:16 ` [Patch bpf-next 10/19] af_unix: implement ->sendmsg_locked for dgram socket Cong Wang
2021-02-03  4:16 ` [Patch bpf-next 11/19] af_unix: implement ->read_sock() for sockmap Cong Wang
2021-02-03  4:16 ` [Patch bpf-next 12/19] af_unix: implement ->update_proto() Cong Wang
2021-02-03  4:16 ` [Patch bpf-next 13/19] af_unix: set TCP_ESTABLISHED for datagram sockets too Cong Wang
2021-02-03  4:16 ` [Patch bpf-next 14/19] skmsg: extract __tcp_bpf_recvmsg() and tcp_bpf_wait_data() Cong Wang
2021-02-03  4:16 ` [Patch bpf-next 15/19] udp: implement udp_bpf_recvmsg() for sockmap Cong Wang
2021-02-03  4:16 ` [Patch bpf-next 16/19] af_unix: implement unix_dgram_bpf_recvmsg() Cong Wang
2021-02-03  4:16 ` [Patch bpf-next 17/19] sock_map: update sock type checks Cong Wang
2021-02-03  4:16 ` [Patch bpf-next 18/19] selftests/bpf: add test cases for unix and udp sockmap Cong Wang
2021-02-05 10:53   ` Jakub Sitnicki
2021-02-08 18:43     ` Cong Wang
2021-02-03  4:16 ` [Patch bpf-next 19/19] selftests/bpf: add test case for redirection between udp and unix Cong Wang
2021-02-03 17:48 ` [Patch bpf-next 00/19] sock_map: add non-TCP and cross-protocol support Alexei Starovoitov
2021-02-03 19:22   ` Cong Wang
2021-02-03 20:29     ` John Fastabend

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.