bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v5 00/11] sockmap: add sockmap support for unix datagram socket
@ 2021-07-04 19:02 Cong Wang
  2021-07-04 19:02 ` [PATCH bpf-next v5 01/11] sock_map: relax config dependency to CONFIG_NET Cong Wang
                   ` (11 more replies)
  0 siblings, 12 replies; 19+ messages in thread
From: Cong Wang @ 2021-07-04 19:02 UTC (permalink / raw)
  To: netdev; +Cc: bpf, Cong Wang

From: Cong Wang <cong.wang@bytedance.com>

This is the last patchset of the original large patchset. In the
previous patchset, a new BPF sockmap program BPF_SK_SKB_VERDICT
was introduced and UDP began to support it too. In this patchset,
we add BPF_SK_SKB_VERDICT support to Unix datagram socket, so that
we can finally splice Unix datagram socket and UDP socket. Please
check each patch description for more details.

To see the big picture, the previous patchsets are available here:
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=1e0ab70778bd86a90de438cc5e1535c115a7c396
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=89d69c5d0fbcabd8656459bc8b1a476d6f1efee4

and this patchset is available here:
https://github.com/congwang/linux/tree/sockmap3

---
v5: lift socket state check for dgram
    remove ->unhash() case
    add retries for EAGAIN in all test cases
    remove an unused parameter of __unix_dgram_recvmsg()
    rebase on the latest bpf-next

v4: fix af_unix disconnect case
    add unix_unhash()
    split out two small patches
    reduce u->iolock critical section
    remove an unused parameter of __unix_dgram_recvmsg()

v3: fix Kconfig dependency
    make unix_read_sock() static
    fix a UAF in unix_release()
    add a missing header unix_bpf.c

v2: separate out from the original large patchset
    rebase to the latest bpf-next
    clean up unix_read_sock()
    export sock_map_close()
    factor out some helpers in selftests for code reuse

Cong Wang (11):
  sock_map: relax config dependency to CONFIG_NET
  sock_map: lift socket state restriction for datagram sockets
  af_unix: implement ->read_sock() for sockmap
  af_unix: set TCP_ESTABLISHED for datagram sockets too
  af_unix: add a dummy ->close() for sockmap
  af_unix: implement ->psock_update_sk_prot()
  af_unix: implement unix_dgram_bpf_recvmsg()
  selftests/bpf: factor out udp_socketpair()
  selftests/bpf: factor out add_to_sockmap()
  selftests/bpf: add a test case for unix sockmap
  selftests/bpf: add test cases for redirection between udp and unix

 MAINTAINERS                                   |   1 +
 include/linux/bpf.h                           |  38 +-
 include/net/af_unix.h                         |  12 +
 kernel/bpf/Kconfig                            |   2 +-
 net/core/Makefile                             |   2 -
 net/core/sock_map.c                           |  22 +-
 net/ipv4/udp_bpf.c                            |   1 -
 net/unix/Makefile                             |   1 +
 net/unix/af_unix.c                            |  85 +++-
 net/unix/unix_bpf.c                           | 122 ++++++
 .../selftests/bpf/prog_tests/sockmap_listen.c | 406 ++++++++++++++----
 11 files changed, 564 insertions(+), 128 deletions(-)
 create mode 100644 net/unix/unix_bpf.c

-- 
2.27.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH bpf-next v5 01/11] sock_map: relax config dependency to CONFIG_NET
  2021-07-04 19:02 [PATCH bpf-next v5 00/11] sockmap: add sockmap support for unix datagram socket Cong Wang
@ 2021-07-04 19:02 ` Cong Wang
  2021-07-04 19:02 ` [PATCH bpf-next v5 02/11] sock_map: lift socket state restriction for datagram sockets Cong Wang
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Cong Wang @ 2021-07-04 19:02 UTC (permalink / raw)
  To: netdev
  Cc: bpf, Cong Wang, John Fastabend, Daniel Borkmann, Jakub Sitnicki,
	Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

Currently sock_map still has Kconfig dependency on CONFIG_INET,
but there is no actual functional dependency on it after we
introduce ->psock_update_sk_prot().

We have to extend it to CONFIG_NET now as we are going to
support AF_UNIX.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 include/linux/bpf.h | 38 ++++++++++++++++++++------------------
 kernel/bpf/Kconfig  |  2 +-
 net/core/Makefile   |  2 --
 3 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f309fc1509f2..401a6908ed3f 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1851,6 +1851,12 @@ void bpf_map_offload_map_free(struct bpf_map *map);
 int bpf_prog_test_run_syscall(struct bpf_prog *prog,
 			      const union bpf_attr *kattr,
 			      union bpf_attr __user *uattr);
+
+int sock_map_get_from_fd(const union bpf_attr *attr, struct bpf_prog *prog);
+int sock_map_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype);
+int sock_map_update_elem_sys(struct bpf_map *map, void *key, void *value, u64 flags);
+void sock_map_unhash(struct sock *sk);
+void sock_map_close(struct sock *sk, long timeout);
 #else
 static inline int bpf_prog_offload_init(struct bpf_prog *prog,
 					union bpf_attr *attr)
@@ -1883,24 +1889,6 @@ static inline int bpf_prog_test_run_syscall(struct bpf_prog *prog,
 {
 	return -ENOTSUPP;
 }
-#endif /* CONFIG_NET && CONFIG_BPF_SYSCALL */
-
-#if defined(CONFIG_INET) && defined(CONFIG_BPF_SYSCALL)
-int sock_map_get_from_fd(const union bpf_attr *attr, struct bpf_prog *prog);
-int sock_map_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype);
-int sock_map_update_elem_sys(struct bpf_map *map, void *key, void *value, u64 flags);
-void sock_map_unhash(struct sock *sk);
-void sock_map_close(struct sock *sk, long timeout);
-
-void bpf_sk_reuseport_detach(struct sock *sk);
-int bpf_fd_reuseport_array_lookup_elem(struct bpf_map *map, void *key,
-				       void *value);
-int bpf_fd_reuseport_array_update_elem(struct bpf_map *map, void *key,
-				       void *value, u64 map_flags);
-#else
-static inline void bpf_sk_reuseport_detach(struct sock *sk)
-{
-}
 
 #ifdef CONFIG_BPF_SYSCALL
 static inline int sock_map_get_from_fd(const union bpf_attr *attr,
@@ -1920,7 +1908,21 @@ static inline int sock_map_update_elem_sys(struct bpf_map *map, void *key, void
 {
 	return -EOPNOTSUPP;
 }
+#endif /* CONFIG_BPF_SYSCALL */
+#endif /* CONFIG_NET && CONFIG_BPF_SYSCALL */
 
+#if defined(CONFIG_INET) && defined(CONFIG_BPF_SYSCALL)
+void bpf_sk_reuseport_detach(struct sock *sk);
+int bpf_fd_reuseport_array_lookup_elem(struct bpf_map *map, void *key,
+				       void *value);
+int bpf_fd_reuseport_array_update_elem(struct bpf_map *map, void *key,
+				       void *value, u64 map_flags);
+#else
+static inline void bpf_sk_reuseport_detach(struct sock *sk)
+{
+}
+
+#ifdef CONFIG_BPF_SYSCALL
 static inline int bpf_fd_reuseport_array_lookup_elem(struct bpf_map *map,
 						     void *key, void *value)
 {
diff --git a/kernel/bpf/Kconfig b/kernel/bpf/Kconfig
index bd04f4a44c01..a82d6de86522 100644
--- a/kernel/bpf/Kconfig
+++ b/kernel/bpf/Kconfig
@@ -29,7 +29,7 @@ config BPF_SYSCALL
 	select IRQ_WORK
 	select TASKS_TRACE_RCU
 	select BINARY_PRINTF
-	select NET_SOCK_MSG if INET
+	select NET_SOCK_MSG if NET
 	default n
 	help
 	  Enable the bpf() system call that allows to manipulate BPF programs
diff --git a/net/core/Makefile b/net/core/Makefile
index f7f16650fe9e..35ced6201814 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -33,8 +33,6 @@ obj-$(CONFIG_HWBM) += hwbm.o
 obj-$(CONFIG_NET_DEVLINK) += devlink.o
 obj-$(CONFIG_GRO_CELLS) += gro_cells.o
 obj-$(CONFIG_FAILOVER) += failover.o
-ifeq ($(CONFIG_INET),y)
 obj-$(CONFIG_NET_SOCK_MSG) += skmsg.o
 obj-$(CONFIG_BPF_SYSCALL) += sock_map.o
-endif
 obj-$(CONFIG_BPF_SYSCALL) += bpf_sk_storage.o
-- 
2.27.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH bpf-next v5 02/11] sock_map: lift socket state restriction for datagram sockets
  2021-07-04 19:02 [PATCH bpf-next v5 00/11] sockmap: add sockmap support for unix datagram socket Cong Wang
  2021-07-04 19:02 ` [PATCH bpf-next v5 01/11] sock_map: relax config dependency to CONFIG_NET Cong Wang
@ 2021-07-04 19:02 ` Cong Wang
  2021-07-04 19:02 ` [PATCH bpf-next v5 03/11] af_unix: implement ->read_sock() for sockmap Cong Wang
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Cong Wang @ 2021-07-04 19:02 UTC (permalink / raw)
  To: netdev
  Cc: bpf, Cong Wang, John Fastabend, Daniel Borkmann, Jakub Sitnicki,
	Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

TCP and other connection oriented sockets have accept()
for each incoming connection on the server side, hence
they can just insert those fd's from accept() to sockmap,
which are of course established.

Now with datagram sockets begin to support sockmap and
redirection, the restriction is no longer applicable to
them, as they have no accept(). So we have to lift this
restriction for them. This is fine, because inside
bpf_sk_redirect_map() we still have another socket status
check, sock_map_redirect_allowed(), as a guard.

This also means they do not have to be removed from
sockmap when disconnecting.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 net/core/sock_map.c                           | 21 +------------------
 net/ipv4/udp_bpf.c                            |  1 -
 .../selftests/bpf/prog_tests/sockmap_listen.c |  8 ++++---
 3 files changed, 6 insertions(+), 24 deletions(-)

diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 60decd6420ca..3c427e7e6df9 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -211,8 +211,6 @@ static struct sk_psock *sock_map_psock_get_checked(struct sock *sk)
 	return psock;
 }
 
-static bool sock_map_redirect_allowed(const struct sock *sk);
-
 static int sock_map_link(struct bpf_map *map, struct sock *sk)
 {
 	struct sk_psock_progs *progs = sock_map_progs(map);
@@ -223,13 +221,6 @@ static int sock_map_link(struct bpf_map *map, struct sock *sk)
 	struct sk_psock *psock;
 	int ret;
 
-	/* Only sockets we can redirect into/from in BPF need to hold
-	 * refs to parser/verdict progs and have their sk_data_ready
-	 * and sk_write_space callbacks overridden.
-	 */
-	if (!sock_map_redirect_allowed(sk))
-		goto no_progs;
-
 	stream_verdict = READ_ONCE(progs->stream_verdict);
 	if (stream_verdict) {
 		stream_verdict = bpf_prog_inc_not_zero(stream_verdict);
@@ -264,7 +255,6 @@ static int sock_map_link(struct bpf_map *map, struct sock *sk)
 		}
 	}
 
-no_progs:
 	psock = sock_map_psock_get_checked(sk);
 	if (IS_ERR(psock)) {
 		ret = PTR_ERR(psock);
@@ -527,12 +517,6 @@ static bool sk_is_tcp(const struct sock *sk)
 	       sk->sk_protocol == IPPROTO_TCP;
 }
 
-static bool sk_is_udp(const struct sock *sk)
-{
-	return sk->sk_type == SOCK_DGRAM &&
-	       sk->sk_protocol == IPPROTO_UDP;
-}
-
 static bool sock_map_redirect_allowed(const struct sock *sk)
 {
 	if (sk_is_tcp(sk))
@@ -550,10 +534,7 @@ static bool sock_map_sk_state_allowed(const struct sock *sk)
 {
 	if (sk_is_tcp(sk))
 		return (1 << sk->sk_state) & (TCPF_ESTABLISHED | TCPF_LISTEN);
-	else if (sk_is_udp(sk))
-		return sk_hashed(sk);
-
-	return false;
+	return true;
 }
 
 static int sock_hash_update_common(struct bpf_map *map, void *key,
diff --git a/net/ipv4/udp_bpf.c b/net/ipv4/udp_bpf.c
index 45b8782aec0c..cb1d113ce6fd 100644
--- a/net/ipv4/udp_bpf.c
+++ b/net/ipv4/udp_bpf.c
@@ -112,7 +112,6 @@ static struct proto udp_bpf_prots[UDP_BPF_NUM_PROTS];
 static void udp_bpf_rebuild_protos(struct proto *prot, const struct proto *base)
 {
 	*prot        = *base;
-	prot->unhash = sock_map_unhash;
 	prot->close  = sock_map_close;
 	prot->recvmsg = udp_bpf_recvmsg;
 }
diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
index 515229f24a93..b8934ae694e5 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
@@ -351,9 +351,11 @@ static void test_insert_opened(int family, int sotype, int mapfd)
 	errno = 0;
 	value = s;
 	err = bpf_map_update_elem(mapfd, &key, &value, BPF_NOEXIST);
-	if (!err || errno != EOPNOTSUPP)
-		FAIL_ERRNO("map_update: expected EOPNOTSUPP");
-
+	if (sotype == SOCK_STREAM) {
+		if (!err || errno != EOPNOTSUPP)
+			FAIL_ERRNO("map_update: expected EOPNOTSUPP");
+	} else if (err)
+		FAIL_ERRNO("map_update: expected success");
 	xclose(s);
 }
 
-- 
2.27.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH bpf-next v5 03/11] af_unix: implement ->read_sock() for sockmap
  2021-07-04 19:02 [PATCH bpf-next v5 00/11] sockmap: add sockmap support for unix datagram socket Cong Wang
  2021-07-04 19:02 ` [PATCH bpf-next v5 01/11] sock_map: relax config dependency to CONFIG_NET Cong Wang
  2021-07-04 19:02 ` [PATCH bpf-next v5 02/11] sock_map: lift socket state restriction for datagram sockets Cong Wang
@ 2021-07-04 19:02 ` Cong Wang
  2021-07-12 17:04   ` John Fastabend
  2021-07-04 19:02 ` [PATCH bpf-next v5 04/11] af_unix: set TCP_ESTABLISHED for datagram sockets too Cong Wang
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 19+ messages in thread
From: Cong Wang @ 2021-07-04 19:02 UTC (permalink / raw)
  To: netdev
  Cc: bpf, Cong Wang, John Fastabend, Daniel Borkmann, Jakub Sitnicki,
	Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

Implement ->read_sock() for AF_UNIX datagram socket, it is
pretty much similar to udp_read_sock().

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 net/unix/af_unix.c | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 23c92ad15c61..38863468768a 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -669,6 +669,8 @@ static ssize_t unix_stream_splice_read(struct socket *,  loff_t *ppos,
 				       unsigned int flags);
 static int unix_dgram_sendmsg(struct socket *, struct msghdr *, size_t);
 static int unix_dgram_recvmsg(struct socket *, struct msghdr *, size_t, int);
+static int unix_read_sock(struct sock *sk, read_descriptor_t *desc,
+			  sk_read_actor_t recv_actor);
 static int unix_dgram_connect(struct socket *, struct sockaddr *,
 			      int, int);
 static int unix_seqpacket_sendmsg(struct socket *, struct msghdr *, size_t);
@@ -746,6 +748,7 @@ static const struct proto_ops unix_dgram_ops = {
 	.listen =	sock_no_listen,
 	.shutdown =	unix_shutdown,
 	.sendmsg =	unix_dgram_sendmsg,
+	.read_sock =	unix_read_sock,
 	.recvmsg =	unix_dgram_recvmsg,
 	.mmap =		sock_no_mmap,
 	.sendpage =	sock_no_sendpage,
@@ -2188,6 +2191,40 @@ static int unix_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
 	return err;
 }
 
+static int unix_read_sock(struct sock *sk, read_descriptor_t *desc,
+			  sk_read_actor_t recv_actor)
+{
+	int copied = 0;
+
+	while (1) {
+		struct unix_sock *u = unix_sk(sk);
+		struct sk_buff *skb;
+		int used, err;
+
+		mutex_lock(&u->iolock);
+		skb = skb_recv_datagram(sk, 0, 1, &err);
+		mutex_unlock(&u->iolock);
+		if (!skb)
+			return err;
+
+		used = recv_actor(desc, skb, 0, skb->len);
+		if (used <= 0) {
+			if (!copied)
+				copied = used;
+			kfree_skb(skb);
+			break;
+		} else if (used <= skb->len) {
+			copied += used;
+		}
+
+		kfree_skb(skb);
+		if (!desc->count)
+			break;
+	}
+
+	return copied;
+}
+
 /*
  *	Sleep until more data has arrived. But check for races..
  */
-- 
2.27.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH bpf-next v5 04/11] af_unix: set TCP_ESTABLISHED for datagram sockets too
  2021-07-04 19:02 [PATCH bpf-next v5 00/11] sockmap: add sockmap support for unix datagram socket Cong Wang
                   ` (2 preceding siblings ...)
  2021-07-04 19:02 ` [PATCH bpf-next v5 03/11] af_unix: implement ->read_sock() for sockmap Cong Wang
@ 2021-07-04 19:02 ` Cong Wang
  2021-07-04 19:02 ` [PATCH bpf-next v5 05/11] af_unix: add a dummy ->close() for sockmap Cong Wang
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Cong Wang @ 2021-07-04 19:02 UTC (permalink / raw)
  To: netdev
  Cc: bpf, Cong Wang, John Fastabend, Daniel Borkmann, Jakub Sitnicki,
	Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

Currently only unix stream socket sets TCP_ESTABLISHED,
datagram socket can set this too when they connect to its
peer socket. At least __ip4_datagram_connect() does the same.

This will be used to determine whether an AF_UNIX datagram
socket can be redirected to in sockmap.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 net/unix/af_unix.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 38863468768a..77fb3910e1c3 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -494,6 +494,7 @@ static void unix_dgram_disconnected(struct sock *sk, struct sock *other)
 			sk_error_report(other);
 		}
 	}
+	sk->sk_state = other->sk_state = TCP_CLOSE;
 }
 
 static void unix_sock_destructor(struct sock *sk)
@@ -1202,6 +1203,9 @@ static int unix_dgram_connect(struct socket *sock, struct sockaddr *addr,
 		unix_peer(sk) = other;
 		unix_state_double_unlock(sk, other);
 	}
+
+	if (unix_peer(sk))
+		sk->sk_state = other->sk_state = TCP_ESTABLISHED;
 	return 0;
 
 out_unlock:
@@ -1434,12 +1438,10 @@ static int unix_socketpair(struct socket *socka, struct socket *sockb)
 	init_peercred(ska);
 	init_peercred(skb);
 
-	if (ska->sk_type != SOCK_DGRAM) {
-		ska->sk_state = TCP_ESTABLISHED;
-		skb->sk_state = TCP_ESTABLISHED;
-		socka->state  = SS_CONNECTED;
-		sockb->state  = SS_CONNECTED;
-	}
+	ska->sk_state = TCP_ESTABLISHED;
+	skb->sk_state = TCP_ESTABLISHED;
+	socka->state  = SS_CONNECTED;
+	sockb->state  = SS_CONNECTED;
 	return 0;
 }
 
-- 
2.27.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH bpf-next v5 05/11] af_unix: add a dummy ->close() for sockmap
  2021-07-04 19:02 [PATCH bpf-next v5 00/11] sockmap: add sockmap support for unix datagram socket Cong Wang
                   ` (3 preceding siblings ...)
  2021-07-04 19:02 ` [PATCH bpf-next v5 04/11] af_unix: set TCP_ESTABLISHED for datagram sockets too Cong Wang
@ 2021-07-04 19:02 ` Cong Wang
  2021-07-04 19:02 ` [PATCH bpf-next v5 06/11] af_unix: implement ->psock_update_sk_prot() Cong Wang
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Cong Wang @ 2021-07-04 19:02 UTC (permalink / raw)
  To: netdev
  Cc: bpf, Cong Wang, John Fastabend, Daniel Borkmann, Jakub Sitnicki,
	Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

Unlike af_inet, unix_proto is very different, it does not even
have a ->close(). We have to add a dummy implementation to
satisfy sockmap. Normally it is just a nop, it is introduced only
for sockmap to replace it.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 net/unix/af_unix.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 77fb3910e1c3..875eeaaddc07 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -781,10 +781,18 @@ static const struct proto_ops unix_seqpacket_ops = {
 	.show_fdinfo =	unix_show_fdinfo,
 };
 
+static void unix_close(struct sock *sk, long timeout)
+{
+	/* Nothing to do here, unix socket does not need a ->close().
+	 * This is merely for sockmap.
+	 */
+}
+
 static struct proto unix_proto = {
 	.name			= "UNIX",
 	.owner			= THIS_MODULE,
 	.obj_size		= sizeof(struct unix_sock),
+	.close			= unix_close,
 };
 
 static struct sock *unix_create1(struct net *net, struct socket *sock, int kern)
@@ -868,6 +876,7 @@ static int unix_release(struct socket *sock)
 	if (!sk)
 		return 0;
 
+	sk->sk_prot->close(sk, 0);
 	unix_release_sock(sk, 0);
 	sock->sk = NULL;
 
-- 
2.27.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH bpf-next v5 06/11] af_unix: implement ->psock_update_sk_prot()
  2021-07-04 19:02 [PATCH bpf-next v5 00/11] sockmap: add sockmap support for unix datagram socket Cong Wang
                   ` (4 preceding siblings ...)
  2021-07-04 19:02 ` [PATCH bpf-next v5 05/11] af_unix: add a dummy ->close() for sockmap Cong Wang
@ 2021-07-04 19:02 ` Cong Wang
  2021-07-04 19:02 ` [PATCH bpf-next v5 07/11] af_unix: implement unix_dgram_bpf_recvmsg() Cong Wang
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Cong Wang @ 2021-07-04 19:02 UTC (permalink / raw)
  To: netdev
  Cc: bpf, Cong Wang, John Fastabend, Daniel Borkmann, Jakub Sitnicki,
	Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

Now we can implement unix_bpf_update_proto() to update
sk_prot, especially prot->close().

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 MAINTAINERS           |  1 +
 include/net/af_unix.h | 10 +++++++++
 net/core/sock_map.c   |  1 +
 net/unix/Makefile     |  1 +
 net/unix/af_unix.c    |  6 +++++-
 net/unix/unix_bpf.c   | 47 +++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 65 insertions(+), 1 deletion(-)
 create mode 100644 net/unix/unix_bpf.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 88449b7a4c95..2c793df1d873 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10277,6 +10277,7 @@ F:	net/core/skmsg.c
 F:	net/core/sock_map.c
 F:	net/ipv4/tcp_bpf.c
 F:	net/ipv4/udp_bpf.c
+F:	net/unix/unix_bpf.c
 
 LANDLOCK SECURITY MODULE
 M:	Mickaël Salaün <mic@digikod.net>
diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index f42fdddecd41..cca645846af1 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -89,4 +89,14 @@ void unix_sysctl_unregister(struct net *net);
 static inline int unix_sysctl_register(struct net *net) { return 0; }
 static inline void unix_sysctl_unregister(struct net *net) {}
 #endif
+
+#ifdef CONFIG_BPF_SYSCALL
+extern struct proto unix_proto;
+
+int unix_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore);
+void __init unix_bpf_build_proto(void);
+#else
+static inline void __init unix_bpf_build_proto(void)
+{}
+#endif
 #endif
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 3c427e7e6df9..ae5fa4338d9c 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -1517,6 +1517,7 @@ void sock_map_close(struct sock *sk, long timeout)
 	release_sock(sk);
 	saved_close(sk, timeout);
 }
+EXPORT_SYMBOL_GPL(sock_map_close);
 
 static int sock_map_iter_attach_target(struct bpf_prog *prog,
 				       union bpf_iter_link_info *linfo,
diff --git a/net/unix/Makefile b/net/unix/Makefile
index 54e58cc4f945..20491825b4d0 100644
--- a/net/unix/Makefile
+++ b/net/unix/Makefile
@@ -7,6 +7,7 @@ obj-$(CONFIG_UNIX)	+= unix.o
 
 unix-y			:= af_unix.o garbage.o
 unix-$(CONFIG_SYSCTL)	+= sysctl_net_unix.o
+unix-$(CONFIG_BPF_SYSCALL) += unix_bpf.o
 
 obj-$(CONFIG_UNIX_DIAG)	+= unix_diag.o
 unix_diag-y		:= diag.o
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 875eeaaddc07..573253c5b5c2 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -788,11 +788,14 @@ static void unix_close(struct sock *sk, long timeout)
 	 */
 }
 
-static struct proto unix_proto = {
+struct proto unix_proto = {
 	.name			= "UNIX",
 	.owner			= THIS_MODULE,
 	.obj_size		= sizeof(struct unix_sock),
 	.close			= unix_close,
+#ifdef CONFIG_BPF_SYSCALL
+	.psock_update_sk_prot	= unix_bpf_update_proto,
+#endif
 };
 
 static struct sock *unix_create1(struct net *net, struct socket *sock, int kern)
@@ -2973,6 +2976,7 @@ static int __init af_unix_init(void)
 
 	sock_register(&unix_family_ops);
 	register_pernet_subsys(&unix_net_ops);
+	unix_bpf_build_proto();
 out:
 	return rc;
 }
diff --git a/net/unix/unix_bpf.c b/net/unix/unix_bpf.c
new file mode 100644
index 000000000000..b1582a659427
--- /dev/null
+++ b/net/unix/unix_bpf.c
@@ -0,0 +1,47 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Cong Wang <cong.wang@bytedance.com> */
+
+#include <linux/skmsg.h>
+#include <linux/bpf.h>
+#include <net/sock.h>
+#include <net/af_unix.h>
+
+static struct proto *unix_prot_saved __read_mostly;
+static DEFINE_SPINLOCK(unix_prot_lock);
+static struct proto unix_bpf_prot;
+
+static void unix_bpf_rebuild_protos(struct proto *prot, const struct proto *base)
+{
+	*prot        = *base;
+	prot->close  = sock_map_close;
+}
+
+static void unix_bpf_check_needs_rebuild(struct proto *ops)
+{
+	if (unlikely(ops != smp_load_acquire(&unix_prot_saved))) {
+		spin_lock_bh(&unix_prot_lock);
+		if (likely(ops != unix_prot_saved)) {
+			unix_bpf_rebuild_protos(&unix_bpf_prot, ops);
+			smp_store_release(&unix_prot_saved, ops);
+		}
+		spin_unlock_bh(&unix_prot_lock);
+	}
+}
+
+int unix_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore)
+{
+	if (restore) {
+		sk->sk_write_space = psock->saved_write_space;
+		WRITE_ONCE(sk->sk_prot, psock->sk_proto);
+		return 0;
+	}
+
+	unix_bpf_check_needs_rebuild(psock->sk_proto);
+	WRITE_ONCE(sk->sk_prot, &unix_bpf_prot);
+	return 0;
+}
+
+void __init unix_bpf_build_proto(void)
+{
+	unix_bpf_rebuild_protos(&unix_bpf_prot, &unix_proto);
+}
-- 
2.27.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH bpf-next v5 07/11] af_unix: implement unix_dgram_bpf_recvmsg()
  2021-07-04 19:02 [PATCH bpf-next v5 00/11] sockmap: add sockmap support for unix datagram socket Cong Wang
                   ` (5 preceding siblings ...)
  2021-07-04 19:02 ` [PATCH bpf-next v5 06/11] af_unix: implement ->psock_update_sk_prot() Cong Wang
@ 2021-07-04 19:02 ` Cong Wang
  2021-07-18 17:49   ` Eric Dumazet
  2021-07-04 19:02 ` [PATCH bpf-next v5 08/11] selftests/bpf: factor out udp_socketpair() Cong Wang
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 19+ messages in thread
From: Cong Wang @ 2021-07-04 19:02 UTC (permalink / raw)
  To: netdev
  Cc: bpf, Cong Wang, John Fastabend, Daniel Borkmann, Jakub Sitnicki,
	Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

We have to implement unix_dgram_bpf_recvmsg() to replace the
original ->recvmsg() to retrieve skmsg from ingress_msg.

AF_UNIX is again special here because the lack of
sk_prot->recvmsg(). I simply add a special case inside
unix_dgram_recvmsg() to call sk->sk_prot->recvmsg() directly.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 include/net/af_unix.h |  2 ++
 net/unix/af_unix.c    | 19 +++++++++--
 net/unix/unix_bpf.c   | 75 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 93 insertions(+), 3 deletions(-)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index cca645846af1..435a2c3d5a6f 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -82,6 +82,8 @@ static inline struct unix_sock *unix_sk(const struct sock *sk)
 long unix_inq_len(struct sock *sk);
 long unix_outq_len(struct sock *sk);
 
+int __unix_dgram_recvmsg(struct sock *sk, struct msghdr *msg, size_t size,
+			 int flags);
 #ifdef CONFIG_SYSCTL
 int unix_sysctl_register(struct net *net);
 void unix_sysctl_unregister(struct net *net);
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 573253c5b5c2..89927678c0dc 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2098,11 +2098,11 @@ static void unix_copy_addr(struct msghdr *msg, struct sock *sk)
 	}
 }
 
-static int unix_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
-			      size_t size, int flags)
+int __unix_dgram_recvmsg(struct sock *sk, struct msghdr *msg, size_t size,
+			 int flags)
 {
 	struct scm_cookie scm;
-	struct sock *sk = sock->sk;
+	struct socket *sock = sk->sk_socket;
 	struct unix_sock *u = unix_sk(sk);
 	struct sk_buff *skb, *last;
 	long timeo;
@@ -2205,6 +2205,19 @@ static int unix_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
 	return err;
 }
 
+static int unix_dgram_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
+			      int flags)
+{
+	struct sock *sk = sock->sk;
+
+#ifdef CONFIG_BPF_SYSCALL
+	if (sk->sk_prot != &unix_proto)
+		return sk->sk_prot->recvmsg(sk, msg, size, flags & MSG_DONTWAIT,
+					    flags & ~MSG_DONTWAIT, NULL);
+#endif
+	return __unix_dgram_recvmsg(sk, msg, size, flags);
+}
+
 static int unix_read_sock(struct sock *sk, read_descriptor_t *desc,
 			  sk_read_actor_t recv_actor)
 {
diff --git a/net/unix/unix_bpf.c b/net/unix/unix_bpf.c
index b1582a659427..db0cda29fb2f 100644
--- a/net/unix/unix_bpf.c
+++ b/net/unix/unix_bpf.c
@@ -6,6 +6,80 @@
 #include <net/sock.h>
 #include <net/af_unix.h>
 
+#define unix_sk_has_data(__sk, __psock)					\
+		({	!skb_queue_empty(&__sk->sk_receive_queue) ||	\
+			!skb_queue_empty(&__psock->ingress_skb) ||	\
+			!list_empty(&__psock->ingress_msg);		\
+		})
+
+static int unix_msg_wait_data(struct sock *sk, struct sk_psock *psock,
+			      long timeo)
+{
+	DEFINE_WAIT_FUNC(wait, woken_wake_function);
+	struct unix_sock *u = unix_sk(sk);
+	int ret = 0;
+
+	if (sk->sk_shutdown & RCV_SHUTDOWN)
+		return 1;
+
+	if (!timeo)
+		return ret;
+
+	add_wait_queue(sk_sleep(sk), &wait);
+	sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk);
+	if (!unix_sk_has_data(sk, psock)) {
+		mutex_unlock(&u->iolock);
+		wait_woken(&wait, TASK_INTERRUPTIBLE, timeo);
+		mutex_lock(&u->iolock);
+		ret = unix_sk_has_data(sk, psock);
+	}
+	sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk);
+	remove_wait_queue(sk_sleep(sk), &wait);
+	return ret;
+}
+
+static int unix_dgram_bpf_recvmsg(struct sock *sk, struct msghdr *msg,
+				  size_t len, int nonblock, int flags,
+				  int *addr_len)
+{
+	struct unix_sock *u = unix_sk(sk);
+	struct sk_psock *psock;
+	int copied, ret;
+
+	psock = sk_psock_get(sk);
+	if (unlikely(!psock))
+		return __unix_dgram_recvmsg(sk, msg, len, flags);
+
+	mutex_lock(&u->iolock);
+	if (!skb_queue_empty(&sk->sk_receive_queue) &&
+	    sk_psock_queue_empty(psock)) {
+		ret = __unix_dgram_recvmsg(sk, msg, len, flags);
+		goto out;
+	}
+
+msg_bytes_ready:
+	copied = sk_msg_recvmsg(sk, psock, msg, len, flags);
+	if (!copied) {
+		long timeo;
+		int data;
+
+		timeo = sock_rcvtimeo(sk, nonblock);
+		data = unix_msg_wait_data(sk, psock, timeo);
+		if (data) {
+			if (!sk_psock_queue_empty(psock))
+				goto msg_bytes_ready;
+			ret = __unix_dgram_recvmsg(sk, msg, len, flags);
+			goto out;
+		}
+		copied = -EAGAIN;
+	}
+	ret = copied;
+out:
+	mutex_unlock(&u->iolock);
+	sk_psock_put(sk, psock);
+	return ret;
+}
+
 static struct proto *unix_prot_saved __read_mostly;
 static DEFINE_SPINLOCK(unix_prot_lock);
 static struct proto unix_bpf_prot;
@@ -14,6 +88,7 @@ static void unix_bpf_rebuild_protos(struct proto *prot, const struct proto *base
 {
 	*prot        = *base;
 	prot->close  = sock_map_close;
+	prot->recvmsg = unix_dgram_bpf_recvmsg;
 }
 
 static void unix_bpf_check_needs_rebuild(struct proto *ops)
-- 
2.27.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH bpf-next v5 08/11] selftests/bpf: factor out udp_socketpair()
  2021-07-04 19:02 [PATCH bpf-next v5 00/11] sockmap: add sockmap support for unix datagram socket Cong Wang
                   ` (6 preceding siblings ...)
  2021-07-04 19:02 ` [PATCH bpf-next v5 07/11] af_unix: implement unix_dgram_bpf_recvmsg() Cong Wang
@ 2021-07-04 19:02 ` Cong Wang
  2021-07-04 19:02 ` [PATCH bpf-next v5 09/11] selftests/bpf: factor out add_to_sockmap() Cong Wang
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Cong Wang @ 2021-07-04 19:02 UTC (permalink / raw)
  To: netdev
  Cc: bpf, Cong Wang, John Fastabend, Daniel Borkmann, Jakub Sitnicki,
	Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

Factor out a common helper udp_socketpair() which creates
a pair of connected UDP sockets.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 .../selftests/bpf/prog_tests/sockmap_listen.c | 78 ++++++++++---------
 1 file changed, 40 insertions(+), 38 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
index b8934ae694e5..52d11959e05b 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
@@ -1605,33 +1605,27 @@ static void test_reuseport(struct test_sockmap_listen *skel,
 	}
 }
 
-static void udp_redir_to_connected(int family, int sotype, int sock_mapfd,
-				   int verd_mapfd, enum redir_mode mode)
+static int udp_socketpair(int family, int *s, int *c)
 {
-	const char *log_prefix = redir_mode_str(mode);
 	struct sockaddr_storage addr;
-	int c0, c1, p0, p1;
-	unsigned int pass;
-	int retries = 100;
 	socklen_t len;
-	int err, n;
-	u64 value;
-	u32 key;
-	char b;
-
-	zero_verdict_count(verd_mapfd);
+	int p0, c0;
+	int err;
 
-	p0 = socket_loopback(family, sotype | SOCK_NONBLOCK);
+	p0 = socket_loopback(family, SOCK_DGRAM | SOCK_NONBLOCK);
 	if (p0 < 0)
-		return;
+		return p0;
+
 	len = sizeof(addr);
 	err = xgetsockname(p0, sockaddr(&addr), &len);
 	if (err)
 		goto close_peer0;
 
-	c0 = xsocket(family, sotype | SOCK_NONBLOCK, 0);
-	if (c0 < 0)
+	c0 = xsocket(family, SOCK_DGRAM | SOCK_NONBLOCK, 0);
+	if (c0 < 0) {
+		err = c0;
 		goto close_peer0;
+	}
 	err = xconnect(c0, sockaddr(&addr), len);
 	if (err)
 		goto close_cli0;
@@ -1642,25 +1636,37 @@ static void udp_redir_to_connected(int family, int sotype, int sock_mapfd,
 	if (err)
 		goto close_cli0;
 
-	p1 = socket_loopback(family, sotype | SOCK_NONBLOCK);
-	if (p1 < 0)
-		goto close_cli0;
-	err = xgetsockname(p1, sockaddr(&addr), &len);
-	if (err)
-		goto close_cli0;
+	*s = p0;
+	*c = c0;
+	return 0;
 
-	c1 = xsocket(family, sotype | SOCK_NONBLOCK, 0);
-	if (c1 < 0)
-		goto close_peer1;
-	err = xconnect(c1, sockaddr(&addr), len);
-	if (err)
-		goto close_cli1;
-	err = xgetsockname(c1, sockaddr(&addr), &len);
+close_cli0:
+	xclose(c0);
+close_peer0:
+	xclose(p0);
+	return err;
+}
+
+static void udp_redir_to_connected(int family, int sock_mapfd, int verd_mapfd,
+				   enum redir_mode mode)
+{
+	const char *log_prefix = redir_mode_str(mode);
+	int c0, c1, p0, p1;
+	unsigned int pass;
+	int retries = 100;
+	int err, n;
+	u64 value;
+	u32 key;
+	char b;
+
+	zero_verdict_count(verd_mapfd);
+
+	err = udp_socketpair(family, &p0, &c0);
 	if (err)
-		goto close_cli1;
-	err = xconnect(p1, sockaddr(&addr), len);
+		return;
+	err = udp_socketpair(family, &p1, &c1);
 	if (err)
-		goto close_cli1;
+		goto close_cli0;
 
 	key = 0;
 	value = p0;
@@ -1701,11 +1707,9 @@ static void udp_redir_to_connected(int family, int sotype, int sock_mapfd,
 
 close_cli1:
 	xclose(c1);
-close_peer1:
 	xclose(p1);
 close_cli0:
 	xclose(c0);
-close_peer0:
 	xclose(p0);
 }
 
@@ -1722,11 +1726,9 @@ static void udp_skb_redir_to_connected(struct test_sockmap_listen *skel,
 		return;
 
 	skel->bss->test_ingress = false;
-	udp_redir_to_connected(family, SOCK_DGRAM, sock_map, verdict_map,
-			       REDIR_EGRESS);
+	udp_redir_to_connected(family, sock_map, verdict_map, REDIR_EGRESS);
 	skel->bss->test_ingress = true;
-	udp_redir_to_connected(family, SOCK_DGRAM, sock_map, verdict_map,
-			       REDIR_INGRESS);
+	udp_redir_to_connected(family, sock_map, verdict_map, REDIR_INGRESS);
 
 	xbpf_prog_detach2(verdict, sock_map, BPF_SK_SKB_VERDICT);
 }
-- 
2.27.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH bpf-next v5 09/11] selftests/bpf: factor out add_to_sockmap()
  2021-07-04 19:02 [PATCH bpf-next v5 00/11] sockmap: add sockmap support for unix datagram socket Cong Wang
                   ` (7 preceding siblings ...)
  2021-07-04 19:02 ` [PATCH bpf-next v5 08/11] selftests/bpf: factor out udp_socketpair() Cong Wang
@ 2021-07-04 19:02 ` Cong Wang
  2021-07-04 19:02 ` [PATCH bpf-next v5 10/11] selftests/bpf: add a test case for unix sockmap Cong Wang
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Cong Wang @ 2021-07-04 19:02 UTC (permalink / raw)
  To: netdev
  Cc: bpf, Cong Wang, John Fastabend, Daniel Borkmann, Jakub Sitnicki,
	Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

Factor out a common helper add_to_sockmap() which adds two
sockets into a sockmap.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 .../selftests/bpf/prog_tests/sockmap_listen.c | 59 +++++++------------
 1 file changed, 21 insertions(+), 38 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
index 52d11959e05b..a023a824af78 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
@@ -921,6 +921,23 @@ static const char *redir_mode_str(enum redir_mode mode)
 	}
 }
 
+static int add_to_sockmap(int sock_mapfd, int fd1, int fd2)
+{
+	u64 value;
+	u32 key;
+	int err;
+
+	key = 0;
+	value = fd1;
+	err = xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST);
+	if (err)
+		return err;
+
+	key = 1;
+	value = fd2;
+	return xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST);
+}
+
 static void redir_to_connected(int family, int sotype, int sock_mapfd,
 			       int verd_mapfd, enum redir_mode mode)
 {
@@ -930,7 +947,6 @@ static void redir_to_connected(int family, int sotype, int sock_mapfd,
 	unsigned int pass;
 	socklen_t len;
 	int err, n;
-	u64 value;
 	u32 key;
 	char b;
 
@@ -967,15 +983,7 @@ static void redir_to_connected(int family, int sotype, int sock_mapfd,
 	if (p1 < 0)
 		goto close_cli1;
 
-	key = 0;
-	value = p0;
-	err = xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST);
-	if (err)
-		goto close_peer1;
-
-	key = 1;
-	value = p1;
-	err = xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST);
+	err = add_to_sockmap(sock_mapfd, p0, p1);
 	if (err)
 		goto close_peer1;
 
@@ -1063,7 +1071,6 @@ static void redir_to_listening(int family, int sotype, int sock_mapfd,
 	int s, c, p, err, n;
 	unsigned int drop;
 	socklen_t len;
-	u64 value;
 	u32 key;
 
 	zero_verdict_count(verd_mapfd);
@@ -1088,15 +1095,7 @@ static void redir_to_listening(int family, int sotype, int sock_mapfd,
 	if (p < 0)
 		goto close_cli;
 
-	key = 0;
-	value = s;
-	err = xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST);
-	if (err)
-		goto close_peer;
-
-	key = 1;
-	value = p;
-	err = xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST);
+	err = add_to_sockmap(sock_mapfd, s, p);
 	if (err)
 		goto close_peer;
 
@@ -1348,7 +1347,6 @@ static void test_reuseport_mixed_groups(int family, int sotype, int sock_map,
 	int s1, s2, c, err;
 	unsigned int drop;
 	socklen_t len;
-	u64 value;
 	u32 key;
 
 	zero_verdict_count(verd_map);
@@ -1362,16 +1360,10 @@ static void test_reuseport_mixed_groups(int family, int sotype, int sock_map,
 	if (s2 < 0)
 		goto close_srv1;
 
-	key = 0;
-	value = s1;
-	err = xbpf_map_update_elem(sock_map, &key, &value, BPF_NOEXIST);
+	err = add_to_sockmap(sock_map, s1, s2);
 	if (err)
 		goto close_srv2;
 
-	key = 1;
-	value = s2;
-	err = xbpf_map_update_elem(sock_map, &key, &value, BPF_NOEXIST);
-
 	/* Connect to s2, reuseport BPF selects s1 via sock_map[0] */
 	len = sizeof(addr);
 	err = xgetsockname(s2, sockaddr(&addr), &len);
@@ -1655,7 +1647,6 @@ static void udp_redir_to_connected(int family, int sock_mapfd, int verd_mapfd,
 	unsigned int pass;
 	int retries = 100;
 	int err, n;
-	u64 value;
 	u32 key;
 	char b;
 
@@ -1668,15 +1659,7 @@ static void udp_redir_to_connected(int family, int sock_mapfd, int verd_mapfd,
 	if (err)
 		goto close_cli0;
 
-	key = 0;
-	value = p0;
-	err = xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST);
-	if (err)
-		goto close_cli1;
-
-	key = 1;
-	value = p1;
-	err = xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST);
+	err = add_to_sockmap(sock_mapfd, p0, p1);
 	if (err)
 		goto close_cli1;
 
-- 
2.27.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH bpf-next v5 10/11] selftests/bpf: add a test case for unix sockmap
  2021-07-04 19:02 [PATCH bpf-next v5 00/11] sockmap: add sockmap support for unix datagram socket Cong Wang
                   ` (8 preceding siblings ...)
  2021-07-04 19:02 ` [PATCH bpf-next v5 09/11] selftests/bpf: factor out add_to_sockmap() Cong Wang
@ 2021-07-04 19:02 ` Cong Wang
  2021-07-04 19:02 ` [PATCH bpf-next v5 11/11] selftests/bpf: add test cases for redirection between udp and unix Cong Wang
  2021-07-12 17:02 ` [PATCH bpf-next v5 00/11] sockmap: add sockmap support for unix datagram socket John Fastabend
  11 siblings, 0 replies; 19+ messages in thread
From: Cong Wang @ 2021-07-04 19:02 UTC (permalink / raw)
  To: netdev
  Cc: bpf, Cong Wang, John Fastabend, Daniel Borkmann, Jakub Sitnicki,
	Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

Add a test case to ensure redirection between two AF_UNIX
datagram sockets work.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 .../selftests/bpf/prog_tests/sockmap_listen.c | 97 +++++++++++++++++++
 1 file changed, 97 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
index a023a824af78..b6464be89f1a 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
@@ -1435,6 +1435,8 @@ static const char *family_str(sa_family_t family)
 		return "IPv4";
 	case AF_INET6:
 		return "IPv6";
+	case AF_UNIX:
+		return "Unix";
 	default:
 		return "unknown";
 	}
@@ -1557,6 +1559,99 @@ static void test_redir(struct test_sockmap_listen *skel, struct bpf_map *map,
 	}
 }
 
+static void unix_redir_to_connected(int sotype, int sock_mapfd,
+			       int verd_mapfd, enum redir_mode mode)
+{
+	const char *log_prefix = redir_mode_str(mode);
+	int c0, c1, p0, p1;
+	unsigned int pass;
+	int retries = 100;
+	int err, n;
+	int sfd[2];
+	u32 key;
+	char b;
+
+	zero_verdict_count(verd_mapfd);
+
+	if (socketpair(AF_UNIX, sotype | SOCK_NONBLOCK, 0, sfd))
+		return;
+	c0 = sfd[0], p0 = sfd[1];
+
+	if (socketpair(AF_UNIX, sotype | SOCK_NONBLOCK, 0, sfd))
+		goto close0;
+	c1 = sfd[0], p1 = sfd[1];
+
+	err = add_to_sockmap(sock_mapfd, p0, p1);
+	if (err)
+		goto close;
+
+	n = write(c1, "a", 1);
+	if (n < 0)
+		FAIL_ERRNO("%s: write", log_prefix);
+	if (n == 0)
+		FAIL("%s: incomplete write", log_prefix);
+	if (n < 1)
+		goto close;
+
+	key = SK_PASS;
+	err = xbpf_map_lookup_elem(verd_mapfd, &key, &pass);
+	if (err)
+		goto close;
+	if (pass != 1)
+		FAIL("%s: want pass count 1, have %d", log_prefix, pass);
+
+again:
+	n = read(mode == REDIR_INGRESS ? p0 : c0, &b, 1);
+	if (n < 0) {
+		if (errno == EAGAIN && retries--)
+			goto again;
+		FAIL_ERRNO("%s: read", log_prefix);
+	}
+	if (n == 0)
+		FAIL("%s: incomplete read", log_prefix);
+
+close:
+	xclose(c1);
+	xclose(p1);
+close0:
+	xclose(c0);
+	xclose(p0);
+}
+
+static void unix_skb_redir_to_connected(struct test_sockmap_listen *skel,
+					struct bpf_map *inner_map, int sotype)
+{
+	int verdict = bpf_program__fd(skel->progs.prog_skb_verdict);
+	int verdict_map = bpf_map__fd(skel->maps.verdict_map);
+	int sock_map = bpf_map__fd(inner_map);
+	int err;
+
+	err = xbpf_prog_attach(verdict, sock_map, BPF_SK_SKB_VERDICT, 0);
+	if (err)
+		return;
+
+	skel->bss->test_ingress = false;
+	unix_redir_to_connected(sotype, sock_map, verdict_map, REDIR_EGRESS);
+	skel->bss->test_ingress = true;
+	unix_redir_to_connected(sotype, sock_map, verdict_map, REDIR_INGRESS);
+
+	xbpf_prog_detach2(verdict, sock_map, BPF_SK_SKB_VERDICT);
+}
+
+static void test_unix_redir(struct test_sockmap_listen *skel, struct bpf_map *map,
+			    int sotype)
+{
+	const char *family_name, *map_name;
+	char s[MAX_TEST_NAME];
+
+	family_name = family_str(AF_UNIX);
+	map_name = map_type_str(map);
+	snprintf(s, sizeof(s), "%s %s %s", map_name, family_name, __func__);
+	if (!test__start_subtest(s))
+		return;
+	unix_skb_redir_to_connected(skel, map, sotype);
+}
+
 static void test_reuseport(struct test_sockmap_listen *skel,
 			   struct bpf_map *map, int family, int sotype)
 {
@@ -1754,10 +1849,12 @@ void test_sockmap_listen(void)
 	skel->bss->test_sockmap = true;
 	run_tests(skel, skel->maps.sock_map, AF_INET);
 	run_tests(skel, skel->maps.sock_map, AF_INET6);
+	test_unix_redir(skel, skel->maps.sock_map, SOCK_DGRAM);
 
 	skel->bss->test_sockmap = false;
 	run_tests(skel, skel->maps.sock_hash, AF_INET);
 	run_tests(skel, skel->maps.sock_hash, AF_INET6);
+	test_unix_redir(skel, skel->maps.sock_hash, SOCK_DGRAM);
 
 	test_sockmap_listen__destroy(skel);
 }
-- 
2.27.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH bpf-next v5 11/11] selftests/bpf: add test cases for redirection between udp and unix
  2021-07-04 19:02 [PATCH bpf-next v5 00/11] sockmap: add sockmap support for unix datagram socket Cong Wang
                   ` (9 preceding siblings ...)
  2021-07-04 19:02 ` [PATCH bpf-next v5 10/11] selftests/bpf: add a test case for unix sockmap Cong Wang
@ 2021-07-04 19:02 ` Cong Wang
  2021-08-05 22:43   ` Andrii Nakryiko
  2021-07-12 17:02 ` [PATCH bpf-next v5 00/11] sockmap: add sockmap support for unix datagram socket John Fastabend
  11 siblings, 1 reply; 19+ messages in thread
From: Cong Wang @ 2021-07-04 19:02 UTC (permalink / raw)
  To: netdev
  Cc: bpf, Cong Wang, John Fastabend, Daniel Borkmann, Jakub Sitnicki,
	Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

Add two test cases to ensure redirection between udp and unix
work bidirectionally.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 .../selftests/bpf/prog_tests/sockmap_listen.c | 170 ++++++++++++++++++
 1 file changed, 170 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
index b6464be89f1a..a9f1bf9d5dff 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
@@ -1825,6 +1825,175 @@ static void test_udp_redir(struct test_sockmap_listen *skel, struct bpf_map *map
 	udp_skb_redir_to_connected(skel, map, family);
 }
 
+static void udp_unix_redir_to_connected(int family, int sock_mapfd,
+					int verd_mapfd, enum redir_mode mode)
+{
+	const char *log_prefix = redir_mode_str(mode);
+	int c0, c1, p0, p1;
+	unsigned int pass;
+	int retries = 100;
+	int err, n;
+	int sfd[2];
+	u32 key;
+	char b;
+
+	zero_verdict_count(verd_mapfd);
+
+	if (socketpair(AF_UNIX, SOCK_DGRAM | SOCK_NONBLOCK, 0, sfd))
+		return;
+	c0 = sfd[0], p0 = sfd[1];
+
+	err = udp_socketpair(family, &p1, &c1);
+	if (err)
+		goto close;
+
+	err = add_to_sockmap(sock_mapfd, p0, p1);
+	if (err)
+		goto close_cli1;
+
+	n = write(c1, "a", 1);
+	if (n < 0)
+		FAIL_ERRNO("%s: write", log_prefix);
+	if (n == 0)
+		FAIL("%s: incomplete write", log_prefix);
+	if (n < 1)
+		goto close_cli1;
+
+	key = SK_PASS;
+	err = xbpf_map_lookup_elem(verd_mapfd, &key, &pass);
+	if (err)
+		goto close_cli1;
+	if (pass != 1)
+		FAIL("%s: want pass count 1, have %d", log_prefix, pass);
+
+again:
+	n = read(mode == REDIR_INGRESS ? p0 : c0, &b, 1);
+	if (n < 0) {
+		if (errno == EAGAIN && retries--)
+			goto again;
+		FAIL_ERRNO("%s: read", log_prefix);
+	}
+	if (n == 0)
+		FAIL("%s: incomplete read", log_prefix);
+
+close_cli1:
+	xclose(c1);
+	xclose(p1);
+close:
+	xclose(c0);
+	xclose(p0);
+}
+
+static void udp_unix_skb_redir_to_connected(struct test_sockmap_listen *skel,
+					    struct bpf_map *inner_map, int family)
+{
+	int verdict = bpf_program__fd(skel->progs.prog_skb_verdict);
+	int verdict_map = bpf_map__fd(skel->maps.verdict_map);
+	int sock_map = bpf_map__fd(inner_map);
+	int err;
+
+	err = xbpf_prog_attach(verdict, sock_map, BPF_SK_SKB_VERDICT, 0);
+	if (err)
+		return;
+
+	skel->bss->test_ingress = false;
+	udp_unix_redir_to_connected(family, sock_map, verdict_map, REDIR_EGRESS);
+	skel->bss->test_ingress = true;
+	udp_unix_redir_to_connected(family, sock_map, verdict_map, REDIR_INGRESS);
+
+	xbpf_prog_detach2(verdict, sock_map, BPF_SK_SKB_VERDICT);
+}
+
+static void unix_udp_redir_to_connected(int family, int sock_mapfd,
+					int verd_mapfd, enum redir_mode mode)
+{
+	const char *log_prefix = redir_mode_str(mode);
+	int c0, c1, p0, p1;
+	unsigned int pass;
+	int err, n;
+	int sfd[2];
+	u32 key;
+	char b;
+
+	zero_verdict_count(verd_mapfd);
+
+	err = udp_socketpair(family, &p0, &c0);
+	if (err)
+		return;
+
+	if (socketpair(AF_UNIX, SOCK_DGRAM | SOCK_NONBLOCK, 0, sfd))
+		goto close_cli0;
+	c1 = sfd[0], p1 = sfd[1];
+
+	err = add_to_sockmap(sock_mapfd, p0, p1);
+	if (err)
+		goto close;
+
+	n = write(c1, "a", 1);
+	if (n < 0)
+		FAIL_ERRNO("%s: write", log_prefix);
+	if (n == 0)
+		FAIL("%s: incomplete write", log_prefix);
+	if (n < 1)
+		goto close;
+
+	key = SK_PASS;
+	err = xbpf_map_lookup_elem(verd_mapfd, &key, &pass);
+	if (err)
+		goto close;
+	if (pass != 1)
+		FAIL("%s: want pass count 1, have %d", log_prefix, pass);
+
+	n = read(mode == REDIR_INGRESS ? p0 : c0, &b, 1);
+	if (n < 0)
+		FAIL_ERRNO("%s: read", log_prefix);
+	if (n == 0)
+		FAIL("%s: incomplete read", log_prefix);
+
+close:
+	xclose(c1);
+	xclose(p1);
+close_cli0:
+	xclose(c0);
+	xclose(p0);
+
+}
+
+static void unix_udp_skb_redir_to_connected(struct test_sockmap_listen *skel,
+					    struct bpf_map *inner_map, int family)
+{
+	int verdict = bpf_program__fd(skel->progs.prog_skb_verdict);
+	int verdict_map = bpf_map__fd(skel->maps.verdict_map);
+	int sock_map = bpf_map__fd(inner_map);
+	int err;
+
+	err = xbpf_prog_attach(verdict, sock_map, BPF_SK_SKB_VERDICT, 0);
+	if (err)
+		return;
+
+	skel->bss->test_ingress = false;
+	unix_udp_redir_to_connected(family, sock_map, verdict_map, REDIR_EGRESS);
+	skel->bss->test_ingress = true;
+	unix_udp_redir_to_connected(family, sock_map, verdict_map, REDIR_INGRESS);
+
+	xbpf_prog_detach2(verdict, sock_map, BPF_SK_SKB_VERDICT);
+}
+
+static void test_udp_unix_redir(struct test_sockmap_listen *skel, struct bpf_map *map,
+				int family)
+{
+	const char *family_name, *map_name;
+	char s[MAX_TEST_NAME];
+
+	family_name = family_str(family);
+	map_name = map_type_str(map);
+	snprintf(s, sizeof(s), "%s %s %s", map_name, family_name, __func__);
+	if (!test__start_subtest(s))
+		return;
+	udp_unix_skb_redir_to_connected(skel, map, family);
+	unix_udp_skb_redir_to_connected(skel, map, family);
+}
+
 static void run_tests(struct test_sockmap_listen *skel, struct bpf_map *map,
 		      int family)
 {
@@ -1834,6 +2003,7 @@ static void run_tests(struct test_sockmap_listen *skel, struct bpf_map *map,
 	test_reuseport(skel, map, family, SOCK_STREAM);
 	test_reuseport(skel, map, family, SOCK_DGRAM);
 	test_udp_redir(skel, map, family);
+	test_udp_unix_redir(skel, map, family);
 }
 
 void test_sockmap_listen(void)
-- 
2.27.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH bpf-next v5 00/11] sockmap: add sockmap support for unix datagram socket
  2021-07-04 19:02 [PATCH bpf-next v5 00/11] sockmap: add sockmap support for unix datagram socket Cong Wang
                   ` (10 preceding siblings ...)
  2021-07-04 19:02 ` [PATCH bpf-next v5 11/11] selftests/bpf: add test cases for redirection between udp and unix Cong Wang
@ 2021-07-12 17:02 ` John Fastabend
  2021-07-16  1:31   ` Alexei Starovoitov
  11 siblings, 1 reply; 19+ messages in thread
From: John Fastabend @ 2021-07-12 17:02 UTC (permalink / raw)
  To: Cong Wang, netdev; +Cc: bpf, Cong Wang

Cong Wang wrote:
> From: Cong Wang <cong.wang@bytedance.com>
> 
> This is the last patchset of the original large patchset. In the
> previous patchset, a new BPF sockmap program BPF_SK_SKB_VERDICT
> was introduced and UDP began to support it too. In this patchset,
> we add BPF_SK_SKB_VERDICT support to Unix datagram socket, so that
> we can finally splice Unix datagram socket and UDP socket. Please
> check each patch description for more details.
> 
> To see the big picture, the previous patchsets are available here:
> https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=1e0ab70778bd86a90de438cc5e1535c115a7c396
> https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=89d69c5d0fbcabd8656459bc8b1a476d6f1efee4
> 
> and this patchset is available here:
> https://github.com/congwang/linux/tree/sockmap3

LGTM Thanks. One nit around kfree of packets but its not specific
to this series and I have a proposed fix coming shortly so no
reason to hold this up.

Acked-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH bpf-next v5 03/11] af_unix: implement ->read_sock() for sockmap
  2021-07-04 19:02 ` [PATCH bpf-next v5 03/11] af_unix: implement ->read_sock() for sockmap Cong Wang
@ 2021-07-12 17:04   ` John Fastabend
  0 siblings, 0 replies; 19+ messages in thread
From: John Fastabend @ 2021-07-12 17:04 UTC (permalink / raw)
  To: Cong Wang, netdev
  Cc: bpf, Cong Wang, John Fastabend, Daniel Borkmann, Jakub Sitnicki,
	Lorenz Bauer

Cong Wang wrote:
> From: Cong Wang <cong.wang@bytedance.com>
> 
> Implement ->read_sock() for AF_UNIX datagram socket, it is
> pretty much similar to udp_read_sock().
> 
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Jakub Sitnicki <jakub@cloudflare.com>
> Cc: Lorenz Bauer <lmb@cloudflare.com>
> Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> ---

[...]

> +static int unix_read_sock(struct sock *sk, read_descriptor_t *desc,
> +			  sk_read_actor_t recv_actor)
> +{
> +	int copied = 0;
> +
> +	while (1) {
> +		struct unix_sock *u = unix_sk(sk);
> +		struct sk_buff *skb;
> +		int used, err;
> +
> +		mutex_lock(&u->iolock);
> +		skb = skb_recv_datagram(sk, 0, 1, &err);
> +		mutex_unlock(&u->iolock);
> +		if (!skb)
> +			return err;
> +
> +		used = recv_actor(desc, skb, 0, skb->len);
> +		if (used <= 0) {
> +			if (!copied)
> +				copied = used;
> +			kfree_skb(skb);

Is it OK to drop a unix dgram? I think the sockets likely wouldn't
expect this?

Anyways I'll have a proposed fix for TCP side shortly. And we can
extend it here as well if needed.

> +			break;
> +		} else if (used <= skb->len) {
> +			copied += used;
> +		}
> +
> +		kfree_skb(skb);
> +		if (!desc->count)
> +			break;
> +	}
> +
> +	return copied;
> +}
> +
>  /*
>   *	Sleep until more data has arrived. But check for races..
>   */
> -- 
> 2.27.0
> 



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v5 00/11] sockmap: add sockmap support for unix datagram socket
  2021-07-12 17:02 ` [PATCH bpf-next v5 00/11] sockmap: add sockmap support for unix datagram socket John Fastabend
@ 2021-07-16  1:31   ` Alexei Starovoitov
  0 siblings, 0 replies; 19+ messages in thread
From: Alexei Starovoitov @ 2021-07-16  1:31 UTC (permalink / raw)
  To: John Fastabend; +Cc: Cong Wang, Network Development, bpf, Cong Wang

On Mon, Jul 12, 2021 at 10:03 AM John Fastabend
<john.fastabend@gmail.com> wrote:
>
> Cong Wang wrote:
> > From: Cong Wang <cong.wang@bytedance.com>
> >
> > This is the last patchset of the original large patchset. In the
> > previous patchset, a new BPF sockmap program BPF_SK_SKB_VERDICT
> > was introduced and UDP began to support it too. In this patchset,
> > we add BPF_SK_SKB_VERDICT support to Unix datagram socket, so that
> > we can finally splice Unix datagram socket and UDP socket. Please
> > check each patch description for more details.
> >
> > To see the big picture, the previous patchsets are available here:
> > https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=1e0ab70778bd86a90de438cc5e1535c115a7c396
> > https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=89d69c5d0fbcabd8656459bc8b1a476d6f1efee4
> >
> > and this patchset is available here:
> > https://github.com/congwang/linux/tree/sockmap3
>
> LGTM Thanks. One nit around kfree of packets but its not specific
> to this series and I have a proposed fix coming shortly so no
> reason to hold this up.
>
> Acked-by: John Fastabend <john.fastabend@gmail.com>

Applied. Thanks

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v5 07/11] af_unix: implement unix_dgram_bpf_recvmsg()
  2021-07-04 19:02 ` [PATCH bpf-next v5 07/11] af_unix: implement unix_dgram_bpf_recvmsg() Cong Wang
@ 2021-07-18 17:49   ` Eric Dumazet
  2021-07-20  0:03     ` Cong Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2021-07-18 17:49 UTC (permalink / raw)
  To: Cong Wang, netdev
  Cc: bpf, Cong Wang, John Fastabend, Daniel Borkmann, Jakub Sitnicki,
	Lorenz Bauer



On 7/4/21 9:02 PM, Cong Wang wrote:
> From: Cong Wang <cong.wang@bytedance.com>
> 
> We have to implement unix_dgram_bpf_recvmsg() to replace the
> original ->recvmsg() to retrieve skmsg from ingress_msg.
> 
> AF_UNIX is again special here because the lack of
> sk_prot->recvmsg(). I simply add a special case inside
> unix_dgram_recvmsg() to call sk->sk_prot->recvmsg() directly.
> 
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Jakub Sitnicki <jakub@cloudflare.com>
> Cc: Lorenz Bauer <lmb@cloudflare.com>
> Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> ---
>  include/net/af_unix.h |  2 ++
>  net/unix/af_unix.c    | 19 +++++++++--
>  net/unix/unix_bpf.c   | 75 +++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 93 insertions(+), 3 deletions(-)
> 
> diff --git a/include/net/af_unix.h b/include/net/af_unix.h
> index cca645846af1..435a2c3d5a6f 100644
> --- a/include/net/af_unix.h
> +++ b/include/net/af_unix.h
> @@ -82,6 +82,8 @@ static inline struct unix_sock *unix_sk(const struct sock *sk)
>  long unix_inq_len(struct sock *sk);
>  long unix_outq_len(struct sock *sk);
>  
> +int __unix_dgram_recvmsg(struct sock *sk, struct msghdr *msg, size_t size,
> +			 int flags);
>  #ifdef CONFIG_SYSCTL
>  int unix_sysctl_register(struct net *net);
>  void unix_sysctl_unregister(struct net *net);
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 573253c5b5c2..89927678c0dc 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -2098,11 +2098,11 @@ static void unix_copy_addr(struct msghdr *msg, struct sock *sk)
>  	}
>  }
>  
> -static int unix_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
> -			      size_t size, int flags)
> +int __unix_dgram_recvmsg(struct sock *sk, struct msghdr *msg, size_t size,
> +			 int flags)
>  {
>  	struct scm_cookie scm;
> -	struct sock *sk = sock->sk;
> +	struct socket *sock = sk->sk_socket;
>  	struct unix_sock *u = unix_sk(sk);
>  	struct sk_buff *skb, *last;
>  	long timeo;
> @@ -2205,6 +2205,19 @@ static int unix_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
>  	return err;
>  }
>  
> +static int unix_dgram_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
> +			      int flags)
> +{
> +	struct sock *sk = sock->sk;
> +
> +#ifdef CONFIG_BPF_SYSCALL
> +	if (sk->sk_prot != &unix_proto)
> +		return sk->sk_prot->recvmsg(sk, msg, size, flags & MSG_DONTWAIT,
> +					    flags & ~MSG_DONTWAIT, NULL);
> +#endif
> +	return __unix_dgram_recvmsg(sk, msg, size, flags);
> +}
> +
>  static int unix_read_sock(struct sock *sk, read_descriptor_t *desc,
>  			  sk_read_actor_t recv_actor)
>  {
> diff --git a/net/unix/unix_bpf.c b/net/unix/unix_bpf.c
> index b1582a659427..db0cda29fb2f 100644
> --- a/net/unix/unix_bpf.c
> +++ b/net/unix/unix_bpf.c
> @@ -6,6 +6,80 @@
>  #include <net/sock.h>
>  #include <net/af_unix.h>
>  
> +#define unix_sk_has_data(__sk, __psock)					\
> +		({	!skb_queue_empty(&__sk->sk_receive_queue) ||	\
> +			!skb_queue_empty(&__psock->ingress_skb) ||	\
> +			!list_empty(&__psock->ingress_msg);		\
> +		})
> +
> +static int unix_msg_wait_data(struct sock *sk, struct sk_psock *psock,
> +			      long timeo)
> +{
> +	DEFINE_WAIT_FUNC(wait, woken_wake_function);
> +	struct unix_sock *u = unix_sk(sk);
> +	int ret = 0;
> +
> +	if (sk->sk_shutdown & RCV_SHUTDOWN)
> +		return 1;
> +
> +	if (!timeo)
> +		return ret;
> +
> +	add_wait_queue(sk_sleep(sk), &wait);
> +	sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk);
> +	if (!unix_sk_has_data(sk, psock)) {
> +		mutex_unlock(&u->iolock);
> +		wait_woken(&wait, TASK_INTERRUPTIBLE, timeo);
> +		mutex_lock(&u->iolock);
> +		ret = unix_sk_has_data(sk, psock);
> +	}
> +	sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk);
> +	remove_wait_queue(sk_sleep(sk), &wait);
> +	return ret;
> +}
> +
> +static int unix_dgram_bpf_recvmsg(struct sock *sk, struct msghdr *msg,
> +				  size_t len, int nonblock, int flags,
> +				  int *addr_len)
> +{
> +	struct unix_sock *u = unix_sk(sk);
> +	struct sk_psock *psock;
> +	int copied, ret;
> +
> +	psock = sk_psock_get(sk);
> +	if (unlikely(!psock))
> +		return __unix_dgram_recvmsg(sk, msg, len, flags);
> +
> +	mutex_lock(&u->iolock);

u->iolock mutex is owned here.

> +	if (!skb_queue_empty(&sk->sk_receive_queue) &&
> +	    sk_psock_queue_empty(psock)) {
> +		ret = __unix_dgram_recvmsg(sk, msg, len, flags);

But __unix_dgram_recvmsg() will also try to grab this mutex ?

> +		goto out;
> +	}
> +
> +msg_bytes_ready:
> +	copied = sk_msg_recvmsg(sk, psock, msg, len, flags);
> +	if (!copied) {
> +		long timeo;
> +		int data;
> +
> +		timeo = sock_rcvtimeo(sk, nonblock);
> +		data = unix_msg_wait_data(sk, psock, timeo);
> +		if (data) {
> +			if (!sk_psock_queue_empty(psock))
> +				goto msg_bytes_ready;
> +			ret = __unix_dgram_recvmsg(sk, msg, len, flags);
> +			goto out;
> +		}
> +		copied = -EAGAIN;
> +	}
> +	ret = copied;
> +out:
> +	mutex_unlock(&u->iolock);
> +	sk_psock_put(sk, psock);
> +	return ret;
> +}
> +
>  static struct proto *unix_prot_saved __read_mostly;
>  static DEFINE_SPINLOCK(unix_prot_lock);
>  static struct proto unix_bpf_prot;
> @@ -14,6 +88,7 @@ static void unix_bpf_rebuild_protos(struct proto *prot, const struct proto *base
>  {
>  	*prot        = *base;
>  	prot->close  = sock_map_close;
> +	prot->recvmsg = unix_dgram_bpf_recvmsg;
>  }
>  
>  static void unix_bpf_check_needs_rebuild(struct proto *ops)
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v5 07/11] af_unix: implement unix_dgram_bpf_recvmsg()
  2021-07-18 17:49   ` Eric Dumazet
@ 2021-07-20  0:03     ` Cong Wang
  0 siblings, 0 replies; 19+ messages in thread
From: Cong Wang @ 2021-07-20  0:03 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Linux Kernel Network Developers, bpf, Cong Wang, John Fastabend,
	Daniel Borkmann, Jakub Sitnicki, Lorenz Bauer

On Sun, Jul 18, 2021 at 10:49 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
> On 7/4/21 9:02 PM, Cong Wang wrote:
> > From: Cong Wang <cong.wang@bytedance.com>
> > +     mutex_lock(&u->iolock);
>
> u->iolock mutex is owned here.
>
> > +     if (!skb_queue_empty(&sk->sk_receive_queue) &&
> > +         sk_psock_queue_empty(psock)) {
> > +             ret = __unix_dgram_recvmsg(sk, msg, len, flags);
>
> But __unix_dgram_recvmsg() will also try to grab this mutex ?

Good catch. I should release the lock before calling it. I will send
a patch.

Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v5 11/11] selftests/bpf: add test cases for redirection between udp and unix
  2021-07-04 19:02 ` [PATCH bpf-next v5 11/11] selftests/bpf: add test cases for redirection between udp and unix Cong Wang
@ 2021-08-05 22:43   ` Andrii Nakryiko
  2021-08-06  2:34     ` Cong Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Andrii Nakryiko @ 2021-08-05 22:43 UTC (permalink / raw)
  To: Cong Wang
  Cc: Networking, bpf, Cong Wang, John Fastabend, Daniel Borkmann,
	Jakub Sitnicki, Lorenz Bauer

On Sun, Jul 4, 2021 at 12:05 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> From: Cong Wang <cong.wang@bytedance.com>
>
> Add two test cases to ensure redirection between udp and unix
> work bidirectionally.
>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Jakub Sitnicki <jakub@cloudflare.com>
> Cc: Lorenz Bauer <lmb@cloudflare.com>
> Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> ---
>  .../selftests/bpf/prog_tests/sockmap_listen.c | 170 ++++++++++++++++++
>  1 file changed, 170 insertions(+)
>

[...]

> +       n = write(c1, "a", 1);
> +       if (n < 0)
> +               FAIL_ERRNO("%s: write", log_prefix);
> +       if (n == 0)
> +               FAIL("%s: incomplete write", log_prefix);
> +       if (n < 1)
> +               goto close;
> +
> +       key = SK_PASS;
> +       err = xbpf_map_lookup_elem(verd_mapfd, &key, &pass);
> +       if (err)
> +               goto close;
> +       if (pass != 1)
> +               FAIL("%s: want pass count 1, have %d", log_prefix, pass);
> +
> +       n = read(mode == REDIR_INGRESS ? p0 : c0, &b, 1);
> +       if (n < 0)
> +               FAIL_ERRNO("%s: read", log_prefix);

Hey Cong,

This test is pretty flaky and quite frequently fails in our CIs (e.g., [0]):

./test_progs-no_alu32:unix_udp_redir_to_connected:1949: egress: read:
Resource temporarily unavailable
  unix_udp_redir_to_connected:FAIL:1949

Please send a fix to make it more reliable. Thanks!


  [0] https://github.com/anakryiko/libbpf/runs/3249152533?check_suite_focus=true


> +       if (n == 0)
> +               FAIL("%s: incomplete read", log_prefix);
> +
> +close:
> +       xclose(c1);
> +       xclose(p1);
> +close_cli0:
> +       xclose(c0);
> +       xclose(p0);
> +
> +}
> +

[...]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v5 11/11] selftests/bpf: add test cases for redirection between udp and unix
  2021-08-05 22:43   ` Andrii Nakryiko
@ 2021-08-06  2:34     ` Cong Wang
  0 siblings, 0 replies; 19+ messages in thread
From: Cong Wang @ 2021-08-06  2:34 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Networking, bpf, Cong Wang, John Fastabend, Daniel Borkmann,
	Jakub Sitnicki, Lorenz Bauer

On Thu, Aug 5, 2021 at 3:43 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Sun, Jul 4, 2021 at 12:05 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >
> > From: Cong Wang <cong.wang@bytedance.com>
> >
> > Add two test cases to ensure redirection between udp and unix
> > work bidirectionally.
> >
> > Cc: John Fastabend <john.fastabend@gmail.com>
> > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > Cc: Jakub Sitnicki <jakub@cloudflare.com>
> > Cc: Lorenz Bauer <lmb@cloudflare.com>
> > Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> > ---
> >  .../selftests/bpf/prog_tests/sockmap_listen.c | 170 ++++++++++++++++++
> >  1 file changed, 170 insertions(+)
> >
>
> [...]
>
> > +       n = write(c1, "a", 1);
> > +       if (n < 0)
> > +               FAIL_ERRNO("%s: write", log_prefix);
> > +       if (n == 0)
> > +               FAIL("%s: incomplete write", log_prefix);
> > +       if (n < 1)
> > +               goto close;
> > +
> > +       key = SK_PASS;
> > +       err = xbpf_map_lookup_elem(verd_mapfd, &key, &pass);
> > +       if (err)
> > +               goto close;
> > +       if (pass != 1)
> > +               FAIL("%s: want pass count 1, have %d", log_prefix, pass);
> > +
> > +       n = read(mode == REDIR_INGRESS ? p0 : c0, &b, 1);
> > +       if (n < 0)
> > +               FAIL_ERRNO("%s: read", log_prefix);
>
> Hey Cong,
>
> This test is pretty flaky and quite frequently fails in our CIs (e.g., [0]):
>
> ./test_progs-no_alu32:unix_udp_redir_to_connected:1949: egress: read:
> Resource temporarily unavailable
>   unix_udp_redir_to_connected:FAIL:1949
>
> Please send a fix to make it more reliable. Thanks!

Using an unbound number of retries makes this reliable, but you
dislike it. ;) But, there must be some reason for the delay of
packet delivery, I guess it might be related to workqueue scheduling,
let me check it tomorrow.

Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-08-06  2:34 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-04 19:02 [PATCH bpf-next v5 00/11] sockmap: add sockmap support for unix datagram socket Cong Wang
2021-07-04 19:02 ` [PATCH bpf-next v5 01/11] sock_map: relax config dependency to CONFIG_NET Cong Wang
2021-07-04 19:02 ` [PATCH bpf-next v5 02/11] sock_map: lift socket state restriction for datagram sockets Cong Wang
2021-07-04 19:02 ` [PATCH bpf-next v5 03/11] af_unix: implement ->read_sock() for sockmap Cong Wang
2021-07-12 17:04   ` John Fastabend
2021-07-04 19:02 ` [PATCH bpf-next v5 04/11] af_unix: set TCP_ESTABLISHED for datagram sockets too Cong Wang
2021-07-04 19:02 ` [PATCH bpf-next v5 05/11] af_unix: add a dummy ->close() for sockmap Cong Wang
2021-07-04 19:02 ` [PATCH bpf-next v5 06/11] af_unix: implement ->psock_update_sk_prot() Cong Wang
2021-07-04 19:02 ` [PATCH bpf-next v5 07/11] af_unix: implement unix_dgram_bpf_recvmsg() Cong Wang
2021-07-18 17:49   ` Eric Dumazet
2021-07-20  0:03     ` Cong Wang
2021-07-04 19:02 ` [PATCH bpf-next v5 08/11] selftests/bpf: factor out udp_socketpair() Cong Wang
2021-07-04 19:02 ` [PATCH bpf-next v5 09/11] selftests/bpf: factor out add_to_sockmap() Cong Wang
2021-07-04 19:02 ` [PATCH bpf-next v5 10/11] selftests/bpf: add a test case for unix sockmap Cong Wang
2021-07-04 19:02 ` [PATCH bpf-next v5 11/11] selftests/bpf: add test cases for redirection between udp and unix Cong Wang
2021-08-05 22:43   ` Andrii Nakryiko
2021-08-06  2:34     ` Cong Wang
2021-07-12 17:02 ` [PATCH bpf-next v5 00/11] sockmap: add sockmap support for unix datagram socket John Fastabend
2021-07-16  1:31   ` Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).