bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Patch bpf v2 0/4] sock_map: fix ->poll() and update selftests
@ 2021-09-28  0:22 Cong Wang
  2021-09-28  0:22 ` [Patch bpf v2 1/4] skmsg: introduce sk_psock_get_checked() Cong Wang
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Cong Wang @ 2021-09-28  0:22 UTC (permalink / raw)
  To: netdev; +Cc: bpf, Cong Wang

From: Cong Wang <cong.wang@bytedance.com>

This patchset fixes ->poll() on sockmap sockets and update
selftests accordingly with select(). Please check each patch
for more details.
---
v2: rename and reuse ->stream_memory_read()
    fix a compile error in sk_psock_get_checked()

Cong Wang (3):
  skmsg: introduce sk_psock_get_checked()
  net: rename ->stream_memory_read to ->sock_is_readable
  net: implement ->sock_is_readable for UDP and AF_UNIX

Yucong Sun (1):
  selftests/bpf: use recv_timeout() instead of retries

 include/linux/skmsg.h                         | 21 ++++++
 include/net/sock.h                            |  8 +-
 include/net/tls.h                             |  2 +-
 net/core/skmsg.c                              | 14 ++++
 net/core/sock_map.c                           | 22 +-----
 net/ipv4/tcp.c                                |  5 +-
 net/ipv4/tcp_bpf.c                            |  4 +-
 net/ipv4/udp.c                                |  2 +
 net/ipv4/udp_bpf.c                            |  1 +
 net/tls/tls_main.c                            |  4 +-
 net/tls/tls_sw.c                              |  2 +-
 net/unix/af_unix.c                            |  4 +
 net/unix/unix_bpf.c                           |  2 +
 .../selftests/bpf/prog_tests/sockmap_listen.c | 75 +++++--------------
 14 files changed, 79 insertions(+), 87 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Patch bpf v2 1/4] skmsg: introduce sk_psock_get_checked()
  2021-09-28  0:22 [Patch bpf v2 0/4] sock_map: fix ->poll() and update selftests Cong Wang
@ 2021-09-28  0:22 ` Cong Wang
  2021-09-28  0:22 ` [Patch bpf v2 2/4] net: rename ->stream_memory_read to ->sock_is_readable Cong Wang
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Cong Wang @ 2021-09-28  0:22 UTC (permalink / raw)
  To: netdev
  Cc: bpf, Cong Wang, John Fastabend, Daniel Borkmann, Jakub Sitnicki,
	Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

Although we have sk_psock_get(), it assumes the psock
retrieved from sk_user_data is for sockmap, this is not
sufficient if we call it outside of sockmap, for example,
reuseport_array.

Fortunately sock_map_psock_get_checked() is more strict
and checks for sock_map_close before using psock. So we can
refactor it and rename it to sk_psock_get_checked(), which
can be safely called outside of sockmap.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 include/linux/skmsg.h | 20 ++++++++++++++++++++
 net/core/sock_map.c   | 22 +---------------------
 2 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index 14ab0c0bc924..8f577739fc36 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -452,6 +452,26 @@ static inline struct sk_psock *sk_psock_get(struct sock *sk)
 	return psock;
 }
 
+static inline struct sk_psock *sk_psock_get_checked(struct sock *sk)
+{
+	struct sk_psock *psock;
+
+	rcu_read_lock();
+	psock = sk_psock(sk);
+	if (psock) {
+#if defined(CONFIG_BPF_SYSCALL)
+		if (sk->sk_prot->close != sock_map_close) {
+			rcu_read_unlock();
+			return ERR_PTR(-EBUSY);
+		}
+#endif
+		if (!refcount_inc_not_zero(&psock->refcnt))
+			psock = ERR_PTR(-EBUSY);
+	}
+	rcu_read_unlock();
+	return psock;
+}
+
 void sk_psock_drop(struct sock *sk, struct sk_psock *psock);
 
 static inline void sk_psock_put(struct sock *sk, struct sk_psock *psock)
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index e252b8ec2b85..6612bb0b95b5 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -191,26 +191,6 @@ static int sock_map_init_proto(struct sock *sk, struct sk_psock *psock)
 	return sk->sk_prot->psock_update_sk_prot(sk, psock, false);
 }
 
-static struct sk_psock *sock_map_psock_get_checked(struct sock *sk)
-{
-	struct sk_psock *psock;
-
-	rcu_read_lock();
-	psock = sk_psock(sk);
-	if (psock) {
-		if (sk->sk_prot->close != sock_map_close) {
-			psock = ERR_PTR(-EBUSY);
-			goto out;
-		}
-
-		if (!refcount_inc_not_zero(&psock->refcnt))
-			psock = ERR_PTR(-EBUSY);
-	}
-out:
-	rcu_read_unlock();
-	return psock;
-}
-
 static int sock_map_link(struct bpf_map *map, struct sock *sk)
 {
 	struct sk_psock_progs *progs = sock_map_progs(map);
@@ -255,7 +235,7 @@ static int sock_map_link(struct bpf_map *map, struct sock *sk)
 		}
 	}
 
-	psock = sock_map_psock_get_checked(sk);
+	psock = sk_psock_get_checked(sk);
 	if (IS_ERR(psock)) {
 		ret = PTR_ERR(psock);
 		goto out_progs;
-- 
2.30.2


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Patch bpf v2 2/4] net: rename ->stream_memory_read to ->sock_is_readable
  2021-09-28  0:22 [Patch bpf v2 0/4] sock_map: fix ->poll() and update selftests Cong Wang
  2021-09-28  0:22 ` [Patch bpf v2 1/4] skmsg: introduce sk_psock_get_checked() Cong Wang
@ 2021-09-28  0:22 ` Cong Wang
  2021-09-28  0:22 ` [Patch bpf v2 3/4] net: implement ->sock_is_readable for UDP and AF_UNIX Cong Wang
  2021-09-28  0:22 ` [Patch bpf v2 4/4] selftests/bpf: use recv_timeout() instead of retries Cong Wang
  3 siblings, 0 replies; 7+ messages in thread
From: Cong Wang @ 2021-09-28  0:22 UTC (permalink / raw)
  To: netdev
  Cc: bpf, Cong Wang, John Fastabend, Daniel Borkmann, Jakub Sitnicki,
	Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

The proto ops ->stream_memory_read is currently only used
by TCP to check whether psock queue is empty or not. We need
to rename it before reusing it for non-TCP
protocols, and adjust the exsiting TCP functions accordingly.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 include/net/sock.h | 8 +++++++-
 include/net/tls.h  | 2 +-
 net/ipv4/tcp.c     | 5 +----
 net/ipv4/tcp_bpf.c | 4 ++--
 net/tls/tls_main.c | 4 ++--
 net/tls/tls_sw.c   | 2 +-
 6 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 66a9a90f9558..5c1dcc4a2284 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1205,7 +1205,7 @@ struct proto {
 #endif
 
 	bool			(*stream_memory_free)(const struct sock *sk, int wake);
-	bool			(*stream_memory_read)(const struct sock *sk);
+	bool			(*sock_is_readable)(struct sock *sk);
 	/* Memory pressure */
 	void			(*enter_memory_pressure)(struct sock *sk);
 	void			(*leave_memory_pressure)(struct sock *sk);
@@ -2787,4 +2787,10 @@ void sock_set_sndtimeo(struct sock *sk, s64 secs);
 
 int sock_bind_add(struct sock *sk, struct sockaddr *addr, int addr_len);
 
+static inline bool sk_is_readable(struct sock *sk)
+{
+	if (sk->sk_prot->sock_is_readable)
+		return sk->sk_prot->sock_is_readable(sk);
+	return false;
+}
 #endif	/* _SOCK_H */
diff --git a/include/net/tls.h b/include/net/tls.h
index be4b3e1cac46..01d2e3744393 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -375,7 +375,7 @@ void tls_sw_release_resources_rx(struct sock *sk);
 void tls_sw_free_ctx_rx(struct tls_context *tls_ctx);
 int tls_sw_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		   int nonblock, int flags, int *addr_len);
-bool tls_sw_stream_read(const struct sock *sk);
+bool tls_sw_sock_is_readable(struct sock *sk);
 ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos,
 			   struct pipe_inode_info *pipe,
 			   size_t len, unsigned int flags);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index e8b48df73c85..f5c336f8b0c8 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -486,10 +486,7 @@ static bool tcp_stream_is_readable(struct sock *sk, int target)
 {
 	if (tcp_epollin_ready(sk, target))
 		return true;
-
-	if (sk->sk_prot->stream_memory_read)
-		return sk->sk_prot->stream_memory_read(sk);
-	return false;
+	return sk_is_readable(sk);
 }
 
 /*
diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index d3e9386b493e..0175dbcb7722 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -150,7 +150,7 @@ int tcp_bpf_sendmsg_redir(struct sock *sk, struct sk_msg *msg,
 EXPORT_SYMBOL_GPL(tcp_bpf_sendmsg_redir);
 
 #ifdef CONFIG_BPF_SYSCALL
-static bool tcp_bpf_stream_read(const struct sock *sk)
+static bool tcp_bpf_sock_is_readable(struct sock *sk)
 {
 	struct sk_psock *psock;
 	bool empty = true;
@@ -479,7 +479,7 @@ static void tcp_bpf_rebuild_protos(struct proto prot[TCP_BPF_NUM_CFGS],
 	prot[TCP_BPF_BASE].unhash		= sock_map_unhash;
 	prot[TCP_BPF_BASE].close		= sock_map_close;
 	prot[TCP_BPF_BASE].recvmsg		= tcp_bpf_recvmsg;
-	prot[TCP_BPF_BASE].stream_memory_read	= tcp_bpf_stream_read;
+	prot[TCP_BPF_BASE].sock_is_readable	= tcp_bpf_sock_is_readable;
 
 	prot[TCP_BPF_TX]			= prot[TCP_BPF_BASE];
 	prot[TCP_BPF_TX].sendmsg		= tcp_bpf_sendmsg;
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index fde56ff49163..9ab81db8a654 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -681,12 +681,12 @@ static void build_protos(struct proto prot[TLS_NUM_CONFIG][TLS_NUM_CONFIG],
 
 	prot[TLS_BASE][TLS_SW] = prot[TLS_BASE][TLS_BASE];
 	prot[TLS_BASE][TLS_SW].recvmsg		  = tls_sw_recvmsg;
-	prot[TLS_BASE][TLS_SW].stream_memory_read = tls_sw_stream_read;
+	prot[TLS_BASE][TLS_SW].sock_is_readable   = tls_sw_sock_is_readable;
 	prot[TLS_BASE][TLS_SW].close		  = tls_sk_proto_close;
 
 	prot[TLS_SW][TLS_SW] = prot[TLS_SW][TLS_BASE];
 	prot[TLS_SW][TLS_SW].recvmsg		= tls_sw_recvmsg;
-	prot[TLS_SW][TLS_SW].stream_memory_read	= tls_sw_stream_read;
+	prot[TLS_SW][TLS_SW].sock_is_readable   = tls_sw_sock_is_readable;
 	prot[TLS_SW][TLS_SW].close		= tls_sk_proto_close;
 
 #ifdef CONFIG_TLS_DEVICE
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 4feb95e34b64..d5d09bd817b7 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -2026,7 +2026,7 @@ ssize_t tls_sw_splice_read(struct socket *sock,  loff_t *ppos,
 	return copied ? : err;
 }
 
-bool tls_sw_stream_read(const struct sock *sk)
+bool tls_sw_sock_is_readable(struct sock *sk)
 {
 	struct tls_context *tls_ctx = tls_get_ctx(sk);
 	struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Patch bpf v2 3/4] net: implement ->sock_is_readable for UDP and AF_UNIX
  2021-09-28  0:22 [Patch bpf v2 0/4] sock_map: fix ->poll() and update selftests Cong Wang
  2021-09-28  0:22 ` [Patch bpf v2 1/4] skmsg: introduce sk_psock_get_checked() Cong Wang
  2021-09-28  0:22 ` [Patch bpf v2 2/4] net: rename ->stream_memory_read to ->sock_is_readable Cong Wang
@ 2021-09-28  0:22 ` Cong Wang
  2021-09-30 21:44   ` John Fastabend
  2021-09-28  0:22 ` [Patch bpf v2 4/4] selftests/bpf: use recv_timeout() instead of retries Cong Wang
  3 siblings, 1 reply; 7+ messages in thread
From: Cong Wang @ 2021-09-28  0:22 UTC (permalink / raw)
  To: netdev
  Cc: bpf, Cong Wang, Yucong Sun, John Fastabend, Daniel Borkmann,
	Jakub Sitnicki, Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

Yucong noticed we can't poll() sockets in sockmap even when
they are the destination sockets of redirections. This is
because we never poll any psock queues in ->poll(), except
for TCP. Now we can overwrite >sock_is_readable() and
implement and invoke it for UDP and AF_UNIX sockets.

Reported-by: Yucong Sun <sunyucong@gmail.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 include/linux/skmsg.h |  1 +
 net/core/skmsg.c      | 14 ++++++++++++++
 net/ipv4/udp.c        |  2 ++
 net/ipv4/udp_bpf.c    |  1 +
 net/unix/af_unix.c    |  4 ++++
 net/unix/unix_bpf.c   |  2 ++
 6 files changed, 24 insertions(+)

diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index 8f577739fc36..a25434207dca 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -128,6 +128,7 @@ int sk_msg_memcopy_from_iter(struct sock *sk, struct iov_iter *from,
 			     struct sk_msg *msg, u32 bytes);
 int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg,
 		   int len, int flags);
+bool sk_msg_is_readable(struct sock *sk);
 
 static inline void sk_msg_check_to_free(struct sk_msg *msg, u32 i, u32 bytes)
 {
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 2d6249b28928..93ae48581ad2 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -474,6 +474,20 @@ int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg,
 }
 EXPORT_SYMBOL_GPL(sk_msg_recvmsg);
 
+bool sk_msg_is_readable(struct sock *sk)
+{
+	struct sk_psock *psock;
+	bool empty = true;
+
+	psock = sk_psock_get_checked(sk);
+	if (IS_ERR_OR_NULL(psock))
+		return false;
+	empty = sk_psock_queue_empty(psock);
+	sk_psock_put(sk, psock);
+	return !empty;
+}
+EXPORT_SYMBOL_GPL(sk_msg_is_readable);
+
 static struct sk_msg *sk_psock_create_ingress_msg(struct sock *sk,
 						  struct sk_buff *skb)
 {
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 8851c9463b4b..9f49c0967504 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2866,6 +2866,8 @@ __poll_t udp_poll(struct file *file, struct socket *sock, poll_table *wait)
 	    !(sk->sk_shutdown & RCV_SHUTDOWN) && first_packet_length(sk) == -1)
 		mask &= ~(EPOLLIN | EPOLLRDNORM);
 
+	if (sk_is_readable(sk))
+		mask |= EPOLLIN | EPOLLRDNORM;
 	return mask;
 
 }
diff --git a/net/ipv4/udp_bpf.c b/net/ipv4/udp_bpf.c
index 7a1d5f473878..bbe6569c9ad3 100644
--- a/net/ipv4/udp_bpf.c
+++ b/net/ipv4/udp_bpf.c
@@ -114,6 +114,7 @@ static void udp_bpf_rebuild_protos(struct proto *prot, const struct proto *base)
 	*prot        = *base;
 	prot->close  = sock_map_close;
 	prot->recvmsg = udp_bpf_recvmsg;
+	prot->sock_is_readable = sk_msg_is_readable;
 }
 
 static void udp_bpf_check_v6_needs_rebuild(struct proto *ops)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 92345c9bb60c..f1cbaa0ccf6b 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -3014,6 +3014,8 @@ static __poll_t unix_poll(struct file *file, struct socket *sock, poll_table *wa
 	/* readable? */
 	if (!skb_queue_empty_lockless(&sk->sk_receive_queue))
 		mask |= EPOLLIN | EPOLLRDNORM;
+	if (sk_is_readable(sk))
+		mask |= EPOLLIN | EPOLLRDNORM;
 
 	/* Connection-based need to check for termination and startup */
 	if ((sk->sk_type == SOCK_STREAM || sk->sk_type == SOCK_SEQPACKET) &&
@@ -3053,6 +3055,8 @@ static __poll_t unix_dgram_poll(struct file *file, struct socket *sock,
 	/* readable? */
 	if (!skb_queue_empty_lockless(&sk->sk_receive_queue))
 		mask |= EPOLLIN | EPOLLRDNORM;
+	if (sk_is_readable(sk))
+		mask |= EPOLLIN | EPOLLRDNORM;
 
 	/* Connection-based need to check for termination and startup */
 	if (sk->sk_type == SOCK_SEQPACKET) {
diff --git a/net/unix/unix_bpf.c b/net/unix/unix_bpf.c
index b927e2baae50..452376c6f419 100644
--- a/net/unix/unix_bpf.c
+++ b/net/unix/unix_bpf.c
@@ -102,6 +102,7 @@ static void unix_dgram_bpf_rebuild_protos(struct proto *prot, const struct proto
 	*prot        = *base;
 	prot->close  = sock_map_close;
 	prot->recvmsg = unix_bpf_recvmsg;
+	prot->sock_is_readable = sk_msg_is_readable;
 }
 
 static void unix_stream_bpf_rebuild_protos(struct proto *prot,
@@ -110,6 +111,7 @@ static void unix_stream_bpf_rebuild_protos(struct proto *prot,
 	*prot        = *base;
 	prot->close  = sock_map_close;
 	prot->recvmsg = unix_bpf_recvmsg;
+	prot->sock_is_readable = sk_msg_is_readable;
 	prot->unhash  = sock_map_unhash;
 }
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Patch bpf v2 4/4] selftests/bpf: use recv_timeout() instead of retries
  2021-09-28  0:22 [Patch bpf v2 0/4] sock_map: fix ->poll() and update selftests Cong Wang
                   ` (2 preceding siblings ...)
  2021-09-28  0:22 ` [Patch bpf v2 3/4] net: implement ->sock_is_readable for UDP and AF_UNIX Cong Wang
@ 2021-09-28  0:22 ` Cong Wang
  3 siblings, 0 replies; 7+ messages in thread
From: Cong Wang @ 2021-09-28  0:22 UTC (permalink / raw)
  To: netdev
  Cc: bpf, Yucong Sun, John Fastabend, Daniel Borkmann, Jakub Sitnicki,
	Lorenz Bauer, Cong Wang

From: Yucong Sun <sunyucong@gmail.com>

We use non-blocking sockets in those tests, retrying for
EAGAIN is ugly because there is no upper bound for the packet
arrival time, at least in theory. After we fix poll() on
sockmap sockets, now we can switch to select()+recv().

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Yucong Sun <sunyucong@gmail.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 .../selftests/bpf/prog_tests/sockmap_listen.c | 75 +++++--------------
 1 file changed, 20 insertions(+), 55 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
index 5c5979046523..d88bb65b74cc 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
@@ -949,7 +949,6 @@ static void redir_to_connected(int family, int sotype, int sock_mapfd,
 	int err, n;
 	u32 key;
 	char b;
-	int retries = 100;
 
 	zero_verdict_count(verd_mapfd);
 
@@ -1002,17 +1001,11 @@ static void redir_to_connected(int family, int sotype, int sock_mapfd,
 		goto close_peer1;
 	if (pass != 1)
 		FAIL("%s: want pass count 1, have %d", log_prefix, pass);
-again:
-	n = read(c0, &b, 1);
-	if (n < 0) {
-		if (errno == EAGAIN && retries--) {
-			usleep(1000);
-			goto again;
-		}
-		FAIL_ERRNO("%s: read", log_prefix);
-	}
+	n = recv_timeout(c0, &b, 1, 0, IO_TIMEOUT_SEC);
+	if (n < 0)
+		FAIL_ERRNO("%s: recv_timeout", log_prefix);
 	if (n == 0)
-		FAIL("%s: incomplete read", log_prefix);
+		FAIL("%s: incomplete recv", log_prefix);
 
 close_peer1:
 	xclose(p1);
@@ -1571,7 +1564,6 @@ static void unix_redir_to_connected(int sotype, int sock_mapfd,
 	const char *log_prefix = redir_mode_str(mode);
 	int c0, c1, p0, p1;
 	unsigned int pass;
-	int retries = 100;
 	int err, n;
 	int sfd[2];
 	u32 key;
@@ -1606,17 +1598,11 @@ static void unix_redir_to_connected(int sotype, int sock_mapfd,
 	if (pass != 1)
 		FAIL("%s: want pass count 1, have %d", log_prefix, pass);
 
-again:
-	n = read(mode == REDIR_INGRESS ? p0 : c0, &b, 1);
-	if (n < 0) {
-		if (errno == EAGAIN && retries--) {
-			usleep(1000);
-			goto again;
-		}
-		FAIL_ERRNO("%s: read", log_prefix);
-	}
+	n = recv_timeout(mode == REDIR_INGRESS ? p0 : c0, &b, 1, 0, IO_TIMEOUT_SEC);
+	if (n < 0)
+		FAIL_ERRNO("%s: recv_timeout", log_prefix);
 	if (n == 0)
-		FAIL("%s: incomplete read", log_prefix);
+		FAIL("%s: incomplete recv", log_prefix);
 
 close:
 	xclose(c1);
@@ -1748,7 +1734,6 @@ static void udp_redir_to_connected(int family, int sock_mapfd, int verd_mapfd,
 	const char *log_prefix = redir_mode_str(mode);
 	int c0, c1, p0, p1;
 	unsigned int pass;
-	int retries = 100;
 	int err, n;
 	u32 key;
 	char b;
@@ -1781,17 +1766,11 @@ static void udp_redir_to_connected(int family, int sock_mapfd, int verd_mapfd,
 	if (pass != 1)
 		FAIL("%s: want pass count 1, have %d", log_prefix, pass);
 
-again:
-	n = read(mode == REDIR_INGRESS ? p0 : c0, &b, 1);
-	if (n < 0) {
-		if (errno == EAGAIN && retries--) {
-			usleep(1000);
-			goto again;
-		}
-		FAIL_ERRNO("%s: read", log_prefix);
-	}
+	n = recv_timeout(mode == REDIR_INGRESS ? p0 : c0, &b, 1, 0, IO_TIMEOUT_SEC);
+	if (n < 0)
+		FAIL_ERRNO("%s: recv_timeout", log_prefix);
 	if (n == 0)
-		FAIL("%s: incomplete read", log_prefix);
+		FAIL("%s: incomplete recv", log_prefix);
 
 close_cli1:
 	xclose(c1);
@@ -1841,7 +1820,6 @@ static void inet_unix_redir_to_connected(int family, int type, int sock_mapfd,
 	const char *log_prefix = redir_mode_str(mode);
 	int c0, c1, p0, p1;
 	unsigned int pass;
-	int retries = 100;
 	int err, n;
 	int sfd[2];
 	u32 key;
@@ -1876,17 +1854,11 @@ static void inet_unix_redir_to_connected(int family, int type, int sock_mapfd,
 	if (pass != 1)
 		FAIL("%s: want pass count 1, have %d", log_prefix, pass);
 
-again:
-	n = read(mode == REDIR_INGRESS ? p0 : c0, &b, 1);
-	if (n < 0) {
-		if (errno == EAGAIN && retries--) {
-			usleep(1000);
-			goto again;
-		}
-		FAIL_ERRNO("%s: read", log_prefix);
-	}
+	n = recv_timeout(mode == REDIR_INGRESS ? p0 : c0, &b, 1, 0, IO_TIMEOUT_SEC);
+	if (n < 0)
+		FAIL_ERRNO("%s: recv_timeout", log_prefix);
 	if (n == 0)
-		FAIL("%s: incomplete read", log_prefix);
+		FAIL("%s: incomplete recv", log_prefix);
 
 close_cli1:
 	xclose(c1);
@@ -1932,7 +1904,6 @@ static void unix_inet_redir_to_connected(int family, int type, int sock_mapfd,
 	int sfd[2];
 	u32 key;
 	char b;
-	int retries = 100;
 
 	zero_verdict_count(verd_mapfd);
 
@@ -1963,17 +1934,11 @@ static void unix_inet_redir_to_connected(int family, int type, int sock_mapfd,
 	if (pass != 1)
 		FAIL("%s: want pass count 1, have %d", log_prefix, pass);
 
-again:
-	n = read(mode == REDIR_INGRESS ? p0 : c0, &b, 1);
-	if (n < 0) {
-		if (errno == EAGAIN && retries--) {
-			usleep(1000);
-			goto again;
-		}
-		FAIL_ERRNO("%s: read", log_prefix);
-	}
+	n = recv_timeout(mode == REDIR_INGRESS ? p0 : c0, &b, 1, 0, IO_TIMEOUT_SEC);
+	if (n < 0)
+		FAIL_ERRNO("%s: recv_timeout", log_prefix);
 	if (n == 0)
-		FAIL("%s: incomplete read", log_prefix);
+		FAIL("%s: incomplete recv", log_prefix);
 
 close:
 	xclose(c1);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [Patch bpf v2 3/4] net: implement ->sock_is_readable for UDP and AF_UNIX
  2021-09-28  0:22 ` [Patch bpf v2 3/4] net: implement ->sock_is_readable for UDP and AF_UNIX Cong Wang
@ 2021-09-30 21:44   ` John Fastabend
  2021-10-02  0:00     ` Cong Wang
  0 siblings, 1 reply; 7+ messages in thread
From: John Fastabend @ 2021-09-30 21:44 UTC (permalink / raw)
  To: Cong Wang, netdev
  Cc: bpf, Cong Wang, Yucong Sun, John Fastabend, Daniel Borkmann,
	Jakub Sitnicki, Lorenz Bauer

Cong Wang wrote:
> From: Cong Wang <cong.wang@bytedance.com>
> 
> Yucong noticed we can't poll() sockets in sockmap even when
> they are the destination sockets of redirections. This is
> because we never poll any psock queues in ->poll(), except
> for TCP. Now we can overwrite >sock_is_readable() and
> implement and invoke it for UDP and AF_UNIX sockets.

nit: instead of 'because we never poll any psock queue...' how about
'because we do not poll the psock queues in ->poll(), except
for TCP.'
> 
> Reported-by: Yucong Sun <sunyucong@gmail.com>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Jakub Sitnicki <jakub@cloudflare.com>
> Cc: Lorenz Bauer <lmb@cloudflare.com>
> Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> ---

[...]
  
>  static inline void sk_msg_check_to_free(struct sk_msg *msg, u32 i, u32 bytes)
>  {
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index 2d6249b28928..93ae48581ad2 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -474,6 +474,20 @@ int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg,
>  }
>  EXPORT_SYMBOL_GPL(sk_msg_recvmsg);
>  
> +bool sk_msg_is_readable(struct sock *sk)
> +{
> +	struct sk_psock *psock;
> +	bool empty = true;
> +
> +	psock = sk_psock_get_checked(sk);

We shouldn't need the checked version here right? We only get here because
we hooked the sk with the callbacks from *_bpf_rebuild_rpotos. Then we
can just use sk_psock() and save a few extra insns/branch.

> +	if (IS_ERR_OR_NULL(psock))
> +		return false;
> +	empty = sk_psock_queue_empty(psock);
> +	sk_psock_put(sk, psock);
> +	return !empty;
> +}
> +EXPORT_SYMBOL_GPL(sk_msg_is_readable);

[...]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Patch bpf v2 3/4] net: implement ->sock_is_readable for UDP and AF_UNIX
  2021-09-30 21:44   ` John Fastabend
@ 2021-10-02  0:00     ` Cong Wang
  0 siblings, 0 replies; 7+ messages in thread
From: Cong Wang @ 2021-10-02  0:00 UTC (permalink / raw)
  To: John Fastabend
  Cc: Linux Kernel Network Developers, bpf, Cong Wang, Yucong Sun,
	Daniel Borkmann, Jakub Sitnicki, Lorenz Bauer

On Thu, Sep 30, 2021 at 2:44 PM John Fastabend <john.fastabend@gmail.com> wrote:
> > +bool sk_msg_is_readable(struct sock *sk)
> > +{
> > +     struct sk_psock *psock;
> > +     bool empty = true;
> > +
> > +     psock = sk_psock_get_checked(sk);
>
> We shouldn't need the checked version here right? We only get here because
> we hooked the sk with the callbacks from *_bpf_rebuild_rpotos. Then we
> can just use sk_psock() and save a few extra insns/branch.

Good catch! Indeed only sockmap overwrites that hook.

I will send V3 shortly after all tests are done.

Thanks

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-10-02  0:00 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-28  0:22 [Patch bpf v2 0/4] sock_map: fix ->poll() and update selftests Cong Wang
2021-09-28  0:22 ` [Patch bpf v2 1/4] skmsg: introduce sk_psock_get_checked() Cong Wang
2021-09-28  0:22 ` [Patch bpf v2 2/4] net: rename ->stream_memory_read to ->sock_is_readable Cong Wang
2021-09-28  0:22 ` [Patch bpf v2 3/4] net: implement ->sock_is_readable for UDP and AF_UNIX Cong Wang
2021-09-30 21:44   ` John Fastabend
2021-10-02  0:00     ` Cong Wang
2021-09-28  0:22 ` [Patch bpf v2 4/4] selftests/bpf: use recv_timeout() instead of retries Cong Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).