All of lore.kernel.org
 help / color / mirror / Atom feed
* [Patch bpf v2 0/7] sock_map: some bug fixes and improvements
@ 2021-05-22 19:14 Cong Wang
  2021-05-22 19:14 ` [Patch bpf v2 1/7] skmsg: improve udp_bpf_recvmsg() accuracy Cong Wang
                   ` (6 more replies)
  0 siblings, 7 replies; 9+ messages in thread
From: Cong Wang @ 2021-05-22 19:14 UTC (permalink / raw)
  To: netdev; +Cc: bpf, Cong Wang

From: Cong Wang <cong.wang@bytedance.com>

This patchset contains a few bug fixes and improvements for sock_map.

Patch 1 improves recvmsg() accuracy for UDP, patch 2 improves UDP
non-blocking read() by retrying on EAGAIN. With both of them, the
failure rate of the UDP test case goes down from 10% to 0%.

Patch 3 is memory leak fix I posted, no change since v1. The rest
patches address similar memory leaks or improve error handling,
including one increases sk_drops counter in error cases. Please
check each patch description for more details.

---
Cong Wang (7):
  skmsg: improve udp_bpf_recvmsg() accuracy
  selftests/bpf: Retry for EAGAIN in udp_redir_to_connected()
  udp: fix a memory leak in udp_read_sock()
  skmsg: fix a memory leak in sk_psock_verdict_apply()
  skmsg: teach sk_psock_verdict_apply() to return errors
  skmsg: pass source psock to sk_psock_skb_redirect()
  skmsg: increase sk->sk_drops when dropping packets

 include/linux/skmsg.h                         |  2 -
 net/core/skmsg.c                              | 80 +++++++++----------
 net/ipv4/tcp_bpf.c                            | 24 +++++-
 net/ipv4/udp.c                                |  2 +
 net/ipv4/udp_bpf.c                            | 47 +++++++++--
 .../selftests/bpf/prog_tests/sockmap_listen.c |  7 +-
 6 files changed, 109 insertions(+), 53 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Patch bpf v2 1/7] skmsg: improve udp_bpf_recvmsg() accuracy
  2021-05-22 19:14 [Patch bpf v2 0/7] sock_map: some bug fixes and improvements Cong Wang
@ 2021-05-22 19:14 ` Cong Wang
  2021-05-22 19:14 ` [Patch bpf v2 2/7] selftests/bpf: Retry for EAGAIN in udp_redir_to_connected() Cong Wang
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Cong Wang @ 2021-05-22 19:14 UTC (permalink / raw)
  To: netdev
  Cc: bpf, Cong Wang, John Fastabend, Daniel Borkmann, Jakub Sitnicki,
	Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

I tried to reuse sk_msg_wait_data() for different protocols,
but it turns out it can not be simply reused. For example,
UDP actually uses two queues to receive skb:
udp_sk(sk)->reader_queue and sk->sk_receive_queue. So we have
to check both of them to know whether we have received any
packet.

Also, UDP does not lock the sock during BH Rx path, it makes
no sense for its ->recvmsg() to lock the sock. It is always
possible for ->recvmsg() to be called before packets actually
arrive in the receive queue, we just use best effort to make
it accurate here.

Fixes: 1f5be6b3b063 ("udp: Implement udp_bpf_recvmsg() for sockmap")
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 include/linux/skmsg.h |  2 --
 net/core/skmsg.c      | 23 ---------------------
 net/ipv4/tcp_bpf.c    | 24 +++++++++++++++++++++-
 net/ipv4/udp_bpf.c    | 47 ++++++++++++++++++++++++++++++++++++++-----
 4 files changed, 65 insertions(+), 31 deletions(-)

diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index aba0f0f429be..e3d080c299f6 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -126,8 +126,6 @@ int sk_msg_zerocopy_from_iter(struct sock *sk, struct iov_iter *from,
 			      struct sk_msg *msg, u32 bytes);
 int sk_msg_memcopy_from_iter(struct sock *sk, struct iov_iter *from,
 			     struct sk_msg *msg, u32 bytes);
-int sk_msg_wait_data(struct sock *sk, struct sk_psock *psock, int flags,
-		     long timeo, int *err);
 int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg,
 		   int len, int flags);
 
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 43ce17a6a585..f9a81b314e4c 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -399,29 +399,6 @@ int sk_msg_memcopy_from_iter(struct sock *sk, struct iov_iter *from,
 }
 EXPORT_SYMBOL_GPL(sk_msg_memcopy_from_iter);
 
-int sk_msg_wait_data(struct sock *sk, struct sk_psock *psock, int flags,
-		     long timeo, int *err)
-{
-	DEFINE_WAIT_FUNC(wait, woken_wake_function);
-	int ret = 0;
-
-	if (sk->sk_shutdown & RCV_SHUTDOWN)
-		return 1;
-
-	if (!timeo)
-		return ret;
-
-	add_wait_queue(sk_sleep(sk), &wait);
-	sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk);
-	ret = sk_wait_event(sk, &timeo,
-			    !list_empty(&psock->ingress_msg) ||
-			    !skb_queue_empty(&sk->sk_receive_queue), &wait);
-	sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk);
-	remove_wait_queue(sk_sleep(sk), &wait);
-	return ret;
-}
-EXPORT_SYMBOL_GPL(sk_msg_wait_data);
-
 /* Receive sk_msg from psock->ingress_msg to @msg. */
 int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg,
 		   int len, int flags)
diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index ad9d17923fc5..bb49b52d7be8 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -163,6 +163,28 @@ static bool tcp_bpf_stream_read(const struct sock *sk)
 	return !empty;
 }
 
+static int tcp_msg_wait_data(struct sock *sk, struct sk_psock *psock, int flags,
+			     long timeo, int *err)
+{
+	DEFINE_WAIT_FUNC(wait, woken_wake_function);
+	int ret = 0;
+
+	if (sk->sk_shutdown & RCV_SHUTDOWN)
+		return 1;
+
+	if (!timeo)
+		return ret;
+
+	add_wait_queue(sk_sleep(sk), &wait);
+	sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk);
+	ret = sk_wait_event(sk, &timeo,
+			    !list_empty(&psock->ingress_msg) ||
+			    !skb_queue_empty(&sk->sk_receive_queue), &wait);
+	sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk);
+	remove_wait_queue(sk_sleep(sk), &wait);
+	return ret;
+}
+
 static int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		    int nonblock, int flags, int *addr_len)
 {
@@ -188,7 +210,7 @@ static int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		long timeo;
 
 		timeo = sock_rcvtimeo(sk, nonblock);
-		data = sk_msg_wait_data(sk, psock, flags, timeo, &err);
+		data = tcp_msg_wait_data(sk, psock, flags, timeo, &err);
 		if (data) {
 			if (!sk_psock_queue_empty(psock))
 				goto msg_bytes_ready;
diff --git a/net/ipv4/udp_bpf.c b/net/ipv4/udp_bpf.c
index 954c4591a6fd..565a70040c57 100644
--- a/net/ipv4/udp_bpf.c
+++ b/net/ipv4/udp_bpf.c
@@ -21,6 +21,45 @@ static int sk_udp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 	return udp_prot.recvmsg(sk, msg, len, noblock, flags, addr_len);
 }
 
+static bool udp_sk_has_data(struct sock *sk)
+{
+	return !skb_queue_empty(&udp_sk(sk)->reader_queue) ||
+	       !skb_queue_empty(&sk->sk_receive_queue);
+}
+
+static bool psock_has_data(struct sk_psock *psock)
+{
+	return !skb_queue_empty(&psock->ingress_skb) ||
+	       !sk_psock_queue_empty(psock);
+}
+
+#define udp_msg_has_data(__sk, __psock)	\
+		({ udp_sk_has_data(__sk) || psock_has_data(__psock); })
+
+static int udp_msg_wait_data(struct sock *sk, struct sk_psock *psock, int flags,
+			     long timeo, int *err)
+{
+	DEFINE_WAIT_FUNC(wait, woken_wake_function);
+	int ret = 0;
+
+	if (sk->sk_shutdown & RCV_SHUTDOWN)
+		return 1;
+
+	if (!timeo)
+		return ret;
+
+	add_wait_queue(sk_sleep(sk), &wait);
+	sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk);
+	ret = udp_msg_has_data(sk, psock);
+	if (!ret) {
+		wait_woken(&wait, TASK_INTERRUPTIBLE, timeo);
+		ret = udp_msg_has_data(sk, psock);
+	}
+	sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk);
+	remove_wait_queue(sk_sleep(sk), &wait);
+	return ret;
+}
+
 static int udp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 			   int nonblock, int flags, int *addr_len)
 {
@@ -34,8 +73,7 @@ static int udp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 	if (unlikely(!psock))
 		return sk_udp_recvmsg(sk, msg, len, nonblock, flags, addr_len);
 
-	lock_sock(sk);
-	if (sk_psock_queue_empty(psock)) {
+	if (!psock_has_data(psock)) {
 		ret = sk_udp_recvmsg(sk, msg, len, nonblock, flags, addr_len);
 		goto out;
 	}
@@ -47,9 +85,9 @@ static int udp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		long timeo;
 
 		timeo = sock_rcvtimeo(sk, nonblock);
-		data = sk_msg_wait_data(sk, psock, flags, timeo, &err);
+		data = udp_msg_wait_data(sk, psock, flags, timeo, &err);
 		if (data) {
-			if (!sk_psock_queue_empty(psock))
+			if (psock_has_data(psock))
 				goto msg_bytes_ready;
 			ret = sk_udp_recvmsg(sk, msg, len, nonblock, flags, addr_len);
 			goto out;
@@ -62,7 +100,6 @@ static int udp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 	}
 	ret = copied;
 out:
-	release_sock(sk);
 	sk_psock_put(sk, psock);
 	return ret;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Patch bpf v2 2/7] selftests/bpf: Retry for EAGAIN in udp_redir_to_connected()
  2021-05-22 19:14 [Patch bpf v2 0/7] sock_map: some bug fixes and improvements Cong Wang
  2021-05-22 19:14 ` [Patch bpf v2 1/7] skmsg: improve udp_bpf_recvmsg() accuracy Cong Wang
@ 2021-05-22 19:14 ` Cong Wang
  2021-05-22 19:14 ` [Patch bpf v2 3/7] udp: fix a memory leak in udp_read_sock() Cong Wang
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Cong Wang @ 2021-05-22 19:14 UTC (permalink / raw)
  To: netdev
  Cc: bpf, Cong Wang, Jiang Wang, John Fastabend, Daniel Borkmann,
	Jakub Sitnicki, Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

We use non-blocking sockets for testing sockmap redirections,
and got some random EAGAIN errors from UDP tests.

There is no guarantee the packet would be immediately available
to receive as soon as it is sent out, even on the local host.
For UDP, this is especially true because it does not lock the
sock during BH (unlike the TCP path). This is probably why we
only saw this error in UDP cases.

No matter how hard we try to make the queue empty check accurate,
it is always possible for recvmsg() to beat ->sk_data_ready().
Therefore, we should just retry in case of EAGAIN.

Fixes: d6378af615275 ("selftests/bpf: Add a test case for udp sockmap")
Reported-by: Jiang Wang <jiang.wang@bytedance.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 tools/testing/selftests/bpf/prog_tests/sockmap_listen.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
index 648d9ae898d2..01ab11259809 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
@@ -1610,6 +1610,7 @@ static void udp_redir_to_connected(int family, int sotype, int sock_mapfd,
 	struct sockaddr_storage addr;
 	int c0, c1, p0, p1;
 	unsigned int pass;
+	int retries = 100;
 	socklen_t len;
 	int err, n;
 	u64 value;
@@ -1686,9 +1687,13 @@ static void udp_redir_to_connected(int family, int sotype, int sock_mapfd,
 	if (pass != 1)
 		FAIL("%s: want pass count 1, have %d", log_prefix, pass);
 
+again:
 	n = read(mode == REDIR_INGRESS ? p0 : c0, &b, 1);
-	if (n < 0)
+	if (n < 0) {
+		if (errno == EAGAIN && retries--)
+			goto again;
 		FAIL_ERRNO("%s: read", log_prefix);
+	}
 	if (n == 0)
 		FAIL("%s: incomplete read", log_prefix);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Patch bpf v2 3/7] udp: fix a memory leak in udp_read_sock()
  2021-05-22 19:14 [Patch bpf v2 0/7] sock_map: some bug fixes and improvements Cong Wang
  2021-05-22 19:14 ` [Patch bpf v2 1/7] skmsg: improve udp_bpf_recvmsg() accuracy Cong Wang
  2021-05-22 19:14 ` [Patch bpf v2 2/7] selftests/bpf: Retry for EAGAIN in udp_redir_to_connected() Cong Wang
@ 2021-05-22 19:14 ` Cong Wang
  2021-05-22 19:14 ` [Patch bpf v2 4/7] skmsg: fix a memory leak in sk_psock_verdict_apply() Cong Wang
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Cong Wang @ 2021-05-22 19:14 UTC (permalink / raw)
  To: netdev
  Cc: bpf, Cong Wang, John Fastabend, Daniel Borkmann, Jakub Sitnicki,
	Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

sk_psock_verdict_recv() clones the skb and uses the clone
afterward, so udp_read_sock() should free the skb after using
it, regardless of error or not.

This fixes a real kmemleak.

Fixes: d7f571188ecf ("udp: Implement ->read_sock() for sockmap")
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 net/ipv4/udp.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 15f5504adf5b..e31d67fd5183 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1798,11 +1798,13 @@ int udp_read_sock(struct sock *sk, read_descriptor_t *desc,
 		if (used <= 0) {
 			if (!copied)
 				copied = used;
+			kfree_skb(skb);
 			break;
 		} else if (used <= skb->len) {
 			copied += used;
 		}
 
+		kfree_skb(skb);
 		if (!desc->count)
 			break;
 	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Patch bpf v2 4/7] skmsg: fix a memory leak in sk_psock_verdict_apply()
  2021-05-22 19:14 [Patch bpf v2 0/7] sock_map: some bug fixes and improvements Cong Wang
                   ` (2 preceding siblings ...)
  2021-05-22 19:14 ` [Patch bpf v2 3/7] udp: fix a memory leak in udp_read_sock() Cong Wang
@ 2021-05-22 19:14 ` Cong Wang
  2021-05-22 19:14 ` [Patch bpf v2 5/7] skmsg: teach sk_psock_verdict_apply() to return errors Cong Wang
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Cong Wang @ 2021-05-22 19:14 UTC (permalink / raw)
  To: netdev
  Cc: bpf, Cong Wang, John Fastabend, Daniel Borkmann, Jakub Sitnicki,
	Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

If the dest psock does not set SK_PSOCK_TX_ENABLED,
then the skb can't be queued anywhere so hould be
dropped.

This one is found during code review.

Fixes: 799aa7f98d53 ("skmsg: Avoid lock_sock() in sk_psock_backlog()")
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 net/core/skmsg.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index f9a81b314e4c..de68a3cd33f1 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -922,8 +922,11 @@ static void sk_psock_verdict_apply(struct sk_psock *psock,
 			if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) {
 				skb_queue_tail(&psock->ingress_skb, skb);
 				schedule_work(&psock->work);
+				err = 0;
 			}
 			spin_unlock_bh(&psock->ingress_lock);
+			if (err < 0)
+				goto out_free;
 		}
 		break;
 	case __SK_REDIRECT:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Patch bpf v2 5/7] skmsg: teach sk_psock_verdict_apply() to return errors
  2021-05-22 19:14 [Patch bpf v2 0/7] sock_map: some bug fixes and improvements Cong Wang
                   ` (3 preceding siblings ...)
  2021-05-22 19:14 ` [Patch bpf v2 4/7] skmsg: fix a memory leak in sk_psock_verdict_apply() Cong Wang
@ 2021-05-22 19:14 ` Cong Wang
  2021-05-22 19:14 ` [Patch bpf v2 6/7] skmsg: pass source psock to sk_psock_skb_redirect() Cong Wang
  2021-05-22 19:14 ` [Patch bpf v2 7/7] skmsg: increase sk->sk_drops when dropping packets Cong Wang
  6 siblings, 0 replies; 9+ messages in thread
From: Cong Wang @ 2021-05-22 19:14 UTC (permalink / raw)
  To: netdev
  Cc: bpf, Cong Wang, John Fastabend, Daniel Borkmann, Jakub Sitnicki,
	Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

Currently sk_psock_verdict_apply() is void, but it handles some
error conditions too. Its caller is impossible to learn whether
it succeeds or fails, especially sk_psock_verdict_recv().

Make it return int to indicate error cases and propagate errors
to callers properly.

Fixes: ef5659280eb1 ("bpf, sockmap: Allow skipping sk_skb parser program")
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 net/core/skmsg.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index de68a3cd33f1..335fc60f5d22 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -824,7 +824,7 @@ int sk_psock_msg_verdict(struct sock *sk, struct sk_psock *psock,
 }
 EXPORT_SYMBOL_GPL(sk_psock_msg_verdict);
 
-static void sk_psock_skb_redirect(struct sk_buff *skb)
+static int sk_psock_skb_redirect(struct sk_buff *skb)
 {
 	struct sk_psock *psock_other;
 	struct sock *sk_other;
@@ -835,7 +835,7 @@ static void sk_psock_skb_redirect(struct sk_buff *skb)
 	 */
 	if (unlikely(!sk_other)) {
 		kfree_skb(skb);
-		return;
+		return -EIO;
 	}
 	psock_other = sk_psock(sk_other);
 	/* This error indicates the socket is being torn down or had another
@@ -844,18 +844,19 @@ static void sk_psock_skb_redirect(struct sk_buff *skb)
 	 */
 	if (!psock_other || sock_flag(sk_other, SOCK_DEAD)) {
 		kfree_skb(skb);
-		return;
+		return -EIO;
 	}
 	spin_lock_bh(&psock_other->ingress_lock);
 	if (!sk_psock_test_state(psock_other, SK_PSOCK_TX_ENABLED)) {
 		spin_unlock_bh(&psock_other->ingress_lock);
 		kfree_skb(skb);
-		return;
+		return -EIO;
 	}
 
 	skb_queue_tail(&psock_other->ingress_skb, skb);
 	schedule_work(&psock_other->work);
 	spin_unlock_bh(&psock_other->ingress_lock);
+	return 0;
 }
 
 static void sk_psock_tls_verdict_apply(struct sk_buff *skb, struct sock *sk, int verdict)
@@ -892,14 +893,15 @@ int sk_psock_tls_strp_read(struct sk_psock *psock, struct sk_buff *skb)
 }
 EXPORT_SYMBOL_GPL(sk_psock_tls_strp_read);
 
-static void sk_psock_verdict_apply(struct sk_psock *psock,
-				   struct sk_buff *skb, int verdict)
+static int sk_psock_verdict_apply(struct sk_psock *psock, struct sk_buff *skb,
+				  int verdict)
 {
 	struct sock *sk_other;
-	int err = -EIO;
+	int err = 0;
 
 	switch (verdict) {
 	case __SK_PASS:
+		err = -EIO;
 		sk_other = psock->sk;
 		if (sock_flag(sk_other, SOCK_DEAD) ||
 		    !sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) {
@@ -930,13 +932,15 @@ static void sk_psock_verdict_apply(struct sk_psock *psock,
 		}
 		break;
 	case __SK_REDIRECT:
-		sk_psock_skb_redirect(skb);
+		err = sk_psock_skb_redirect(skb);
 		break;
 	case __SK_DROP:
 	default:
 out_free:
 		kfree_skb(skb);
 	}
+
+	return err;
 }
 
 static void sk_psock_write_space(struct sock *sk)
@@ -1103,7 +1107,8 @@ static int sk_psock_verdict_recv(read_descriptor_t *desc, struct sk_buff *skb,
 		ret = sk_psock_map_verd(ret, skb_bpf_redirect_fetch(skb));
 		skb->sk = NULL;
 	}
-	sk_psock_verdict_apply(psock, skb, ret);
+	if (sk_psock_verdict_apply(psock, skb, ret) < 0)
+		len = 0;
 out:
 	rcu_read_unlock();
 	return len;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Patch bpf v2 6/7] skmsg: pass source psock to sk_psock_skb_redirect()
  2021-05-22 19:14 [Patch bpf v2 0/7] sock_map: some bug fixes and improvements Cong Wang
                   ` (4 preceding siblings ...)
  2021-05-22 19:14 ` [Patch bpf v2 5/7] skmsg: teach sk_psock_verdict_apply() to return errors Cong Wang
@ 2021-05-22 19:14 ` Cong Wang
  2021-05-22 19:14 ` [Patch bpf v2 7/7] skmsg: increase sk->sk_drops when dropping packets Cong Wang
  6 siblings, 0 replies; 9+ messages in thread
From: Cong Wang @ 2021-05-22 19:14 UTC (permalink / raw)
  To: netdev
  Cc: bpf, Cong Wang, John Fastabend, Daniel Borkmann, Jakub Sitnicki,
	Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

sk_psock_skb_redirect() only takes skb as a parameter, we
will need to know where this skb is from, so just pass
the source psock to this function as a new parameter.
This patch prepares for the next one.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 net/core/skmsg.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 335fc60f5d22..7b2c25038a48 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -824,7 +824,7 @@ int sk_psock_msg_verdict(struct sock *sk, struct sk_psock *psock,
 }
 EXPORT_SYMBOL_GPL(sk_psock_msg_verdict);
 
-static int sk_psock_skb_redirect(struct sk_buff *skb)
+static int sk_psock_skb_redirect(struct sk_psock *from, struct sk_buff *skb)
 {
 	struct sk_psock *psock_other;
 	struct sock *sk_other;
@@ -859,11 +859,12 @@ static int sk_psock_skb_redirect(struct sk_buff *skb)
 	return 0;
 }
 
-static void sk_psock_tls_verdict_apply(struct sk_buff *skb, struct sock *sk, int verdict)
+static void sk_psock_tls_verdict_apply(struct sk_buff *skb,
+				       struct sk_psock *from, int verdict)
 {
 	switch (verdict) {
 	case __SK_REDIRECT:
-		sk_psock_skb_redirect(skb);
+		sk_psock_skb_redirect(from, skb);
 		break;
 	case __SK_PASS:
 	case __SK_DROP:
@@ -887,7 +888,7 @@ int sk_psock_tls_strp_read(struct sk_psock *psock, struct sk_buff *skb)
 		ret = sk_psock_map_verd(ret, skb_bpf_redirect_fetch(skb));
 		skb->sk = NULL;
 	}
-	sk_psock_tls_verdict_apply(skb, psock->sk, ret);
+	sk_psock_tls_verdict_apply(skb, psock, ret);
 	rcu_read_unlock();
 	return ret;
 }
@@ -932,7 +933,7 @@ static int sk_psock_verdict_apply(struct sk_psock *psock, struct sk_buff *skb,
 		}
 		break;
 	case __SK_REDIRECT:
-		err = sk_psock_skb_redirect(skb);
+		err = sk_psock_skb_redirect(psock, skb);
 		break;
 	case __SK_DROP:
 	default:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Patch bpf v2 7/7] skmsg: increase sk->sk_drops when dropping packets
  2021-05-22 19:14 [Patch bpf v2 0/7] sock_map: some bug fixes and improvements Cong Wang
                   ` (5 preceding siblings ...)
  2021-05-22 19:14 ` [Patch bpf v2 6/7] skmsg: pass source psock to sk_psock_skb_redirect() Cong Wang
@ 2021-05-22 19:14 ` Cong Wang
  2021-05-26  4:22   ` John Fastabend
  6 siblings, 1 reply; 9+ messages in thread
From: Cong Wang @ 2021-05-22 19:14 UTC (permalink / raw)
  To: netdev
  Cc: bpf, Cong Wang, John Fastabend, Daniel Borkmann, Jakub Sitnicki,
	Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

It is hard to observe packet drops without increase relevant
drop counters, here we should increase sk->sk_drops which is
a protocol-independent counter. Fortunately psock is always
assocaited with a struct sock, we can just use psock->sk.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 net/core/skmsg.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 7b2c25038a48..de3af8152d07 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -578,6 +578,12 @@ static int sk_psock_handle_skb(struct sk_psock *psock, struct sk_buff *skb,
 	return sk_psock_skb_ingress(psock, skb);
 }
 
+static void sock_drop(struct sock *sk, struct sk_buff *skb)
+{
+	sk_drops_add(sk, skb);
+	kfree_skb(skb);
+}
+
 static void sk_psock_backlog(struct work_struct *work)
 {
 	struct sk_psock *psock = container_of(work, struct sk_psock, work);
@@ -617,7 +623,7 @@ static void sk_psock_backlog(struct work_struct *work)
 				/* Hard errors break pipe and stop xmit. */
 				sk_psock_report_error(psock, ret ? -ret : EPIPE);
 				sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED);
-				kfree_skb(skb);
+				sock_drop(psock->sk, skb);
 				goto end;
 			}
 			off += ret;
@@ -625,7 +631,7 @@ static void sk_psock_backlog(struct work_struct *work)
 		} while (len);
 
 		if (!ingress)
-			kfree_skb(skb);
+			sock_drop(psock->sk, skb);
 	}
 end:
 	mutex_unlock(&psock->work_mutex);
@@ -708,7 +714,7 @@ static void __sk_psock_zap_ingress(struct sk_psock *psock)
 
 	while ((skb = skb_dequeue(&psock->ingress_skb)) != NULL) {
 		skb_bpf_redirect_clear(skb);
-		kfree_skb(skb);
+		sock_drop(psock->sk, skb);
 	}
 	__sk_psock_purge_ingress_msg(psock);
 }
@@ -834,7 +840,7 @@ static int sk_psock_skb_redirect(struct sk_psock *from, struct sk_buff *skb)
 	 * return code, but then didn't set a redirect interface.
 	 */
 	if (unlikely(!sk_other)) {
-		kfree_skb(skb);
+		sock_drop(from->sk, skb);
 		return -EIO;
 	}
 	psock_other = sk_psock(sk_other);
@@ -843,13 +849,13 @@ static int sk_psock_skb_redirect(struct sk_psock *from, struct sk_buff *skb)
 	 * a socket that is in this state so we drop the skb.
 	 */
 	if (!psock_other || sock_flag(sk_other, SOCK_DEAD)) {
-		kfree_skb(skb);
+		sock_drop(from->sk, skb);
 		return -EIO;
 	}
 	spin_lock_bh(&psock_other->ingress_lock);
 	if (!sk_psock_test_state(psock_other, SK_PSOCK_TX_ENABLED)) {
 		spin_unlock_bh(&psock_other->ingress_lock);
-		kfree_skb(skb);
+		sock_drop(from->sk, skb);
 		return -EIO;
 	}
 
@@ -938,7 +944,7 @@ static int sk_psock_verdict_apply(struct sk_psock *psock, struct sk_buff *skb,
 	case __SK_DROP:
 	default:
 out_free:
-		kfree_skb(skb);
+		sock_drop(psock->sk, skb);
 	}
 
 	return err;
@@ -973,7 +979,7 @@ static void sk_psock_strp_read(struct strparser *strp, struct sk_buff *skb)
 	sk = strp->sk;
 	psock = sk_psock(sk);
 	if (unlikely(!psock)) {
-		kfree_skb(skb);
+		sock_drop(sk, skb);
 		goto out;
 	}
 	prog = READ_ONCE(psock->progs.stream_verdict);
@@ -1094,7 +1100,7 @@ static int sk_psock_verdict_recv(read_descriptor_t *desc, struct sk_buff *skb,
 	psock = sk_psock(sk);
 	if (unlikely(!psock)) {
 		len = 0;
-		kfree_skb(skb);
+		sock_drop(sk, skb);
 		goto out;
 	}
 	prog = READ_ONCE(psock->progs.stream_verdict);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* RE: [Patch bpf v2 7/7] skmsg: increase sk->sk_drops when dropping packets
  2021-05-22 19:14 ` [Patch bpf v2 7/7] skmsg: increase sk->sk_drops when dropping packets Cong Wang
@ 2021-05-26  4:22   ` John Fastabend
  0 siblings, 0 replies; 9+ messages in thread
From: John Fastabend @ 2021-05-26  4:22 UTC (permalink / raw)
  To: Cong Wang, netdev
  Cc: bpf, Cong Wang, John Fastabend, Daniel Borkmann, Jakub Sitnicki,
	Lorenz Bauer

Cong Wang wrote:
> From: Cong Wang <cong.wang@bytedance.com>
> 
> It is hard to observe packet drops without increase relevant
> drop counters, here we should increase sk->sk_drops which is
> a protocol-independent counter. Fortunately psock is always
> assocaited with a struct sock, we can just use psock->sk.
> 
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Jakub Sitnicki <jakub@cloudflare.com>
> Cc: Lorenz Bauer <lmb@cloudflare.com>
> Signed-off-by: Cong Wang <cong.wang@bytedance.com>

[...]

>  static void sk_psock_backlog(struct work_struct *work)
>  {
>  	struct sk_psock *psock = container_of(work, struct sk_psock, work);
> @@ -617,7 +623,7 @@ static void sk_psock_backlog(struct work_struct *work)
>  				/* Hard errors break pipe and stop xmit. */
>  				sk_psock_report_error(psock, ret ? -ret : EPIPE);
>  				sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED);
> -				kfree_skb(skb);
> +				sock_drop(psock->sk, skb);
>  				goto end;
>  			}
>  			off += ret;
> @@ -625,7 +631,7 @@ static void sk_psock_backlog(struct work_struct *work)
>  		} while (len);
>  
>  		if (!ingress)
> -			kfree_skb(skb);
> +			sock_drop(psock->sk, skb);

This is not a dropped skb this was sent via skb_send_sock().

The rest LGTM thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-05-26  4:22 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-22 19:14 [Patch bpf v2 0/7] sock_map: some bug fixes and improvements Cong Wang
2021-05-22 19:14 ` [Patch bpf v2 1/7] skmsg: improve udp_bpf_recvmsg() accuracy Cong Wang
2021-05-22 19:14 ` [Patch bpf v2 2/7] selftests/bpf: Retry for EAGAIN in udp_redir_to_connected() Cong Wang
2021-05-22 19:14 ` [Patch bpf v2 3/7] udp: fix a memory leak in udp_read_sock() Cong Wang
2021-05-22 19:14 ` [Patch bpf v2 4/7] skmsg: fix a memory leak in sk_psock_verdict_apply() Cong Wang
2021-05-22 19:14 ` [Patch bpf v2 5/7] skmsg: teach sk_psock_verdict_apply() to return errors Cong Wang
2021-05-22 19:14 ` [Patch bpf v2 6/7] skmsg: pass source psock to sk_psock_skb_redirect() Cong Wang
2021-05-22 19:14 ` [Patch bpf v2 7/7] skmsg: increase sk->sk_drops when dropping packets Cong Wang
2021-05-26  4:22   ` John Fastabend

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.