All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RESEND bpf 0/4] bpf, sockmap: Fix some issues with using apply_bytes
@ 2022-11-22  1:58 Pengcheng Yang
  2022-11-22  1:58 ` [PATCH RESEND bpf 1/4] bpf, sockmap: Fix repeated calls to sock_put() when msg has more_data Pengcheng Yang
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Pengcheng Yang @ 2022-11-22  1:58 UTC (permalink / raw)
  To: bpf, netdev, John Fastabend, Daniel Borkmann, Jakub Sitnicki,
	Lorenz Bauer
  Cc: Pengcheng Yang

Patch 0001~0003 fixes three issues with using apply_bytes when redirecting.
Patch 0004 adds ingress tests for txmsg with apply_bytes in selftests.

Pengcheng Yang (4):
  bpf, sockmap: Fix repeated calls to sock_put() when msg has more_data
  bpf, sockmap: Fix missing BPF_F_INGRESS flag when using apply_bytes
  bpf, sockmap: Fix data loss caused by using apply_bytes on ingress
    redirect
  selftests/bpf: Add ingress tests for txmsg with apply_bytes

 include/linux/skmsg.h                      |  1 +
 net/core/skmsg.c                           |  1 +
 net/ipv4/tcp_bpf.c                         |  9 +++++++--
 net/tls/tls_sw.c                           |  1 +
 tools/testing/selftests/bpf/test_sockmap.c | 18 ++++++++++++++++++
 5 files changed, 28 insertions(+), 2 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH RESEND bpf 1/4] bpf, sockmap: Fix repeated calls to sock_put() when msg has more_data
  2022-11-22  1:58 [PATCH RESEND bpf 0/4] bpf, sockmap: Fix some issues with using apply_bytes Pengcheng Yang
@ 2022-11-22  1:58 ` Pengcheng Yang
  2022-11-23  2:50   ` John Fastabend
  2022-11-22  1:58 ` [PATCH RESEND bpf 2/4] bpf, sockmap: Fix missing BPF_F_INGRESS flag when using apply_bytes Pengcheng Yang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Pengcheng Yang @ 2022-11-22  1:58 UTC (permalink / raw)
  To: bpf, netdev, John Fastabend, Daniel Borkmann, Jakub Sitnicki,
	Lorenz Bauer
  Cc: Pengcheng Yang

In tcp_bpf_send_verdict() redirection, the eval variable is assigned to
__SK_REDIRECT after the apply_bytes data is sent, if msg has more_data,
sock_put() will be called multiple times.
We should reset the eval variable to __SK_NONE every time more_data
starts.

This causes:

IPv4: Attempt to release TCP socket in state 1 00000000b4c925d7
------------[ cut here ]------------
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 5 PID: 4482 at lib/refcount.c:25 refcount_warn_saturate+0x7d/0x110
Modules linked in:
CPU: 5 PID: 4482 Comm: sockhash_bypass Kdump: loaded Not tainted 6.0.0 #1
Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014
Call Trace:
 <TASK>
 __tcp_transmit_skb+0xa1b/0xb90
 ? __alloc_skb+0x8c/0x1a0
 ? __kmalloc_node_track_caller+0x184/0x320
 tcp_write_xmit+0x22a/0x1110
 __tcp_push_pending_frames+0x32/0xf0
 do_tcp_sendpages+0x62d/0x640
 tcp_bpf_push+0xae/0x2c0
 tcp_bpf_sendmsg_redir+0x260/0x410
 ? preempt_count_add+0x70/0xa0
 tcp_bpf_send_verdict+0x386/0x4b0
 tcp_bpf_sendmsg+0x21b/0x3b0
 sock_sendmsg+0x58/0x70
 __sys_sendto+0xfa/0x170
 ? xfd_validate_state+0x1d/0x80
 ? switch_fpu_return+0x59/0xe0
 __x64_sys_sendto+0x24/0x30
 do_syscall_64+0x37/0x90
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

Fixes: cd9733f5d75c ("tcp_bpf: Fix one concurrency problem in the tcp_bpf_send_verdict function")
Signed-off-by: Pengcheng Yang <yangpc@wangsu.com>
---
 net/ipv4/tcp_bpf.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index f8b12b9..ef5de4f 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -279,7 +279,7 @@ static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
 	bool cork = false, enospc = sk_msg_full(msg);
 	struct sock *sk_redir;
 	u32 tosend, origsize, sent, delta = 0;
-	u32 eval = __SK_NONE;
+	u32 eval;
 	int ret;
 
 more_data:
@@ -310,6 +310,7 @@ static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
 	tosend = msg->sg.size;
 	if (psock->apply_bytes && psock->apply_bytes < tosend)
 		tosend = psock->apply_bytes;
+	eval = __SK_NONE;
 
 	switch (psock->eval) {
 	case __SK_PASS:
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH RESEND bpf 2/4] bpf, sockmap: Fix missing BPF_F_INGRESS flag when using apply_bytes
  2022-11-22  1:58 [PATCH RESEND bpf 0/4] bpf, sockmap: Fix some issues with using apply_bytes Pengcheng Yang
  2022-11-22  1:58 ` [PATCH RESEND bpf 1/4] bpf, sockmap: Fix repeated calls to sock_put() when msg has more_data Pengcheng Yang
@ 2022-11-22  1:58 ` Pengcheng Yang
  2022-11-23  3:02   ` John Fastabend
  2022-11-22  1:58 ` [PATCH RESEND bpf 3/4] bpf, sockmap: Fix data loss caused by using apply_bytes on ingress redirect Pengcheng Yang
  2022-11-22  1:58 ` [PATCH RESEND bpf 4/4] selftests/bpf: Add ingress tests for txmsg with apply_bytes Pengcheng Yang
  3 siblings, 1 reply; 13+ messages in thread
From: Pengcheng Yang @ 2022-11-22  1:58 UTC (permalink / raw)
  To: bpf, netdev, John Fastabend, Daniel Borkmann, Jakub Sitnicki,
	Lorenz Bauer
  Cc: Pengcheng Yang

When redirecting, we use sk_msg_to_ingress() to get the BPF_F_INGRESS
flag from the msg->flags. If apply_bytes is used and it is larger than
the current data being processed, sk_psock_msg_verdict() will not be
called when sendmsg() is called again. At this time, the msg->flags is 0,
and we lost the BPF_F_INGRESS flag.

So we need to save the BPF_F_INGRESS flag in sk_psock and assign it to
msg->flags before redirection.

Fixes: 8934ce2fd081 ("bpf: sockmap redirect ingress support")
Signed-off-by: Pengcheng Yang <yangpc@wangsu.com>
---
 include/linux/skmsg.h | 1 +
 net/core/skmsg.c      | 1 +
 net/ipv4/tcp_bpf.c    | 1 +
 net/tls/tls_sw.c      | 1 +
 4 files changed, 4 insertions(+)

diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index 48f4b64..e1d463f 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -82,6 +82,7 @@ struct sk_psock {
 	u32				apply_bytes;
 	u32				cork_bytes;
 	u32				eval;
+	u32				flags;
 	struct sk_msg			*cork;
 	struct sk_psock_progs		progs;
 #if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 188f855..ab2f8f3 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -888,6 +888,7 @@ int sk_psock_msg_verdict(struct sock *sk, struct sk_psock *psock,
 		if (psock->sk_redir)
 			sock_put(psock->sk_redir);
 		psock->sk_redir = msg->sk_redir;
+		psock->flags = msg->flags;
 		if (!psock->sk_redir) {
 			ret = __SK_DROP;
 			goto out;
diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index ef5de4f..1390d72 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -323,6 +323,7 @@ static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
 		break;
 	case __SK_REDIRECT:
 		sk_redir = psock->sk_redir;
+		msg->flags = psock->flags;
 		sk_msg_apply_bytes(psock, tosend);
 		if (!psock->apply_bytes) {
 			/* Clean up before releasing the sock lock. */
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index fe27241..49e424d 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -838,6 +838,7 @@ static int bpf_exec_tx_verdict(struct sk_msg *msg, struct sock *sk,
 		break;
 	case __SK_REDIRECT:
 		sk_redir = psock->sk_redir;
+		msg->flags = psock->flags;
 		memcpy(&msg_redir, msg, sizeof(*msg));
 		if (msg->apply_bytes < send)
 			msg->apply_bytes = 0;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH RESEND bpf 3/4] bpf, sockmap: Fix data loss caused by using apply_bytes on ingress redirect
  2022-11-22  1:58 [PATCH RESEND bpf 0/4] bpf, sockmap: Fix some issues with using apply_bytes Pengcheng Yang
  2022-11-22  1:58 ` [PATCH RESEND bpf 1/4] bpf, sockmap: Fix repeated calls to sock_put() when msg has more_data Pengcheng Yang
  2022-11-22  1:58 ` [PATCH RESEND bpf 2/4] bpf, sockmap: Fix missing BPF_F_INGRESS flag when using apply_bytes Pengcheng Yang
@ 2022-11-22  1:58 ` Pengcheng Yang
  2022-11-22  1:58 ` [PATCH RESEND bpf 4/4] selftests/bpf: Add ingress tests for txmsg with apply_bytes Pengcheng Yang
  3 siblings, 0 replies; 13+ messages in thread
From: Pengcheng Yang @ 2022-11-22  1:58 UTC (permalink / raw)
  To: bpf, netdev, John Fastabend, Daniel Borkmann, Jakub Sitnicki,
	Lorenz Bauer
  Cc: Pengcheng Yang

Use apply_bytes on ingress redirect, when apply_bytes is less than
the length of msg data, some data may be skipped and lost in
bpf_tcp_ingress().
If there is still data in the scatterlist that has not been consumed,
we cannot move the msg iter.

Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Pengcheng Yang <yangpc@wangsu.com>
---
 net/ipv4/tcp_bpf.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index 1390d72..3cc0346 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -45,8 +45,11 @@ static int bpf_tcp_ingress(struct sock *sk, struct sk_psock *psock,
 		tmp->sg.end = i;
 		if (apply) {
 			apply_bytes -= size;
-			if (!apply_bytes)
+			if (!apply_bytes) {
+				if (sge->length)
+					sk_msg_iter_var_prev(i);
 				break;
+			}
 		}
 	} while (i != msg->sg.end);
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH RESEND bpf 4/4] selftests/bpf: Add ingress tests for txmsg with apply_bytes
  2022-11-22  1:58 [PATCH RESEND bpf 0/4] bpf, sockmap: Fix some issues with using apply_bytes Pengcheng Yang
                   ` (2 preceding siblings ...)
  2022-11-22  1:58 ` [PATCH RESEND bpf 3/4] bpf, sockmap: Fix data loss caused by using apply_bytes on ingress redirect Pengcheng Yang
@ 2022-11-22  1:58 ` Pengcheng Yang
  2022-11-23  3:05   ` John Fastabend
  3 siblings, 1 reply; 13+ messages in thread
From: Pengcheng Yang @ 2022-11-22  1:58 UTC (permalink / raw)
  To: bpf, netdev, John Fastabend, Daniel Borkmann, Jakub Sitnicki,
	Lorenz Bauer
  Cc: Pengcheng Yang

Currently, the ingress redirect is not covered in "txmsg test apply".

Signed-off-by: Pengcheng Yang <yangpc@wangsu.com>
---
 tools/testing/selftests/bpf/test_sockmap.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index 0fbaccd..9bc0cb4 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -1649,24 +1649,42 @@ static void test_txmsg_apply(int cgrp, struct sockmap_options *opt)
 {
 	txmsg_pass = 1;
 	txmsg_redir = 0;
+	txmsg_ingress = 0;
 	txmsg_apply = 1;
 	txmsg_cork = 0;
 	test_send_one(opt, cgrp);
 
 	txmsg_pass = 0;
 	txmsg_redir = 1;
+	txmsg_ingress = 0;
+	txmsg_apply = 1;
+	txmsg_cork = 0;
+	test_send_one(opt, cgrp);
+
+	txmsg_pass = 0;
+	txmsg_redir = 1;
+	txmsg_ingress = 1;
 	txmsg_apply = 1;
 	txmsg_cork = 0;
 	test_send_one(opt, cgrp);
 
 	txmsg_pass = 1;
 	txmsg_redir = 0;
+	txmsg_ingress = 0;
+	txmsg_apply = 1024;
+	txmsg_cork = 0;
+	test_send_large(opt, cgrp);
+
+	txmsg_pass = 0;
+	txmsg_redir = 1;
+	txmsg_ingress = 0;
 	txmsg_apply = 1024;
 	txmsg_cork = 0;
 	test_send_large(opt, cgrp);
 
 	txmsg_pass = 0;
 	txmsg_redir = 1;
+	txmsg_ingress = 1;
 	txmsg_apply = 1024;
 	txmsg_cork = 0;
 	test_send_large(opt, cgrp);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* RE: [PATCH RESEND bpf 1/4] bpf, sockmap: Fix repeated calls to sock_put() when msg has more_data
  2022-11-22  1:58 ` [PATCH RESEND bpf 1/4] bpf, sockmap: Fix repeated calls to sock_put() when msg has more_data Pengcheng Yang
@ 2022-11-23  2:50   ` John Fastabend
  0 siblings, 0 replies; 13+ messages in thread
From: John Fastabend @ 2022-11-23  2:50 UTC (permalink / raw)
  To: Pengcheng Yang, bpf, netdev, John Fastabend, Daniel Borkmann,
	Jakub Sitnicki, Lorenz Bauer
  Cc: Pengcheng Yang

Pengcheng Yang wrote:
> In tcp_bpf_send_verdict() redirection, the eval variable is assigned to
> __SK_REDIRECT after the apply_bytes data is sent, if msg has more_data,
> sock_put() will be called multiple times.
> We should reset the eval variable to __SK_NONE every time more_data
> starts.
> 
> This causes:
> 
> IPv4: Attempt to release TCP socket in state 1 00000000b4c925d7
> ------------[ cut here ]------------
> refcount_t: addition on 0; use-after-free.
> WARNING: CPU: 5 PID: 4482 at lib/refcount.c:25 refcount_warn_saturate+0x7d/0x110
> Modules linked in:
> CPU: 5 PID: 4482 Comm: sockhash_bypass Kdump: loaded Not tainted 6.0.0 #1
> Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014
> Call Trace:
>  <TASK>
>  __tcp_transmit_skb+0xa1b/0xb90
>  ? __alloc_skb+0x8c/0x1a0
>  ? __kmalloc_node_track_caller+0x184/0x320
>  tcp_write_xmit+0x22a/0x1110
>  __tcp_push_pending_frames+0x32/0xf0
>  do_tcp_sendpages+0x62d/0x640
>  tcp_bpf_push+0xae/0x2c0
>  tcp_bpf_sendmsg_redir+0x260/0x410
>  ? preempt_count_add+0x70/0xa0
>  tcp_bpf_send_verdict+0x386/0x4b0
>  tcp_bpf_sendmsg+0x21b/0x3b0
>  sock_sendmsg+0x58/0x70
>  __sys_sendto+0xfa/0x170
>  ? xfd_validate_state+0x1d/0x80
>  ? switch_fpu_return+0x59/0xe0
>  __x64_sys_sendto+0x24/0x30
>  do_syscall_64+0x37/0x90
>  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> 
> Fixes: cd9733f5d75c ("tcp_bpf: Fix one concurrency problem in the tcp_bpf_send_verdict function")
> Signed-off-by: Pengcheng Yang <yangpc@wangsu.com>

Thanks.

Acked-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH RESEND bpf 2/4] bpf, sockmap: Fix missing BPF_F_INGRESS flag when using apply_bytes
  2022-11-22  1:58 ` [PATCH RESEND bpf 2/4] bpf, sockmap: Fix missing BPF_F_INGRESS flag when using apply_bytes Pengcheng Yang
@ 2022-11-23  3:02   ` John Fastabend
  2022-11-23  6:01     ` Pengcheng Yang
  0 siblings, 1 reply; 13+ messages in thread
From: John Fastabend @ 2022-11-23  3:02 UTC (permalink / raw)
  To: Pengcheng Yang, bpf, netdev, John Fastabend, Daniel Borkmann,
	Jakub Sitnicki, Lorenz Bauer
  Cc: Pengcheng Yang

Pengcheng Yang wrote:
> When redirecting, we use sk_msg_to_ingress() to get the BPF_F_INGRESS
> flag from the msg->flags. If apply_bytes is used and it is larger than
> the current data being processed, sk_psock_msg_verdict() will not be
> called when sendmsg() is called again. At this time, the msg->flags is 0,
> and we lost the BPF_F_INGRESS flag.
> 
> So we need to save the BPF_F_INGRESS flag in sk_psock and assign it to
> msg->flags before redirection.
> 
> Fixes: 8934ce2fd081 ("bpf: sockmap redirect ingress support")
> Signed-off-by: Pengcheng Yang <yangpc@wangsu.com>
> ---
>  include/linux/skmsg.h | 1 +
>  net/core/skmsg.c      | 1 +
>  net/ipv4/tcp_bpf.c    | 1 +
>  net/tls/tls_sw.c      | 1 +
>  4 files changed, 4 insertions(+)
> 
> diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
> index 48f4b64..e1d463f 100644
> --- a/include/linux/skmsg.h
> +++ b/include/linux/skmsg.h
> @@ -82,6 +82,7 @@ struct sk_psock {
>  	u32				apply_bytes;
>  	u32				cork_bytes;
>  	u32				eval;
> +	u32				flags;
>  	struct sk_msg			*cork;
>  	struct sk_psock_progs		progs;
>  #if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index 188f855..ab2f8f3 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -888,6 +888,7 @@ int sk_psock_msg_verdict(struct sock *sk, struct sk_psock *psock,
>  		if (psock->sk_redir)
>  			sock_put(psock->sk_redir);
>  		psock->sk_redir = msg->sk_redir;
> +		psock->flags = msg->flags;
>  		if (!psock->sk_redir) {
>  			ret = __SK_DROP;
>  			goto out;
> diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
> index ef5de4f..1390d72 100644
> --- a/net/ipv4/tcp_bpf.c
> +++ b/net/ipv4/tcp_bpf.c
> @@ -323,6 +323,7 @@ static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
>  		break;
>  	case __SK_REDIRECT:
>  		sk_redir = psock->sk_redir;
> +		msg->flags = psock->flags;
>  		sk_msg_apply_bytes(psock, tosend);
>  		if (!psock->apply_bytes) {
>  			/* Clean up before releasing the sock lock. */
                 ^^^^^^^^^^^^^^^
In this block reposted here with the rest of the block


		if (!psock->apply_bytes) {
			/* Clean up before releasing the sock lock. */
			eval = psock->eval;
			psock->eval = __SK_NONE;
			psock->sk_redir = NULL;
		}

Now that we have a psock->flags we should clera that as
well right?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH RESEND bpf 4/4] selftests/bpf: Add ingress tests for txmsg with apply_bytes
  2022-11-22  1:58 ` [PATCH RESEND bpf 4/4] selftests/bpf: Add ingress tests for txmsg with apply_bytes Pengcheng Yang
@ 2022-11-23  3:05   ` John Fastabend
  0 siblings, 0 replies; 13+ messages in thread
From: John Fastabend @ 2022-11-23  3:05 UTC (permalink / raw)
  To: Pengcheng Yang, bpf, netdev, John Fastabend, Daniel Borkmann,
	Jakub Sitnicki, Lorenz Bauer
  Cc: Pengcheng Yang

Pengcheng Yang wrote:
> Currently, the ingress redirect is not covered in "txmsg test apply".
> 
> Signed-off-by: Pengcheng Yang <yangpc@wangsu.com>
> ---

Thanks for adding extra test.

Acked-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RESEND bpf 2/4] bpf, sockmap: Fix missing BPF_F_INGRESS flag when using apply_bytes
  2022-11-23  3:02   ` John Fastabend
@ 2022-11-23  6:01     ` Pengcheng Yang
  2022-11-28 11:22       ` Jakub Sitnicki
  0 siblings, 1 reply; 13+ messages in thread
From: Pengcheng Yang @ 2022-11-23  6:01 UTC (permalink / raw)
  To: 'John Fastabend', bpf, netdev, 'Daniel Borkmann',
	'Jakub Sitnicki', 'Lorenz Bauer'

John Fastabend <john.fastabend@gmail.com> wrote:
> 
> Pengcheng Yang wrote:
> > When redirecting, we use sk_msg_to_ingress() to get the BPF_F_INGRESS
> > flag from the msg->flags. If apply_bytes is used and it is larger than
> > the current data being processed, sk_psock_msg_verdict() will not be
> > called when sendmsg() is called again. At this time, the msg->flags is 0,
> > and we lost the BPF_F_INGRESS flag.
> >
> > So we need to save the BPF_F_INGRESS flag in sk_psock and assign it to
> > msg->flags before redirection.
> >
> > Fixes: 8934ce2fd081 ("bpf: sockmap redirect ingress support")
> > Signed-off-by: Pengcheng Yang <yangpc@wangsu.com>
> > ---
> >  include/linux/skmsg.h | 1 +
> >  net/core/skmsg.c      | 1 +
> >  net/ipv4/tcp_bpf.c    | 1 +
> >  net/tls/tls_sw.c      | 1 +
> >  4 files changed, 4 insertions(+)
> >
> > diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
> > index 48f4b64..e1d463f 100644
> > --- a/include/linux/skmsg.h
> > +++ b/include/linux/skmsg.h
> > @@ -82,6 +82,7 @@ struct sk_psock {
> >  	u32				apply_bytes;
> >  	u32				cork_bytes;
> >  	u32				eval;
> > +	u32				flags;
> >  	struct sk_msg			*cork;
> >  	struct sk_psock_progs		progs;
> >  #if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)
> > diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> > index 188f855..ab2f8f3 100644
> > --- a/net/core/skmsg.c
> > +++ b/net/core/skmsg.c
> > @@ -888,6 +888,7 @@ int sk_psock_msg_verdict(struct sock *sk, struct sk_psock *psock,
> >  		if (psock->sk_redir)
> >  			sock_put(psock->sk_redir);
> >  		psock->sk_redir = msg->sk_redir;
> > +		psock->flags = msg->flags;
> >  		if (!psock->sk_redir) {
> >  			ret = __SK_DROP;
> >  			goto out;
> > diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
> > index ef5de4f..1390d72 100644
> > --- a/net/ipv4/tcp_bpf.c
> > +++ b/net/ipv4/tcp_bpf.c
> > @@ -323,6 +323,7 @@ static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
> >  		break;
> >  	case __SK_REDIRECT:
> >  		sk_redir = psock->sk_redir;
> > +		msg->flags = psock->flags;
> >  		sk_msg_apply_bytes(psock, tosend);
> >  		if (!psock->apply_bytes) {
> >  			/* Clean up before releasing the sock lock. */
>                  ^^^^^^^^^^^^^^^
> In this block reposted here with the rest of the block
> 
> 
> 		if (!psock->apply_bytes) {
> 			/* Clean up before releasing the sock lock. */
> 			eval = psock->eval;
> 			psock->eval = __SK_NONE;
> 			psock->sk_redir = NULL;
> 		}
> 
> Now that we have a psock->flags we should clera that as
> well right?

According to my understanding, it is not necessary (but can) to clear
psock->flags here, because psock->flags will be overwritten by msg->flags
at the beginning of each redirection (in sk_psock_msg_verdict()).


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RESEND bpf 2/4] bpf, sockmap: Fix missing BPF_F_INGRESS flag when using apply_bytes
  2022-11-23  6:01     ` Pengcheng Yang
@ 2022-11-28 11:22       ` Jakub Sitnicki
  2022-11-28 18:18         ` John Fastabend
  2022-11-29  8:02         ` Pengcheng Yang
  0 siblings, 2 replies; 13+ messages in thread
From: Jakub Sitnicki @ 2022-11-28 11:22 UTC (permalink / raw)
  To: Pengcheng Yang
  Cc: 'John Fastabend', bpf, netdev, 'Daniel Borkmann',
	'Lorenz Bauer'

On Wed, Nov 23, 2022 at 02:01 PM +08, Pengcheng Yang wrote:
> John Fastabend <john.fastabend@gmail.com> wrote:
>> 
>> Pengcheng Yang wrote:
>> > When redirecting, we use sk_msg_to_ingress() to get the BPF_F_INGRESS
>> > flag from the msg->flags. If apply_bytes is used and it is larger than
>> > the current data being processed, sk_psock_msg_verdict() will not be
>> > called when sendmsg() is called again. At this time, the msg->flags is 0,
>> > and we lost the BPF_F_INGRESS flag.
>> >
>> > So we need to save the BPF_F_INGRESS flag in sk_psock and assign it to
>> > msg->flags before redirection.
>> >
>> > Fixes: 8934ce2fd081 ("bpf: sockmap redirect ingress support")
>> > Signed-off-by: Pengcheng Yang <yangpc@wangsu.com>
>> > ---
>> >  include/linux/skmsg.h | 1 +
>> >  net/core/skmsg.c      | 1 +
>> >  net/ipv4/tcp_bpf.c    | 1 +
>> >  net/tls/tls_sw.c      | 1 +
>> >  4 files changed, 4 insertions(+)
>> >
>> > diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
>> > index 48f4b64..e1d463f 100644
>> > --- a/include/linux/skmsg.h
>> > +++ b/include/linux/skmsg.h
>> > @@ -82,6 +82,7 @@ struct sk_psock {
>> >  	u32				apply_bytes;
>> >  	u32				cork_bytes;
>> >  	u32				eval;
>> > +	u32				flags;
>> >  	struct sk_msg			*cork;
>> >  	struct sk_psock_progs		progs;
>> >  #if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)
>> > diff --git a/net/core/skmsg.c b/net/core/skmsg.c
>> > index 188f855..ab2f8f3 100644
>> > --- a/net/core/skmsg.c
>> > +++ b/net/core/skmsg.c
>> > @@ -888,6 +888,7 @@ int sk_psock_msg_verdict(struct sock *sk, struct sk_psock *psock,
>> >  		if (psock->sk_redir)
>> >  			sock_put(psock->sk_redir);
>> >  		psock->sk_redir = msg->sk_redir;
>> > +		psock->flags = msg->flags;
>> >  		if (!psock->sk_redir) {
>> >  			ret = __SK_DROP;
>> >  			goto out;
>> > diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
>> > index ef5de4f..1390d72 100644
>> > --- a/net/ipv4/tcp_bpf.c
>> > +++ b/net/ipv4/tcp_bpf.c
>> > @@ -323,6 +323,7 @@ static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
>> >  		break;
>> >  	case __SK_REDIRECT:
>> >  		sk_redir = psock->sk_redir;
>> > +		msg->flags = psock->flags;
>> >  		sk_msg_apply_bytes(psock, tosend);
>> >  		if (!psock->apply_bytes) {
>> >  			/* Clean up before releasing the sock lock. */
>>                  ^^^^^^^^^^^^^^^
>> In this block reposted here with the rest of the block
>> 
>> 
>> 		if (!psock->apply_bytes) {
>> 			/* Clean up before releasing the sock lock. */
>> 			eval = psock->eval;
>> 			psock->eval = __SK_NONE;
>> 			psock->sk_redir = NULL;
>> 		}
>> 
>> Now that we have a psock->flags we should clera that as
>> well right?
>
> According to my understanding, it is not necessary (but can) to clear
> psock->flags here, because psock->flags will be overwritten by msg->flags
> at the beginning of each redirection (in sk_psock_msg_verdict()).

1. We should at least document that psock->flags value can be garbage
   (undefined) if psock->sk_redir is null.

2. 'flags' is amiguous (flags for what?). I'd suggest to rename to
   something like redir_flags.

   Also, since we don't care about all flags, but just the ingress bit,
   we should store just that. It's not about size. Less state passed
   around is easier to reason about. See patch below.

3. Alternatively, I'd turn psock->sk_redir into a tagged pointer, like
   skb->_sk_redir is. This way all state (pointer & flags) is bundled
   and managed together. It would be a bigger change. Would have to be
   split out from this patch set.

--8<--

diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index 70d6cb94e580..84f787416a54 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -82,6 +82,7 @@ struct sk_psock {
 	u32				apply_bytes;
 	u32				cork_bytes;
 	u32				eval;
+	bool				redir_ingress; /* undefined if sk_redir is null */
 	struct sk_msg			*cork;
 	struct sk_psock_progs		progs;
 #if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 14d45661a84d..5b70b241ce71 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2291,8 +2291,8 @@ int tcp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore);
 void tcp_bpf_clone(const struct sock *sk, struct sock *newsk);
 #endif /* CONFIG_BPF_SYSCALL */
 
-int tcp_bpf_sendmsg_redir(struct sock *sk, struct sk_msg *msg, u32 bytes,
-			  int flags);
+int tcp_bpf_sendmsg_redir(struct sock *sk, bool ingress,
+			  struct sk_msg *msg, u32 bytes, int flags);
 #endif /* CONFIG_NET_SOCK_MSG */
 
 #if !defined(CONFIG_BPF_SYSCALL) || !defined(CONFIG_NET_SOCK_MSG)
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index e6b9ced3eda8..53d0251788aa 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -886,13 +886,16 @@ int sk_psock_msg_verdict(struct sock *sk, struct sk_psock *psock,
 	ret = sk_psock_map_verd(ret, msg->sk_redir);
 	psock->apply_bytes = msg->apply_bytes;
 	if (ret == __SK_REDIRECT) {
-		if (psock->sk_redir)
+		if (psock->sk_redir) {
 			sock_put(psock->sk_redir);
-		psock->sk_redir = msg->sk_redir;
-		if (!psock->sk_redir) {
+			psock->sk_redir = NULL;
+		}
+		if (!msg->sk_redir) {
 			ret = __SK_DROP;
 			goto out;
 		}
+		psock->redir_ingress = sk_msg_to_ingress(msg);
+		psock->sk_redir = msg->sk_redir;
 		sock_hold(psock->sk_redir);
 	}
 out:
diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index cf9c3e8f7ccb..490b359dc814 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -131,10 +131,9 @@ static int tcp_bpf_push_locked(struct sock *sk, struct sk_msg *msg,
 	return ret;
 }
 
-int tcp_bpf_sendmsg_redir(struct sock *sk, struct sk_msg *msg,
-			  u32 bytes, int flags)
+int tcp_bpf_sendmsg_redir(struct sock *sk, bool ingress,
+			  struct sk_msg *msg, u32 bytes, int flags)
 {
-	bool ingress = sk_msg_to_ingress(msg);
 	struct sk_psock *psock = sk_psock_get(sk);
 	int ret;
 
@@ -337,7 +336,8 @@ static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
 		release_sock(sk);
 
 		origsize = msg->sg.size;
-		ret = tcp_bpf_sendmsg_redir(sk_redir, msg, tosend, flags);
+		ret = tcp_bpf_sendmsg_redir(sk_redir, psock->redir_ingress,
+					    msg, tosend, flags);
 		sent = origsize - msg->sg.size;
 
 		if (eval == __SK_REDIRECT)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 264cf367e265..b22d97610b9a 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -846,7 +846,8 @@ static int bpf_exec_tx_verdict(struct sk_msg *msg, struct sock *sk,
 		sk_msg_return_zero(sk, msg, send);
 		msg->sg.size -= send;
 		release_sock(sk);
-		err = tcp_bpf_sendmsg_redir(sk_redir, &msg_redir, send, flags);
+		err = tcp_bpf_sendmsg_redir(sk_redir, psock->redir_ingress,
+					    &msg_redir, send, flags);
 		lock_sock(sk);
 		if (err < 0) {
 			*copied -= sk_msg_free_nocharge(sk, &msg_redir);

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH RESEND bpf 2/4] bpf, sockmap: Fix missing BPF_F_INGRESS flag when using apply_bytes
  2022-11-28 11:22       ` Jakub Sitnicki
@ 2022-11-28 18:18         ` John Fastabend
  2022-11-29 19:16           ` Jakub Sitnicki
  2022-11-29  8:02         ` Pengcheng Yang
  1 sibling, 1 reply; 13+ messages in thread
From: John Fastabend @ 2022-11-28 18:18 UTC (permalink / raw)
  To: Jakub Sitnicki, Pengcheng Yang
  Cc: 'John Fastabend', bpf, netdev, 'Daniel Borkmann',
	'Lorenz Bauer'

Jakub Sitnicki wrote:
> On Wed, Nov 23, 2022 at 02:01 PM +08, Pengcheng Yang wrote:
> > John Fastabend <john.fastabend@gmail.com> wrote:
> >> 
> >> Pengcheng Yang wrote:
> >> > When redirecting, we use sk_msg_to_ingress() to get the BPF_F_INGRESS
> >> > flag from the msg->flags. If apply_bytes is used and it is larger than
> >> > the current data being processed, sk_psock_msg_verdict() will not be
> >> > called when sendmsg() is called again. At this time, the msg->flags is 0,
> >> > and we lost the BPF_F_INGRESS flag.
> >> >
> >> > So we need to save the BPF_F_INGRESS flag in sk_psock and assign it to
> >> > msg->flags before redirection.
> >> >
> >> > Fixes: 8934ce2fd081 ("bpf: sockmap redirect ingress support")
> >> > Signed-off-by: Pengcheng Yang <yangpc@wangsu.com>
> >> > ---
> >> >  include/linux/skmsg.h | 1 +
> >> >  net/core/skmsg.c      | 1 +
> >> >  net/ipv4/tcp_bpf.c    | 1 +
> >> >  net/tls/tls_sw.c      | 1 +
> >> >  4 files changed, 4 insertions(+)
> >> >
> >> > diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
> >> > index 48f4b64..e1d463f 100644
> >> > --- a/include/linux/skmsg.h
> >> > +++ b/include/linux/skmsg.h
> >> > @@ -82,6 +82,7 @@ struct sk_psock {
> >> >  	u32				apply_bytes;
> >> >  	u32				cork_bytes;
> >> >  	u32				eval;
> >> > +	u32				flags;
> >> >  	struct sk_msg			*cork;
> >> >  	struct sk_psock_progs		progs;
> >> >  #if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)
> >> > diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> >> > index 188f855..ab2f8f3 100644
> >> > --- a/net/core/skmsg.c
> >> > +++ b/net/core/skmsg.c
> >> > @@ -888,6 +888,7 @@ int sk_psock_msg_verdict(struct sock *sk, struct sk_psock *psock,
> >> >  		if (psock->sk_redir)
> >> >  			sock_put(psock->sk_redir);
> >> >  		psock->sk_redir = msg->sk_redir;
> >> > +		psock->flags = msg->flags;
> >> >  		if (!psock->sk_redir) {
> >> >  			ret = __SK_DROP;
> >> >  			goto out;
> >> > diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
> >> > index ef5de4f..1390d72 100644
> >> > --- a/net/ipv4/tcp_bpf.c
> >> > +++ b/net/ipv4/tcp_bpf.c
> >> > @@ -323,6 +323,7 @@ static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
> >> >  		break;
> >> >  	case __SK_REDIRECT:
> >> >  		sk_redir = psock->sk_redir;
> >> > +		msg->flags = psock->flags;
> >> >  		sk_msg_apply_bytes(psock, tosend);
> >> >  		if (!psock->apply_bytes) {
> >> >  			/* Clean up before releasing the sock lock. */
> >>                  ^^^^^^^^^^^^^^^
> >> In this block reposted here with the rest of the block
> >> 
> >> 
> >> 		if (!psock->apply_bytes) {
> >> 			/* Clean up before releasing the sock lock. */
> >> 			eval = psock->eval;
> >> 			psock->eval = __SK_NONE;
> >> 			psock->sk_redir = NULL;
> >> 		}
> >> 
> >> Now that we have a psock->flags we should clera that as
> >> well right?
> >
> > According to my understanding, it is not necessary (but can) to clear
> > psock->flags here, because psock->flags will be overwritten by msg->flags
> > at the beginning of each redirection (in sk_psock_msg_verdict()).
> 
> 1. We should at least document that psock->flags value can be garbage
>    (undefined) if psock->sk_redir is null.

Per v2 I think we should not have garbage flags. Just zero the flags
field no point in saving a single insn here IMO.

> 
> 2. 'flags' is amiguous (flags for what?). I'd suggest to rename to
>    something like redir_flags.
> 
>    Also, since we don't care about all flags, but just the ingress bit,
>    we should store just that. It's not about size. Less state passed
>    around is easier to reason about. See patch below.

rename makes sense to me.

> 
> 3. Alternatively, I'd turn psock->sk_redir into a tagged pointer, like
>    skb->_sk_redir is. This way all state (pointer & flags) is bundled
>    and managed together. It would be a bigger change. Would have to be
>    split out from this patch set.
> 
> --8<--
> 
> diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
> index 70d6cb94e580..84f787416a54 100644
> --- a/include/linux/skmsg.h
> +++ b/include/linux/skmsg.h
> @@ -82,6 +82,7 @@ struct sk_psock {
>  	u32				apply_bytes;
>  	u32				cork_bytes;
>  	u32				eval;
> +	bool				redir_ingress; /* undefined if sk_redir is null */
>  	struct sk_msg			*cork;
>  	struct sk_psock_progs		progs;
>  #if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 14d45661a84d..5b70b241ce71 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -2291,8 +2291,8 @@ int tcp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore);
>  void tcp_bpf_clone(const struct sock *sk, struct sock *newsk);
>  #endif /* CONFIG_BPF_SYSCALL */
>  
> -int tcp_bpf_sendmsg_redir(struct sock *sk, struct sk_msg *msg, u32 bytes,
> -			  int flags);
> +int tcp_bpf_sendmsg_redir(struct sock *sk, bool ingress,
> +			  struct sk_msg *msg, u32 bytes, int flags);
>  #endif /* CONFIG_NET_SOCK_MSG */
>  
>  #if !defined(CONFIG_BPF_SYSCALL) || !defined(CONFIG_NET_SOCK_MSG)
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index e6b9ced3eda8..53d0251788aa 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -886,13 +886,16 @@ int sk_psock_msg_verdict(struct sock *sk, struct sk_psock *psock,
>  	ret = sk_psock_map_verd(ret, msg->sk_redir);
>  	psock->apply_bytes = msg->apply_bytes;
>  	if (ret == __SK_REDIRECT) {
> -		if (psock->sk_redir)
> +		if (psock->sk_redir) {
>  			sock_put(psock->sk_redir);
> -		psock->sk_redir = msg->sk_redir;
> -		if (!psock->sk_redir) {
> +			psock->sk_redir = NULL;
> +		}
> +		if (!msg->sk_redir) {
>  			ret = __SK_DROP;
>  			goto out;
>  		}
> +		psock->redir_ingress = sk_msg_to_ingress(msg);
> +		psock->sk_redir = msg->sk_redir;
>  		sock_hold(psock->sk_redir);
>  	}
>  out:
> diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
> index cf9c3e8f7ccb..490b359dc814 100644
> --- a/net/ipv4/tcp_bpf.c
> +++ b/net/ipv4/tcp_bpf.c
> @@ -131,10 +131,9 @@ static int tcp_bpf_push_locked(struct sock *sk, struct sk_msg *msg,
>  	return ret;
>  }
>  
> -int tcp_bpf_sendmsg_redir(struct sock *sk, struct sk_msg *msg,
> -			  u32 bytes, int flags)
> +int tcp_bpf_sendmsg_redir(struct sock *sk, bool ingress,
> +			  struct sk_msg *msg, u32 bytes, int flags)
>  {
> -	bool ingress = sk_msg_to_ingress(msg);
>  	struct sk_psock *psock = sk_psock_get(sk);
>  	int ret;
>  
> @@ -337,7 +336,8 @@ static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
>  		release_sock(sk);
>  
>  		origsize = msg->sg.size;
> -		ret = tcp_bpf_sendmsg_redir(sk_redir, msg, tosend, flags);
> +		ret = tcp_bpf_sendmsg_redir(sk_redir, psock->redir_ingress,
> +					    msg, tosend, flags);
>  		sent = origsize - msg->sg.size;
>  
>  		if (eval == __SK_REDIRECT)
> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index 264cf367e265..b22d97610b9a 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -846,7 +846,8 @@ static int bpf_exec_tx_verdict(struct sk_msg *msg, struct sock *sk,
>  		sk_msg_return_zero(sk, msg, send);
>  		msg->sg.size -= send;
>  		release_sock(sk);
> -		err = tcp_bpf_sendmsg_redir(sk_redir, &msg_redir, send, flags);
> +		err = tcp_bpf_sendmsg_redir(sk_redir, psock->redir_ingress,
> +					    &msg_redir, send, flags);
>  		lock_sock(sk);
>  		if (err < 0) {
>  			*copied -= sk_msg_free_nocharge(sk, &msg_redir);



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RESEND bpf 2/4] bpf, sockmap: Fix missing BPF_F_INGRESS flag when using apply_bytes
  2022-11-28 11:22       ` Jakub Sitnicki
  2022-11-28 18:18         ` John Fastabend
@ 2022-11-29  8:02         ` Pengcheng Yang
  1 sibling, 0 replies; 13+ messages in thread
From: Pengcheng Yang @ 2022-11-29  8:02 UTC (permalink / raw)
  To: 'Jakub Sitnicki'
  Cc: 'John Fastabend', bpf, netdev, 'Daniel Borkmann',
	'Lorenz Bauer'

Jakub Sitnicki <jakub@cloudflare.com> wrote:
> On Wed, Nov 23, 2022 at 02:01 PM +08, Pengcheng Yang wrote:
> > John Fastabend <john.fastabend@gmail.com> wrote:
> >>
> >> 		if (!psock->apply_bytes) {
> >> 			/* Clean up before releasing the sock lock. */
> >> 			eval = psock->eval;
> >> 			psock->eval = __SK_NONE;
> >> 			psock->sk_redir = NULL;
> >> 		}
> >>
> >> Now that we have a psock->flags we should clera that as
> >> well right?
> >
> > According to my understanding, it is not necessary (but can) to clear
> > psock->flags here, because psock->flags will be overwritten by msg->flags
> > at the beginning of each redirection (in sk_psock_msg_verdict()).
> 
> 1. We should at least document that psock->flags value can be garbage
>    (undefined) if psock->sk_redir is null.
> 
> 2. 'flags' is amiguous (flags for what?). I'd suggest to rename to
>    something like redir_flags.
> 
>    Also, since we don't care about all flags, but just the ingress bit,
>    we should store just that. It's not about size. Less state passed
>    around is easier to reason about. See patch below.
> 
> 3. Alternatively, I'd turn psock->sk_redir into a tagged pointer, like
>    skb->_sk_redir is. This way all state (pointer & flags) is bundled
>    and managed together. It would be a bigger change. Would have to be
>    split out from this patch set.
> 
> --8<--
> 
> diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
> index 70d6cb94e580..84f787416a54 100644
> --- a/include/linux/skmsg.h
> +++ b/include/linux/skmsg.h
> @@ -82,6 +82,7 @@ struct sk_psock {
>  	u32				apply_bytes;
>  	u32				cork_bytes;
>  	u32				eval;
> +	bool				redir_ingress; /* undefined if sk_redir is null */
>  	struct sk_msg			*cork;
>  	struct sk_psock_progs		progs;
>  #if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 14d45661a84d..5b70b241ce71 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -2291,8 +2291,8 @@ int tcp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore);
>  void tcp_bpf_clone(const struct sock *sk, struct sock *newsk);
>  #endif /* CONFIG_BPF_SYSCALL */
> 
> -int tcp_bpf_sendmsg_redir(struct sock *sk, struct sk_msg *msg, u32 bytes,
> -			  int flags);
> +int tcp_bpf_sendmsg_redir(struct sock *sk, bool ingress,
> +			  struct sk_msg *msg, u32 bytes, int flags);
>  #endif /* CONFIG_NET_SOCK_MSG */
> 
>  #if !defined(CONFIG_BPF_SYSCALL) || !defined(CONFIG_NET_SOCK_MSG)
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index e6b9ced3eda8..53d0251788aa 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -886,13 +886,16 @@ int sk_psock_msg_verdict(struct sock *sk, struct sk_psock *psock,
>  	ret = sk_psock_map_verd(ret, msg->sk_redir);
>  	psock->apply_bytes = msg->apply_bytes;
>  	if (ret == __SK_REDIRECT) {
> -		if (psock->sk_redir)
> +		if (psock->sk_redir) {
>  			sock_put(psock->sk_redir);
> -		psock->sk_redir = msg->sk_redir;
> -		if (!psock->sk_redir) {
> +			psock->sk_redir = NULL;
> +		}
> +		if (!msg->sk_redir) {
>  			ret = __SK_DROP;
>  			goto out;
>  		}
> +		psock->redir_ingress = sk_msg_to_ingress(msg);
> +		psock->sk_redir = msg->sk_redir;
>  		sock_hold(psock->sk_redir);
>  	}
>  out:
> diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
> index cf9c3e8f7ccb..490b359dc814 100644
> --- a/net/ipv4/tcp_bpf.c
> +++ b/net/ipv4/tcp_bpf.c
> @@ -131,10 +131,9 @@ static int tcp_bpf_push_locked(struct sock *sk, struct sk_msg *msg,
>  	return ret;
>  }
> 
> -int tcp_bpf_sendmsg_redir(struct sock *sk, struct sk_msg *msg,
> -			  u32 bytes, int flags)
> +int tcp_bpf_sendmsg_redir(struct sock *sk, bool ingress,
> +			  struct sk_msg *msg, u32 bytes, int flags)
>  {
> -	bool ingress = sk_msg_to_ingress(msg);
>  	struct sk_psock *psock = sk_psock_get(sk);
>  	int ret;
> 
> @@ -337,7 +336,8 @@ static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
>  		release_sock(sk);
> 
>  		origsize = msg->sg.size;
> -		ret = tcp_bpf_sendmsg_redir(sk_redir, msg, tosend, flags);
> +		ret = tcp_bpf_sendmsg_redir(sk_redir, psock->redir_ingress,
                                       ^^^^^^^
Thanks for such detailed advice.
Here it looks like we need to pre-save the redir_ingress before
setting psock->sk_redir to NULL and release_sock.

> +					    msg, tosend, flags);
>  		sent = origsize - msg->sg.size;
> 
>  		if (eval == __SK_REDIRECT)
> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index 264cf367e265..b22d97610b9a 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -846,7 +846,8 @@ static int bpf_exec_tx_verdict(struct sk_msg *msg, struct sock *sk,
>  		sk_msg_return_zero(sk, msg, send);
>  		msg->sg.size -= send;
>  		release_sock(sk);
> -		err = tcp_bpf_sendmsg_redir(sk_redir, &msg_redir, send, flags);
> +		err = tcp_bpf_sendmsg_redir(sk_redir, psock->redir_ingress,
> +					    &msg_redir, send, flags);
>  		lock_sock(sk);
>  		if (err < 0) {
>  			*copied -= sk_msg_free_nocharge(sk, &msg_redir);


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RESEND bpf 2/4] bpf, sockmap: Fix missing BPF_F_INGRESS flag when using apply_bytes
  2022-11-28 18:18         ` John Fastabend
@ 2022-11-29 19:16           ` Jakub Sitnicki
  0 siblings, 0 replies; 13+ messages in thread
From: Jakub Sitnicki @ 2022-11-29 19:16 UTC (permalink / raw)
  To: John Fastabend
  Cc: Pengcheng Yang, bpf, netdev, 'Daniel Borkmann',
	'Lorenz Bauer'


On Mon, Nov 28, 2022 at 10:18 AM -08, John Fastabend wrote:
> Jakub Sitnicki wrote:
>> On Wed, Nov 23, 2022 at 02:01 PM +08, Pengcheng Yang wrote:
>> > John Fastabend <john.fastabend@gmail.com> wrote:

[...]

>> >> Now that we have a psock->flags we should clera that as
>> >> well right?
>> >
>> > According to my understanding, it is not necessary (but can) to clear
>> > psock->flags here, because psock->flags will be overwritten by msg->flags
>> > at the beginning of each redirection (in sk_psock_msg_verdict()).
>> 
>> 1. We should at least document that psock->flags value can be garbage
>>    (undefined) if psock->sk_redir is null.
>
> Per v2 I think we should not have garbage flags. Just zero the flags
> field no point in saving a single insn here IMO.

It would make sense to me if zero was not a valid value here. But since
it signifies "redirect to egress", we won't be able to tell if it has
been reset anyway.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-11-29 19:25 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-22  1:58 [PATCH RESEND bpf 0/4] bpf, sockmap: Fix some issues with using apply_bytes Pengcheng Yang
2022-11-22  1:58 ` [PATCH RESEND bpf 1/4] bpf, sockmap: Fix repeated calls to sock_put() when msg has more_data Pengcheng Yang
2022-11-23  2:50   ` John Fastabend
2022-11-22  1:58 ` [PATCH RESEND bpf 2/4] bpf, sockmap: Fix missing BPF_F_INGRESS flag when using apply_bytes Pengcheng Yang
2022-11-23  3:02   ` John Fastabend
2022-11-23  6:01     ` Pengcheng Yang
2022-11-28 11:22       ` Jakub Sitnicki
2022-11-28 18:18         ` John Fastabend
2022-11-29 19:16           ` Jakub Sitnicki
2022-11-29  8:02         ` Pengcheng Yang
2022-11-22  1:58 ` [PATCH RESEND bpf 3/4] bpf, sockmap: Fix data loss caused by using apply_bytes on ingress redirect Pengcheng Yang
2022-11-22  1:58 ` [PATCH RESEND bpf 4/4] selftests/bpf: Add ingress tests for txmsg with apply_bytes Pengcheng Yang
2022-11-23  3:05   ` John Fastabend

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.