netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [bpf-next PATCH 0/2] Add skb_adjust_room() for SK_SKB
@ 2020-09-26  4:26 John Fastabend
  2020-09-26  4:27 ` [bpf-next PATCH 1/2] bpf, sockmap: add skb_adjust_room to pop bytes off ingress payload John Fastabend
  2020-09-26  4:27 ` [bpf-next PATCH 2/2] bpf, sockmap: update selftests to use skb_adjust_room John Fastabend
  0 siblings, 2 replies; 7+ messages in thread
From: John Fastabend @ 2020-09-26  4:26 UTC (permalink / raw)
  To: ast, daniel; +Cc: netdev, john.fastabend, jakub, lmb

This implements the helper skb_adjust_room() for BPF_SKS_BK_STREAM_VERDICT
programs so we can push/pop headers from the data on recieve. The
obvious use case is to pop TLS headers of kTLS packets.

The first patch implements the helper and the second updates test_sockmap
to use it removing some case handling we had to do earlier to account for
the TLS headers in the kTLS case.

I have a couple more series to flush off my stack then I'll work on
modernizing the test_sockmap tests themselves. It was created before
global data and a few other nice things so its a bit more verbose
than necessary.

Thanks,
John

---

John Fastabend (2):
      bpf, sockmap: add skb_adjust_room to pop bytes off ingress payload
      bpf, sockmap: update selftests to use skb_adjust_room


 net/core/filter.c                                  |   51 ++++++++++++++++++++
 .../selftests/bpf/progs/test_sockmap_kern.h        |   34 ++++++++++---
 tools/testing/selftests/bpf/test_sockmap.c         |   27 ++---------
 3 files changed, 82 insertions(+), 30 deletions(-)

--
Signature

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [bpf-next PATCH 1/2] bpf, sockmap: add skb_adjust_room to pop bytes off ingress payload
  2020-09-26  4:26 [bpf-next PATCH 0/2] Add skb_adjust_room() for SK_SKB John Fastabend
@ 2020-09-26  4:27 ` John Fastabend
  2020-09-29 14:59   ` Daniel Borkmann
  2020-09-30  9:55   ` Jakub Sitnicki
  2020-09-26  4:27 ` [bpf-next PATCH 2/2] bpf, sockmap: update selftests to use skb_adjust_room John Fastabend
  1 sibling, 2 replies; 7+ messages in thread
From: John Fastabend @ 2020-09-26  4:27 UTC (permalink / raw)
  To: ast, daniel; +Cc: netdev, john.fastabend, jakub, lmb

This implements a new helper skb_adjust_room() so users can push/pop
extra bytes from a BPF_SK_SKB_STREAM_VERDICT program.

Some protocols may include headers and other information that we may
not want to include when doing a redirect from a BPF_SK_SKB_STREAM_VERDICT
program. One use case is to redirect TLS packets into a receive socket
that doesn't expect TLS data. In TLS case the first 13B or so contain the
protocol header. With KTLS the payload is decrypted so we should be able
to redirect this to a receiving socket, but the receiving socket may not
be expecting to receive a TLS header and discard the data. Using the
above helper we can pop the header off and put an appropriate header on
the payload. This allows for creating a proxy between protocols without
extra hops through the stack or userspace.

So in order to fix this case add skb_adjust_room() so users can strip the
header. After this the user can strip the header and an unmodified receiver
thread will work correctly when data is redirected into the ingress path
of a sock.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 net/core/filter.c |   51 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index 4d8dc7a31a78..d232358f1dcd 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -76,6 +76,7 @@
 #include <net/bpf_sk_storage.h>
 #include <net/transp_v6.h>
 #include <linux/btf_ids.h>
+#include <net/tls.h>
 
 static const struct bpf_func_proto *
 bpf_sk_base_func_proto(enum bpf_func_id func_id);
@@ -3218,6 +3219,53 @@ static u32 __bpf_skb_max_len(const struct sk_buff *skb)
 			  SKB_MAX_ALLOC;
 }
 
+BPF_CALL_4(sk_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
+	   u32, mode, u64, flags)
+{
+	unsigned int len_diff_abs = abs(len_diff);
+	bool shrink = len_diff < 0;
+	int ret = 0;
+
+	if (unlikely(flags))
+		return -EINVAL;
+	if (unlikely(len_diff_abs > 0xfffU))
+		return -EFAULT;
+
+	if (!shrink) {
+		unsigned int grow = len_diff;
+
+		ret = skb_cow(skb, grow);
+		if (likely(!ret)) {
+			__skb_push(skb, len_diff_abs);
+			memset(skb->data, 0, len_diff_abs);
+		}
+	} else {
+		/* skb_ensure_writable() is not needed here, as we're
+		 * already working on an uncloned skb.
+		 */
+		if (unlikely(!pskb_may_pull(skb, len_diff_abs)))
+			return -ENOMEM;
+		__skb_pull(skb, len_diff_abs);
+	}
+	bpf_compute_data_end_sk_skb(skb);
+	if (tls_sw_has_ctx_rx(skb->sk)) {
+		struct strp_msg *rxm = strp_msg(skb);
+
+		rxm->full_len += len_diff;
+	}
+	return ret;
+}
+
+static const struct bpf_func_proto sk_skb_adjust_room_proto = {
+	.func		= sk_skb_adjust_room,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_ANYTHING,
+	.arg3_type	= ARG_ANYTHING,
+	.arg4_type	= ARG_ANYTHING,
+};
+
 BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
 	   u32, mode, u64, flags)
 {
@@ -6483,6 +6531,7 @@ bool bpf_helper_changes_pkt_data(void *func)
 	    func == bpf_skb_change_tail ||
 	    func == sk_skb_change_tail ||
 	    func == bpf_skb_adjust_room ||
+	    func == sk_skb_adjust_room ||
 	    func == bpf_skb_pull_data ||
 	    func == sk_skb_pull_data ||
 	    func == bpf_clone_redirect ||
@@ -6950,6 +6999,8 @@ sk_skb_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &sk_skb_change_tail_proto;
 	case BPF_FUNC_skb_change_head:
 		return &sk_skb_change_head_proto;
+	case BPF_FUNC_skb_adjust_room:
+		return &sk_skb_adjust_room_proto;
 	case BPF_FUNC_get_socket_cookie:
 		return &bpf_get_socket_cookie_proto;
 	case BPF_FUNC_get_socket_uid:


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [bpf-next PATCH 2/2] bpf, sockmap: update selftests to use skb_adjust_room
  2020-09-26  4:26 [bpf-next PATCH 0/2] Add skb_adjust_room() for SK_SKB John Fastabend
  2020-09-26  4:27 ` [bpf-next PATCH 1/2] bpf, sockmap: add skb_adjust_room to pop bytes off ingress payload John Fastabend
@ 2020-09-26  4:27 ` John Fastabend
  1 sibling, 0 replies; 7+ messages in thread
From: John Fastabend @ 2020-09-26  4:27 UTC (permalink / raw)
  To: ast, daniel; +Cc: netdev, john.fastabend, jakub, lmb

Instead of working around TLS headers in sockmap selftests use the
new skb_adjust_room helper. This allows us to avoid special casing
the receive side to skip headers.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 .../selftests/bpf/progs/test_sockmap_kern.h        |   34 +++++++++++++++-----
 tools/testing/selftests/bpf/test_sockmap.c         |   27 ++++------------
 2 files changed, 31 insertions(+), 30 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/test_sockmap_kern.h b/tools/testing/selftests/bpf/progs/test_sockmap_kern.h
index 3dca4c2e2418..1858435de7aa 100644
--- a/tools/testing/selftests/bpf/progs/test_sockmap_kern.h
+++ b/tools/testing/selftests/bpf/progs/test_sockmap_kern.h
@@ -131,39 +131,55 @@ int bpf_prog2(struct __sk_buff *skb)
 
 }
 
-SEC("sk_skb3")
-int bpf_prog3(struct __sk_buff *skb)
+static inline void bpf_write_pass(struct __sk_buff *skb, int offset)
 {
-	const int one = 1;
-	int err, *f, ret = SK_PASS;
+	int err = bpf_skb_pull_data(skb, 6 + offset);
 	void *data_end;
 	char *c;
 
-	err = bpf_skb_pull_data(skb, 19);
 	if (err)
-		goto tls_out;
+		return;
 
 	c = (char *)(long)skb->data;
 	data_end = (void *)(long)skb->data_end;
 
-	if (c + 18 < data_end)
-		memcpy(&c[13], "PASS", 4);
+	if (c + 5 + offset < data_end)
+		memcpy(c + offset, "PASS", 4);
+}
+
+SEC("sk_skb3")
+int bpf_prog3(struct __sk_buff *skb)
+{
+	int err, *f, ret = SK_PASS;
+	const int one = 1;
+
 	f = bpf_map_lookup_elem(&sock_skb_opts, &one);
 	if (f && *f) {
 		__u64 flags = 0;
 
 		ret = 0;
 		flags = *f;
+
+		err = bpf_skb_adjust_room(skb, -13, 0, 0);
+		if (err)
+			return SK_DROP;
+		err = bpf_skb_adjust_room(skb, 4, 0, 0);
+		if (err)
+			return SK_DROP;
+		bpf_write_pass(skb, 0);
 #ifdef SOCKMAP
 		return bpf_sk_redirect_map(skb, &tls_sock_map, ret, flags);
 #else
 		return bpf_sk_redirect_hash(skb, &tls_sock_map, &ret, flags);
 #endif
 	}
-
 	f = bpf_map_lookup_elem(&sock_skb_opts, &one);
 	if (f && *f)
 		ret = SK_DROP;
+	err = bpf_skb_adjust_room(skb, 4, 0, 0);
+	if (err)
+		return SK_DROP;
+	bpf_write_pass(skb, 13);
 tls_out:
 	return ret;
 }
diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index 9b6fb00dc7a0..5cf45455de42 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -518,28 +518,13 @@ static int msg_verify_data(struct msghdr *msg, int size, int chunk_sz)
 		if (i == 0 && txmsg_ktls_skb) {
 			if (msg->msg_iov[i].iov_len < 4)
 				return -EIO;
-			if (txmsg_ktls_skb_redir) {
-				if (memcmp(&d[13], "PASS", 4) != 0) {
-					fprintf(stderr,
-						"detected redirect ktls_skb data error with skb ingress update @iov[%i]:%i \"%02x %02x %02x %02x\" != \"PASS\"\n", i, 0, d[13], d[14], d[15], d[16]);
-					return -EIO;
-				}
-				d[13] = 0;
-				d[14] = 1;
-				d[15] = 2;
-				d[16] = 3;
-				j = 13;
-			} else if (txmsg_ktls_skb) {
-				if (memcmp(d, "PASS", 4) != 0) {
-					fprintf(stderr,
-						"detected ktls_skb data error with skb ingress update @iov[%i]:%i \"%02x %02x %02x %02x\" != \"PASS\"\n", i, 0, d[0], d[1], d[2], d[3]);
-					return -EIO;
-				}
-				d[0] = 0;
-				d[1] = 1;
-				d[2] = 2;
-				d[3] = 3;
+			if (memcmp(d, "PASS", 4) != 0) {
+				fprintf(stderr,
+					"detected skb data error with skb ingress update @iov[%i]:%i \"%02x %02x %02x %02x\" != \"PASS\"\n",
+					i, 0, d[0], d[1], d[2], d[3]);
+				return -EIO;
 			}
+			j = 4; /* advance index past PASS header */
 		}
 
 		for (; j < msg->msg_iov[i].iov_len && size; j++) {


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [bpf-next PATCH 1/2] bpf, sockmap: add skb_adjust_room to pop bytes off ingress payload
  2020-09-26  4:27 ` [bpf-next PATCH 1/2] bpf, sockmap: add skb_adjust_room to pop bytes off ingress payload John Fastabend
@ 2020-09-29 14:59   ` Daniel Borkmann
  2020-09-29 15:41     ` John Fastabend
  2020-09-30  9:55   ` Jakub Sitnicki
  1 sibling, 1 reply; 7+ messages in thread
From: Daniel Borkmann @ 2020-09-29 14:59 UTC (permalink / raw)
  To: John Fastabend, ast; +Cc: netdev, jakub, lmb

On 9/26/20 6:27 AM, John Fastabend wrote:
> This implements a new helper skb_adjust_room() so users can push/pop
> extra bytes from a BPF_SK_SKB_STREAM_VERDICT program.
> 
> Some protocols may include headers and other information that we may
> not want to include when doing a redirect from a BPF_SK_SKB_STREAM_VERDICT
> program. One use case is to redirect TLS packets into a receive socket
> that doesn't expect TLS data. In TLS case the first 13B or so contain the
> protocol header. With KTLS the payload is decrypted so we should be able
> to redirect this to a receiving socket, but the receiving socket may not
> be expecting to receive a TLS header and discard the data. Using the
> above helper we can pop the header off and put an appropriate header on
> the payload. This allows for creating a proxy between protocols without
> extra hops through the stack or userspace.
> 
> So in order to fix this case add skb_adjust_room() so users can strip the
> header. After this the user can strip the header and an unmodified receiver
> thread will work correctly when data is redirected into the ingress path
> of a sock.
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> ---
>   net/core/filter.c |   51 +++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 51 insertions(+)
> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 4d8dc7a31a78..d232358f1dcd 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -76,6 +76,7 @@
>   #include <net/bpf_sk_storage.h>
>   #include <net/transp_v6.h>
>   #include <linux/btf_ids.h>
> +#include <net/tls.h>
>   
>   static const struct bpf_func_proto *
>   bpf_sk_base_func_proto(enum bpf_func_id func_id);
> @@ -3218,6 +3219,53 @@ static u32 __bpf_skb_max_len(const struct sk_buff *skb)
>   			  SKB_MAX_ALLOC;
>   }
>   
> +BPF_CALL_4(sk_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
> +	   u32, mode, u64, flags)
> +{
> +	unsigned int len_diff_abs = abs(len_diff);

small nit: u32

> +	bool shrink = len_diff < 0;
> +	int ret = 0;
> +
> +	if (unlikely(flags))
> +		return -EINVAL;

Parameter 'mode' is not used here, I guess we need to reject anything non-zero?

Similarly, any interaction wrt bpf_csum_level() that was needed back then for the
bpf_skb_adjust_room()?

> +	if (unlikely(len_diff_abs > 0xfffU))
> +		return -EFAULT;
> +
> +	if (!shrink) {
> +		unsigned int grow = len_diff;

nit: u32 or just directly len_diff?

> +		ret = skb_cow(skb, grow);
> +		if (likely(!ret)) {
> +			__skb_push(skb, len_diff_abs);
> +			memset(skb->data, 0, len_diff_abs);
> +		}
> +	} else {
> +		/* skb_ensure_writable() is not needed here, as we're
> +		 * already working on an uncloned skb.
> +		 */
> +		if (unlikely(!pskb_may_pull(skb, len_diff_abs)))
> +			return -ENOMEM;
> +		__skb_pull(skb, len_diff_abs);
> +	}
> +	bpf_compute_data_end_sk_skb(skb);
> +	if (tls_sw_has_ctx_rx(skb->sk)) {
> +		struct strp_msg *rxm = strp_msg(skb);
> +
> +		rxm->full_len += len_diff;

If skb_cow() failed, we still adjust rxm->full_len?

> +	}
> +	return ret;
> +}
> +
> +static const struct bpf_func_proto sk_skb_adjust_room_proto = {
> +	.func		= sk_skb_adjust_room,
> +	.gpl_only	= false,
> +	.ret_type	= RET_INTEGER,
> +	.arg1_type	= ARG_PTR_TO_CTX,
> +	.arg2_type	= ARG_ANYTHING,
> +	.arg3_type	= ARG_ANYTHING,
> +	.arg4_type	= ARG_ANYTHING,
> +};
> +
>   BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
>   	   u32, mode, u64, flags)
>   {
> @@ -6483,6 +6531,7 @@ bool bpf_helper_changes_pkt_data(void *func)
>   	    func == bpf_skb_change_tail ||
>   	    func == sk_skb_change_tail ||
>   	    func == bpf_skb_adjust_room ||
> +	    func == sk_skb_adjust_room ||
>   	    func == bpf_skb_pull_data ||
>   	    func == sk_skb_pull_data ||
>   	    func == bpf_clone_redirect ||
> @@ -6950,6 +6999,8 @@ sk_skb_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>   		return &sk_skb_change_tail_proto;
>   	case BPF_FUNC_skb_change_head:
>   		return &sk_skb_change_head_proto;
> +	case BPF_FUNC_skb_adjust_room:
> +		return &sk_skb_adjust_room_proto;
>   	case BPF_FUNC_get_socket_cookie:
>   		return &bpf_get_socket_cookie_proto;
>   	case BPF_FUNC_get_socket_uid:
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bpf-next PATCH 1/2] bpf, sockmap: add skb_adjust_room to pop bytes off ingress payload
  2020-09-29 14:59   ` Daniel Borkmann
@ 2020-09-29 15:41     ` John Fastabend
  0 siblings, 0 replies; 7+ messages in thread
From: John Fastabend @ 2020-09-29 15:41 UTC (permalink / raw)
  To: Daniel Borkmann, John Fastabend, ast; +Cc: netdev, jakub, lmb

Daniel Borkmann wrote:
> On 9/26/20 6:27 AM, John Fastabend wrote:
> > This implements a new helper skb_adjust_room() so users can push/pop
> > extra bytes from a BPF_SK_SKB_STREAM_VERDICT program.
> > 
> > Some protocols may include headers and other information that we may
> > not want to include when doing a redirect from a BPF_SK_SKB_STREAM_VERDICT
> > program. One use case is to redirect TLS packets into a receive socket
> > that doesn't expect TLS data. In TLS case the first 13B or so contain the
> > protocol header. With KTLS the payload is decrypted so we should be able
> > to redirect this to a receiving socket, but the receiving socket may not
> > be expecting to receive a TLS header and discard the data. Using the
> > above helper we can pop the header off and put an appropriate header on
> > the payload. This allows for creating a proxy between protocols without
> > extra hops through the stack or userspace.
> > 
> > So in order to fix this case add skb_adjust_room() so users can strip the
> > header. After this the user can strip the header and an unmodified receiver
> > thread will work correctly when data is redirected into the ingress path
> > of a sock.
> > 
> > Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> > ---
> >   net/core/filter.c |   51 +++++++++++++++++++++++++++++++++++++++++++++++++++
> >   1 file changed, 51 insertions(+)
> > 
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 4d8dc7a31a78..d232358f1dcd 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -76,6 +76,7 @@
> >   #include <net/bpf_sk_storage.h>
> >   #include <net/transp_v6.h>
> >   #include <linux/btf_ids.h>
> > +#include <net/tls.h>
> >   
> >   static const struct bpf_func_proto *
> >   bpf_sk_base_func_proto(enum bpf_func_id func_id);
> > @@ -3218,6 +3219,53 @@ static u32 __bpf_skb_max_len(const struct sk_buff *skb)
> >   			  SKB_MAX_ALLOC;
> >   }
> >   
> > +BPF_CALL_4(sk_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
> > +	   u32, mode, u64, flags)
> > +{
> > +	unsigned int len_diff_abs = abs(len_diff);
> 
> small nit: u32

Sure.

> 
> > +	bool shrink = len_diff < 0;
> > +	int ret = 0;
> > +
> > +	if (unlikely(flags))
> > +		return -EINVAL;
> 
> Parameter 'mode' is not used here, I guess we need to reject anything non-zero?

Probably its not used.

> 
> Similarly, any interaction wrt bpf_csum_level() that was needed back then for the
> bpf_skb_adjust_room()?

I don't believe so because we are above csum checks at this point.
Either we will put the skb data in the receive_queue for the socket
or redirect it into sendpage.

> 
> > +	if (unlikely(len_diff_abs > 0xfffU))
> > +		return -EFAULT;
> > +
> > +	if (!shrink) {
> > +		unsigned int grow = len_diff;
> 
> nit: u32 or just directly len_diff?

Just use len_diff missed when I cleaned this up.

> 
> > +		ret = skb_cow(skb, grow);
> > +		if (likely(!ret)) {
> > +			__skb_push(skb, len_diff_abs);
> > +			memset(skb->data, 0, len_diff_abs);
> > +		}
> > +	} else {
> > +		/* skb_ensure_writable() is not needed here, as we're
> > +		 * already working on an uncloned skb.
> > +		 */
> > +		if (unlikely(!pskb_may_pull(skb, len_diff_abs)))
> > +			return -ENOMEM;
> > +		__skb_pull(skb, len_diff_abs);
> > +	}
> > +	bpf_compute_data_end_sk_skb(skb);
> > +	if (tls_sw_has_ctx_rx(skb->sk)) {
> > +		struct strp_msg *rxm = strp_msg(skb);
> > +
> > +		rxm->full_len += len_diff;
> 
> If skb_cow() failed, we still adjust rxm->full_len?

Thanks. Will just return above on error like in the else
branch. I'll send a v2 shortly.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bpf-next PATCH 1/2] bpf, sockmap: add skb_adjust_room to pop bytes off ingress payload
  2020-09-26  4:27 ` [bpf-next PATCH 1/2] bpf, sockmap: add skb_adjust_room to pop bytes off ingress payload John Fastabend
  2020-09-29 14:59   ` Daniel Borkmann
@ 2020-09-30  9:55   ` Jakub Sitnicki
  2020-10-01  1:59     ` John Fastabend
  1 sibling, 1 reply; 7+ messages in thread
From: Jakub Sitnicki @ 2020-09-30  9:55 UTC (permalink / raw)
  To: John Fastabend; +Cc: ast, daniel, netdev, lmb, Marek Majkowski

On Sat, Sep 26, 2020 at 06:27 AM CEST, John Fastabend wrote:
> This implements a new helper skb_adjust_room() so users can push/pop
> extra bytes from a BPF_SK_SKB_STREAM_VERDICT program.
>
> Some protocols may include headers and other information that we may
> not want to include when doing a redirect from a BPF_SK_SKB_STREAM_VERDICT
> program. One use case is to redirect TLS packets into a receive socket
> that doesn't expect TLS data. In TLS case the first 13B or so contain the
> protocol header. With KTLS the payload is decrypted so we should be able
> to redirect this to a receiving socket, but the receiving socket may not
> be expecting to receive a TLS header and discard the data. Using the
> above helper we can pop the header off and put an appropriate header on
> the payload. This allows for creating a proxy between protocols without
> extra hops through the stack or userspace.

This is useful stuff. Apart from the TLS use-case, you might want to pop
off proxy headers like PROXY v1/v2 (CC Marek):

  https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt

>
> So in order to fix this case add skb_adjust_room() so users can strip the
> header. After this the user can strip the header and an unmodified receiver
> thread will work correctly when data is redirected into the ingress path
> of a sock.
>
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> ---
>  net/core/filter.c |   51 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 51 insertions(+)
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 4d8dc7a31a78..d232358f1dcd 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -76,6 +76,7 @@
>  #include <net/bpf_sk_storage.h>
>  #include <net/transp_v6.h>
>  #include <linux/btf_ids.h>
> +#include <net/tls.h>
>
>  static const struct bpf_func_proto *
>  bpf_sk_base_func_proto(enum bpf_func_id func_id);
> @@ -3218,6 +3219,53 @@ static u32 __bpf_skb_max_len(const struct sk_buff *skb)
>  			  SKB_MAX_ALLOC;
>  }
>
> +BPF_CALL_4(sk_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
> +	   u32, mode, u64, flags)
> +{
> +	unsigned int len_diff_abs = abs(len_diff);
> +	bool shrink = len_diff < 0;
> +	int ret = 0;
> +
> +	if (unlikely(flags))
> +		return -EINVAL;
> +	if (unlikely(len_diff_abs > 0xfffU))
> +		return -EFAULT;
> +
> +	if (!shrink) {
> +		unsigned int grow = len_diff;
> +
> +		ret = skb_cow(skb, grow);
> +		if (likely(!ret)) {
> +			__skb_push(skb, len_diff_abs);
> +			memset(skb->data, 0, len_diff_abs);
> +		}
> +	} else {
> +		/* skb_ensure_writable() is not needed here, as we're
> +		 * already working on an uncloned skb.
> +		 */

I'm trying to digest the above comment. What if:

static int __strp_recv(…)
{
        …
	while (eaten < orig_len) {
		/* Always clone since we will consume something */
		skb = skb_clone(orig_skb, GFP_ATOMIC);
                …
		head = strp->skb_head;
		if (!head) {
			head = skb;
                        …
		} else {
                        …
		}
                …
		/* Give skb to upper layer */
		strp->cb.rcv_msg(strp, head); // → sk_psock_init_strp
                …
	}
        …
}

That looks like a code path where we pass a cloned SKB.

> +		if (unlikely(!pskb_may_pull(skb, len_diff_abs)))
> +			return -ENOMEM;
> +		__skb_pull(skb, len_diff_abs);
> +	}
> +	bpf_compute_data_end_sk_skb(skb);
> +	if (tls_sw_has_ctx_rx(skb->sk)) {
> +		struct strp_msg *rxm = strp_msg(skb);
> +
> +		rxm->full_len += len_diff;
> +	}
> +	return ret;
> +}

[...]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bpf-next PATCH 1/2] bpf, sockmap: add skb_adjust_room to pop bytes off ingress payload
  2020-09-30  9:55   ` Jakub Sitnicki
@ 2020-10-01  1:59     ` John Fastabend
  0 siblings, 0 replies; 7+ messages in thread
From: John Fastabend @ 2020-10-01  1:59 UTC (permalink / raw)
  To: Jakub Sitnicki, John Fastabend; +Cc: ast, daniel, netdev, lmb, Marek Majkowski

Jakub Sitnicki wrote:
> On Sat, Sep 26, 2020 at 06:27 AM CEST, John Fastabend wrote:
> > This implements a new helper skb_adjust_room() so users can push/pop
> > extra bytes from a BPF_SK_SKB_STREAM_VERDICT program.
> >
> > Some protocols may include headers and other information that we may
> > not want to include when doing a redirect from a BPF_SK_SKB_STREAM_VERDICT
> > program. One use case is to redirect TLS packets into a receive socket
> > that doesn't expect TLS data. In TLS case the first 13B or so contain the
> > protocol header. With KTLS the payload is decrypted so we should be able
> > to redirect this to a receiving socket, but the receiving socket may not
> > be expecting to receive a TLS header and discard the data. Using the
> > above helper we can pop the header off and put an appropriate header on
> > the payload. This allows for creating a proxy between protocols without
> > extra hops through the stack or userspace.
> 
> This is useful stuff. Apart from the TLS use-case, you might want to pop
> off proxy headers like PROXY v1/v2 (CC Marek):
> 
>   https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt

Great!

> 
> >
> > So in order to fix this case add skb_adjust_room() so users can strip the
> > header. After this the user can strip the header and an unmodified receiver
> > thread will work correctly when data is redirected into the ingress path
> > of a sock.
> >
> > Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> > ---
> >  net/core/filter.c |   51 +++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 51 insertions(+)
> >
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 4d8dc7a31a78..d232358f1dcd 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -76,6 +76,7 @@
> >  #include <net/bpf_sk_storage.h>
> >  #include <net/transp_v6.h>
> >  #include <linux/btf_ids.h>
> > +#include <net/tls.h>
> >
> >  static const struct bpf_func_proto *
> >  bpf_sk_base_func_proto(enum bpf_func_id func_id);
> > @@ -3218,6 +3219,53 @@ static u32 __bpf_skb_max_len(const struct sk_buff *skb)
> >  			  SKB_MAX_ALLOC;
> >  }
> >
> > +BPF_CALL_4(sk_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
> > +	   u32, mode, u64, flags)
> > +{
> > +	unsigned int len_diff_abs = abs(len_diff);
> > +	bool shrink = len_diff < 0;
> > +	int ret = 0;
> > +
> > +	if (unlikely(flags))
> > +		return -EINVAL;
> > +	if (unlikely(len_diff_abs > 0xfffU))
> > +		return -EFAULT;
> > +
> > +	if (!shrink) {
> > +		unsigned int grow = len_diff;
> > +
> > +		ret = skb_cow(skb, grow);
> > +		if (likely(!ret)) {
> > +			__skb_push(skb, len_diff_abs);
> > +			memset(skb->data, 0, len_diff_abs);
> > +		}
> > +	} else {
> > +		/* skb_ensure_writable() is not needed here, as we're
> > +		 * already working on an uncloned skb.
> > +		 */
> 
> I'm trying to digest the above comment. What if:

I'll delete the comment its not accurate. We happily write headers
from verdict programs today. Do you have a specific concern or
just noticing I was a bit careless and cut'n'pasted an incorrect
comment around.

> 
> static int __strp_recv(…)
> {
>         …
> 	while (eaten < orig_len) {
> 		/* Always clone since we will consume something */
> 		skb = skb_clone(orig_skb, GFP_ATOMIC);
>                 …
> 		head = strp->skb_head;
> 		if (!head) {
> 			head = skb;
>                         …
> 		} else {
>                         …
> 		}
>                 …
> 		/* Give skb to upper layer */
> 		strp->cb.rcv_msg(strp, head); // → sk_psock_init_strp
>                 …
> 	}
>         …
> }
> 
> That looks like a code path where we pass a cloned SKB.

Right but its there to cover the sk_eat_skb in tcp_read_sock()
otherwise

 sk_eat_skb() -> __kfree_skb() -> skb_release_all()

would go all the way to page_frag_free().

> 
> > +		if (unlikely(!pskb_may_pull(skb, len_diff_abs)))
> > +			return -ENOMEM;
> > +		__skb_pull(skb, len_diff_abs);
> > +	}
> > +	bpf_compute_data_end_sk_skb(skb);
> > +	if (tls_sw_has_ctx_rx(skb->sk)) {
> > +		struct strp_msg *rxm = strp_msg(skb);
> > +
> > +		rxm->full_len += len_diff;
> > +	}
> > +	return ret;
> > +}
> 
> [...]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-10-01  1:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-26  4:26 [bpf-next PATCH 0/2] Add skb_adjust_room() for SK_SKB John Fastabend
2020-09-26  4:27 ` [bpf-next PATCH 1/2] bpf, sockmap: add skb_adjust_room to pop bytes off ingress payload John Fastabend
2020-09-29 14:59   ` Daniel Borkmann
2020-09-29 15:41     ` John Fastabend
2020-09-30  9:55   ` Jakub Sitnicki
2020-10-01  1:59     ` John Fastabend
2020-09-26  4:27 ` [bpf-next PATCH 2/2] bpf, sockmap: update selftests to use skb_adjust_room John Fastabend

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).