bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next V2 0/2] bpf: Updates for BPF-helper bpf_check_mtu
@ 2021-02-18 11:49 Jesper Dangaard Brouer
  2021-02-18 11:49 ` [PATCH bpf-next V2 1/2] bpf: BPF-helper for MTU checking add length input Jesper Dangaard Brouer
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2021-02-18 11:49 UTC (permalink / raw)
  To: bpf; +Cc: Jesper Dangaard Brouer, netdev, Daniel Borkmann, Alexei Starovoitov

The FIB lookup example[1] show how the IP-header field tot_len
(iph->tot_len) is used as input to perform the MTU check. The recently
added MTU check helper bpf_check_mtu() should also support this type
of MTU check.

Lets add this feature before merge window, please. This is a followup
to 34b2021cc616 ("bpf: Add BPF-helper for MTU checking").

[1] samples/bpf/xdp_fwd_kern.c

V2: Fixed spelling and added ACKs from John
---

Jesper Dangaard Brouer (2):
      bpf: BPF-helper for MTU checking add length input
      selftests/bpf: Tests using bpf_check_mtu BPF-helper input mtu_len param


 include/uapi/linux/bpf.h                           |   17 ++--
 net/core/filter.c                                  |   12 ++-
 tools/testing/selftests/bpf/prog_tests/check_mtu.c |    4 +
 tools/testing/selftests/bpf/progs/test_check_mtu.c |   92 ++++++++++++++++++++
 4 files changed, 117 insertions(+), 8 deletions(-)

--


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH bpf-next V2 1/2] bpf: BPF-helper for MTU checking add length input
  2021-02-18 11:49 [PATCH bpf-next V2 0/2] bpf: Updates for BPF-helper bpf_check_mtu Jesper Dangaard Brouer
@ 2021-02-18 11:49 ` Jesper Dangaard Brouer
  2021-02-26 23:36   ` Daniel Borkmann
  2021-02-18 11:50 ` [PATCH bpf-next V2 2/2] selftests/bpf: Tests using bpf_check_mtu BPF-helper input mtu_len param Jesper Dangaard Brouer
  2021-02-19  6:36 ` [PATCH bpf-next V2 0/2] bpf: Updates for BPF-helper bpf_check_mtu Jesper Dangaard Brouer
  2 siblings, 1 reply; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2021-02-18 11:49 UTC (permalink / raw)
  To: bpf; +Cc: Jesper Dangaard Brouer, netdev, Daniel Borkmann, Alexei Starovoitov

The FIB lookup example[1] show how the IP-header field tot_len
(iph->tot_len) is used as input to perform the MTU check.

This patch extend the BPF-helper bpf_check_mtu() with the same ability
to provide the length as user parameter input, via mtu_len parameter.

[1] samples/bpf/xdp_fwd_kern.c

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
---
 include/uapi/linux/bpf.h |   17 +++++++++++------
 net/core/filter.c        |   12 ++++++++++--
 2 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 4c24daa43bac..4ba4ef0ff63a 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3850,8 +3850,7 @@ union bpf_attr {
  *
  * long bpf_check_mtu(void *ctx, u32 ifindex, u32 *mtu_len, s32 len_diff, u64 flags)
  *	Description
-
- *		Check ctx packet size against exceeding MTU of net device (based
+ *		Check packet size against exceeding MTU of net device (based
  *		on *ifindex*).  This helper will likely be used in combination
  *		with helpers that adjust/change the packet size.
  *
@@ -3868,6 +3867,14 @@ union bpf_attr {
  *		against the current net device.  This is practical if this isn't
  *		used prior to redirect.
  *
+ *		On input *mtu_len* must be a valid pointer, else verifier will
+ *		reject BPF program.  If the value *mtu_len* is initialized to
+ *		zero then the ctx packet size is use.  When value *mtu_len* is
+ *		provided as input this specify the L3 length that the MTU check
+ *		is done against. Remember XDP and TC length operate at L2, but
+ *		this value is L3 as this correlate to MTU and IP-header tot_len
+ *		values which are L3 (similar behavior as bpf_fib_lookup).
+ *
  *		The Linux kernel route table can configure MTUs on a more
  *		specific per route level, which is not provided by this helper.
  *		For route level MTU checks use the **bpf_fib_lookup**\ ()
@@ -3892,11 +3899,9 @@ union bpf_attr {
  *
  *		On return *mtu_len* pointer contains the MTU value of the net
  *		device.  Remember the net device configured MTU is the L3 size,
- *		which is returned here and XDP and TX length operate at L2.
+ *		which is returned here and XDP and TC length operate at L2.
  *		Helper take this into account for you, but remember when using
- *		MTU value in your BPF-code.  On input *mtu_len* must be a valid
- *		pointer and be initialized (to zero), else verifier will reject
- *		BPF program.
+ *		MTU value in your BPF-code.
  *
  *	Return
  *		* 0 on success, and populate MTU value in *mtu_len* pointer.
diff --git a/net/core/filter.c b/net/core/filter.c
index 7059cf604d94..fcc3bda85960 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5660,7 +5660,7 @@ BPF_CALL_5(bpf_skb_check_mtu, struct sk_buff *, skb,
 	if (unlikely(flags & ~(BPF_MTU_CHK_SEGS)))
 		return -EINVAL;
 
-	if (unlikely(flags & BPF_MTU_CHK_SEGS && len_diff))
+	if (unlikely(flags & BPF_MTU_CHK_SEGS && (len_diff || *mtu_len)))
 		return -EINVAL;
 
 	dev = __dev_via_ifindex(dev, ifindex);
@@ -5670,7 +5670,11 @@ BPF_CALL_5(bpf_skb_check_mtu, struct sk_buff *, skb,
 	mtu = READ_ONCE(dev->mtu);
 
 	dev_len = mtu + dev->hard_header_len;
-	skb_len = skb->len + len_diff; /* minus result pass check */
+
+	/* If set use *mtu_len as input, L3 as iph->tot_len (like fib_lookup) */
+	skb_len = *mtu_len ? *mtu_len + dev->hard_header_len : skb->len;
+
+	skb_len += len_diff; /* minus result pass check */
 	if (skb_len <= dev_len) {
 		ret = BPF_MTU_CHK_RET_SUCCESS;
 		goto out;
@@ -5715,6 +5719,10 @@ BPF_CALL_5(bpf_xdp_check_mtu, struct xdp_buff *, xdp,
 	/* Add L2-header as dev MTU is L3 size */
 	dev_len = mtu + dev->hard_header_len;
 
+	/* Use *mtu_len as input, L3 as iph->tot_len (like fib_lookup) */
+	if (*mtu_len)
+		xdp_len = *mtu_len + dev->hard_header_len;
+
 	xdp_len += len_diff; /* minus result pass check */
 	if (xdp_len > dev_len)
 		ret = BPF_MTU_CHK_RET_FRAG_NEEDED;



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH bpf-next V2 2/2] selftests/bpf: Tests using bpf_check_mtu BPF-helper input mtu_len param
  2021-02-18 11:49 [PATCH bpf-next V2 0/2] bpf: Updates for BPF-helper bpf_check_mtu Jesper Dangaard Brouer
  2021-02-18 11:49 ` [PATCH bpf-next V2 1/2] bpf: BPF-helper for MTU checking add length input Jesper Dangaard Brouer
@ 2021-02-18 11:50 ` Jesper Dangaard Brouer
  2021-02-19  6:36 ` [PATCH bpf-next V2 0/2] bpf: Updates for BPF-helper bpf_check_mtu Jesper Dangaard Brouer
  2 siblings, 0 replies; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2021-02-18 11:50 UTC (permalink / raw)
  To: bpf; +Cc: Jesper Dangaard Brouer, netdev, Daniel Borkmann, Alexei Starovoitov

Add tests that use mtu_len as input parameter in BPF-helper
bpf_check_mtu().

The BPF-helper is avail from both XDP and TC context. Add two tests
per context, one that tests below MTU and one that exceeds the MTU.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
---
 tools/testing/selftests/bpf/prog_tests/check_mtu.c |    4 +
 tools/testing/selftests/bpf/progs/test_check_mtu.c |   92 ++++++++++++++++++++
 2 files changed, 96 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/check_mtu.c b/tools/testing/selftests/bpf/prog_tests/check_mtu.c
index 36af1c138faf..b62a39315336 100644
--- a/tools/testing/selftests/bpf/prog_tests/check_mtu.c
+++ b/tools/testing/selftests/bpf/prog_tests/check_mtu.c
@@ -128,6 +128,8 @@ static void test_check_mtu_xdp(__u32 mtu, __u32 ifindex)
 	test_check_mtu_run_xdp(skel, skel->progs.xdp_use_helper, mtu);
 	test_check_mtu_run_xdp(skel, skel->progs.xdp_exceed_mtu, mtu);
 	test_check_mtu_run_xdp(skel, skel->progs.xdp_minus_delta, mtu);
+	test_check_mtu_run_xdp(skel, skel->progs.xdp_input_len, mtu);
+	test_check_mtu_run_xdp(skel, skel->progs.xdp_input_len_exceed, mtu);
 
 cleanup:
 	test_check_mtu__destroy(skel);
@@ -187,6 +189,8 @@ static void test_check_mtu_tc(__u32 mtu, __u32 ifindex)
 	test_check_mtu_run_tc(skel, skel->progs.tc_exceed_mtu, mtu);
 	test_check_mtu_run_tc(skel, skel->progs.tc_exceed_mtu_da, mtu);
 	test_check_mtu_run_tc(skel, skel->progs.tc_minus_delta, mtu);
+	test_check_mtu_run_tc(skel, skel->progs.tc_input_len, mtu);
+	test_check_mtu_run_tc(skel, skel->progs.tc_input_len_exceed, mtu);
 cleanup:
 	test_check_mtu__destroy(skel);
 }
diff --git a/tools/testing/selftests/bpf/progs/test_check_mtu.c b/tools/testing/selftests/bpf/progs/test_check_mtu.c
index b7787b43f9db..c4a9bae96e75 100644
--- a/tools/testing/selftests/bpf/progs/test_check_mtu.c
+++ b/tools/testing/selftests/bpf/progs/test_check_mtu.c
@@ -105,6 +105,54 @@ int xdp_minus_delta(struct xdp_md *ctx)
 	return retval;
 }
 
+SEC("xdp")
+int xdp_input_len(struct xdp_md *ctx)
+{
+	int retval = XDP_PASS; /* Expected retval on successful test */
+	void *data_end = (void *)(long)ctx->data_end;
+	void *data = (void *)(long)ctx->data;
+	__u32 ifindex = GLOBAL_USER_IFINDEX;
+	__u32 data_len = data_end - data;
+
+	/* API allow user give length to check as input via mtu_len param,
+	 * resulting MTU value is still output in mtu_len param after call.
+	 *
+	 * Input len is L3, like MTU and iph->tot_len.
+	 * Remember XDP data_len is L2.
+	 */
+	__u32 mtu_len = data_len - ETH_HLEN;
+
+	if (bpf_check_mtu(ctx, ifindex, &mtu_len, 0, 0))
+		retval = XDP_ABORTED;
+
+	global_bpf_mtu_xdp = mtu_len;
+	return retval;
+}
+
+SEC("xdp")
+int xdp_input_len_exceed(struct xdp_md *ctx)
+{
+	int retval = XDP_ABORTED; /* Fail */
+	__u32 ifindex = GLOBAL_USER_IFINDEX;
+	int err;
+
+	/* API allow user give length to check as input via mtu_len param,
+	 * resulting MTU value is still output in mtu_len param after call.
+	 *
+	 * Input length value is L3 size like MTU.
+	 */
+	__u32 mtu_len = GLOBAL_USER_MTU;
+
+	mtu_len += 1; /* Exceed with 1 */
+
+	err = bpf_check_mtu(ctx, ifindex, &mtu_len, 0, 0);
+	if (err == BPF_MTU_CHK_RET_FRAG_NEEDED)
+		retval = XDP_PASS ; /* Success in exceeding MTU check */
+
+	global_bpf_mtu_xdp = mtu_len;
+	return retval;
+}
+
 SEC("classifier")
 int tc_use_helper(struct __sk_buff *ctx)
 {
@@ -196,3 +244,47 @@ int tc_minus_delta(struct __sk_buff *ctx)
 	global_bpf_mtu_xdp = mtu_len;
 	return retval;
 }
+
+SEC("classifier")
+int tc_input_len(struct __sk_buff *ctx)
+{
+	int retval = BPF_OK; /* Expected retval on successful test */
+	__u32 ifindex = GLOBAL_USER_IFINDEX;
+
+	/* API allow user give length to check as input via mtu_len param,
+	 * resulting MTU value is still output in mtu_len param after call.
+	 *
+	 * Input length value is L3 size.
+	 */
+	__u32 mtu_len = GLOBAL_USER_MTU;
+
+	if (bpf_check_mtu(ctx, ifindex, &mtu_len, 0, 0))
+		retval = BPF_DROP;
+
+	global_bpf_mtu_xdp = mtu_len;
+	return retval;
+}
+
+SEC("classifier")
+int tc_input_len_exceed(struct __sk_buff *ctx)
+{
+	int retval = BPF_DROP; /* Fail */
+	__u32 ifindex = GLOBAL_USER_IFINDEX;
+	int err;
+
+	/* API allow user give length to check as input via mtu_len param,
+	 * resulting MTU value is still output in mtu_len param after call.
+	 *
+	 * Input length value is L3 size like MTU.
+	 */
+	__u32 mtu_len = GLOBAL_USER_MTU;
+
+	mtu_len += 1; /* Exceed with 1 */
+
+	err = bpf_check_mtu(ctx, ifindex, &mtu_len, 0, 0);
+	if (err == BPF_MTU_CHK_RET_FRAG_NEEDED)
+		retval = BPF_OK; /* Success in exceeding MTU check */
+
+	global_bpf_mtu_xdp = mtu_len;
+	return retval;
+}



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next V2 0/2] bpf: Updates for BPF-helper bpf_check_mtu
  2021-02-18 11:49 [PATCH bpf-next V2 0/2] bpf: Updates for BPF-helper bpf_check_mtu Jesper Dangaard Brouer
  2021-02-18 11:49 ` [PATCH bpf-next V2 1/2] bpf: BPF-helper for MTU checking add length input Jesper Dangaard Brouer
  2021-02-18 11:50 ` [PATCH bpf-next V2 2/2] selftests/bpf: Tests using bpf_check_mtu BPF-helper input mtu_len param Jesper Dangaard Brouer
@ 2021-02-19  6:36 ` Jesper Dangaard Brouer
  2021-02-26 22:34   ` Daniel Borkmann
  2 siblings, 1 reply; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2021-02-19  6:36 UTC (permalink / raw)
  To: bpf; +Cc: netdev, Daniel Borkmann, Alexei Starovoitov, brouer

On Thu, 18 Feb 2021 12:49:53 +0100
Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> The FIB lookup example[1] show how the IP-header field tot_len
> (iph->tot_len) is used as input to perform the MTU check. The recently
> added MTU check helper bpf_check_mtu() should also support this type
> of MTU check.
> 
> Lets add this feature before merge window, please. This is a followup
> to 34b2021cc616 ("bpf: Add BPF-helper for MTU checking").

Which git tree should I send this against bpf-next or bpf, to keep this
change together with 34b2021cc616 ("bpf: Add BPF-helper for MTU
checking") ?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next V2 0/2] bpf: Updates for BPF-helper bpf_check_mtu
  2021-02-19  6:36 ` [PATCH bpf-next V2 0/2] bpf: Updates for BPF-helper bpf_check_mtu Jesper Dangaard Brouer
@ 2021-02-26 22:34   ` Daniel Borkmann
  2021-03-08 14:44     ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Borkmann @ 2021-02-26 22:34 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, bpf; +Cc: netdev, Daniel Borkmann, Alexei Starovoitov

On 2/19/21 7:36 AM, Jesper Dangaard Brouer wrote:
> On Thu, 18 Feb 2021 12:49:53 +0100
> Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> 
>> The FIB lookup example[1] show how the IP-header field tot_len
>> (iph->tot_len) is used as input to perform the MTU check. The recently
>> added MTU check helper bpf_check_mtu() should also support this type
>> of MTU check.
>>
>> Lets add this feature before merge window, please. This is a followup
>> to 34b2021cc616 ("bpf: Add BPF-helper for MTU checking").
> 
> Which git tree should I send this against bpf-next or bpf, to keep this
> change together with 34b2021cc616 ("bpf: Add BPF-helper for MTU
> checking") ?

Given this is an api change, we'll take this into bpf tree after the
pending pr was merged.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next V2 1/2] bpf: BPF-helper for MTU checking add length input
  2021-02-18 11:49 ` [PATCH bpf-next V2 1/2] bpf: BPF-helper for MTU checking add length input Jesper Dangaard Brouer
@ 2021-02-26 23:36   ` Daniel Borkmann
  2021-02-27 10:37     ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Borkmann @ 2021-02-26 23:36 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, bpf
  Cc: netdev, Daniel Borkmann, Alexei Starovoitov, john.fastabend

On 2/18/21 12:49 PM, Jesper Dangaard Brouer wrote:
> The FIB lookup example[1] show how the IP-header field tot_len
> (iph->tot_len) is used as input to perform the MTU check.
> 
> This patch extend the BPF-helper bpf_check_mtu() with the same ability
> to provide the length as user parameter input, via mtu_len parameter.
> 
> [1] samples/bpf/xdp_fwd_kern.c
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> Acked-by: John Fastabend <john.fastabend@gmail.com>
> ---
>   include/uapi/linux/bpf.h |   17 +++++++++++------
>   net/core/filter.c        |   12 ++++++++++--
>   2 files changed, 21 insertions(+), 8 deletions(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 4c24daa43bac..4ba4ef0ff63a 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -3850,8 +3850,7 @@ union bpf_attr {
>    *
>    * long bpf_check_mtu(void *ctx, u32 ifindex, u32 *mtu_len, s32 len_diff, u64 flags)
>    *	Description
> -
> - *		Check ctx packet size against exceeding MTU of net device (based
> + *		Check packet size against exceeding MTU of net device (based
>    *		on *ifindex*).  This helper will likely be used in combination
>    *		with helpers that adjust/change the packet size.
>    *
> @@ -3868,6 +3867,14 @@ union bpf_attr {
>    *		against the current net device.  This is practical if this isn't
>    *		used prior to redirect.
>    *
> + *		On input *mtu_len* must be a valid pointer, else verifier will
> + *		reject BPF program.  If the value *mtu_len* is initialized to
> + *		zero then the ctx packet size is use.  When value *mtu_len* is
> + *		provided as input this specify the L3 length that the MTU check
> + *		is done against. Remember XDP and TC length operate at L2, but
> + *		this value is L3 as this correlate to MTU and IP-header tot_len
> + *		values which are L3 (similar behavior as bpf_fib_lookup).
> + *
>    *		The Linux kernel route table can configure MTUs on a more
>    *		specific per route level, which is not provided by this helper.
>    *		For route level MTU checks use the **bpf_fib_lookup**\ ()
> @@ -3892,11 +3899,9 @@ union bpf_attr {
>    *
>    *		On return *mtu_len* pointer contains the MTU value of the net
>    *		device.  Remember the net device configured MTU is the L3 size,
> - *		which is returned here and XDP and TX length operate at L2.
> + *		which is returned here and XDP and TC length operate at L2.
>    *		Helper take this into account for you, but remember when using
> - *		MTU value in your BPF-code.  On input *mtu_len* must be a valid
> - *		pointer and be initialized (to zero), else verifier will reject
> - *		BPF program.
> + *		MTU value in your BPF-code.
>    *
>    *	Return
>    *		* 0 on success, and populate MTU value in *mtu_len* pointer.
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 7059cf604d94..fcc3bda85960 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -5660,7 +5660,7 @@ BPF_CALL_5(bpf_skb_check_mtu, struct sk_buff *, skb,
>   	if (unlikely(flags & ~(BPF_MTU_CHK_SEGS)))
>   		return -EINVAL;
>   
> -	if (unlikely(flags & BPF_MTU_CHK_SEGS && len_diff))
> +	if (unlikely(flags & BPF_MTU_CHK_SEGS && (len_diff || *mtu_len)))
>   		return -EINVAL;
>   
>   	dev = __dev_via_ifindex(dev, ifindex);
> @@ -5670,7 +5670,11 @@ BPF_CALL_5(bpf_skb_check_mtu, struct sk_buff *, skb,
>   	mtu = READ_ONCE(dev->mtu);
>   
>   	dev_len = mtu + dev->hard_header_len;
> -	skb_len = skb->len + len_diff; /* minus result pass check */
> +
> +	/* If set use *mtu_len as input, L3 as iph->tot_len (like fib_lookup) */
> +	skb_len = *mtu_len ? *mtu_len + dev->hard_header_len : skb->len;
> +
> +	skb_len += len_diff; /* minus result pass check */
>   	if (skb_len <= dev_len) {
>   		ret = BPF_MTU_CHK_RET_SUCCESS;
>   		goto out;
> @@ -5715,6 +5719,10 @@ BPF_CALL_5(bpf_xdp_check_mtu, struct xdp_buff *, xdp,
>   	/* Add L2-header as dev MTU is L3 size */
>   	dev_len = mtu + dev->hard_header_len;
>   
> +	/* Use *mtu_len as input, L3 as iph->tot_len (like fib_lookup) */
> +	if (*mtu_len)
> +		xdp_len = *mtu_len + dev->hard_header_len;
> +
>   	xdp_len += len_diff; /* minus result pass check */
>   	if (xdp_len > dev_len)
>   		ret = BPF_MTU_CHK_RET_FRAG_NEEDED;
> 

Btw, one more note on the whole bpf_*_check_mtu() helper... Last week I implemented
PMTU discovery support for clients for Cilium's XDP stand-alone LB in DSR mode, so I
was briefly considering whether to use the bpf_xdp_check_mtu() helper for retrieving
the device MTU, but then I thought to myself why having an unnecessary per-packet cost
for an extra helper call if I could just pass it in via constant instead. So I went
with the latter instead of the helper with the tradeoff to restart the Cilium agent
if someone actually changes MTU in prod which is a rare event anyway.

Looking at what bpf_xdp_check_mtu() for example really offers is retrieval of dev->mtu
as well as dev->hard_header_len and the rest can all be done inside the BPF prog itself
w/o the helper overhead. Why am I mentioning this.. because the above change is a similar
case of what could have been done /inside/ the BPF prog anyway (especially on XDP where
extra overhead should be cut where possible).

I think it got lost somewhere in the many versions of the original set where it was
mentioned before, but allowing to retrieve the dev object into BPF context and then
exposing it similarly to how we handle the case of struct bpf_tcp_sock would have been
much cleaner approach, e.g. the prog from XDP and tc context would be able to do:

   struct bpf_dev *dev = ctx->dev;

And we expose initially, for example:

   struct bpf_dev {
     __u32 mtu;
     __u32 hard_header_len;
     __u32 ifindex;
     __u32 rx_queues;
     __u32 tx_queues;
   };

And we could also have a BPF helper for XDP and tc that would fetch a /different/ dev
given we're under RCU context anyway, like ...

BPF_CALL_2(bpf_get_dev, struct xdp_buff *, xdp, u32, ifindex)
{
	return dev_get_by_index_rcu(dev_net(xdp->rxq->dev), index);
}

... returning a new dev_or_null type. With this flexibility everything else can be done
inside the prog, and later on it easily allows to expose more from dev side. Actually,
I'm inclined to code it up ...

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next V2 1/2] bpf: BPF-helper for MTU checking add length input
  2021-02-26 23:36   ` Daniel Borkmann
@ 2021-02-27 10:37     ` Jesper Dangaard Brouer
  2021-03-08 22:14       ` Daniel Borkmann
  0 siblings, 1 reply; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2021-02-27 10:37 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: bpf, netdev, Daniel Borkmann, Alexei Starovoitov, john.fastabend,
	brouer, Lorenzo Bianconi, Marek Majtyka

On Sat, 27 Feb 2021 00:36:02 +0100
Daniel Borkmann <daniel@iogearbox.net> wrote:

> On 2/18/21 12:49 PM, Jesper Dangaard Brouer wrote:
> > The FIB lookup example[1] show how the IP-header field tot_len
> > (iph->tot_len) is used as input to perform the MTU check.
> > 
> > This patch extend the BPF-helper bpf_check_mtu() with the same ability
> > to provide the length as user parameter input, via mtu_len parameter.
> > 
> > [1] samples/bpf/xdp_fwd_kern.c
> > 
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > Acked-by: John Fastabend <john.fastabend@gmail.com>
> > ---
> >   include/uapi/linux/bpf.h |   17 +++++++++++------
> >   net/core/filter.c        |   12 ++++++++++--
> >   2 files changed, 21 insertions(+), 8 deletions(-)
> > 
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 4c24daa43bac..4ba4ef0ff63a 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -3850,8 +3850,7 @@ union bpf_attr {
> >    *
> >    * long bpf_check_mtu(void *ctx, u32 ifindex, u32 *mtu_len, s32 len_diff, u64 flags)
> >    *	Description
> > -
> > - *		Check ctx packet size against exceeding MTU of net device (based
> > + *		Check packet size against exceeding MTU of net device (based
> >    *		on *ifindex*).  This helper will likely be used in combination
> >    *		with helpers that adjust/change the packet size.
> >    *
> > @@ -3868,6 +3867,14 @@ union bpf_attr {
> >    *		against the current net device.  This is practical if this isn't
> >    *		used prior to redirect.
> >    *
> > + *		On input *mtu_len* must be a valid pointer, else verifier will
> > + *		reject BPF program.  If the value *mtu_len* is initialized to
> > + *		zero then the ctx packet size is use.  When value *mtu_len* is
> > + *		provided as input this specify the L3 length that the MTU check
> > + *		is done against. Remember XDP and TC length operate at L2, but
> > + *		this value is L3 as this correlate to MTU and IP-header tot_len
> > + *		values which are L3 (similar behavior as bpf_fib_lookup).
> > + *
> >    *		The Linux kernel route table can configure MTUs on a more
> >    *		specific per route level, which is not provided by this helper.
> >    *		For route level MTU checks use the **bpf_fib_lookup**\ ()
> > @@ -3892,11 +3899,9 @@ union bpf_attr {
> >    *
> >    *		On return *mtu_len* pointer contains the MTU value of the net
> >    *		device.  Remember the net device configured MTU is the L3 size,
> > - *		which is returned here and XDP and TX length operate at L2.
> > + *		which is returned here and XDP and TC length operate at L2.
> >    *		Helper take this into account for you, but remember when using
> > - *		MTU value in your BPF-code.  On input *mtu_len* must be a valid
> > - *		pointer and be initialized (to zero), else verifier will reject
> > - *		BPF program.
> > + *		MTU value in your BPF-code.
> >    *
> >    *	Return
> >    *		* 0 on success, and populate MTU value in *mtu_len* pointer.
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 7059cf604d94..fcc3bda85960 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -5660,7 +5660,7 @@ BPF_CALL_5(bpf_skb_check_mtu, struct sk_buff *, skb,
> >   	if (unlikely(flags & ~(BPF_MTU_CHK_SEGS)))
> >   		return -EINVAL;
> >   
> > -	if (unlikely(flags & BPF_MTU_CHK_SEGS && len_diff))
> > +	if (unlikely(flags & BPF_MTU_CHK_SEGS && (len_diff || *mtu_len)))
> >   		return -EINVAL;
> >   
> >   	dev = __dev_via_ifindex(dev, ifindex);
> > @@ -5670,7 +5670,11 @@ BPF_CALL_5(bpf_skb_check_mtu, struct sk_buff *, skb,
> >   	mtu = READ_ONCE(dev->mtu);
> >   
> >   	dev_len = mtu + dev->hard_header_len;
> > -	skb_len = skb->len + len_diff; /* minus result pass check */
> > +
> > +	/* If set use *mtu_len as input, L3 as iph->tot_len (like fib_lookup) */
> > +	skb_len = *mtu_len ? *mtu_len + dev->hard_header_len : skb->len;
> > +
> > +	skb_len += len_diff; /* minus result pass check */
> >   	if (skb_len <= dev_len) {
> >   		ret = BPF_MTU_CHK_RET_SUCCESS;
> >   		goto out;
> > @@ -5715,6 +5719,10 @@ BPF_CALL_5(bpf_xdp_check_mtu, struct xdp_buff *, xdp,
> >   	/* Add L2-header as dev MTU is L3 size */
> >   	dev_len = mtu + dev->hard_header_len;
> >   
> > +	/* Use *mtu_len as input, L3 as iph->tot_len (like fib_lookup) */
> > +	if (*mtu_len)
> > +		xdp_len = *mtu_len + dev->hard_header_len;
> > +
> >   	xdp_len += len_diff; /* minus result pass check */
> >   	if (xdp_len > dev_len)
> >   		ret = BPF_MTU_CHK_RET_FRAG_NEEDED;
> >   
> 
> Btw, one more note on the whole bpf_*_check_mtu() helper... Last week I implemented
> PMTU discovery support for clients for Cilium's XDP stand-alone LB in DSR mode, so I
> was briefly considering whether to use the bpf_xdp_check_mtu() helper for retrieving
> the device MTU, but then I thought to myself why having an unnecessary per-packet cost
> for an extra helper call if I could just pass it in via constant instead. So I went
> with the latter instead of the helper with the tradeoff to restart the Cilium agent
> if someone actually changes MTU in prod which is a rare event anyway.
> 
> Looking at what bpf_xdp_check_mtu() for example really offers is retrieval of dev->mtu
> as well as dev->hard_header_len and the rest can all be done inside the BPF prog itself
> w/o the helper overhead. Why am I mentioning this.. because the above change is a similar
> case of what could have been done /inside/ the BPF prog anyway (especially on XDP where
> extra overhead should be cut where possible).

The XDP case looks super simple now, but I thinking ahead.  When
Lorenzo adds multi-buff support, then we can/must update this helper to
use another XDP length value, based on the multi-buff jumbo-frame len.

Maybe we need another helper or what you propose below. BUT we could
also allow this helper (via flag?) to ALSO check if dev support
multi-buff XDP transmit (besides MTU limit with multi-buff len).  Then
the BPF-programmer can know this packet cannot be redirected to the
device.

> I think it got lost somewhere in the many versions of the original set where it was
> mentioned before, but allowing to retrieve the dev object into BPF context and then
> exposing it similarly to how we handle the case of struct bpf_tcp_sock would have been
> much cleaner approach, e.g. the prog from XDP and tc context would be able to do:
> 
>    struct bpf_dev *dev = ctx->dev;
> 
> And we expose initially, for example:
> 
>    struct bpf_dev {
>      __u32 mtu;
>      __u32 hard_header_len;
>      __u32 ifindex;
>      __u32 rx_queues;
>      __u32 tx_queues;
>    };
> 
> And we could also have a BPF helper for XDP and tc that would fetch a /different/ dev
> given we're under RCU context anyway, like ...
> 
> BPF_CALL_2(bpf_get_dev, struct xdp_buff *, xdp, u32, ifindex)
> {
> 	return dev_get_by_index_rcu(dev_net(xdp->rxq->dev), index);
> }
> 
> ... returning a new dev_or_null type. With this flexibility everything else can be done
> inside the prog, and later on it easily allows to expose more from dev side. Actually,
> I'm inclined to code it up ...

I love the idea to retrieve the dev object into BPF context.  It is
orthogonal, and doesn't replace the MTU helpers as the packet ctx
objects (SKB and xdp_buff) are more complex, and the helper allows us
to extend them without users have to update their BPF-code (as desc
above).

I do think it makes a lot of sense to expose/retrieve dev object into
BPF context.  As I hinted about, when we implement XDP multi-buff, then
the bpf_redirect BPF-helper cannot check if the remote device support
multi-buff transmit (as it don't have packet ctx).  If we have the dev
object, the we could expose XDP features that allow us (BPF-programmer)
to check this prior to doing the redirect.

To be clear:
 * I still think *this* patch is relevant and should be applied.

I'm also on-board with retrieve the dev object into BPF context, as it
have other use-cases.
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next V2 0/2] bpf: Updates for BPF-helper bpf_check_mtu
  2021-02-26 22:34   ` Daniel Borkmann
@ 2021-03-08 14:44     ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2021-03-08 14:44 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: bpf, netdev, Daniel Borkmann, Alexei Starovoitov, brouer

On Fri, 26 Feb 2021 23:34:34 +0100
Daniel Borkmann <daniel@iogearbox.net> wrote:

> On 2/19/21 7:36 AM, Jesper Dangaard Brouer wrote:
> > On Thu, 18 Feb 2021 12:49:53 +0100
> > Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> >   
> >> The FIB lookup example[1] show how the IP-header field tot_len
> >> (iph->tot_len) is used as input to perform the MTU check. The recently
> >> added MTU check helper bpf_check_mtu() should also support this type
> >> of MTU check.
> >>
> >> Lets add this feature before merge window, please. This is a followup
> >> to 34b2021cc616 ("bpf: Add BPF-helper for MTU checking").  
> > 
> > Which git tree should I send this against bpf-next or bpf, to keep this
> > change together with 34b2021cc616 ("bpf: Add BPF-helper for MTU
> > checking") ?  
> 
> Given this is an api change, we'll take this into bpf tree after the
> pending pr was merged.

That sounds great, but I noticed that they have not reached bpf-tree
yet. And the patches[1][2] disappeared[0] from patchwork as they were
archived, which confuse me.

As the patchset doesn't apply cleanly (due to whitespace in comment)
against bpf-tree, I'll resend it.

[0] https://patchwork.kernel.org/project/netdevbpf/list/?series=434987
[1] https://patchwork.kernel.org/project/netdevbpf/patch/161364899856.1250213.17435782167100828617.stgit@firesoul/
[2] https://patchwork.kernel.org/project/netdevbpf/patch/161364900363.1250213.9894483265551874755.stgit@firesoul/
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next V2 1/2] bpf: BPF-helper for MTU checking add length input
  2021-02-27 10:37     ` Jesper Dangaard Brouer
@ 2021-03-08 22:14       ` Daniel Borkmann
  0 siblings, 0 replies; 9+ messages in thread
From: Daniel Borkmann @ 2021-03-08 22:14 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: bpf, netdev, Daniel Borkmann, Alexei Starovoitov, john.fastabend,
	Lorenzo Bianconi, Marek Majtyka

On 2/27/21 11:37 AM, Jesper Dangaard Brouer wrote:
> On Sat, 27 Feb 2021 00:36:02 +0100
> Daniel Borkmann <daniel@iogearbox.net> wrote:
>> On 2/18/21 12:49 PM, Jesper Dangaard Brouer wrote:
>>> The FIB lookup example[1] show how the IP-header field tot_len
>>> (iph->tot_len) is used as input to perform the MTU check.
>>>
>>> This patch extend the BPF-helper bpf_check_mtu() with the same ability
>>> to provide the length as user parameter input, via mtu_len parameter.
>>>
>>> [1] samples/bpf/xdp_fwd_kern.c
>>>
>>> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
>>> Acked-by: John Fastabend <john.fastabend@gmail.com>
[...]
>> Btw, one more note on the whole bpf_*_check_mtu() helper... Last week I implemented
>> PMTU discovery support for clients for Cilium's XDP stand-alone LB in DSR mode, so I
>> was briefly considering whether to use the bpf_xdp_check_mtu() helper for retrieving
>> the device MTU, but then I thought to myself why having an unnecessary per-packet cost
>> for an extra helper call if I could just pass it in via constant instead. So I went
>> with the latter instead of the helper with the tradeoff to restart the Cilium agent
>> if someone actually changes MTU in prod which is a rare event anyway.
>>
>> Looking at what bpf_xdp_check_mtu() for example really offers is retrieval of dev->mtu
>> as well as dev->hard_header_len and the rest can all be done inside the BPF prog itself
>> w/o the helper overhead. Why am I mentioning this.. because the above change is a similar
>> case of what could have been done /inside/ the BPF prog anyway (especially on XDP where
>> extra overhead should be cut where possible).
> 
> The XDP case looks super simple now, but I thinking ahead.  When
> Lorenzo adds multi-buff support, then we can/must update this helper to
> use another XDP length value, based on the multi-buff jumbo-frame len.
> 
> Maybe we need another helper or what you propose below. BUT we could
> also allow this helper (via flag?) to ALSO check if dev support
> multi-buff XDP transmit (besides MTU limit with multi-buff len).  Then
> the BPF-programmer can know this packet cannot be redirected to the
> device.

Whether a XDP program is running on a device with multi-buff support or without
it should be transparent to this helper, in other words, the helper would have
to figure this out internally so that programs wouldn't have to be changed (in
the ideal case).

Overall, I still think that especially for the XDP case where performance matters
most, all this could have been done inside the program itself w/o the overhead of
the helper call as outlined earlier with struct bpf_dev ; adding more flags like
querying if a device supports multi-buff XDP transmit has not much to do with the
original purpose of bpf_xdp_check_mtu() anymore (aka do one thing/do it well mantra),
maybe the API should have been named bpf_xdp_check_forwardable() instead if it goes
beyond MTU .. But similarly here, struct bpf_dev property could also solve this
case if the prog developer needs more sanity checks when it is not clear whether
both devs support it, which can then also be /compiled out/ for the situation when
it /is/ known a-priori.

>> I think it got lost somewhere in the many versions of the original set where it was
>> mentioned before, but allowing to retrieve the dev object into BPF context and then
>> exposing it similarly to how we handle the case of struct bpf_tcp_sock would have been
>> much cleaner approach, e.g. the prog from XDP and tc context would be able to do:
>>
>>     struct bpf_dev *dev = ctx->dev;
>>
>> And we expose initially, for example:
>>
>>     struct bpf_dev {
>>       __u32 mtu;
>>       __u32 hard_header_len;
>>       __u32 ifindex;
>>       __u32 rx_queues;
>>       __u32 tx_queues;
>>     };
>>
>> And we could also have a BPF helper for XDP and tc that would fetch a /different/ dev
>> given we're under RCU context anyway, like ...
>>
>> BPF_CALL_2(bpf_get_dev, struct xdp_buff *, xdp, u32, ifindex)
>> {
>> 	return dev_get_by_index_rcu(dev_net(xdp->rxq->dev), index);
>> }
>>
>> ... returning a new dev_or_null type. With this flexibility everything else can be done
>> inside the prog, and later on it easily allows to expose more from dev side. Actually,
>> I'm inclined to code it up ...
> 
> I love the idea to retrieve the dev object into BPF context.  It is
> orthogonal, and doesn't replace the MTU helpers as the packet ctx
> objects (SKB and xdp_buff) are more complex, and the helper allows us
> to extend them without users have to update their BPF-code (as desc
> above).
> 
> I do think it makes a lot of sense to expose/retrieve dev object into
> BPF context.  As I hinted about, when we implement XDP multi-buff, then
> the bpf_redirect BPF-helper cannot check if the remote device support
> multi-buff transmit (as it don't have packet ctx).  If we have the dev
> object, the we could expose XDP features that allow us (BPF-programmer)
> to check this prior to doing the redirect.

Yep, that would be cleaner.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-03-08 22:15 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-18 11:49 [PATCH bpf-next V2 0/2] bpf: Updates for BPF-helper bpf_check_mtu Jesper Dangaard Brouer
2021-02-18 11:49 ` [PATCH bpf-next V2 1/2] bpf: BPF-helper for MTU checking add length input Jesper Dangaard Brouer
2021-02-26 23:36   ` Daniel Borkmann
2021-02-27 10:37     ` Jesper Dangaard Brouer
2021-03-08 22:14       ` Daniel Borkmann
2021-02-18 11:50 ` [PATCH bpf-next V2 2/2] selftests/bpf: Tests using bpf_check_mtu BPF-helper input mtu_len param Jesper Dangaard Brouer
2021-02-19  6:36 ` [PATCH bpf-next V2 0/2] bpf: Updates for BPF-helper bpf_check_mtu Jesper Dangaard Brouer
2021-02-26 22:34   ` Daniel Borkmann
2021-03-08 14:44     ` Jesper Dangaard Brouer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).