[PATCH] bpf: in bpf_skb_adjust_room correct inner protocol for vxlan

bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] bpf: in bpf_skb_adjust_room correct inner protocol for vxlan
@ 2021-02-08 11:38 huangxuesen
  2021-02-08 13:06 ` Willem de Bruijn
  0 siblings, 1 reply; 4+ messages in thread
From: huangxuesen @ 2021-02-08 11:38 UTC (permalink / raw)
  To: davem
  Cc: bpf, daniel, netdev, linux-kernel, huangxuesen, chengzhiyong, wangli

From: huangxuesen <huangxuesen@kuaishou.com>

When pushing vxlan tunnel header, set inner protocol as ETH_P_TEB in skb
to avoid HW device disabling udp tunnel segmentation offload, just like
vxlan_build_skb does.

Drivers for NIC may invoke vxlan_features_check to check the
inner_protocol in skb for vxlan packets to decide whether to disable
NETIF_F_GSO_MASK. Currently it sets inner_protocol as the original
skb->protocol, that will make mlx5_core disable TSO and lead to huge
performance degradation.

Signed-off-by: huangxuesen <huangxuesen@kuaishou.com>
Signed-off-by: chengzhiyong <chengzhiyong@kuaishou.com>
Signed-off-by: wangli <wangli09@kuaishou.com>
---
 net/core/filter.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 255aeee72402..f8d3ba3fe10f 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3466,7 +3466,12 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
 		skb->inner_mac_header = inner_net - inner_mac_len;
 		skb->inner_network_header = inner_net;
 		skb->inner_transport_header = inner_trans;
-		skb_set_inner_protocol(skb, skb->protocol);
+
+		if (flags & BPF_F_ADJ_ROOM_ENCAP_L4_UDP &&
+		    inner_mac_len == ETH_HLEN)
+			skb_set_inner_protocol(skb, htons(ETH_P_TEB));
+		else
+			skb_set_inner_protocol(skb, skb->protocol);
 
 		skb->encapsulation = 1;
 		skb_set_network_header(skb, mac_len);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] bpf: in bpf_skb_adjust_room correct inner protocol for vxlan
  2021-02-08 11:38 [PATCH] bpf: in bpf_skb_adjust_room correct inner protocol for vxlan huangxuesen
@ 2021-02-08 13:06 ` Willem de Bruijn
  2021-02-09 10:41   ` 黄学森
  0 siblings, 1 reply; 4+ messages in thread
From: Willem de Bruijn @ 2021-02-08 13:06 UTC (permalink / raw)
  To: huangxuesen
  Cc: David Miller, bpf, Daniel Borkmann, Network Development,
	linux-kernel, huangxuesen, chengzhiyong, wangli

On Mon, Feb 8, 2021 at 7:16 AM huangxuesen <hxseverything@gmail.com> wrote:
>
> From: huangxuesen <huangxuesen@kuaishou.com>
>
> When pushing vxlan tunnel header, set inner protocol as ETH_P_TEB in skb
> to avoid HW device disabling udp tunnel segmentation offload, just like
> vxlan_build_skb does.
>
> Drivers for NIC may invoke vxlan_features_check to check the
> inner_protocol in skb for vxlan packets to decide whether to disable
> NETIF_F_GSO_MASK. Currently it sets inner_protocol as the original
> skb->protocol, that will make mlx5_core disable TSO and lead to huge
> performance degradation.
>
> Signed-off-by: huangxuesen <huangxuesen@kuaishou.com>
> Signed-off-by: chengzhiyong <chengzhiyong@kuaishou.com>
> Signed-off-by: wangli <wangli09@kuaishou.com>
> ---
>  net/core/filter.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 255aeee72402..f8d3ba3fe10f 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -3466,7 +3466,12 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
>                 skb->inner_mac_header = inner_net - inner_mac_len;
>                 skb->inner_network_header = inner_net;
>                 skb->inner_transport_header = inner_trans;
> -               skb_set_inner_protocol(skb, skb->protocol);
> +
> +               if (flags & BPF_F_ADJ_ROOM_ENCAP_L4_UDP &&
> +                   inner_mac_len == ETH_HLEN)
> +                       skb_set_inner_protocol(skb, htons(ETH_P_TEB));

This may be used by vxlan, but it does not imply it.

Adding ETH_HLEN bytes likely means pushing an Ethernet header, but same point.

Conversely, pushing an Ethernet header is not limited to UDP encap.

This probably needs a new explicit BPF_F_ADJ_ROOM_.. flag, rather than
trying to infer from imprecise heuristics.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] bpf: in bpf_skb_adjust_room correct inner protocol for vxlan
  2021-02-08 13:06 ` Willem de Bruijn
@ 2021-02-09 10:41   ` 黄学森
  2021-02-09 13:48     ` Willem de Bruijn
  0 siblings, 1 reply; 4+ messages in thread
From: 黄学森 @ 2021-02-09 10:41 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: David Miller, bpf, Daniel Borkmann, Network Development,
	linux-kernel, chengzhiyong, wangli

Appreciate for your reply Willem!

The original intention of this commit is that when we use bpf_skb_adjust_room  to encapsulate 
Vxlan packets, we find some powerful device features disabled. 

Setting the inner_protocol directly as skb->protocol is the root cause.

I understand that it’s not easy to handle all tunnel protocol in one bpf helper function. But for my
immature idea, when pushing Ethernet header, setting the inner_protocol as ETH_P_TEB may
be better.

Now the flag BPF_F_ADJ_ROOM_ENCAP_L4_UDP includes many udp tunnel types( e.g. 
udp+mpls, geneve, vxlan). Adding an independent flag to represents Vxlan looks a little 
reduplicative. What’s your suggestion?

Thanks again for your reply!



> 2021年2月8日 下午9:06，Willem de Bruijn <willemdebruijn.kernel@gmail.com> 写道：
> 
> On Mon, Feb 8, 2021 at 7:16 AM huangxuesen <hxseverything@gmail.com> wrote:
>> 
>> From: huangxuesen <huangxuesen@kuaishou.com>
>> 
>> When pushing vxlan tunnel header, set inner protocol as ETH_P_TEB in skb
>> to avoid HW device disabling udp tunnel segmentation offload, just like
>> vxlan_build_skb does.
>> 
>> Drivers for NIC may invoke vxlan_features_check to check the
>> inner_protocol in skb for vxlan packets to decide whether to disable
>> NETIF_F_GSO_MASK. Currently it sets inner_protocol as the original
>> skb->protocol, that will make mlx5_core disable TSO and lead to huge
>> performance degradation.
>> 
>> Signed-off-by: huangxuesen <huangxuesen@kuaishou.com>
>> Signed-off-by: chengzhiyong <chengzhiyong@kuaishou.com>
>> Signed-off-by: wangli <wangli09@kuaishou.com>
>> ---
>> net/core/filter.c | 7 ++++++-
>> 1 file changed, 6 insertions(+), 1 deletion(-)
>> 
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index 255aeee72402..f8d3ba3fe10f 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -3466,7 +3466,12 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
>>                skb->inner_mac_header = inner_net - inner_mac_len;
>>                skb->inner_network_header = inner_net;
>>                skb->inner_transport_header = inner_trans;
>> -               skb_set_inner_protocol(skb, skb->protocol);
>> +
>> +               if (flags & BPF_F_ADJ_ROOM_ENCAP_L4_UDP &&
>> +                   inner_mac_len == ETH_HLEN)
>> +                       skb_set_inner_protocol(skb, htons(ETH_P_TEB));
> 
> This may be used by vxlan, but it does not imply it.
> 
> Adding ETH_HLEN bytes likely means pushing an Ethernet header, but same point.
> 
> Conversely, pushing an Ethernet header is not limited to UDP encap.
> 
> This probably needs a new explicit BPF_F_ADJ_ROOM_.. flag, rather than
> trying to infer from imprecise heuristics.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] bpf: in bpf_skb_adjust_room correct inner protocol for vxlan
  2021-02-09 10:41   ` 黄学森
@ 2021-02-09 13:48     ` Willem de Bruijn
  0 siblings, 0 replies; 4+ messages in thread
From: Willem de Bruijn @ 2021-02-09 13:48 UTC (permalink / raw)
  To: 黄学森
  Cc: David Miller, bpf, Daniel Borkmann, Network Development,
	linux-kernel, chengzhiyong, wangli, Alan Maguire

On Tue, Feb 9, 2021 at 5:41 AM 黄学森 <hxseverything@gmail.com> wrote:
>
> Appreciate for your reply Willem!
>
> The original intention of this commit is that when we use bpf_skb_adjust_room  to encapsulate
> Vxlan packets, we find some powerful device features disabled.
>
> Setting the inner_protocol directly as skb->protocol is the root cause.
>
> I understand that it’s not easy to handle all tunnel protocol in one bpf helper function. But for my
> immature idea, when pushing Ethernet header, setting the inner_protocol as ETH_P_TEB may
> be better.
>
> Now the flag BPF_F_ADJ_ROOM_ENCAP_L4_UDP includes many udp tunnel types( e.g.
> udp+mpls, geneve, vxlan). Adding an independent flag to represents Vxlan looks a little
> reduplicative. What’s your suggestion?

Agreed. I don't mean to add a vxlan specific flag.

Instead, a way to identify that the encapsulation includes a mac
header. To a certain extent, that already exists as of commit
58dfc900faff ("bpf: add layer 2 encap support to
bpf_skb_adjust_room"). That computes an inner_maclen. It makes sense
that inner_protocol needs to be updated if inner_maclen indicates a
mac header.

I would only not infer it based on some imprecise measure, such as
inner_maclen being 14. But add a new explicit flag
BPF_F_ADJ_ROOM_ENCAP_L2_ETH. Update inner protocol if the flag is
passed and inner_maclen >= ETH_HLEN. Fail the operation if the flag is
passed and inner_maclen is too short.

> Thanks again for your reply!
>
>
>
> > 2021年2月8日 下午9:06，Willem de Bruijn <willemdebruijn.kernel@gmail.com> 写道：
> >
> > On Mon, Feb 8, 2021 at 7:16 AM huangxuesen <hxseverything@gmail.com> wrote:
> >>
> >> From: huangxuesen <huangxuesen@kuaishou.com>
> >>
> >> When pushing vxlan tunnel header, set inner protocol as ETH_P_TEB in skb
> >> to avoid HW device disabling udp tunnel segmentation offload, just like
> >> vxlan_build_skb does.
> >>
> >> Drivers for NIC may invoke vxlan_features_check to check the
> >> inner_protocol in skb for vxlan packets to decide whether to disable
> >> NETIF_F_GSO_MASK. Currently it sets inner_protocol as the original
> >> skb->protocol, that will make mlx5_core disable TSO and lead to huge
> >> performance degradation.
> >>
> >> Signed-off-by: huangxuesen <huangxuesen@kuaishou.com>
> >> Signed-off-by: chengzhiyong <chengzhiyong@kuaishou.com>
> >> Signed-off-by: wangli <wangli09@kuaishou.com>
> >> ---
> >> net/core/filter.c | 7 ++++++-
> >> 1 file changed, 6 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/net/core/filter.c b/net/core/filter.c
> >> index 255aeee72402..f8d3ba3fe10f 100644
> >> --- a/net/core/filter.c
> >> +++ b/net/core/filter.c
> >> @@ -3466,7 +3466,12 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
> >>                skb->inner_mac_header = inner_net - inner_mac_len;
> >>                skb->inner_network_header = inner_net;
> >>                skb->inner_transport_header = inner_trans;
> >> -               skb_set_inner_protocol(skb, skb->protocol);
> >> +
> >> +               if (flags & BPF_F_ADJ_ROOM_ENCAP_L4_UDP &&
> >> +                   inner_mac_len == ETH_HLEN)
> >> +                       skb_set_inner_protocol(skb, htons(ETH_P_TEB));
> >
> > This may be used by vxlan, but it does not imply it.
> >
> > Adding ETH_HLEN bytes likely means pushing an Ethernet header, but same point.
> >
> > Conversely, pushing an Ethernet header is not limited to UDP encap.
> >
> > This probably needs a new explicit BPF_F_ADJ_ROOM_.. flag, rather than
> > trying to infer from imprecise heuristics.
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-02-09 13:51 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-08 11:38 [PATCH] bpf: in bpf_skb_adjust_room correct inner protocol for vxlan huangxuesen
2021-02-08 13:06 ` Willem de Bruijn
2021-02-09 10:41   ` 黄学森
2021-02-09 13:48     ` Willem de Bruijn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).