* [PATCH] bpf: in bpf_skb_adjust_room correct inner protocol for vxlan @ 2021-02-08 11:38 huangxuesen 2021-02-08 13:06 ` Willem de Bruijn 0 siblings, 1 reply; 4+ messages in thread From: huangxuesen @ 2021-02-08 11:38 UTC (permalink / raw) To: davem Cc: bpf, daniel, netdev, linux-kernel, huangxuesen, chengzhiyong, wangli From: huangxuesen <huangxuesen@kuaishou.com> When pushing vxlan tunnel header, set inner protocol as ETH_P_TEB in skb to avoid HW device disabling udp tunnel segmentation offload, just like vxlan_build_skb does. Drivers for NIC may invoke vxlan_features_check to check the inner_protocol in skb for vxlan packets to decide whether to disable NETIF_F_GSO_MASK. Currently it sets inner_protocol as the original skb->protocol, that will make mlx5_core disable TSO and lead to huge performance degradation. Signed-off-by: huangxuesen <huangxuesen@kuaishou.com> Signed-off-by: chengzhiyong <chengzhiyong@kuaishou.com> Signed-off-by: wangli <wangli09@kuaishou.com> --- net/core/filter.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/net/core/filter.c b/net/core/filter.c index 255aeee72402..f8d3ba3fe10f 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -3466,7 +3466,12 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff, skb->inner_mac_header = inner_net - inner_mac_len; skb->inner_network_header = inner_net; skb->inner_transport_header = inner_trans; - skb_set_inner_protocol(skb, skb->protocol); + + if (flags & BPF_F_ADJ_ROOM_ENCAP_L4_UDP && + inner_mac_len == ETH_HLEN) + skb_set_inner_protocol(skb, htons(ETH_P_TEB)); + else + skb_set_inner_protocol(skb, skb->protocol); skb->encapsulation = 1; skb_set_network_header(skb, mac_len); -- 2.28.0 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] bpf: in bpf_skb_adjust_room correct inner protocol for vxlan 2021-02-08 11:38 [PATCH] bpf: in bpf_skb_adjust_room correct inner protocol for vxlan huangxuesen @ 2021-02-08 13:06 ` Willem de Bruijn 2021-02-09 10:41 ` 黄学森 0 siblings, 1 reply; 4+ messages in thread From: Willem de Bruijn @ 2021-02-08 13:06 UTC (permalink / raw) To: huangxuesen Cc: David Miller, bpf, Daniel Borkmann, Network Development, linux-kernel, huangxuesen, chengzhiyong, wangli On Mon, Feb 8, 2021 at 7:16 AM huangxuesen <hxseverything@gmail.com> wrote: > > From: huangxuesen <huangxuesen@kuaishou.com> > > When pushing vxlan tunnel header, set inner protocol as ETH_P_TEB in skb > to avoid HW device disabling udp tunnel segmentation offload, just like > vxlan_build_skb does. > > Drivers for NIC may invoke vxlan_features_check to check the > inner_protocol in skb for vxlan packets to decide whether to disable > NETIF_F_GSO_MASK. Currently it sets inner_protocol as the original > skb->protocol, that will make mlx5_core disable TSO and lead to huge > performance degradation. > > Signed-off-by: huangxuesen <huangxuesen@kuaishou.com> > Signed-off-by: chengzhiyong <chengzhiyong@kuaishou.com> > Signed-off-by: wangli <wangli09@kuaishou.com> > --- > net/core/filter.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/net/core/filter.c b/net/core/filter.c > index 255aeee72402..f8d3ba3fe10f 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -3466,7 +3466,12 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff, > skb->inner_mac_header = inner_net - inner_mac_len; > skb->inner_network_header = inner_net; > skb->inner_transport_header = inner_trans; > - skb_set_inner_protocol(skb, skb->protocol); > + > + if (flags & BPF_F_ADJ_ROOM_ENCAP_L4_UDP && > + inner_mac_len == ETH_HLEN) > + skb_set_inner_protocol(skb, htons(ETH_P_TEB)); This may be used by vxlan, but it does not imply it. Adding ETH_HLEN bytes likely means pushing an Ethernet header, but same point. Conversely, pushing an Ethernet header is not limited to UDP encap. This probably needs a new explicit BPF_F_ADJ_ROOM_.. flag, rather than trying to infer from imprecise heuristics. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] bpf: in bpf_skb_adjust_room correct inner protocol for vxlan 2021-02-08 13:06 ` Willem de Bruijn @ 2021-02-09 10:41 ` 黄学森 2021-02-09 13:48 ` Willem de Bruijn 0 siblings, 1 reply; 4+ messages in thread From: 黄学森 @ 2021-02-09 10:41 UTC (permalink / raw) To: Willem de Bruijn Cc: David Miller, bpf, Daniel Borkmann, Network Development, linux-kernel, chengzhiyong, wangli Appreciate for your reply Willem! The original intention of this commit is that when we use bpf_skb_adjust_room to encapsulate Vxlan packets, we find some powerful device features disabled. Setting the inner_protocol directly as skb->protocol is the root cause. I understand that it’s not easy to handle all tunnel protocol in one bpf helper function. But for my immature idea, when pushing Ethernet header, setting the inner_protocol as ETH_P_TEB may be better. Now the flag BPF_F_ADJ_ROOM_ENCAP_L4_UDP includes many udp tunnel types( e.g. udp+mpls, geneve, vxlan). Adding an independent flag to represents Vxlan looks a little reduplicative. What’s your suggestion? Thanks again for your reply! > 2021年2月8日 下午9:06,Willem de Bruijn <willemdebruijn.kernel@gmail.com> 写道: > > On Mon, Feb 8, 2021 at 7:16 AM huangxuesen <hxseverything@gmail.com> wrote: >> >> From: huangxuesen <huangxuesen@kuaishou.com> >> >> When pushing vxlan tunnel header, set inner protocol as ETH_P_TEB in skb >> to avoid HW device disabling udp tunnel segmentation offload, just like >> vxlan_build_skb does. >> >> Drivers for NIC may invoke vxlan_features_check to check the >> inner_protocol in skb for vxlan packets to decide whether to disable >> NETIF_F_GSO_MASK. Currently it sets inner_protocol as the original >> skb->protocol, that will make mlx5_core disable TSO and lead to huge >> performance degradation. >> >> Signed-off-by: huangxuesen <huangxuesen@kuaishou.com> >> Signed-off-by: chengzhiyong <chengzhiyong@kuaishou.com> >> Signed-off-by: wangli <wangli09@kuaishou.com> >> --- >> net/core/filter.c | 7 ++++++- >> 1 file changed, 6 insertions(+), 1 deletion(-) >> >> diff --git a/net/core/filter.c b/net/core/filter.c >> index 255aeee72402..f8d3ba3fe10f 100644 >> --- a/net/core/filter.c >> +++ b/net/core/filter.c >> @@ -3466,7 +3466,12 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff, >> skb->inner_mac_header = inner_net - inner_mac_len; >> skb->inner_network_header = inner_net; >> skb->inner_transport_header = inner_trans; >> - skb_set_inner_protocol(skb, skb->protocol); >> + >> + if (flags & BPF_F_ADJ_ROOM_ENCAP_L4_UDP && >> + inner_mac_len == ETH_HLEN) >> + skb_set_inner_protocol(skb, htons(ETH_P_TEB)); > > This may be used by vxlan, but it does not imply it. > > Adding ETH_HLEN bytes likely means pushing an Ethernet header, but same point. > > Conversely, pushing an Ethernet header is not limited to UDP encap. > > This probably needs a new explicit BPF_F_ADJ_ROOM_.. flag, rather than > trying to infer from imprecise heuristics. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] bpf: in bpf_skb_adjust_room correct inner protocol for vxlan 2021-02-09 10:41 ` 黄学森 @ 2021-02-09 13:48 ` Willem de Bruijn 0 siblings, 0 replies; 4+ messages in thread From: Willem de Bruijn @ 2021-02-09 13:48 UTC (permalink / raw) To: 黄学森 Cc: David Miller, bpf, Daniel Borkmann, Network Development, linux-kernel, chengzhiyong, wangli, Alan Maguire On Tue, Feb 9, 2021 at 5:41 AM 黄学森 <hxseverything@gmail.com> wrote: > > Appreciate for your reply Willem! > > The original intention of this commit is that when we use bpf_skb_adjust_room to encapsulate > Vxlan packets, we find some powerful device features disabled. > > Setting the inner_protocol directly as skb->protocol is the root cause. > > I understand that it’s not easy to handle all tunnel protocol in one bpf helper function. But for my > immature idea, when pushing Ethernet header, setting the inner_protocol as ETH_P_TEB may > be better. > > Now the flag BPF_F_ADJ_ROOM_ENCAP_L4_UDP includes many udp tunnel types( e.g. > udp+mpls, geneve, vxlan). Adding an independent flag to represents Vxlan looks a little > reduplicative. What’s your suggestion? Agreed. I don't mean to add a vxlan specific flag. Instead, a way to identify that the encapsulation includes a mac header. To a certain extent, that already exists as of commit 58dfc900faff ("bpf: add layer 2 encap support to bpf_skb_adjust_room"). That computes an inner_maclen. It makes sense that inner_protocol needs to be updated if inner_maclen indicates a mac header. I would only not infer it based on some imprecise measure, such as inner_maclen being 14. But add a new explicit flag BPF_F_ADJ_ROOM_ENCAP_L2_ETH. Update inner protocol if the flag is passed and inner_maclen >= ETH_HLEN. Fail the operation if the flag is passed and inner_maclen is too short. > Thanks again for your reply! > > > > > 2021年2月8日 下午9:06,Willem de Bruijn <willemdebruijn.kernel@gmail.com> 写道: > > > > On Mon, Feb 8, 2021 at 7:16 AM huangxuesen <hxseverything@gmail.com> wrote: > >> > >> From: huangxuesen <huangxuesen@kuaishou.com> > >> > >> When pushing vxlan tunnel header, set inner protocol as ETH_P_TEB in skb > >> to avoid HW device disabling udp tunnel segmentation offload, just like > >> vxlan_build_skb does. > >> > >> Drivers for NIC may invoke vxlan_features_check to check the > >> inner_protocol in skb for vxlan packets to decide whether to disable > >> NETIF_F_GSO_MASK. Currently it sets inner_protocol as the original > >> skb->protocol, that will make mlx5_core disable TSO and lead to huge > >> performance degradation. > >> > >> Signed-off-by: huangxuesen <huangxuesen@kuaishou.com> > >> Signed-off-by: chengzhiyong <chengzhiyong@kuaishou.com> > >> Signed-off-by: wangli <wangli09@kuaishou.com> > >> --- > >> net/core/filter.c | 7 ++++++- > >> 1 file changed, 6 insertions(+), 1 deletion(-) > >> > >> diff --git a/net/core/filter.c b/net/core/filter.c > >> index 255aeee72402..f8d3ba3fe10f 100644 > >> --- a/net/core/filter.c > >> +++ b/net/core/filter.c > >> @@ -3466,7 +3466,12 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff, > >> skb->inner_mac_header = inner_net - inner_mac_len; > >> skb->inner_network_header = inner_net; > >> skb->inner_transport_header = inner_trans; > >> - skb_set_inner_protocol(skb, skb->protocol); > >> + > >> + if (flags & BPF_F_ADJ_ROOM_ENCAP_L4_UDP && > >> + inner_mac_len == ETH_HLEN) > >> + skb_set_inner_protocol(skb, htons(ETH_P_TEB)); > > > > This may be used by vxlan, but it does not imply it. > > > > Adding ETH_HLEN bytes likely means pushing an Ethernet header, but same point. > > > > Conversely, pushing an Ethernet header is not limited to UDP encap. > > > > This probably needs a new explicit BPF_F_ADJ_ROOM_.. flag, rather than > > trying to infer from imprecise heuristics. > ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-02-09 13:52 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-02-08 11:38 [PATCH] bpf: in bpf_skb_adjust_room correct inner protocol for vxlan huangxuesen 2021-02-08 13:06 ` Willem de Bruijn 2021-02-09 10:41 ` 黄学森 2021-02-09 13:48 ` Willem de Bruijn
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.