From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 530AEC433E0 for ; Tue, 9 Feb 2021 13:52:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 03AA764ED5 for ; Tue, 9 Feb 2021 13:52:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231904AbhBINwC (ORCPT ); Tue, 9 Feb 2021 08:52:02 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32814 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231956AbhBINtu (ORCPT ); Tue, 9 Feb 2021 08:49:50 -0500 Received: from mail-ej1-x633.google.com (mail-ej1-x633.google.com [IPv6:2a00:1450:4864:20::633]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D20D3C061797; Tue, 9 Feb 2021 05:49:08 -0800 (PST) Received: by mail-ej1-x633.google.com with SMTP id a9so31540402ejr.2; Tue, 09 Feb 2021 05:49:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=EFHWJqOuZ5oCfySHPR3lGIAEM6/c2S56gFx0hAWTits=; b=Ppv7Q7ejOG9vxA58OJEbZVdzNl7aX/qp2kwYiGA43KDSoKyQ/O+Xuo1h5pbXDi7Ihv m+nOm1oG/rMqw5wq7RA/qdEqVjoIzBBzxVOVhZbbSNNzSm59kmnQ8CrSYf4VHuABfoRz q7z+ZLHD5lB0VJIEvIG4s4hjEUscbocpcQ092Ui9Pqivm98hvHh+ji9xVq66RSU7/MIi iokQyZCNIaUO1P+vfSBagS6Mwol9B3pNSpFjPvzeq8LCtMnyJz1VgrLbQ1XpDeuRIbVj +asGO79iHe9LOxGKRC1ZCR8SMd6BhhNjnWuJFBbAV57WADu3Pxp1B+snMkjhRRPbILce GmtQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=EFHWJqOuZ5oCfySHPR3lGIAEM6/c2S56gFx0hAWTits=; b=fGfNVFWp464zCeDR6SJnDU8j1rXJO74woAm7MQfk5GjVP/xjI/T51eoCXKCYju700B /3okztziAcz1vAGKl6NBm77Vy87CNQnMcJKH8ol5GWqbE0lkzvQgR58SGWulX9renn+a 3TtdysE9kvMQkI7DTnUUJN2corgsneM/Nu2nms7bQroz0CRPpqQoLvPwmpjRXuXle0dg 3f/la18uNzRfD1XOMdM46TV1IgrX7Nx9U7HdbeCEYotoRkQWb7Qa5+No67blfT/vwBCb joWusy75c6MjFcSNaL9QJ3nEKRb77h2ADjsAklbOUp2K5LisOz/l/KGp5gRbstpr/1gn T55Q== X-Gm-Message-State: AOAM5319f1A5g4H1PpaURxi/fXopv2wCqb8eqZmFFMUFdxdJklRaLgcz nwIl7EvPmZsHBTHFKP23/SpvVccPCT+OrfkxSDI= X-Google-Smtp-Source: ABdhPJzW3Fvi8O1I3RdTpgDrVo+V4N7HeT6KVKLgNNjdyfYAbynwehos99I2n2hC2EeG6e4aVy/1iB1HAvNbc6zHnDs= X-Received: by 2002:a17:906:158c:: with SMTP id k12mr22774572ejd.119.1612878542051; Tue, 09 Feb 2021 05:49:02 -0800 (PST) MIME-Version: 1.0 References: <20210208113810.11118-1-hxseverything@gmail.com> <8552C5F8-8410-4E81-8AF4-7018878AFCDC@gmail.com> In-Reply-To: <8552C5F8-8410-4E81-8AF4-7018878AFCDC@gmail.com> From: Willem de Bruijn Date: Tue, 9 Feb 2021 08:48:26 -0500 Message-ID: Subject: Re: [PATCH] bpf: in bpf_skb_adjust_room correct inner protocol for vxlan To: =?UTF-8?B?6buE5a2m5qOu?= Cc: David Miller , bpf , Daniel Borkmann , Network Development , linux-kernel , chengzhiyong , wangli , Alan Maguire Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 9, 2021 at 5:41 AM =E9=BB=84=E5=AD=A6=E6=A3=AE wrote: > > Appreciate for your reply Willem! > > The original intention of this commit is that when we use bpf_skb_adjust_= room to encapsulate > Vxlan packets, we find some powerful device features disabled. > > Setting the inner_protocol directly as skb->protocol is the root cause. > > I understand that it=E2=80=99s not easy to handle all tunnel protocol in = one bpf helper function. But for my > immature idea, when pushing Ethernet header, setting the inner_protocol a= s ETH_P_TEB may > be better. > > Now the flag BPF_F_ADJ_ROOM_ENCAP_L4_UDP includes many udp tunnel types( = e.g. > udp+mpls, geneve, vxlan). Adding an independent flag to represents Vxlan = looks a little > reduplicative. What=E2=80=99s your suggestion? Agreed. I don't mean to add a vxlan specific flag. Instead, a way to identify that the encapsulation includes a mac header. To a certain extent, that already exists as of commit 58dfc900faff ("bpf: add layer 2 encap support to bpf_skb_adjust_room"). That computes an inner_maclen. It makes sense that inner_protocol needs to be updated if inner_maclen indicates a mac header. I would only not infer it based on some imprecise measure, such as inner_maclen being 14. But add a new explicit flag BPF_F_ADJ_ROOM_ENCAP_L2_ETH. Update inner protocol if the flag is passed and inner_maclen >=3D ETH_HLEN. Fail the operation if the flag is passed and inner_maclen is too short. > Thanks again for your reply! > > > > > 2021=E5=B9=B42=E6=9C=888=E6=97=A5 =E4=B8=8B=E5=8D=889:06=EF=BC=8CWillem= de Bruijn =E5=86=99=E9=81=93=EF=BC=9A > > > > On Mon, Feb 8, 2021 at 7:16 AM huangxuesen wr= ote: > >> > >> From: huangxuesen > >> > >> When pushing vxlan tunnel header, set inner protocol as ETH_P_TEB in s= kb > >> to avoid HW device disabling udp tunnel segmentation offload, just lik= e > >> vxlan_build_skb does. > >> > >> Drivers for NIC may invoke vxlan_features_check to check the > >> inner_protocol in skb for vxlan packets to decide whether to disable > >> NETIF_F_GSO_MASK. Currently it sets inner_protocol as the original > >> skb->protocol, that will make mlx5_core disable TSO and lead to huge > >> performance degradation. > >> > >> Signed-off-by: huangxuesen > >> Signed-off-by: chengzhiyong > >> Signed-off-by: wangli > >> --- > >> net/core/filter.c | 7 ++++++- > >> 1 file changed, 6 insertions(+), 1 deletion(-) > >> > >> diff --git a/net/core/filter.c b/net/core/filter.c > >> index 255aeee72402..f8d3ba3fe10f 100644 > >> --- a/net/core/filter.c > >> +++ b/net/core/filter.c > >> @@ -3466,7 +3466,12 @@ static int bpf_skb_net_grow(struct sk_buff *skb= , u32 off, u32 len_diff, > >> skb->inner_mac_header =3D inner_net - inner_mac_len; > >> skb->inner_network_header =3D inner_net; > >> skb->inner_transport_header =3D inner_trans; > >> - skb_set_inner_protocol(skb, skb->protocol); > >> + > >> + if (flags & BPF_F_ADJ_ROOM_ENCAP_L4_UDP && > >> + inner_mac_len =3D=3D ETH_HLEN) > >> + skb_set_inner_protocol(skb, htons(ETH_P_TEB)); > > > > This may be used by vxlan, but it does not imply it. > > > > Adding ETH_HLEN bytes likely means pushing an Ethernet header, but same= point. > > > > Conversely, pushing an Ethernet header is not limited to UDP encap. > > > > This probably needs a new explicit BPF_F_ADJ_ROOM_.. flag, rather than > > trying to infer from imprecise heuristics. >