From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B61DDC2BBCD for ; Fri, 18 Dec 2020 03:14:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 85BC0239EB for ; Fri, 18 Dec 2020 03:14:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732284AbgLRDML (ORCPT ); Thu, 17 Dec 2020 22:12:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40978 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727196AbgLRDML (ORCPT ); Thu, 17 Dec 2020 22:12:11 -0500 Received: from mail-il1-x12d.google.com (mail-il1-x12d.google.com [IPv6:2607:f8b0:4864:20::12d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 16B09C06138C; Thu, 17 Dec 2020 19:11:31 -0800 (PST) Received: by mail-il1-x12d.google.com with SMTP id x15so910429ilq.1; Thu, 17 Dec 2020 19:11:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=UBUXaP6l2Qi4Gx1FjoPyjKe2Fr+H1yUYAGwBlsF05jM=; b=CW3GrgoHk6PhhRj2enQgSJm9hhe+kQRuY4x0LmTgCnbSumhbkHSHmp5/RVl2jSor9s 81wh+c/tAmiH/uGPp4wjQoIn3ft2riGRpfd6EasiwsK3uVF8mbkzht5a+C4YwTzK4kH2 V+yYc6iXO82feXMER1pKmdqJZqLUnd/QLXbP48sBlagmnPkVP52m1WHm4Z0xbd7Y6Jkf mhCrwdcAK6RXJSHfkNU3pCdisCVReeO55NLYhCn+D3IsOqI1YX4yxE1FHmXXXCVI/5L6 TstsQyfv7Rt+cmhW3ftO8/B0AgYLtMqTGA0YDtfh083/C/sC+uIIHc9PZNpObv6if+JB Bluw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=UBUXaP6l2Qi4Gx1FjoPyjKe2Fr+H1yUYAGwBlsF05jM=; b=mRULnnPcsi5OmVTb+z7+A3xIxk3u3jl9CBsK+mLDB0WxD1F1dMcn0+luKEGdr+VfyY 4bGdvV+36NhUXmB8KSBQ3OmhTUqDSP+KqCqP0NpwdXPoLrEXbccTqD57ktY+fFUE8IfR ueFMsm8+UJjuoP7OOdjnnD9NTgLEEENuFc1KIXvW8YVa4SoKaGL6T9V+P+UHkkO31RyE k7j8A6kOqMf8dKJbrnjXCboOtBEdEm6tFAxeONg2lg3QQF1cTRjI3zQgg8eW7dU709Wx CPJNdNPeoljl97ZJdWNwtYj91Tap+ETTN8832Yz5HEt4kZaipk0/UCmGo/mD98BmXoSM 2grg== X-Gm-Message-State: AOAM531yVWmlYDQY1sNmElZbq41Q21OOjMmBtjFfxcnhG4NEVAcDj0+u rLLX0W4ePEHwWisNEOX9vvU8S5in/H/nC+JQpew= X-Google-Smtp-Source: ABdhPJyvesF1cyHa/WiJRus8Y9gicNHDSRJRA/LpcROSeW+GWRXM4cOKk8egTBuvuDP6SaZKxuDqz8hFTxX8NKwbBow= X-Received: by 2002:a92:d44f:: with SMTP id r15mr1867518ilm.237.1608261090360; Thu, 17 Dec 2020 19:11:30 -0800 (PST) MIME-Version: 1.0 References: <20201216001946.GF552508@nvidia.com> <20201216030351.GH552508@nvidia.com> <20201216133309.GI552508@nvidia.com> <20201216175112.GJ552508@nvidia.com> <20201216203537.GM552508@nvidia.com> In-Reply-To: From: Alexander Duyck Date: Thu, 17 Dec 2020 19:11:19 -0800 Message-ID: Subject: Re: [net-next v4 00/15] Add mlx5 subfunction support To: David Ahern Cc: Jason Gunthorpe , Saeed Mahameed , "David S. Miller" , Jakub Kicinski , Leon Romanovsky , Netdev , linux-rdma@vger.kernel.org, David Ahern , Jacob Keller , Sridhar Samudrala , "Ertman, David M" , Dan Williams , Kiran Patil , Greg KH Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org On Thu, Dec 17, 2020 at 5:30 PM David Ahern wrote: > > On 12/16/20 3:53 PM, Alexander Duyck wrote: > > The problem in my case was based on a past experience where east-west > > traffic became a problem and it was easily shown that bypassing the > > NIC for traffic was significantly faster. > > If a deployment expects a lot of east-west traffic *within a host* why > is it using hardware based isolation like a VF. That is a side effect of > a design choice that is remedied by other options. I am mostly talking about this from past experience as I had seen a few instances when I was at Intel when it became an issue. Sales and marketing people aren't exactly happy when you tell them "don't sell that" in response to them trying to sell a feature into an area where it doesn't belong. Generally they want a solution. The macvlan offload addressed these issues as the replication and local switching can be handled in software. The problem is PCIe DMA wasn't designed to function as a network switch fabric and when we start talking about a 400Gb NIC trying to handle over 256 subfunctions it will quickly reduce the receive/transmit throughput to gigabit or less speeds when encountering hardware multicast/broadcast replication. With 256 subfunctions a simple 60B ARP could consume more than 19KB of PCIe bandwidth due to the packet having to be duplicated so many times. In my mind it should be simpler to simply clone a single skb 256 times, forward that to the switchdev ports, and have them perform a bypass (if available) to deliver it to the subfunctions. That's why I was thinking it might be a good time to look at addressing it.