From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 84845C4743D for ; Tue, 8 Jun 2021 07:36:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 643646127A for ; Tue, 8 Jun 2021 07:36:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230119AbhFHHhd (ORCPT ); Tue, 8 Jun 2021 03:37:33 -0400 Received: from szxga02-in.huawei.com ([45.249.212.188]:4508 "EHLO szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229536AbhFHHhc (ORCPT ); Tue, 8 Jun 2021 03:37:32 -0400 Received: from dggemv703-chm.china.huawei.com (unknown [172.30.72.57]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4Fzhm04RkTzZdwx; Tue, 8 Jun 2021 15:32:48 +0800 (CST) Received: from dggpemm500005.china.huawei.com (7.185.36.74) by dggemv703-chm.china.huawei.com (10.3.19.46) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Tue, 8 Jun 2021 15:35:37 +0800 Received: from [127.0.0.1] (10.69.30.204) by dggpemm500005.china.huawei.com (7.185.36.74) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.1.2176.2; Tue, 8 Jun 2021 15:35:37 +0800 Subject: Re: [PATCH RESEND iproute2-next] devlink: Add optional controller user input To: Parav Pandit , "dsahern@gmail.com" , "stephen@networkplumber.org" , "netdev@vger.kernel.org" CC: Jiri Pirko , "moyufeng@huawei.com" , "linuxarm@openeuler.org" References: <20210603111901.9888-1-parav@nvidia.com> <338a2463-eb3a-f642-a288-9ae45f721992@huawei.com> <8c3e48ce-f5ed-d35d-4f5e-1b572f251bd1@huawei.com> From: Yunsheng Lin Message-ID: <17a59ab0-be25-3588-dd1e-9497652bfe23@huawei.com> Date: Tue, 8 Jun 2021 15:35:36 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [10.69.30.204] X-ClientProxiedBy: dggeme706-chm.china.huawei.com (10.1.199.102) To dggpemm500005.china.huawei.com (7.185.36.74) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On 2021/6/8 13:26, Parav Pandit wrote: >> From: Yunsheng Lin >> Sent: Tuesday, June 8, 2021 8:58 AM >> >> On 2021/6/7 19:12, Parav Pandit wrote: >>>> From: Yunsheng Lin >>>> Sent: Monday, June 7, 2021 4:27 PM >>>> >> >> [..] >> >>>>> >>>>>> 2. each PF's devlink instance has three types of port, which is >>>>>> FLAVOUR_PHYSICAL, FLAVOUR_PCI_PF and >>>> FLAVOUR_PCI_VF(supposing I >>>>>> understand >>>>>> port flavour correctly). >>>>>> >>>>> FLAVOUR_PCI_{PF,VF,SF} belongs to eswitch (representor) side on >>>> switchdev device. >>>> >>>> If devlink instance or eswitch is in DEVLINK_ESWITCH_MODE_LEGACY >>>> mode, the FLAVOUR_PCI_{PF,VF,SF} port instance does not need to >> created? >>> No. in eswitch legacy, there are no representor netdevice or devlink ports. >> >> It seems each devlink port instance corresponds to a netdevice. >> More specificly, the devlink instance is created in the struct pci_driver' probe >> function of a pci function, a devlink port instance is created and registered to >> that devlink instance when a netdev of that pci function is created? >> > Yes. > >> As in diagram [1], the devlink port instance(flavour FLAVOUR_PHYSICAL) for >> ctrl-0-pf0 is created when the netdev of ctrl-0-pf0 is created in the host of >> smartNIC, the devlink port instance(flavour FLAVOUR_VIRTUAL) for ctrl-0- >> pf0vfN is created when the netdev of ctrl-0-pf0vfN is created in the host of >> smartNIC, right? >> > Ctrl-0-pf0vfN, ctrl-0-pf0 ports are eswitch ports. They are created where there is eswitch. > Usually in smartnic where eswitch is located. Does diagram in [1] corresponds to the multi-host (two) host setup as memtioned previously? H1.pf0.phyical_port = p0. H1.pf1.phyical_port = p1. H2.pf0.phyical_port = p0. H2.pf1.phyical_port = p1. Let's say H1 = server and H2 = smartNIC as the pci rc connected to below: --------------------------------------------------------- | | | --------- --------- ------- ------- | ----------- | | vf(s) | | sf(s) | |vf(s)| |sf(s)| | | server | | ------- ----/---- ---/----- ------- ---/--- ---/--- | | pci rc |=== | pf0 |______/________/ | pf1 |___/_______/ | | connect | | ------- ------- | ----------- | | controller_num=1 (no eswitch) | ------|-------------------------------------------------- (internal wire) | --------------------------------------------------------- | devlink eswitch ports and reps | | ----------------------------------------------------- | | |ctrl-0 | ctrl-0 | ctrl-0 | ctrl-0 | ctrl-0 |ctrl-0 | | | |pf0 | pf0vfN | pf0sfN | pf1 | pf1vfN |pf1sfN | | | ----------------------------------------------------- | | |ctrl-1 | ctrl-1 | ctrl-1 | ctrl-1 | ctrl-1 |ctrl-1 | | | |pf0 | pf0vfN | pf0sfN | pf1 | pf1vfN |pf1sfN | | | ----------------------------------------------------- | | | | | ----------- | --------- --------- ------- ------- | | smartNIC| | | vf(s) | | sf(s) | |vf(s)| |sf(s)| | | pci rc |==| ------- ----/---- ---/----- ------- ---/--- ---/--- | | connect | | | pf0 |______/________/ | pf1 |___/_______/ | ----------- | ------- ------- | | | | local controller_num=0 (eswitch) | --------------------------------------------------------- A vanilla kernel can run on the smartNIC host, right? what the smartNIC host see is two PF corresponding to ctrl-0-pf0 and ctrl-0-pf1 When the kernel is boot up first and mlx driver is not loaded yet, right? I am not sure it is ok to leave out the VF and SF, but let's leave them out for simplicity now. When mlx driver is loaded, two devlink instances are created, which corresponds to ctrl-0-pf0 and ctrl-0-pf1, and two devlink port instances (flavour FLAVOUR_PHYSICAL) is created and registered to corresponding devlink instances just created, right? As the eswitch mode is based on devlink instance, Let's only set the mode of ctrl-0-pf0' devlink instance to DEVLINK_ESWITCH_MODE_SWITCHDEV, the representor netdev of ctrl-1-pf0 is created and devlink port instance of that representor netdev is created and registered to devlink instances corresponding to ctrl-0-pf0? I think I miss something here, the above does not seems right, because: 1. For single host case:the PF is not passed through to the VM, devlink port instance of VF's representor netdev can be registered to the devlink instance corresponding to it's PF, right? 2. But for two-host case as above, do we need to create a devlink instances for the PF corresponding to ctrl-1-pf0 in smartNIC host? >