From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS, USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A826C43381 for ; Tue, 12 Mar 2019 14:13:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4409D2075C for ; Tue, 12 Mar 2019 14:13:05 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=resnulli-us.20150623.gappssmtp.com header.i=@resnulli-us.20150623.gappssmtp.com header.b="czzypbaM" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726746AbfCLONE (ORCPT ); Tue, 12 Mar 2019 10:13:04 -0400 Received: from mail-wr1-f65.google.com ([209.85.221.65]:43067 "EHLO mail-wr1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726192AbfCLOND (ORCPT ); Tue, 12 Mar 2019 10:13:03 -0400 Received: by mail-wr1-f65.google.com with SMTP id d17so2858576wre.10 for ; Tue, 12 Mar 2019 07:13:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=resnulli-us.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=YEYi6dqb+TaaVcoqLmDGwbioLCCNRm8Nrz6hVme2Fns=; b=czzypbaMtQj6iAvh52nYhVL0QLj+LDcXo2+xNb6ZJLC3V9p0i/wdyT7b4Yk/WNPwxT IH+A4keqDKTUlrJmfkzJYlXndpsBz7iGQT2uR3zzYntlG9GyuQ4i+dE6G8h6GyxMvx3D owMiw+Xww3piLpONM+2mgiXrTtF2c8MW4ke9kea7DRQJB3VAplnnmjBCZYEz47P18XzN 35ZraFPvl2NswKH3plIl48v5A94/5CGKBkJl7vc+4S9s0wfl5VasSmeQGhiLkQCwGnnZ wfiuWkT8godYXSs6CS4aMNSh4HPknYfy7DLKlYOER4N6sXRohKGobsxpmePwts6shkFQ lIzw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=YEYi6dqb+TaaVcoqLmDGwbioLCCNRm8Nrz6hVme2Fns=; b=G8FHX3gT5b8I8t50mc+x5AGvcVH2ZiINz5+NX7ecBv42oynOpeFUwdkFtRfITFoGR9 eLwDTbj4bIdqH3pJGyshgoKoGnk3uU/DvISQDeOrwbsnZIX6NvuXVgEzPJuMJfWkS3xg +rrfWtKZEpt22AB5sJ25WAAOARRBL2m7RDpFhkciZM5LWnrGuKyu0VJMTcJJ9bYC0qX0 YkzfoqHICRVKEWy/K+Dend/CLqTyiqgPY+KpmSNZivFOtIT4jLwY8r648Edf6lYdEmuD Q+OQH2BSxcy5Wi8fDMMrKkDS9vb/lO7ZqV9ey+8ajGjVMPpPnLYQ/maebZgcrEJotDzM Xetg== X-Gm-Message-State: APjAAAV7t90NB2E9tiH1ZWm0Pg1uwsCx57TYwGfxG827phcTgtZ+zLmB 57nIq8AG+GZYzCUNVLvoNybrRSqFXXA= X-Google-Smtp-Source: APXvYqxyJkEpc6BohE9BpkubzBPUJc1A1daaBPspf6lBEHLRNVH/yu/svpIVGDTdJittB5l866iNXg== X-Received: by 2002:adf:e883:: with SMTP id d3mr2876201wrm.224.1552399980950; Tue, 12 Mar 2019 07:13:00 -0700 (PDT) Received: from localhost (mail.chocen-mesto.cz. [85.163.43.2]) by smtp.gmail.com with ESMTPSA id d2sm10176900wrq.94.2019.03.12.07.12.59 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 12 Mar 2019 07:13:00 -0700 (PDT) Date: Tue, 12 Mar 2019 15:02:39 +0100 From: Jiri Pirko To: Jakub Kicinski Cc: davem@davemloft.net, netdev@vger.kernel.org, oss-drivers@netronome.com Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports Message-ID: <20190312140239.GA2455@nanopsycho> References: <20190305110601.GC2314@nanopsycho> <20190305091534.36200de6@cakuba.hsd1.ca.comcast.net> <20190306122037.GB2819@nanopsycho> <20190306095638.7c028bdd@cakuba.hsd1.ca.comcast.net> <20190307094816.GA2190@nanopsycho> <20190307185202.2db37490@cakuba.hsd1.ca.comcast.net> <20190308145421.GA2888@nanopsycho.orion> <20190308110943.2ee42bc0@cakuba.hsd1.ca.comcast.net> <20190311085204.GA2194@nanopsycho> <20190311191054.36b801d6@cakuba.hsd1.ca.comcast.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190311191054.36b801d6@cakuba.hsd1.ca.comcast.net> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Tue, Mar 12, 2019 at 03:10:54AM CET, jakub.kicinski@netronome.com wrote: >On Mon, 11 Mar 2019 09:52:04 +0100, Jiri Pirko wrote: >> Fri, Mar 08, 2019 at 08:09:43PM CET, jakub.kicinski@netronome.com wrote: >> >If the switchport is in the hypervisor then only the hypervisor can >> >control switching/forwarding, correct? >> >> Correct. >> >> >The primary use case for partitioning within a VM (of a VF) would be >> >containers (and DPDK)? >> >> Makes sense. >> >> >SR-IOV makes things harder. Splitting a PF is reasonably easy to grasp. >> >I'm trying to get a sense of is how would we control an SR-IOV >> >environment as a whole. >> >> You mean orchestration? > >Right, orchestration. > >To be clear on where I'm going with this - if we want to allow VFs >to partition themselves then they have to control what is effectively >a "nested" switch. A per-VF set of rules which would the get Wait. If you allow to make VF subports (I believe that is what you ment by VFs partition themselves), that does not mean they will have a separate nested switch. They would still belong under the same one. >"flattened" into the main eswitch rule set. If I was to choose I'd >really rather have this "flattening" be done on the (Linux) hypervisor >and not in the vendor driver and firmware. Agreed. Driver should provide one big switch. User should configure it. > >I'd much rather have the VM make a "give me another NIC" orchestration >call via some high level REST API than devlink. This makes the >configuration strictly high level to low level: > > VM -> cloud net REST API -> cloud agent -> devlink/Linux -> FW -> HW > >Without round trips via firmware. Okay. So the "devlink/Linux -> FW" part is going to happen on baremetal. Makes sense. > >This allows for easy policy enforcement, common code to be maintained >in user space, in high level languages (no 0.5M LoC drivers and 10M LoC >firmware for every driver). It can also be used with software paths >like VirtIO.. Agreed. > >Modelling and debugging a nested switch would be a nightmare. What >follows is that we probably shouldn't deal with partitioning of VFs, >but rather only partition via the PF devlink instance, and reassign >the partitions to VMs. Agreed. That must be misunderstanding, I never suggested nested switches. > >> I originally planned to implement sriov orchestration api in devlink too. > >Interesting, would you mind elaborating? I have to think about it. But something like this: After bootup, you see only physical port, PF switch port and PF host leg. $ devlink port show pci/0000:05:00.0/0: type eth netdev enp5s0np0 flavour physical switch_id 00154d130d2 pci/0000:05:00.0/1: type eth netdev ??? flavour pci_pf_host peer pci/0000:05:00.0/10000 pci/0000:05:00.0/10000: type eth netdev enp5s0npf0pf0s0 flavour pci_pf pf 0 subport 0 switch_id 00154d130d2f peer pci/0000:05:00.0/1 To create new PF subport under PF 0: $ devlink dev port add pci/0000:05:00.0 flavour pci_pf pf 0 $ devlink port show pci/0000:05:00.0/0: type eth netdev enp5s0np0 flavour physical switch_id 00154d130d2 pci/0000:05:00.0/1: type eth netdev ??? flavour pci_pf_host peer pci/0000:05:00.0/10000 pci/0000:05:00.0/10000: type eth netdev enp5s0npf0pf0s0 flavour pci_pf pf 0 subport 0 switch_id 00154d130d2f peer pci/0000:05:00.0/1 pci/0000:05:00.0/2: type eth netdev ??? flavour pci_pf_host <<<<<<<<<<<<<<<<<< peer pci/0000:05:00.0/10001 <<<<<<<<<<<<<<<<<< pci/0000:05:00.0/10001: type eth netdev enp5s0npf0pf0s0 flavour pci_pf pf 0 subport 1 <<<<<<<<<<<<<<<<<< switch_id 00154d130d2f peer pci/0000:05:00.0/2 <<<<<<<<<<<<<<<<<< To create a new VF under PF 0: $ devlink dev port add pci/0000:05:00.0 flavour pci_vf pf 0 $ devlink port show pci/0000:05:00.0/0: type eth netdev enp5s0np0 flavour physical switch_id 00154d130d2 pci/0000:05:00.0/1: type eth netdev ??? flavour pci_pf_host peer pci/0000:05:00.0/10000 pci/0000:05:00.0/10000: type eth netdev enp5s0npf0pf0s0 flavour pci_pf pf 0 subport 0 switch_id 00154d130d2f peer pci/0000:05:00.0/1 pci/0000:05:00.0/2: type eth netdev ??? flavour pci_pf_host peer pci/0000:05:00.0/10001 pci/0000:05:00.0/10001: type eth netdev enp5s0npf0pf0s0 flavour pci_pf pf 0 subport 1 switch_id 00154d130d2f peer pci/0000:05:00.0/2 pci/0000:05:10.1/0: type eth netdev ??? flavour pci_vf_host <<<<<<<<<<<<<<<<<< peer pci/0000:05:00.0/10002 <<<<<<<<<<<<<<<<<< pci/0000:05:00.0/10002: type eth netdev enp5s0npf0pf0s0 flavour pci_vf pf 0 vf 0 <<<<<<<<<<<<<<<<<< switch_id 00154d130d2f peer pci/0000:05:10.1/0 <<<<<<<<<<<<<<<<<< So new VF is created. To delete, use would need to use the port which is in eswitch: $ devlink port del pci/0000:05:00.0/2 devlink answers: Operation not permitted $ devlink port del pci/0000:05:00.0/10001 <<<<<<<<<< this $ devlink port del pci/0000:05:10.1/0 devlink answers: Operation not permitted $ devlink port del pci/0000:05:00.0/10002 <<<<<<<<<< this This actually removes VF. For VF subports this would work too, we just have to have "subport" attribute not only for PFs but also for VFs: To create a new VF subport under PF 0 and VF 0: $ devlink dev port add pci/0000:05:00.0 flavour pci_vf pf 0 vf 0 $ devlink port show pci/0000:05:00.0/0: type eth netdev enp5s0np0 flavour physical switch_id 00154d130d2 pci/0000:05:00.0/1: type eth netdev ??? flavour pci_pf_host peer pci/0000:05:00.0/10000 pci/0000:05:00.0/10000: type eth netdev enp5s0npf0pf0s0 flavour pci_pf pf 0 subport 0 switch_id 00154d130d2f peer pci/0000:05:00.0/1 pci/0000:05:00.0/2: type eth netdev ??? flavour pci_pf_host peer pci/0000:05:00.0/10001 pci/0000:05:00.0/10001: type eth netdev enp5s0npf0pf0s0 flavour pci_pf pf 0 subport 1 switch_id 00154d130d2f peer pci/0000:05:00.0/2 pci/0000:05:10.1/0: type eth netdev ??? flavour pci_vf_host peer pci/0000:05:00.0/10002 pci/0000:05:00.0/10002: type eth netdev enp5s0npf0pf0s0 flavour pci_vf pf 0 vf 0 subport 0 switch_id 00154d130d2f peer pci/0000:05:10.1/0 pci/0000:05:10.1/1: type eth netdev ??? flavour pci_vf_host <<<<<<<<<<<<<<<<<< peer pci/0000:05:00.0/10003 <<<<<<<<<<<<<<<<<< pci/0000:05:00.0/10003: type eth netdev enp5s0npf0pf0s0 flavour pci_vf pf 0 vf 0 subport 1 <<<<<<<<<<<<<<<<<< switch_id 00154d130d2f peer pci/0000:05:10.1/1 <<<<<<<<<<<<<<<<<<