netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: Saeed Mahameed <saeed@kernel.org>
Cc: Parav Pandit <parav@nvidia.com>,
	"David S. Miller" <davem@davemloft.net>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Leon Romanovsky <leonro@nvidia.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	David Ahern <dsahern@kernel.org>,
	Jacob Keller <jacob.e.keller@intel.com>,
	Sridhar Samudrala <sridhar.samudrala@intel.com>,
	"david.m.ertman@intel.com" <david.m.ertman@intel.com>,
	"dan.j.williams@intel.com" <dan.j.williams@intel.com>,
	"kiran.patil@intel.com" <kiran.patil@intel.com>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	Jiri Pirko <jiri@nvidia.com>, Vu Pham <vuhuong@nvidia.com>
Subject: Re: [net-next v5 03/15] devlink: Introduce PCI SF port flavour and port attribute
Date: Fri, 18 Dec 2020 11:48:12 -0800	[thread overview]
Message-ID: <20201218114812.28db7084@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com> (raw)
In-Reply-To: <ecc117632ffa36ae374fb05ed4806af2d7d55576.camel@kernel.org>

On Wed, 16 Dec 2020 20:44:21 -0800 Saeed Mahameed wrote:
> On Wed, 2020-12-16 at 15:59 -0800, Jakub Kicinski wrote:
> > On Wed, 16 Dec 2020 03:42:51 +0000 Parav Pandit wrote:  
> > > > From: Jakub Kicinski <kuba@kernel.org>
> > > > So subfunctions don't have a VF id but they may have a
> > > > controller?
> > > >    
> > > Right. SF can be on external controller.
> > >    
> > > > Can you tell us more about the use cases and deployment models
> > > > you're
> > > > intending to support? Let's not add attributes and info which
> > > > will go unused.
> > > >     
> > > External will be used the same way how it is used for PF and VF.
> > >   
> > > > How are SFs supposed to be used with SmartNICs? Are you assuming
> > > > single
> > > > domain of control?    
> > > No. it is not assumed. SF can be deployed from smartnic to external
> > > host.
> > > A user has to pass appropriate controller number, pf number
> > > attributes during creation time.  
> > 
> > My problem with this series is that I've gotten some real life
> > application exposure over the last year, and still I have no idea 
> > who is going to find this feature useful and why.
> > 
> > That's the point of my questions in the previous email - what
> > are the use cases, how are they going to operate.
> >   
> 
> The main focus of this feature is scale-ability we want to run
> thousands of Containers/VMs, this is useful for both smartnic and
> baremetal hypervisor worlds, where security and control is exclusive to
> the eswitch manager may it be the smarnic embedded CPU or the x86
> Hypervisor.
> 
> deployment models is identical to SRIOV, the only difference is the
> instantiation model of SF, which is the main discussion point of this
> series (i hope), which to my taste is very modest and minimal.
> after SF is instantiated from that point nothing is new, the SF is
> exposing standard linux interfaces netdev/rdma identical to what VF
> does, most likely you will assign them a namespace and pass them
> through to a container or assign them (not direct assignment) to a VM
> via the virt stack, or create a vdpa instance and pass it to a virtio
> interface.
> 
> There are endless usecases for the netdev stack, for customers who want

"endless" :)

> high scale virtualized/containerized environments, with thousands of
> network functions that can deliver high speed and full offload
> accelerators, Native XDP, Crypto, encap/decap, and HW filtering and
> processing pipeline capabilities.
> 
> I have a long list of customers with various and different applications
> and i am not even talking about the rdma and vdpa customers ! those
> customers just can't wait to leave sriov behind and scale up !
> 
> this feature has a lot of value to the netdev users only because of the
> minimal foot print to the netdev stack (to be honest there is no change
> in netdev, only a thin API layer in devlink) and the immediate and
> effortless benefits to deploy multiple (accelerated) netdevs at scale.

The acceleration can hopefully be plumbed through the software devices.

I think your HW is capable of doing large queue sets so I'm curious
how this actually performs. We're probably talking 1000+ queues here -
the CPU will have hard time serving so many queues. In my experiments
basically the more queues the more cache trashing, the more interrupts,
etc. and the lower the performance.

> > It's hard to review an API without knowing the use of it. iproute2
> > is low level plumbing.
> 
> I don't know how to put this, let me try:
> A) SRIOV model
> echo 128 > /sys/class/net/eth0/device/sriov_numvfs
> ubind vf
> 
> ip set vf attribute x
> configure representor .. 
> deploy vf/netdev/rdma interface into the container

No, no, my point is that for SR-IOV it's OpenStack, libvirt etc. which
do this. I understand the manual steps. Often problems pop up when real
systems try to string the HW objects together, allocated them, learn
their capabilities, etc.

> B) SF model 
> you do (every thing under the devlink umbrella/switchdev):
> for i in {1..1024} ; do
> devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum $i
> devlink port sf $i set attribute x
> 
> # from here on, it is identical to a VF
> configure representor
> deply sf/netdev/rdma interfaces into a container 
> 
> B is more scale-able and has more visibility and controllability  to
> the user, after you create the SFs deployment and usecases are
> identical to SRIOV VF usecases.
> 
> See the improvement ? :)
> 
> > Here the patch is adding the ability to apparently create a SF on 
> > a remote controller. If you haven't thought that use case through
> > just don't allow it until you know how it will work.
> 
> We have thought the use case through it is not any different from the 
> local controller use case. the code is uniform, we need to work hard to
> block a remote controller :) .. 

So the SF is always created from the eswitch controller side?
How does the host side look?

I really think that for ease of merging this we should leave 
the remote controller out at the beginning - only allow local
creation.

> > > > It seems that the way the industry is moving the major
> > > > use case for SmartNICs is bare metal.
> > > > 
> > > > I always assumed nested eswitches when thinking about SmartNICs,
> > > > what
> > > > are you intending to do?
> > > >    
> > > Mlx5 doesn't support nested eswitch. SF can be deployed on the
> > > external controller PCI function.
> > > But this interface neither limited nor enforcing nested or flat
> > > eswitch.
> > >    
> > > > What are your plans for enabling this feature in user space
> > > > project?    
> > > Do you mean K8s plugin or iproute2? Can you please tell us what
> > > user space project?  
> > 
> > That's my question. For SR-IOV it'd be all the virt stacks out there.
> > But this can't do virt. So what can it do?
> 
> you are thinking VF direct assignment. but don't forget
> virt handles netdev assignment to a vm perfectly fine and SF has a
> netdev.
> 
> And don't get me started on the weird virt handling of SRIOV VF, the
> whole thing is a big mess :) it shouldn't be a de facto standard that
> we need to follow.. 

  reply	other threads:[~2020-12-18 19:48 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-15  9:03 [net-next v5 00/15] Add mlx5 subfunction support Saeed Mahameed
2020-12-15  9:03 ` [net-next v5 01/15] net/mlx5: Fix compilation warning for 32-bit platform Saeed Mahameed
2020-12-15  9:03 ` [net-next v5 02/15] devlink: Prepare code to fill multiple port function attributes Saeed Mahameed
2020-12-15  9:03 ` [net-next v5 03/15] devlink: Introduce PCI SF port flavour and port attribute Saeed Mahameed
2020-12-15 23:27   ` Jakub Kicinski
2020-12-16  3:42     ` Parav Pandit
2020-12-16 23:59       ` Jakub Kicinski
2020-12-17  4:44         ` Saeed Mahameed
2020-12-18 19:48           ` Jakub Kicinski [this message]
2020-12-19  4:43             ` Parav Pandit
2020-12-15  9:03 ` [net-next v5 04/15] devlink: Support add and delete devlink port Saeed Mahameed
2020-12-16  0:29   ` Jakub Kicinski
2020-12-16  5:06     ` Parav Pandit
2020-12-15  9:03 ` [net-next v5 05/15] devlink: Support get and set state of port function Saeed Mahameed
2020-12-16  0:37   ` Jakub Kicinski
2020-12-16  5:15     ` Parav Pandit
2020-12-16 16:15       ` David Ahern
2020-12-17  0:08       ` Jakub Kicinski
2020-12-17  5:46         ` Parav Pandit
2020-12-18 19:51           ` Jakub Kicinski
2020-12-19  5:06             ` Parav Pandit
2020-12-15  9:03 ` [net-next v5 06/15] net/mlx5: Introduce vhca state event notifier Saeed Mahameed
2020-12-15  9:03 ` [net-next v5 07/15] net/mlx5: SF, Add auxiliary device support Saeed Mahameed
2020-12-16  0:43   ` Jakub Kicinski
2020-12-16  5:19     ` Parav Pandit
2020-12-17  0:11       ` Jakub Kicinski
2020-12-17  5:23         ` Parav Pandit
2020-12-18 19:58           ` Jakub Kicinski
2020-12-19  4:53             ` Parav Pandit
2020-12-19 17:43               ` Jakub Kicinski
2020-12-15  9:03 ` [net-next v5 08/15] net/mlx5: SF, Add auxiliary device driver Saeed Mahameed
2020-12-15  9:03 ` [net-next v5 09/15] net/mlx5: E-switch, Prepare eswitch to handle SF vport Saeed Mahameed
2020-12-16  0:47   ` Jakub Kicinski
2020-12-16  5:28     ` Parav Pandit
2020-12-15  9:03 ` [net-next v5 10/15] net/mlx5: E-switch, Add eswitch helpers for " Saeed Mahameed
2020-12-15  9:03 ` [net-next v5 11/15] net/mlx5: SF, Add port add delete functionality Saeed Mahameed
2020-12-16  0:51   ` Jakub Kicinski
2020-12-16  5:31     ` Parav Pandit
2020-12-15  9:03 ` [net-next v5 12/15] net/mlx5: SF, Port function state change support Saeed Mahameed
2020-12-15  9:03 ` [net-next v5 13/15] devlink: Add devlink port documentation Saeed Mahameed
2020-12-16  0:57   ` Jakub Kicinski
2020-12-16  5:40     ` Parav Pandit
2020-12-15  9:03 ` [net-next v5 14/15] devlink: Extend devlink port documentation for subfunctions Saeed Mahameed
2020-12-16  1:00   ` Jakub Kicinski
2020-12-16  3:55     ` Parav Pandit
2020-12-15  9:03 ` [net-next v5 15/15] net/mlx5: Add devlink subfunction port documentation Saeed Mahameed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201218114812.28db7084@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com \
    --to=kuba@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=davem@davemloft.net \
    --cc=david.m.ertman@intel.com \
    --cc=dsahern@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=jacob.e.keller@intel.com \
    --cc=jgg@nvidia.com \
    --cc=jiri@nvidia.com \
    --cc=kiran.patil@intel.com \
    --cc=leonro@nvidia.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=parav@nvidia.com \
    --cc=saeed@kernel.org \
    --cc=sridhar.samudrala@intel.com \
    --cc=vuhuong@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).