All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: Saeed Mahameed <saeed@kernel.org>
Cc: Parav Pandit <parav@nvidia.com>,
	"David S. Miller" <davem@davemloft.net>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Leon Romanovsky <leonro@nvidia.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	David Ahern <dsahern@kernel.org>,
	Jacob Keller <jacob.e.keller@intel.com>,
	Sridhar Samudrala <sridhar.samudrala@intel.com>,
	"david.m.ertman@intel.com" <david.m.ertman@intel.com>,
	"dan.j.williams@intel.com" <dan.j.williams@intel.com>,
	"kiran.patil@intel.com" <kiran.patil@intel.com>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	Jiri Pirko <jiri@nvidia.com>, Vu Pham <vuhuong@nvidia.com>
Subject: Re: [net-next v5 03/15] devlink: Introduce PCI SF port flavour and port attribute
Date: Fri, 18 Dec 2020 11:48:12 -0800	[thread overview]
Message-ID: <20201218114812.28db7084@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com> (raw)
In-Reply-To: <ecc117632ffa36ae374fb05ed4806af2d7d55576.camel@kernel.org>

On Wed, 16 Dec 2020 20:44:21 -0800 Saeed Mahameed wrote:
> On Wed, 2020-12-16 at 15:59 -0800, Jakub Kicinski wrote:
> > On Wed, 16 Dec 2020 03:42:51 +0000 Parav Pandit wrote:  
> > > > From: Jakub Kicinski <kuba@kernel.org>
> > > > So subfunctions don't have a VF id but they may have a
> > > > controller?
> > > >    
> > > Right. SF can be on external controller.
> > >    
> > > > Can you tell us more about the use cases and deployment models
> > > > you're
> > > > intending to support? Let's not add attributes and info which
> > > > will go unused.
> > > >     
> > > External will be used the same way how it is used for PF and VF.
> > >   
> > > > How are SFs supposed to be used with SmartNICs? Are you assuming
> > > > single
> > > > domain of control?    
> > > No. it is not assumed. SF can be deployed from smartnic to external
> > > host.
> > > A user has to pass appropriate controller number, pf number
> > > attributes during creation time.  
> > 
> > My problem with this series is that I've gotten some real life
> > application exposure over the last year, and still I have no idea 
> > who is going to find this feature useful and why.
> > 
> > That's the point of my questions in the previous email - what
> > are the use cases, how are they going to operate.
> >   
> 
> The main focus of this feature is scale-ability we want to run
> thousands of Containers/VMs, this is useful for both smartnic and
> baremetal hypervisor worlds, where security and control is exclusive to
> the eswitch manager may it be the smarnic embedded CPU or the x86
> Hypervisor.
> 
> deployment models is identical to SRIOV, the only difference is the
> instantiation model of SF, which is the main discussion point of this
> series (i hope), which to my taste is very modest and minimal.
> after SF is instantiated from that point nothing is new, the SF is
> exposing standard linux interfaces netdev/rdma identical to what VF
> does, most likely you will assign them a namespace and pass them
> through to a container or assign them (not direct assignment) to a VM
> via the virt stack, or create a vdpa instance and pass it to a virtio
> interface.
> 
> There are endless usecases for the netdev stack, for customers who want

"endless" :)

> high scale virtualized/containerized environments, with thousands of
> network functions that can deliver high speed and full offload
> accelerators, Native XDP, Crypto, encap/decap, and HW filtering and
> processing pipeline capabilities.
> 
> I have a long list of customers with various and different applications
> and i am not even talking about the rdma and vdpa customers ! those
> customers just can't wait to leave sriov behind and scale up !
> 
> this feature has a lot of value to the netdev users only because of the
> minimal foot print to the netdev stack (to be honest there is no change
> in netdev, only a thin API layer in devlink) and the immediate and
> effortless benefits to deploy multiple (accelerated) netdevs at scale.

The acceleration can hopefully be plumbed through the software devices.

I think your HW is capable of doing large queue sets so I'm curious
how this actually performs. We're probably talking 1000+ queues here -
the CPU will have hard time serving so many queues. In my experiments
basically the more queues the more cache trashing, the more interrupts,
etc. and the lower the performance.

> > It's hard to review an API without knowing the use of it. iproute2
> > is low level plumbing.
> 
> I don't know how to put this, let me try:
> A) SRIOV model
> echo 128 > /sys/class/net/eth0/device/sriov_numvfs
> ubind vf
> 
> ip set vf attribute x
> configure representor .. 
> deploy vf/netdev/rdma interface into the container

No, no, my point is that for SR-IOV it's OpenStack, libvirt etc. which
do this. I understand the manual steps. Often problems pop up when real
systems try to string the HW objects together, allocated them, learn
their capabilities, etc.

> B) SF model 
> you do (every thing under the devlink umbrella/switchdev):
> for i in {1..1024} ; do
> devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum $i
> devlink port sf $i set attribute x
> 
> # from here on, it is identical to a VF
> configure representor
> deply sf/netdev/rdma interfaces into a container 
> 
> B is more scale-able and has more visibility and controllability  to
> the user, after you create the SFs deployment and usecases are
> identical to SRIOV VF usecases.
> 
> See the improvement ? :)
> 
> > Here the patch is adding the ability to apparently create a SF on 
> > a remote controller. If you haven't thought that use case through
> > just don't allow it until you know how it will work.
> 
> We have thought the use case through it is not any different from the 
> local controller use case. the code is uniform, we need to work hard to
> block a remote controller :) .. 

So the SF is always created from the eswitch controller side?
How does the host side look?

I really think that for ease of merging this we should leave 
the remote controller out at the beginning - only allow local
creation.

> > > > It seems that the way the industry is moving the major
> > > > use case for SmartNICs is bare metal.
> > > > 
> > > > I always assumed nested eswitches when thinking about SmartNICs,
> > > > what
> > > > are you intending to do?
> > > >    
> > > Mlx5 doesn't support nested eswitch. SF can be deployed on the
> > > external controller PCI function.
> > > But this interface neither limited nor enforcing nested or flat
> > > eswitch.
> > >    
> > > > What are your plans for enabling this feature in user space
> > > > project?    
> > > Do you mean K8s plugin or iproute2? Can you please tell us what
> > > user space project?  
> > 
> > That's my question. For SR-IOV it'd be all the virt stacks out there.
> > But this can't do virt. So what can it do?
> 
> you are thinking VF direct assignment. but don't forget
> virt handles netdev assignment to a vm perfectly fine and SF has a
> netdev.
> 
> And don't get me started on the weird virt handling of SRIOV VF, the
> whole thing is a big mess :) it shouldn't be a de facto standard that
> we need to follow.. 

  reply	other threads:[~2020-12-18 19:48 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-15  9:03 [net-next v5 00/15] Add mlx5 subfunction support Saeed Mahameed
2020-12-15  9:03 ` [net-next v5 01/15] net/mlx5: Fix compilation warning for 32-bit platform Saeed Mahameed
2020-12-15  9:03 ` [net-next v5 02/15] devlink: Prepare code to fill multiple port function attributes Saeed Mahameed
2020-12-15  9:03 ` [net-next v5 03/15] devlink: Introduce PCI SF port flavour and port attribute Saeed Mahameed
2020-12-15 23:27   ` Jakub Kicinski
2020-12-16  3:42     ` Parav Pandit
2020-12-16 23:59       ` Jakub Kicinski
2020-12-17  4:44         ` Saeed Mahameed
2020-12-18 19:48           ` Jakub Kicinski [this message]
2020-12-19  4:43             ` Parav Pandit
2020-12-15  9:03 ` [net-next v5 04/15] devlink: Support add and delete devlink port Saeed Mahameed
2020-12-16  0:29   ` Jakub Kicinski
2020-12-16  5:06     ` Parav Pandit
2020-12-15  9:03 ` [net-next v5 05/15] devlink: Support get and set state of port function Saeed Mahameed
2020-12-16  0:37   ` Jakub Kicinski
2020-12-16  5:15     ` Parav Pandit
2020-12-16 16:15       ` David Ahern
2020-12-17  0:08       ` Jakub Kicinski
2020-12-17  5:46         ` Parav Pandit
2020-12-18 19:51           ` Jakub Kicinski
2020-12-19  5:06             ` Parav Pandit
2020-12-15  9:03 ` [net-next v5 06/15] net/mlx5: Introduce vhca state event notifier Saeed Mahameed
2020-12-15  9:03 ` [net-next v5 07/15] net/mlx5: SF, Add auxiliary device support Saeed Mahameed
2020-12-16  0:43   ` Jakub Kicinski
2020-12-16  5:19     ` Parav Pandit
2020-12-17  0:11       ` Jakub Kicinski
2020-12-17  5:23         ` Parav Pandit
2020-12-18 19:58           ` Jakub Kicinski
2020-12-19  4:53             ` Parav Pandit
2020-12-19 17:43               ` Jakub Kicinski
2020-12-15  9:03 ` [net-next v5 08/15] net/mlx5: SF, Add auxiliary device driver Saeed Mahameed
2020-12-15  9:03 ` [net-next v5 09/15] net/mlx5: E-switch, Prepare eswitch to handle SF vport Saeed Mahameed
2020-12-16  0:47   ` Jakub Kicinski
2020-12-16  5:28     ` Parav Pandit
2020-12-15  9:03 ` [net-next v5 10/15] net/mlx5: E-switch, Add eswitch helpers for " Saeed Mahameed
2020-12-15  9:03 ` [net-next v5 11/15] net/mlx5: SF, Add port add delete functionality Saeed Mahameed
2020-12-16  0:51   ` Jakub Kicinski
2020-12-16  5:31     ` Parav Pandit
2020-12-15  9:03 ` [net-next v5 12/15] net/mlx5: SF, Port function state change support Saeed Mahameed
2020-12-15  9:03 ` [net-next v5 13/15] devlink: Add devlink port documentation Saeed Mahameed
2020-12-16  0:57   ` Jakub Kicinski
2020-12-16  5:40     ` Parav Pandit
2020-12-15  9:03 ` [net-next v5 14/15] devlink: Extend devlink port documentation for subfunctions Saeed Mahameed
2020-12-16  1:00   ` Jakub Kicinski
2020-12-16  3:55     ` Parav Pandit
2020-12-15  9:03 ` [net-next v5 15/15] net/mlx5: Add devlink subfunction port documentation Saeed Mahameed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201218114812.28db7084@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com \
    --to=kuba@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=davem@davemloft.net \
    --cc=david.m.ertman@intel.com \
    --cc=dsahern@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=jacob.e.keller@intel.com \
    --cc=jgg@nvidia.com \
    --cc=jiri@nvidia.com \
    --cc=kiran.patil@intel.com \
    --cc=leonro@nvidia.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=parav@nvidia.com \
    --cc=saeed@kernel.org \
    --cc=sridhar.samudrala@intel.com \
    --cc=vuhuong@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.