All of lore.kernel.org
 help / color / mirror / Atom feed
From: Saeed Mahameed <saeed@kernel.org>
To: Jakub Kicinski <kuba@kernel.org>, Parav Pandit <parav@nvidia.com>
Cc: "David S. Miller" <davem@davemloft.net>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Leon Romanovsky <leonro@nvidia.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	David Ahern <dsahern@kernel.org>,
	Jacob Keller <jacob.e.keller@intel.com>,
	Sridhar Samudrala <sridhar.samudrala@intel.com>,
	"david.m.ertman@intel.com" <david.m.ertman@intel.com>,
	"dan.j.williams@intel.com" <dan.j.williams@intel.com>,
	"kiran.patil@intel.com" <kiran.patil@intel.com>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	Jiri Pirko <jiri@nvidia.com>, Vu Pham <vuhuong@nvidia.com>
Subject: Re: [net-next v5 03/15] devlink: Introduce PCI SF port flavour and port attribute
Date: Wed, 16 Dec 2020 20:44:21 -0800	[thread overview]
Message-ID: <ecc117632ffa36ae374fb05ed4806af2d7d55576.camel@kernel.org> (raw)
In-Reply-To: <20201216155945.63f07c80@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>

On Wed, 2020-12-16 at 15:59 -0800, Jakub Kicinski wrote:
> On Wed, 16 Dec 2020 03:42:51 +0000 Parav Pandit wrote:
> > > From: Jakub Kicinski <kuba@kernel.org>
> > > So subfunctions don't have a VF id but they may have a
> > > controller?
> > >  
> > Right. SF can be on external controller.
> >  
> > > Can you tell us more about the use cases and deployment models
> > > you're
> > > intending to support? Let's not add attributes and info which
> > > will go unused.
> > >   
> > External will be used the same way how it is used for PF and VF.
> > 
> > > How are SFs supposed to be used with SmartNICs? Are you assuming
> > > single
> > > domain of control?  
> > No. it is not assumed. SF can be deployed from smartnic to external
> > host.
> > A user has to pass appropriate controller number, pf number
> > attributes during creation time.
> 
> My problem with this series is that I've gotten some real life
> application exposure over the last year, and still I have no idea 
> who is going to find this feature useful and why.
> 
> That's the point of my questions in the previous email - what
> are the use cases, how are they going to operate.
> 

The main focus of this feature is scale-ability we want to run
thousands of Containers/VMs, this is useful for both smartnic and
baremetal hypervisor worlds, where security and control is exclusive to
the eswitch manager may it be the smarnic embedded CPU or the x86
Hypervisor.

deployment models is identical to SRIOV, the only difference is the
instantiation model of SF, which is the main discussion point of this
series (i hope), which to my taste is very modest and minimal.
after SF is instantiated from that point nothing is new, the SF is
exposing standard linux interfaces netdev/rdma identical to what VF
does, most likely you will assign them a namespace and pass them
through to a container or assign them (not direct assignment) to a VM
via the virt stack, or create a vdpa instance and pass it to a virtio
interface.

There are endless usecases for the netdev stack, for customers who want
high scale virtualized/containerized environments, with thousands of
network functions that can deliver high speed and full offload
accelerators, Native XDP, Crypto, encap/decap, and HW filtering and
processing pipeline capabilities.

I have a long list of customers with various and different applications
and i am not even talking about the rdma and vdpa customers ! those
customers just can't wait to leave sriov behind and scale up !

this feature has a lot of value to the netdev users only because of the
minimal foot print to the netdev stack (to be honest there is no change
in netdev, only a thin API layer in devlink) and the immediate and
effortless benefits to deploy multiple (accelerated) netdevs at scale.


> It's hard to review an API without knowing the use of it. iproute2
> is low level plumbing.
> 

I don't know how to put this, let me try:
A) SRIOV model
echo 128 > /sys/class/net/eth0/device/sriov_numvfs
ubind vf

ip set vf attribute x
configure representor .. 
deploy vf/netdev/rdma interface into the container

B) SF model 
you do (every thing under the devlink umbrella/switchdev):
for i in {1..1024} ; do
devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum $i
devlink port sf $i set attribute x

# from here on, it is identical to a VF
configure representor
deply sf/netdev/rdma interfaces into a container 

B is more scale-able and has more visibility and controllability  to
the user, after you create the SFs deployment and usecases are
identical to SRIOV VF usecases.

See the improvement ? :)

> Here the patch is adding the ability to apparently create a SF on 
> a remote controller. If you haven't thought that use case through
> just don't allow it until you know how it will work.
> 

We have thought the use case through it is not any different from the 
local controller use case. the code is uniform, we need to work hard to
block a remote controller :) .. 

> > > It seems that the way the industry is moving the major
> > > use case for SmartNICs is bare metal.
> > > 
> > > I always assumed nested eswitches when thinking about SmartNICs,
> > > what
> > > are you intending to do?
> > >  
> > Mlx5 doesn't support nested eswitch. SF can be deployed on the
> > external controller PCI function.
> > But this interface neither limited nor enforcing nested or flat
> > eswitch.
> >  
> > > What are your plans for enabling this feature in user space
> > > project?  
> > Do you mean K8s plugin or iproute2? Can you please tell us what
> > user space project?
> 
> That's my question. For SR-IOV it'd be all the virt stacks out there.
> But this can't do virt. So what can it do?
> 

you are thinking VF direct assignment. but don't forget
virt handles netdev assignment to a vm perfectly fine and SF has a
netdev.

And don't get me started on the weird virt handling of SRIOV VF, the
whole thing is a big mess :) it shouldn't be a de facto standard that
we need to follow.. 


  reply	other threads:[~2020-12-17  4:45 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-15  9:03 [net-next v5 00/15] Add mlx5 subfunction support Saeed Mahameed
2020-12-15  9:03 ` [net-next v5 01/15] net/mlx5: Fix compilation warning for 32-bit platform Saeed Mahameed
2020-12-15  9:03 ` [net-next v5 02/15] devlink: Prepare code to fill multiple port function attributes Saeed Mahameed
2020-12-15  9:03 ` [net-next v5 03/15] devlink: Introduce PCI SF port flavour and port attribute Saeed Mahameed
2020-12-15 23:27   ` Jakub Kicinski
2020-12-16  3:42     ` Parav Pandit
2020-12-16 23:59       ` Jakub Kicinski
2020-12-17  4:44         ` Saeed Mahameed [this message]
2020-12-18 19:48           ` Jakub Kicinski
2020-12-19  4:43             ` Parav Pandit
2020-12-15  9:03 ` [net-next v5 04/15] devlink: Support add and delete devlink port Saeed Mahameed
2020-12-16  0:29   ` Jakub Kicinski
2020-12-16  5:06     ` Parav Pandit
2020-12-15  9:03 ` [net-next v5 05/15] devlink: Support get and set state of port function Saeed Mahameed
2020-12-16  0:37   ` Jakub Kicinski
2020-12-16  5:15     ` Parav Pandit
2020-12-16 16:15       ` David Ahern
2020-12-17  0:08       ` Jakub Kicinski
2020-12-17  5:46         ` Parav Pandit
2020-12-18 19:51           ` Jakub Kicinski
2020-12-19  5:06             ` Parav Pandit
2020-12-15  9:03 ` [net-next v5 06/15] net/mlx5: Introduce vhca state event notifier Saeed Mahameed
2020-12-15  9:03 ` [net-next v5 07/15] net/mlx5: SF, Add auxiliary device support Saeed Mahameed
2020-12-16  0:43   ` Jakub Kicinski
2020-12-16  5:19     ` Parav Pandit
2020-12-17  0:11       ` Jakub Kicinski
2020-12-17  5:23         ` Parav Pandit
2020-12-18 19:58           ` Jakub Kicinski
2020-12-19  4:53             ` Parav Pandit
2020-12-19 17:43               ` Jakub Kicinski
2020-12-15  9:03 ` [net-next v5 08/15] net/mlx5: SF, Add auxiliary device driver Saeed Mahameed
2020-12-15  9:03 ` [net-next v5 09/15] net/mlx5: E-switch, Prepare eswitch to handle SF vport Saeed Mahameed
2020-12-16  0:47   ` Jakub Kicinski
2020-12-16  5:28     ` Parav Pandit
2020-12-15  9:03 ` [net-next v5 10/15] net/mlx5: E-switch, Add eswitch helpers for " Saeed Mahameed
2020-12-15  9:03 ` [net-next v5 11/15] net/mlx5: SF, Add port add delete functionality Saeed Mahameed
2020-12-16  0:51   ` Jakub Kicinski
2020-12-16  5:31     ` Parav Pandit
2020-12-15  9:03 ` [net-next v5 12/15] net/mlx5: SF, Port function state change support Saeed Mahameed
2020-12-15  9:03 ` [net-next v5 13/15] devlink: Add devlink port documentation Saeed Mahameed
2020-12-16  0:57   ` Jakub Kicinski
2020-12-16  5:40     ` Parav Pandit
2020-12-15  9:03 ` [net-next v5 14/15] devlink: Extend devlink port documentation for subfunctions Saeed Mahameed
2020-12-16  1:00   ` Jakub Kicinski
2020-12-16  3:55     ` Parav Pandit
2020-12-15  9:03 ` [net-next v5 15/15] net/mlx5: Add devlink subfunction port documentation Saeed Mahameed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ecc117632ffa36ae374fb05ed4806af2d7d55576.camel@kernel.org \
    --to=saeed@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=davem@davemloft.net \
    --cc=david.m.ertman@intel.com \
    --cc=dsahern@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=jacob.e.keller@intel.com \
    --cc=jgg@nvidia.com \
    --cc=jiri@nvidia.com \
    --cc=kiran.patil@intel.com \
    --cc=kuba@kernel.org \
    --cc=leonro@nvidia.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=parav@nvidia.com \
    --cc=sridhar.samudrala@intel.com \
    --cc=vuhuong@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.