From: Jakub Kicinski <kuba@kernel.org>
To: Saeed Mahameed <saeed@kernel.org>
Cc: Parav Pandit <parav@nvidia.com>,
"David S. Miller" <davem@davemloft.net>,
Jason Gunthorpe <jgg@nvidia.com>,
Leon Romanovsky <leonro@nvidia.com>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
David Ahern <dsahern@kernel.org>,
Jacob Keller <jacob.e.keller@intel.com>,
Sridhar Samudrala <sridhar.samudrala@intel.com>,
"david.m.ertman@intel.com" <david.m.ertman@intel.com>,
"dan.j.williams@intel.com" <dan.j.williams@intel.com>,
"kiran.patil@intel.com" <kiran.patil@intel.com>,
"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
Jiri Pirko <jiri@nvidia.com>, Vu Pham <vuhuong@nvidia.com>
Subject: Re: [net-next v5 03/15] devlink: Introduce PCI SF port flavour and port attribute
Date: Fri, 18 Dec 2020 11:48:12 -0800 [thread overview]
Message-ID: <20201218114812.28db7084@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com> (raw)
In-Reply-To: <ecc117632ffa36ae374fb05ed4806af2d7d55576.camel@kernel.org>
On Wed, 16 Dec 2020 20:44:21 -0800 Saeed Mahameed wrote:
> On Wed, 2020-12-16 at 15:59 -0800, Jakub Kicinski wrote:
> > On Wed, 16 Dec 2020 03:42:51 +0000 Parav Pandit wrote:
> > > > From: Jakub Kicinski <kuba@kernel.org>
> > > > So subfunctions don't have a VF id but they may have a
> > > > controller?
> > > >
> > > Right. SF can be on external controller.
> > >
> > > > Can you tell us more about the use cases and deployment models
> > > > you're
> > > > intending to support? Let's not add attributes and info which
> > > > will go unused.
> > > >
> > > External will be used the same way how it is used for PF and VF.
> > >
> > > > How are SFs supposed to be used with SmartNICs? Are you assuming
> > > > single
> > > > domain of control?
> > > No. it is not assumed. SF can be deployed from smartnic to external
> > > host.
> > > A user has to pass appropriate controller number, pf number
> > > attributes during creation time.
> >
> > My problem with this series is that I've gotten some real life
> > application exposure over the last year, and still I have no idea
> > who is going to find this feature useful and why.
> >
> > That's the point of my questions in the previous email - what
> > are the use cases, how are they going to operate.
> >
>
> The main focus of this feature is scale-ability we want to run
> thousands of Containers/VMs, this is useful for both smartnic and
> baremetal hypervisor worlds, where security and control is exclusive to
> the eswitch manager may it be the smarnic embedded CPU or the x86
> Hypervisor.
>
> deployment models is identical to SRIOV, the only difference is the
> instantiation model of SF, which is the main discussion point of this
> series (i hope), which to my taste is very modest and minimal.
> after SF is instantiated from that point nothing is new, the SF is
> exposing standard linux interfaces netdev/rdma identical to what VF
> does, most likely you will assign them a namespace and pass them
> through to a container or assign them (not direct assignment) to a VM
> via the virt stack, or create a vdpa instance and pass it to a virtio
> interface.
>
> There are endless usecases for the netdev stack, for customers who want
"endless" :)
> high scale virtualized/containerized environments, with thousands of
> network functions that can deliver high speed and full offload
> accelerators, Native XDP, Crypto, encap/decap, and HW filtering and
> processing pipeline capabilities.
>
> I have a long list of customers with various and different applications
> and i am not even talking about the rdma and vdpa customers ! those
> customers just can't wait to leave sriov behind and scale up !
>
> this feature has a lot of value to the netdev users only because of the
> minimal foot print to the netdev stack (to be honest there is no change
> in netdev, only a thin API layer in devlink) and the immediate and
> effortless benefits to deploy multiple (accelerated) netdevs at scale.
The acceleration can hopefully be plumbed through the software devices.
I think your HW is capable of doing large queue sets so I'm curious
how this actually performs. We're probably talking 1000+ queues here -
the CPU will have hard time serving so many queues. In my experiments
basically the more queues the more cache trashing, the more interrupts,
etc. and the lower the performance.
> > It's hard to review an API without knowing the use of it. iproute2
> > is low level plumbing.
>
> I don't know how to put this, let me try:
> A) SRIOV model
> echo 128 > /sys/class/net/eth0/device/sriov_numvfs
> ubind vf
>
> ip set vf attribute x
> configure representor ..
> deploy vf/netdev/rdma interface into the container
No, no, my point is that for SR-IOV it's OpenStack, libvirt etc. which
do this. I understand the manual steps. Often problems pop up when real
systems try to string the HW objects together, allocated them, learn
their capabilities, etc.
> B) SF model
> you do (every thing under the devlink umbrella/switchdev):
> for i in {1..1024} ; do
> devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum $i
> devlink port sf $i set attribute x
>
> # from here on, it is identical to a VF
> configure representor
> deply sf/netdev/rdma interfaces into a container
>
> B is more scale-able and has more visibility and controllability to
> the user, after you create the SFs deployment and usecases are
> identical to SRIOV VF usecases.
>
> See the improvement ? :)
>
> > Here the patch is adding the ability to apparently create a SF on
> > a remote controller. If you haven't thought that use case through
> > just don't allow it until you know how it will work.
>
> We have thought the use case through it is not any different from the
> local controller use case. the code is uniform, we need to work hard to
> block a remote controller :) ..
So the SF is always created from the eswitch controller side?
How does the host side look?
I really think that for ease of merging this we should leave
the remote controller out at the beginning - only allow local
creation.
> > > > It seems that the way the industry is moving the major
> > > > use case for SmartNICs is bare metal.
> > > >
> > > > I always assumed nested eswitches when thinking about SmartNICs,
> > > > what
> > > > are you intending to do?
> > > >
> > > Mlx5 doesn't support nested eswitch. SF can be deployed on the
> > > external controller PCI function.
> > > But this interface neither limited nor enforcing nested or flat
> > > eswitch.
> > >
> > > > What are your plans for enabling this feature in user space
> > > > project?
> > > Do you mean K8s plugin or iproute2? Can you please tell us what
> > > user space project?
> >
> > That's my question. For SR-IOV it'd be all the virt stacks out there.
> > But this can't do virt. So what can it do?
>
> you are thinking VF direct assignment. but don't forget
> virt handles netdev assignment to a vm perfectly fine and SF has a
> netdev.
>
> And don't get me started on the weird virt handling of SRIOV VF, the
> whole thing is a big mess :) it shouldn't be a de facto standard that
> we need to follow..
next prev parent reply other threads:[~2020-12-18 19:48 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-15 9:03 [net-next v5 00/15] Add mlx5 subfunction support Saeed Mahameed
2020-12-15 9:03 ` [net-next v5 01/15] net/mlx5: Fix compilation warning for 32-bit platform Saeed Mahameed
2020-12-15 9:03 ` [net-next v5 02/15] devlink: Prepare code to fill multiple port function attributes Saeed Mahameed
2020-12-15 9:03 ` [net-next v5 03/15] devlink: Introduce PCI SF port flavour and port attribute Saeed Mahameed
2020-12-15 23:27 ` Jakub Kicinski
2020-12-16 3:42 ` Parav Pandit
2020-12-16 23:59 ` Jakub Kicinski
2020-12-17 4:44 ` Saeed Mahameed
2020-12-18 19:48 ` Jakub Kicinski [this message]
2020-12-19 4:43 ` Parav Pandit
2020-12-15 9:03 ` [net-next v5 04/15] devlink: Support add and delete devlink port Saeed Mahameed
2020-12-16 0:29 ` Jakub Kicinski
2020-12-16 5:06 ` Parav Pandit
2020-12-15 9:03 ` [net-next v5 05/15] devlink: Support get and set state of port function Saeed Mahameed
2020-12-16 0:37 ` Jakub Kicinski
2020-12-16 5:15 ` Parav Pandit
2020-12-16 16:15 ` David Ahern
2020-12-17 0:08 ` Jakub Kicinski
2020-12-17 5:46 ` Parav Pandit
2020-12-18 19:51 ` Jakub Kicinski
2020-12-19 5:06 ` Parav Pandit
2020-12-15 9:03 ` [net-next v5 06/15] net/mlx5: Introduce vhca state event notifier Saeed Mahameed
2020-12-15 9:03 ` [net-next v5 07/15] net/mlx5: SF, Add auxiliary device support Saeed Mahameed
2020-12-16 0:43 ` Jakub Kicinski
2020-12-16 5:19 ` Parav Pandit
2020-12-17 0:11 ` Jakub Kicinski
2020-12-17 5:23 ` Parav Pandit
2020-12-18 19:58 ` Jakub Kicinski
2020-12-19 4:53 ` Parav Pandit
2020-12-19 17:43 ` Jakub Kicinski
2020-12-15 9:03 ` [net-next v5 08/15] net/mlx5: SF, Add auxiliary device driver Saeed Mahameed
2020-12-15 9:03 ` [net-next v5 09/15] net/mlx5: E-switch, Prepare eswitch to handle SF vport Saeed Mahameed
2020-12-16 0:47 ` Jakub Kicinski
2020-12-16 5:28 ` Parav Pandit
2020-12-15 9:03 ` [net-next v5 10/15] net/mlx5: E-switch, Add eswitch helpers for " Saeed Mahameed
2020-12-15 9:03 ` [net-next v5 11/15] net/mlx5: SF, Add port add delete functionality Saeed Mahameed
2020-12-16 0:51 ` Jakub Kicinski
2020-12-16 5:31 ` Parav Pandit
2020-12-15 9:03 ` [net-next v5 12/15] net/mlx5: SF, Port function state change support Saeed Mahameed
2020-12-15 9:03 ` [net-next v5 13/15] devlink: Add devlink port documentation Saeed Mahameed
2020-12-16 0:57 ` Jakub Kicinski
2020-12-16 5:40 ` Parav Pandit
2020-12-15 9:03 ` [net-next v5 14/15] devlink: Extend devlink port documentation for subfunctions Saeed Mahameed
2020-12-16 1:00 ` Jakub Kicinski
2020-12-16 3:55 ` Parav Pandit
2020-12-15 9:03 ` [net-next v5 15/15] net/mlx5: Add devlink subfunction port documentation Saeed Mahameed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201218114812.28db7084@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com \
--to=kuba@kernel.org \
--cc=dan.j.williams@intel.com \
--cc=davem@davemloft.net \
--cc=david.m.ertman@intel.com \
--cc=dsahern@kernel.org \
--cc=gregkh@linuxfoundation.org \
--cc=jacob.e.keller@intel.com \
--cc=jgg@nvidia.com \
--cc=jiri@nvidia.com \
--cc=kiran.patil@intel.com \
--cc=leonro@nvidia.com \
--cc=linux-rdma@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=parav@nvidia.com \
--cc=saeed@kernel.org \
--cc=sridhar.samudrala@intel.com \
--cc=vuhuong@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).