All of lore.kernel.org
 help / color / mirror / Atom feed
From: Saeed Mahameed <saeed@kernel.org>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Leon Romanovsky <leonro@nvidia.com>,
	Netdev <netdev@vger.kernel.org>,
	linux-rdma@vger.kernel.org, David Ahern <dsahern@kernel.org>,
	Jacob Keller <jacob.e.keller@intel.com>,
	Sridhar Samudrala <sridhar.samudrala@intel.com>,
	"Ertman, David M" <david.m.ertman@intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Kiran Patil <kiran.patil@intel.com>,
	Greg KH <gregkh@linuxfoundation.org>
Subject: Re: [net-next v4 00/15] Add mlx5 subfunction support
Date: Mon, 14 Dec 2020 22:15:34 -0800	[thread overview]
Message-ID: <608505778d76b1b01cb3e8d19ecda5b8578f0f79.camel@kernel.org> (raw)
In-Reply-To: <CAKgT0UejoduCB6nYFV2atJ4fa4=v9-dsxNh4kNJNTtoHFd1DuQ@mail.gmail.com>

On Mon, 2020-12-14 at 17:53 -0800, Alexander Duyck wrote:
> On Mon, Dec 14, 2020 at 1:49 PM Saeed Mahameed <saeed@kernel.org>
> wrote:
> > Hi Dave, Jakub, Jason,
> > 
> > This series form Parav was the theme of this mlx5 release cycle,
> > we've been waiting anxiously for the auxbus infrastructure to make
> > it into
> > the kernel, and now as the auxbus is in and all the stars are
> > aligned, I
> > can finally submit this V2 of the devlink and mlx5 subfunction
> > support.
> > 
> > Subfunctions came to solve the scaling issue of virtualization
> > and switchdev environments, where SRIOV failed to deliver and users
> > ran
> > out of VFs very quickly as SRIOV demands huge amount of physical
> > resources
> > in both of the servers and the NIC.
> > 
> > Subfunction provide the same functionality as SRIOV but in a very
> > lightweight manner, please see the thorough and detailed
> > documentation from Parav below, in the commit messages and the
> > Networking documentation patches at the end of this series.
> > 
> 
> Just to clarify a few things for myself. You mention virtualization
> and SR-IOV in your patch description but you cannot support direct
> assignment with this correct? The idea here is simply logical
> partitioning of an existing network interface, correct? So this isn't
> so much a solution for virtualization, but may work better for
> containers. I view this as an important distinction to make as the

at the current state yes, but the SF solution can be extended to
support direct assignment, so this is why i think SF solution can do
better and eventually replace SRIOV.
also many customers are currently using SRIOV with containers to get
the performance and isolation features since there was no other
options.

> first thing that came to mind when I read this was mediated devices
> which is similar, but focused only on the virtualization case:
> https://www.kernel.org/doc/html/v5.9/driver-api/vfio-mediated-device.html
> 
> > Parav Pandit Says:
> > =================
> > 
> > This patchset introduces support for mlx5 subfunction (SF).
> > 
> > A subfunction is a lightweight function that has a parent PCI
> > function on
> > which it is deployed. mlx5 subfunction has its own function
> > capabilities
> > and its own resources. This means a subfunction has its own
> > dedicated
> > queues(txq, rxq, cq, eq). These queues are neither shared nor
> > stealed from
> > the parent PCI function.
> 
> Rather than calling this a subfunction, would it make more sense to
> call it something such as a queue set? It seems like this is exposing
> some of the same functionality we did in the Intel drivers such as
> ixgbe and i40e via the macvlan offload interface. However the
> ixgbe/i40e hardware was somewhat limited in that we were only able to
> expose Ethernet interfaces via this sort of VMQ/VMDQ feature, and
> even
> with that we have seen some limitations to the interface. It sounds
> like you are able to break out RDMA capable devices this way as well.
> So in terms of ways to go I would argue this is likely better. 

We've discussed this thoroughly on V0, the SF solutions is closer to a
VF than a VMDQ, this is not just a set of queues.

https://lore.kernel.org/linux-rdma/421951d99a33d28b91f2b2997409d0c97fa5a98a.camel@kernel.org/

> However
> one downside is that we are going to end up seeing each subfunction
> being different from driver to driver and vendor to vendor which I
> would argue was also one of the problems with SR-IOV as you end up
> with a bit of vendor lock-in as a result of this feature since each
> vendor will be providing a different interface.
> 

I disagree, SFs are tightly coupled with switchdev model and devlink
functions port, they are backed with the a well defined model, i can
say the same about sriov with switchdev mode, this sort of vendor lock-
in issues is eliminated when you migrate to switchdev mode.

> > When subfunction is RDMA capable, it has its own QP1, GID table and
> > rdma
> > resources neither shared nor stealed from the parent PCI function.
> > 
> > A subfunction has dedicated window in PCI BAR space that is not
> > shared
> > with ther other subfunctions or parent PCI function. This ensures
> > that all
> > class devices of the subfunction accesses only assigned PCI BAR
> > space.
> > 
> > A Subfunction supports eswitch representation through which it
> > supports tc
> > offloads. User must configure eswitch to send/receive packets
> > from/to
> > subfunction port.
> > 
> > Subfunctions share PCI level resources such as PCI MSI-X IRQs with
> > their other subfunctions and/or with its parent PCI function.
> 
> This piece to the architecture for this has me somewhat concerned. If
> all your resources are shared and you are allowing devices to be

not all, only PCI MSIX, for now..

> created incrementally you either have to pre-partition the entire
> function which usually results in limited resources for your base
> setup, or free resources from existing interfaces and redistribute
> them as things change. I would be curious which approach you are
> taking here? So for example if you hit a certain threshold will you
> need to reset the port and rebalance the IRQs between the various
> functions?
> 

Currently SFs will use whatever IRQs the PF has pre-allocated for
itself, so there is no IRQ limit issue at the moment, we are
considering a dynamic IRQ pool with dynamic balancing, or even better
us the IMS approach, which perfectly fits the SF architecture. 
https://patchwork.kernel.org/project/linux-pci/cover/1568338328-22458-1-git-send-email-megha.dey@linux.intel.com/

for internal resources the are fully isolated (not shared) and
they are internally managed by FW exactly like a VF internal resources.




  parent reply	other threads:[~2020-12-15  6:16 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-14 21:43 [net-next v4 00/15] Add mlx5 subfunction support Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 01/15] net/mlx5: Fix compilation warning for 32-bit platform Saeed Mahameed
2020-12-14 22:31   ` Alexander Duyck
2020-12-14 22:45     ` Saeed Mahameed
2020-12-15  4:59     ` Leon Romanovsky
2020-12-14 21:43 ` [net-next v4 02/15] devlink: Prepare code to fill multiple port function attributes Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 03/15] devlink: Introduce PCI SF port flavour and port attribute Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 04/15] devlink: Support add and delete devlink port Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 05/15] devlink: Support get and set state of port function Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 06/15] net/mlx5: Introduce vhca state event notifier Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 07/15] net/mlx5: SF, Add auxiliary device support Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 08/15] net/mlx5: SF, Add auxiliary device driver Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 09/15] net/mlx5: E-switch, Prepare eswitch to handle SF vport Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 10/15] net/mlx5: E-switch, Add eswitch helpers for " Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 11/15] net/mlx5: SF, Add port add delete functionality Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 12/15] net/mlx5: SF, Port function state change support Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 13/15] devlink: Add devlink port documentation Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 14/15] devlink: Extend devlink port documentation for subfunctions Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 15/15] net/mlx5: Add devlink subfunction port documentation Saeed Mahameed
2020-12-15  1:53 ` [net-next v4 00/15] Add mlx5 subfunction support Alexander Duyck
2020-12-15  2:44   ` David Ahern
2020-12-15 16:16     ` Alexander Duyck
2020-12-15 16:59       ` Parav Pandit
2020-12-15  5:48   ` Parav Pandit
2020-12-15 18:47     ` Alexander Duyck
2020-12-15 20:05       ` Saeed Mahameed
2020-12-15 21:03       ` Jason Gunthorpe
2020-12-16  1:12       ` Edwin Peer
2020-12-16  2:39         ` Jason Gunthorpe
2020-12-16  3:12         ` Alexander Duyck
2020-12-15 20:59     ` David Ahern
2020-12-15  6:15   ` Saeed Mahameed [this message]
2020-12-15 19:12     ` Alexander Duyck
2020-12-15 20:35       ` Saeed Mahameed
2020-12-15 21:28         ` Jakub Kicinski
2020-12-16  6:50           ` Leon Romanovsky
2020-12-16 17:59             ` Saeed Mahameed
2020-12-15 21:41         ` Alexander Duyck
2020-12-16  0:19           ` Jason Gunthorpe
2020-12-16  2:19             ` Alexander Duyck
2020-12-16  3:03               ` Jason Gunthorpe
2020-12-16  4:13                 ` Alexander Duyck
2020-12-16  4:45                   ` Parav Pandit
2020-12-16 13:33                   ` Jason Gunthorpe
2020-12-16 16:31                     ` Alexander Duyck
2020-12-16 17:51                       ` Jason Gunthorpe
2020-12-16 19:27                         ` Alexander Duyck
2020-12-16 20:35                           ` Jason Gunthorpe
2020-12-16 22:53                             ` Alexander Duyck
2020-12-17  0:38                               ` Jason Gunthorpe
2020-12-17 18:48                                 ` Alexander Duyck
2020-12-17 19:40                                   ` Jason Gunthorpe
2020-12-17 21:05                                     ` Alexander Duyck
2020-12-18  0:08                                       ` Jason Gunthorpe
2020-12-18  1:30                               ` David Ahern
2020-12-18  3:11                                 ` Alexander Duyck
2020-12-18  3:55                                   ` David Ahern
2020-12-18 15:54                                     ` Alexander Duyck
2020-12-18  5:20                                   ` Parav Pandit
2020-12-18  5:36                                     ` Parav Pandit
2020-12-18 16:01                                     ` Alexander Duyck
2020-12-18 18:01                                       ` Parav Pandit
2020-12-18 19:22                                         ` Alexander Duyck
2020-12-18 20:18                                           ` Jason Gunthorpe
2020-12-19  0:03                                             ` Alexander Duyck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=608505778d76b1b01cb3e8d19ecda5b8578f0f79.camel@kernel.org \
    --to=saeed@kernel.org \
    --cc=alexander.duyck@gmail.com \
    --cc=dan.j.williams@intel.com \
    --cc=davem@davemloft.net \
    --cc=david.m.ertman@intel.com \
    --cc=dsahern@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=jacob.e.keller@intel.com \
    --cc=jgg@nvidia.com \
    --cc=kiran.patil@intel.com \
    --cc=kuba@kernel.org \
    --cc=leonro@nvidia.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=sridhar.samudrala@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.