netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Saeed Mahameed <saeed@kernel.org>, Jakub Kicinski <kuba@kernel.org>
Cc: Parav Pandit <parav@nvidia.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	Jiri Pirko <jiri@nvidia.com>,
	"dledford@redhat.com" <dledford@redhat.com>,
	Leon Romanovsky <leonro@nvidia.com>,
	"davem@davemloft.net" <davem@davemloft.net>
Subject: Re: [PATCH net-next 00/13] Add mlx5 subfunction support
Date: Thu, 19 Nov 2020 10:00:17 -0400	[thread overview]
Message-ID: <20201119140017.GN244516@ziepe.ca> (raw)
In-Reply-To: <28239ff66a27c0ddf8be4f1461e27b0ac0b02871.camel@kernel.org>

On Wed, Nov 18, 2020 at 10:22:51PM -0800, Saeed Mahameed wrote:
> > I think the biggest missing piece in my understanding is what's the
> > technical difference between an SF and a VDPA device.
> 
> Same difference as between a VF and netdev.
> SF == VF, so a full HW function.
> VDPA/RDMA/netdev/SCSI/nvme/etc.. are just interfaces (ULPs) sharing the
> same functions as always been, nothing new about this.

All the implementation details are very different, but this white
paper from Intel goes into some detail the basic elements and rational
for the SF concept:

https://software.intel.com/content/dam/develop/public/us/en/documents/intel-scalable-io-virtualization-technical-specification.pdf

What we are calling a sub-function here is a close cousin to what
Intel calls an Assignable Device Interface. I expect to see other
drivers following this general pattern eventually.

A SF will eventually be assignable to a VM and the VM won't be able to
tell the difference between a VF or SF providing the assignable PCI
resources.

VDPA is also assignable to a guest, but the key difference between
mlx5's SF and VDPA is what guest driver binds to the virtual PCI
function. For a SF the guest will bind mlx5_core, for VDPA the guest
will bind virtio-net.

So, the driver stack for a VM using VDPA might be

 Physical device [pci] -> mlx5_core -> [aux] -> SF -> [aux] ->  mlx5_core -> [aux] -> mlx5_vdpa -> QEMU -> |VM| -> [pci] -> virtio_net

When Parav is talking about creating VDPA devices he means attaching
the VDPA accelerator subsystem to a mlx5_core, where ever that
mlx5_core might be attached to.

To your other remark:

> > What are you NAK'ing?
> Spawning multiple netdevs from one device by slicing up its queues.

This is a bit vauge. In SRIOV a device spawns multiple netdevs for a
physical port by "slicing up its physical queues" - where do you see
the cross over between VMDq (bad) and SRIOV (ok)?

I thought the issue with VMDq was more on the horrid management to
configure the traffic splitting, not the actual splitting itself?

In classic SRIOV the traffic is split by a simple non-configurable HW
switch based on MAC address of the VF.

mlx5 already has the extended version of that idea, we can run in
switchdev mode and use switchdev to configure the HW switch. Now
configurable switchdev rules split the traffic for VFs.

This SF step replaces the VF in the above, but everything else is the
same. The switchdev still splits the traffic, it still ends up in same
nested netdev queue structure & RSS a VF/PF would use, etc, etc. No
queues are "stolen" to create the nested netdev.

From the driver perspective there is no significant difference between
sticking a netdev on a mlx5 VF or sticking a netdev on a mlx5 SF. A SF
netdev is not going in and doing deep surgery to the PF netdev to
steal queues or something.

Both VF and SF will be eventually assignable to guests, both can
support all the accelerator subsystems - VDPA, RDMA, etc. Both can
support netdev.

Compared to VMDq, I think it is really no comparison. SF/ADI is an
evolution of a SRIOV VF from something PCI-SGI controlled to something
device specific and lighter weight.

SF/ADI come with a architectural security boundary suitable for
assignment to an untrusted guest. It is not just a jumble of queues.

VMDq is .. not that.

Actually it has been one of the open debates in the virtualization
userspace world. The approach to use switchdev to control the traffic
splitting to VMs is elegant but many drivers are are not following
this design. :(

Finally, in the mlx5 model VDPA is just an "application". It asks the
device to create a 'RDMA' raw ethernet packet QP that is uses rings
formed in the virtio-net specification. We can create it in the kernel
using mlx5_vdpa, and we can create it in userspace through the RDMA
subsystem. Like any "RDMA" application it is contained by the security
boundary of the PF/VF/SF the mlx5_core is running on.

Jason

  reply	other threads:[~2020-11-19 14:00 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-12 19:24 [PATCH net-next 00/13] Add mlx5 subfunction support Parav Pandit
2020-11-12 19:24 ` [PATCH net-next 01/13] devlink: Prepare code to fill multiple port function attributes Parav Pandit
2020-11-12 19:24 ` [PATCH net-next 02/13] devlink: Introduce PCI SF port flavour and port attribute Parav Pandit
2020-11-12 19:24 ` [PATCH net-next 03/13] devlink: Support add and delete devlink port Parav Pandit
2020-11-18 16:21   ` David Ahern
2020-11-18 17:02     ` Parav Pandit
2020-11-18 18:03       ` David Ahern
2020-11-18 18:38         ` Jason Gunthorpe
2020-11-18 19:36           ` David Ahern
2020-11-18 20:42             ` Jason Gunthorpe
2020-11-18 19:22         ` Parav Pandit
2020-11-19  0:41           ` Jacob Keller
2020-11-19  1:17             ` David Ahern
2020-11-19  1:56               ` Samudrala, Sridhar
2020-11-19  0:52       ` Jacob Keller
2020-11-12 19:24 ` [PATCH net-next 04/13] devlink: Support get and set state of port function Parav Pandit
2020-11-12 19:24 ` [PATCH net-next 05/13] devlink: Avoid global devlink mutex, use per instance reload lock Parav Pandit
2020-11-12 19:24 ` [PATCH net-next 06/13] devlink: Introduce devlink refcount to reduce scope of global devlink_mutex Parav Pandit
2020-11-12 19:24 ` [PATCH net-next 07/13] net/mlx5: SF, Add auxiliary device support Parav Pandit
2020-12-07  2:48   ` David Ahern
2020-12-07  4:53     ` Parav Pandit
2020-11-12 19:24 ` [PATCH net-next 08/13] net/mlx5: SF, Add auxiliary device driver Parav Pandit
2020-11-12 19:24 ` [PATCH net-next 09/13] net/mlx5: E-switch, Prepare eswitch to handle SF vport Parav Pandit
2020-11-12 19:24 ` [PATCH net-next 10/13] net/mlx5: E-switch, Add eswitch helpers for " Parav Pandit
2020-11-12 19:24 ` [PATCH net-next 11/13] net/mlx5: SF, Add SF configuration hardware commands Parav Pandit
2020-11-12 19:24 ` [PATCH net-next 12/13] net/mlx5: SF, Add port add delete functionality Parav Pandit
2020-11-12 19:24 ` [PATCH net-next 13/13] net/mlx5: SF, Port function state change support Parav Pandit
2020-11-16 22:52 ` [PATCH net-next 00/13] Add mlx5 subfunction support Jakub Kicinski
2020-11-17  0:06   ` Saeed Mahameed
2020-11-17  1:58     ` Jakub Kicinski
2020-11-17  4:08       ` Parav Pandit
2020-11-17 17:11         ` Jakub Kicinski
2020-11-17 18:49           ` Jason Gunthorpe
2020-11-19  2:14             ` Jakub Kicinski
2020-11-19  4:35               ` David Ahern
2020-11-19  5:57                 ` Saeed Mahameed
2020-11-20  1:31                   ` Jakub Kicinski
2020-11-25  5:33                   ` David Ahern
2020-11-25  6:00                     ` Parav Pandit
2020-11-25 14:37                       ` David Ahern
2020-11-20  1:29                 ` Jakub Kicinski
2020-11-20 17:58                   ` Alexander Duyck
2020-11-20 19:04                     ` Samudrala, Sridhar
2020-11-23 21:51                       ` Saeed Mahameed
2020-11-24  7:01                       ` Jason Wang
2020-11-24  7:05                         ` Jason Wang
2020-11-19  6:12               ` Saeed Mahameed
2020-11-19  8:25                 ` Parav Pandit
2020-11-20  1:35                 ` Jakub Kicinski
2020-11-20  3:34                   ` Parav Pandit
2020-11-17 18:50           ` Parav Pandit
2020-11-19  2:23             ` Jakub Kicinski
2020-11-19  6:22               ` Saeed Mahameed
2020-11-19 14:00                 ` Jason Gunthorpe [this message]
2020-11-20  3:35                   ` Jakub Kicinski
2020-11-20  3:50                     ` Parav Pandit
2020-11-20 16:16                     ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201119140017.GN244516@ziepe.ca \
    --to=jgg@nvidia.com \
    --cc=davem@davemloft.net \
    --cc=dledford@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jiri@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=leonro@nvidia.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=parav@nvidia.com \
    --cc=saeed@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).