All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Saeed Mahameed <saeed@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Leon Romanovsky <leonro@nvidia.com>,
	Netdev <netdev@vger.kernel.org>, <linux-rdma@vger.kernel.org>,
	David Ahern <dsahern@kernel.org>,
	Jacob Keller <jacob.e.keller@intel.com>,
	Sridhar Samudrala <sridhar.samudrala@intel.com>,
	"Ertman, David M" <david.m.ertman@intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Kiran Patil <kiran.patil@intel.com>,
	Greg KH <gregkh@linuxfoundation.org>
Subject: Re: [net-next v4 00/15] Add mlx5 subfunction support
Date: Wed, 16 Dec 2020 09:33:09 -0400	[thread overview]
Message-ID: <20201216133309.GI552508@nvidia.com> (raw)
In-Reply-To: <CAKgT0UcwP67ihaTWLY1XsVKEgysa3HnjDn_q=Sgvqnt=Uc7YQg@mail.gmail.com>

On Tue, Dec 15, 2020 at 08:13:21PM -0800, Alexander Duyck wrote:

> > > Ugh, don't get me started on switchdev. The biggest issue as I see it
> > > with switchev is that you have to have a true switch in order to
> > > really be able to use it.
> >
> > That cuts both ways, suggesting HW with a true switch model itself
> > with VMDq is equally problematic.
> 
> Yes and no. For example the macvlan offload I had setup could be
> configured both ways and it made use of VMDq. I'm not necessarily
> arguing that we need to do VMDq here, however at the same time saying
> that this is only meant to replace SR-IOV becomes problematic since we
> already have SR-IOV so why replace it with something that has many of
> the same limitations?

Why? Because SR-IOV is the *only* option for many use cases. Still. I
said this already, something more generic does not magicaly eliminate
SR-IOV.

The SIOV ADI model is a small refinement to the existing VF scheme, it
is completely parallel to making more generic things.

It is not "repeating mistakes" it is accepting the limitations of
SR-IOV because benefits exist and applications need those benefits.
 
> That said I understand your argument, however I view the elimination
> of SR-IOV to be something we do after we get this interface right and
> can justify doing so. 

Elimination of SR-IOV isn't even a goal here!

> Also it might be useful to call out the flavours and planned flavours
> in the cover page. Admittedly the description is somewhat lacking in
> that regard.

This is more of a general switchdev remark though. In the swithdev
model you have a the switch and a switch port. Each port has a
swichdev representor on the switch side and a "user port" of some
kind.

It can be a physical thing:
 - SFP
 - QSFP
 - WiFi Antennae

It could be a semi-physical thing outside the view of the kernel:
 - SmartNIC VF/SF attached to another CPU

It can be a semi-physical thing in view of this kernel:
 - SRIOV VF (struct pci device)
 - SF (struct aux device)

It could be a SW construct in this kernel:
 - netdev (struct net device)

*all* of these different port types are needed. Probably more down the
road!

Notice I don't have VPDA, VF/SF netdev, or virtio-mdev as a "user
port" type here. Instead creating the user port pci or aux device
allows the user to use the Linux driver model to control what happens
to the pci/aux device next.

> I would argue that is one of the reasons why this keeps being
> compared to either VMDq or VMQ as it is something that SR-IOV has
> yet to fully replace and has many features that would be useful in
> an interface that is a subpartition of an existing interface.

In what sense does switchdev and a VF not fully replace macvlan VMDq?

> The Intel drivers still have the macvlan as the assignable ADI and
> make use of VMDq to enable it.

Is this in-tree or only in the proprietary driver? AFAIK there is no
in-tree way to extract the DMA queue from the macvlan netdev into
userspace..

Remeber all this VF/SF/VDPA stuff results in a HW dataplane, not a SW
one. It doesn't really make sense to compare a SW dataplane to a HW
one. HW dataplanes come with limitations and require special driver
code.

> The limitation as I see it is that the macvlan interface doesn't allow
> for much in the way of custom offloads and the Intel hardware doesn't
> support switchdev. As such it is good for a basic interface, but
> doesn't really do well in terms of supporting advanced vendor-specific
> features.

I don't know what it is that prevents Intel from modeling their
selector HW in switchdev, but I think it is on them to work with the
switchdev folks to figure something out.

I'm a bit surprised HW that can do macvlan can't be modeled with
switchdev? What is missing?

> > That is goal here. This is not about creating just a netdev, this is
> > about the whole kit: rdma, netdev, vdpa virtio-net, virtio-mdev.
> 
> One issue is right now we are only seeing the rdma and netdev. It is
> kind of backwards as it is using the ADIs on the host when this was
> really meant to be used for things like mdev.

This is second 15 patch series on this path already. It is not
possible to pack every single thing into this series. This is the
micro step of introducing the SF idea and using SF==VF to show how the
driver stack works. The minimal changing to the existing drivers
implies this can support an ADI as well.

Further, this does already show an ADI! vdpa_mlx5 will run on the
VF/SF and eventually causes qemu to build a virtio-net ADI that
directly passes HW DMA rings into the guest.

Isn't this exactly the kind of generic SRIOV replacement option you
have been asking for? Doesn't this completely supersede stuff built on
macvlan?

> expected to work. The swtichdev API puts some restrictions in place
> but there still ends up being parts without any definition.

I'm curious what you see as needing definition here? 

The SRIOV model has the HW register programming API is device
specific.

The switchdev model is: no matter what HW register programing is done
on the VF/SF all the packets tx/rx'd will flow through the switchdev.

The purpose of switchdev/SRIOV/SIOV has never been to define a single
"one register set to rule them all".

That is the area that VDPA virtio-net and others are covering.

Jason

  parent reply	other threads:[~2020-12-16 13:34 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-14 21:43 [net-next v4 00/15] Add mlx5 subfunction support Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 01/15] net/mlx5: Fix compilation warning for 32-bit platform Saeed Mahameed
2020-12-14 22:31   ` Alexander Duyck
2020-12-14 22:45     ` Saeed Mahameed
2020-12-15  4:59     ` Leon Romanovsky
2020-12-14 21:43 ` [net-next v4 02/15] devlink: Prepare code to fill multiple port function attributes Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 03/15] devlink: Introduce PCI SF port flavour and port attribute Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 04/15] devlink: Support add and delete devlink port Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 05/15] devlink: Support get and set state of port function Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 06/15] net/mlx5: Introduce vhca state event notifier Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 07/15] net/mlx5: SF, Add auxiliary device support Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 08/15] net/mlx5: SF, Add auxiliary device driver Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 09/15] net/mlx5: E-switch, Prepare eswitch to handle SF vport Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 10/15] net/mlx5: E-switch, Add eswitch helpers for " Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 11/15] net/mlx5: SF, Add port add delete functionality Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 12/15] net/mlx5: SF, Port function state change support Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 13/15] devlink: Add devlink port documentation Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 14/15] devlink: Extend devlink port documentation for subfunctions Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 15/15] net/mlx5: Add devlink subfunction port documentation Saeed Mahameed
2020-12-15  1:53 ` [net-next v4 00/15] Add mlx5 subfunction support Alexander Duyck
2020-12-15  2:44   ` David Ahern
2020-12-15 16:16     ` Alexander Duyck
2020-12-15 16:59       ` Parav Pandit
2020-12-15  5:48   ` Parav Pandit
2020-12-15 18:47     ` Alexander Duyck
2020-12-15 20:05       ` Saeed Mahameed
2020-12-15 21:03       ` Jason Gunthorpe
2020-12-16  1:12       ` Edwin Peer
2020-12-16  2:39         ` Jason Gunthorpe
2020-12-16  3:12         ` Alexander Duyck
2020-12-15 20:59     ` David Ahern
2020-12-15  6:15   ` Saeed Mahameed
2020-12-15 19:12     ` Alexander Duyck
2020-12-15 20:35       ` Saeed Mahameed
2020-12-15 21:28         ` Jakub Kicinski
2020-12-16  6:50           ` Leon Romanovsky
2020-12-16 17:59             ` Saeed Mahameed
2020-12-15 21:41         ` Alexander Duyck
2020-12-16  0:19           ` Jason Gunthorpe
2020-12-16  2:19             ` Alexander Duyck
2020-12-16  3:03               ` Jason Gunthorpe
2020-12-16  4:13                 ` Alexander Duyck
2020-12-16  4:45                   ` Parav Pandit
2020-12-16 13:33                   ` Jason Gunthorpe [this message]
2020-12-16 16:31                     ` Alexander Duyck
2020-12-16 17:51                       ` Jason Gunthorpe
2020-12-16 19:27                         ` Alexander Duyck
2020-12-16 20:35                           ` Jason Gunthorpe
2020-12-16 22:53                             ` Alexander Duyck
2020-12-17  0:38                               ` Jason Gunthorpe
2020-12-17 18:48                                 ` Alexander Duyck
2020-12-17 19:40                                   ` Jason Gunthorpe
2020-12-17 21:05                                     ` Alexander Duyck
2020-12-18  0:08                                       ` Jason Gunthorpe
2020-12-18  1:30                               ` David Ahern
2020-12-18  3:11                                 ` Alexander Duyck
2020-12-18  3:55                                   ` David Ahern
2020-12-18 15:54                                     ` Alexander Duyck
2020-12-18  5:20                                   ` Parav Pandit
2020-12-18  5:36                                     ` Parav Pandit
2020-12-18 16:01                                     ` Alexander Duyck
2020-12-18 18:01                                       ` Parav Pandit
2020-12-18 19:22                                         ` Alexander Duyck
2020-12-18 20:18                                           ` Jason Gunthorpe
2020-12-19  0:03                                             ` Alexander Duyck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201216133309.GI552508@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=alexander.duyck@gmail.com \
    --cc=dan.j.williams@intel.com \
    --cc=davem@davemloft.net \
    --cc=david.m.ertman@intel.com \
    --cc=dsahern@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=jacob.e.keller@intel.com \
    --cc=kiran.patil@intel.com \
    --cc=kuba@kernel.org \
    --cc=leonro@nvidia.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=saeed@kernel.org \
    --cc=sridhar.samudrala@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.