All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Saeed Mahameed <saeed@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Leon Romanovsky <leonro@nvidia.com>,
	Netdev <netdev@vger.kernel.org>, <linux-rdma@vger.kernel.org>,
	David Ahern <dsahern@kernel.org>,
	Jacob Keller <jacob.e.keller@intel.com>,
	Sridhar Samudrala <sridhar.samudrala@intel.com>,
	"Ertman, David M" <david.m.ertman@intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Kiran Patil <kiran.patil@intel.com>,
	Greg KH <gregkh@linuxfoundation.org>
Subject: Re: [net-next v4 00/15] Add mlx5 subfunction support
Date: Tue, 15 Dec 2020 23:03:51 -0400	[thread overview]
Message-ID: <20201216030351.GH552508@nvidia.com> (raw)
In-Reply-To: <CAKgT0UeLBzqh=7gTLtqpOaw7HTSjG+AjXB7EkYBtwA6EJBccbg@mail.gmail.com>

On Tue, Dec 15, 2020 at 06:19:18PM -0800, Alexander Duyck wrote:

> > > I would really like to see is a solid standardization of what this is.
> > > Otherwise the comparison is going to be made. Especially since a year
> > > ago Mellanox was pushing this as an mdev type interface.
> >
> > mdev was NAK'd too.
> >
> > mdev is only for creating /dev/vfio/*.
> 
> Agreed. However my worry is that as we start looking to make this
> support virtualization it will still end up swinging more toward
> mdev.

Of course. mdev is also the only way to create a /dev/vfio/* :)

So all paths that want to use vfio must end up creating a mdev.

Here we would choose to create the mdev on top of the SF aux device.
There isn't really anything mlx5 specific about that decision. 

The SF models the vendor specific ADI in the driver model.

> It isn't so much about right or wrong but he use cases. My experience
> has been that SR-IOV ends up being used for very niche use cases where
> you are direct assigning it into either DPDK or some NFV VM and you
> are essentially building the application around the NIC. It is all
> well and good, but for general virtualization it never really caught
> on.

Sure
 
> > So encourage other vendors to support the switchdev model for managing
> > VFs and ADIs!
> 
> Ugh, don't get me started on switchdev. The biggest issue as I see it
> with switchev is that you have to have a true switch in order to
> really be able to use it. 

That cuts both ways, suggesting HW with a true switch model itself
with VMDq is equally problematic.

> As such dumbed down hardware like the ixgbe for instance cannot use
> it since it defaults to outputting anything that doesn't have an
> existing rule to the external port. If we could tweak the design to
> allow for more dumbed down hardware it would probably be much easier
> to get wider adoption.

I'd agree with this

> interface, but keep the SF interface simple. Then you can back it with
> whatever you want, but without having to have a vendor specific
> version of the interface being plugged into the guest or container.

The entire point *is* to create the vendor version because that serves
the niche cases where SRIOV assignment is already being used.

Having a general solution that can't do vendor SRIOV is useful for
other application, but doesn't eliminate the need for the SRIOV case.

> One of the reasons why virtio-net is being pushed as a common
> interface for vendors is for this reason. It is an interface that can
> be emulated by software or hardware and it allows the guest to run on
> any arbitrary hardware.

Yes, and there is mlx5_vdpa to support this usecase, and it binds to
the SF. Of course all of that is vendor specific too, the driver to
convert HW specifc register programming into a virio-net ADI has to
live *somewhere*

> It has plenty to do with this series. This topic has been under
> discussion since something like 2017 when Mellanox first brought it up
> at Netdev 2.1. At the time I told them they should implement this as a
> veth offload. 

veth doesn't give an ADI, it is useless for these niche cases.

veth offload might be interesting for some container case, but feels
like writing an enormous amount of code to accomplish nothing new...

> Then it becomes obvious what the fallback becomes as you can place
> packets into one end of a veth and it comes out the other, just like
> a switchdev representor and the SF in this case. It would make much
> more sense to do it this way rather than setting up yet another
> vendor proprietary interface pair.

I agree it makes sense to have an all SW veth-like option, but I
wouldn't try to make that as the entry point for all the HW
acceleration or to serve the niche SRIOV use cases, or to represent an
ADI.

It just can't do that and it would make a huge mess if you tried to
force it. Didn't Intel already try this once with trying to use the
macvlan netdev and its queue offload to build an ADI?

> > Anyhow, if such a thing exists someday it could make sense to
> > automatically substitute the HW version using a SF, if available.
> 
> The main problem as I see it is the fact that the SF interface is
> bound too tightly to the hardware. 

That is goal here. This is not about creating just a netdev, this is
about the whole kit: rdma, netdev, vdpa virtio-net, virtio-mdev.

The SF has to support all of that completely. Focusing only on the
one use case of netdevs in containers misses the bigger picture. 

Yes, lots of this stuff is niche, but niche stuff needs to be
supported too.

> Yes, it is a standard feature set for the control plane. However for
> the data-path it is somewhat limited as I feel it only describes what
> goes through the switch.

Sure, I think that is its main point.

> Not the interfaces that are exposed as the endpoints. 

It came from modeling physical HW so the endports are 'physical'
things like actual HW switch ports, or SRIOV VFs, ADI, etc.

> It is the problem of that last bit and how it is handled that can
> make things ugly. For example the multicast/broadcast replication
> problem that just occurred to me while writing this up.  The fact is
> for east-west traffic there has always been a problem with the
> switchdev model as it limits everything to PCIe/DMA so there are
> cases where software switches can outperform the hardware ones.

Yes, but, mixing CPU and DMA in the same packet delivery scheme is
very complicated :)

Jason

  reply	other threads:[~2020-12-16  3:05 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-14 21:43 [net-next v4 00/15] Add mlx5 subfunction support Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 01/15] net/mlx5: Fix compilation warning for 32-bit platform Saeed Mahameed
2020-12-14 22:31   ` Alexander Duyck
2020-12-14 22:45     ` Saeed Mahameed
2020-12-15  4:59     ` Leon Romanovsky
2020-12-14 21:43 ` [net-next v4 02/15] devlink: Prepare code to fill multiple port function attributes Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 03/15] devlink: Introduce PCI SF port flavour and port attribute Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 04/15] devlink: Support add and delete devlink port Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 05/15] devlink: Support get and set state of port function Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 06/15] net/mlx5: Introduce vhca state event notifier Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 07/15] net/mlx5: SF, Add auxiliary device support Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 08/15] net/mlx5: SF, Add auxiliary device driver Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 09/15] net/mlx5: E-switch, Prepare eswitch to handle SF vport Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 10/15] net/mlx5: E-switch, Add eswitch helpers for " Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 11/15] net/mlx5: SF, Add port add delete functionality Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 12/15] net/mlx5: SF, Port function state change support Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 13/15] devlink: Add devlink port documentation Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 14/15] devlink: Extend devlink port documentation for subfunctions Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 15/15] net/mlx5: Add devlink subfunction port documentation Saeed Mahameed
2020-12-15  1:53 ` [net-next v4 00/15] Add mlx5 subfunction support Alexander Duyck
2020-12-15  2:44   ` David Ahern
2020-12-15 16:16     ` Alexander Duyck
2020-12-15 16:59       ` Parav Pandit
2020-12-15  5:48   ` Parav Pandit
2020-12-15 18:47     ` Alexander Duyck
2020-12-15 20:05       ` Saeed Mahameed
2020-12-15 21:03       ` Jason Gunthorpe
2020-12-16  1:12       ` Edwin Peer
2020-12-16  2:39         ` Jason Gunthorpe
2020-12-16  3:12         ` Alexander Duyck
2020-12-15 20:59     ` David Ahern
2020-12-15  6:15   ` Saeed Mahameed
2020-12-15 19:12     ` Alexander Duyck
2020-12-15 20:35       ` Saeed Mahameed
2020-12-15 21:28         ` Jakub Kicinski
2020-12-16  6:50           ` Leon Romanovsky
2020-12-16 17:59             ` Saeed Mahameed
2020-12-15 21:41         ` Alexander Duyck
2020-12-16  0:19           ` Jason Gunthorpe
2020-12-16  2:19             ` Alexander Duyck
2020-12-16  3:03               ` Jason Gunthorpe [this message]
2020-12-16  4:13                 ` Alexander Duyck
2020-12-16  4:45                   ` Parav Pandit
2020-12-16 13:33                   ` Jason Gunthorpe
2020-12-16 16:31                     ` Alexander Duyck
2020-12-16 17:51                       ` Jason Gunthorpe
2020-12-16 19:27                         ` Alexander Duyck
2020-12-16 20:35                           ` Jason Gunthorpe
2020-12-16 22:53                             ` Alexander Duyck
2020-12-17  0:38                               ` Jason Gunthorpe
2020-12-17 18:48                                 ` Alexander Duyck
2020-12-17 19:40                                   ` Jason Gunthorpe
2020-12-17 21:05                                     ` Alexander Duyck
2020-12-18  0:08                                       ` Jason Gunthorpe
2020-12-18  1:30                               ` David Ahern
2020-12-18  3:11                                 ` Alexander Duyck
2020-12-18  3:55                                   ` David Ahern
2020-12-18 15:54                                     ` Alexander Duyck
2020-12-18  5:20                                   ` Parav Pandit
2020-12-18  5:36                                     ` Parav Pandit
2020-12-18 16:01                                     ` Alexander Duyck
2020-12-18 18:01                                       ` Parav Pandit
2020-12-18 19:22                                         ` Alexander Duyck
2020-12-18 20:18                                           ` Jason Gunthorpe
2020-12-19  0:03                                             ` Alexander Duyck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201216030351.GH552508@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=alexander.duyck@gmail.com \
    --cc=dan.j.williams@intel.com \
    --cc=davem@davemloft.net \
    --cc=david.m.ertman@intel.com \
    --cc=dsahern@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=jacob.e.keller@intel.com \
    --cc=kiran.patil@intel.com \
    --cc=kuba@kernel.org \
    --cc=leonro@nvidia.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=saeed@kernel.org \
    --cc=sridhar.samudrala@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.