All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexander Duyck <alexander.duyck@gmail.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Saeed Mahameed <saeed@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Leon Romanovsky <leonro@nvidia.com>,
	Netdev <netdev@vger.kernel.org>,
	linux-rdma@vger.kernel.org, David Ahern <dsahern@kernel.org>,
	Jacob Keller <jacob.e.keller@intel.com>,
	Sridhar Samudrala <sridhar.samudrala@intel.com>,
	"Ertman, David M" <david.m.ertman@intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Kiran Patil <kiran.patil@intel.com>,
	Greg KH <gregkh@linuxfoundation.org>
Subject: Re: [net-next v4 00/15] Add mlx5 subfunction support
Date: Tue, 15 Dec 2020 18:19:18 -0800	[thread overview]
Message-ID: <CAKgT0UeLBzqh=7gTLtqpOaw7HTSjG+AjXB7EkYBtwA6EJBccbg@mail.gmail.com> (raw)
In-Reply-To: <20201216001946.GF552508@nvidia.com>

On Tue, Dec 15, 2020 at 4:20 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Tue, Dec 15, 2020 at 01:41:04PM -0800, Alexander Duyck wrote:
>
> > > not just devlink and switchdev, auxbus was also introduced to
> > > standardize some of the interfaces.
> >
> > The auxbus is just there to make up for the fact that there isn't
> > another bus type for this though. I would imagine otherwise this would
> > be on some sort of platform bus.
>
> Please lets not start this again. This was gone over with Greg for
> literally a year and a half and he explicitly NAK'd platform bus for
> this purpose.
>
> Aux bus exists to connect different kernel subsystems that touch the
> same HW block together. Here we have the mlx5_core subsystem, vdpa,
> rdma, and netdev all being linked together using auxbus.
>
> It is kind of like what MFD does, but again, using MFD for this was
> also NAK'd by Greg.

Sorry I wasn't saying we needed to use MFD or platform bus. I'm aware
of the discussions I was just saying there are some parallels, not
that we needed to go through and use that.

> At the very worst we might sometime find out there is some common
> stuff between ADIs that we might get an ADI bus, but I'm not
> optimistic. So far it looks like there is no commonality.
>
> Aux bus has at least 4 users already in various stages of submission,
> and many other target areas that should be replaced by it.

Aux bus is fine and I am happy with it. I was just saying that auxbus
isn't something that should really be used to say that this is
significantly different from mdev as they both rely on a bus topology.

> > I would really like to see is a solid standardization of what this is.
> > Otherwise the comparison is going to be made. Especially since a year
> > ago Mellanox was pushing this as an mdev type interface.
>
> mdev was NAK'd too.
>
> mdev is only for creating /dev/vfio/*.

Agreed. However my worry is that as we start looking to make this
support virtualization it will still end up swinging more toward mdev.
I would much prefer to make certain that any mdev is a flavor and
doesn't end up coloring the entire interface.

> > That is all well and good. However if we agree that SR-IOV wasn't done
> > right saying that you are spinning up something that works just like
> > SR-IOV isn't all that appealing, is it?
>
> Fitting into some universal least-common-denominator was never a goal
> for SR-IOV, so I wouldn't agree it was done wrong.

It isn't so much about right or wrong but he use cases. My experience
has been that SR-IOV ends up being used for very niche use cases where
you are direct assigning it into either DPDK or some NFV VM and you
are essentially building the application around the NIC. It is all
well and good, but for general virtualization it never really caught
on.

My thought is that if we are going to do this we need to do this in a
way that improves the usability, otherwise we are looking at more
niche use cases.

> > I am talking about my perspective. From what I have seen, one-off
> > features that are only available from specific vendors are a pain to
> > deal with and difficult to enable when you have to support multiple
> > vendors within your ecosystem. What you end up going for is usually
> > the lowest common denominator because you ideally want to be able to
> > configure all your devices the same and have one recipe for setup.
>
> So encourage other vendors to support the switchdev model for managing
> VFs and ADIs!

Ugh, don't get me started on switchdev. The biggest issue as I see it
with switchev is that you have to have a true switch in order to
really be able to use it. As such dumbed down hardware like the ixgbe
for instance cannot use it since it defaults to outputting anything
that doesn't have an existing rule to the external port. If we could
tweak the design to allow for more dumbed down hardware it would
probably be much easier to get wider adoption.

Honestly, the switchdev interface isn't what I was talking about. I
was talking about the SF interface, not the switchdev side of it. In
my mind you can place your complexity into the switchdev side of the
interface, but keep the SF interface simple. Then you can back it with
whatever you want, but without having to have a vendor specific
version of the interface being plugged into the guest or container.

One of the reasons why virtio-net is being pushed as a common
interface for vendors is for this reason. It is an interface that can
be emulated by software or hardware and it allows the guest to run on
any arbitrary hardware.

> > I'm not saying you cannot enable those features. However at the same
> > time I am saying it would be nice to have a vendor neutral way of
> > dealing with those if we are going to support SF, ideally with some
> > sort of software fallback that may not perform as well but will at
> > least get us the same functionality.
>
> Is it really true there is no way to create a software device on a
> switchdev today? I looked for a while and couldn't find
> anything. openvswitch can do this, so it does seem like a gap, but
> this has nothing to do with this series.

It has plenty to do with this series. This topic has been under
discussion since something like 2017 when Mellanox first brought it up
at Netdev 2.1. At the time I told them they should implement this as a
veth offload. Then it becomes obvious what the fallback becomes as you
can place packets into one end of a veth and it comes out the other,
just like a switchdev representor and the SF in this case. It would
make much more sense to do it this way rather than setting up yet
another vendor proprietary interface pair.

> A software switchdev path should still end up with the representor and
> user facing netdev, and the behavior of the two netdevs should be
> identical to the VF switchdev flow we already have today.
>
> SF doesn't change any of this, it just shines a light that, yes,
> people actually have been using VFs with netdevs in containers and
> switchdev, as part of their operations.
>
> FWIW, I view this as a positive because it shows the switchdev model
> is working very well and seeing adoption beyond the original idea of
> controlling VMs with SRIOV.

PF/VF isolation is a given. So the existing switchdev behavior is fine
in that regard. I wouldn't expect that to change. The fact is we
actually support something similar in the macvlan approach we put in
the Intel drivers since the macvlan itself provided an option for
isolation or to hairpin the traffic if you wanted to allow the VMDq
instances to be bypassed. That was another thing I view as a huge
feature as broadcast/multicast traffic can really get ugly when the
devices are all separate pipelines and being able to switch that off
and just do software replication and hairpinning can be extremely
useful.

> > I'm trying to remember which netdev conference it was. I referred to
> > this as a veth switchdev offload when something like this was first
> > brought up.
>
> Sure, though I think the way you'd create such a thing might be
> different. These APIs are really about creating an ADI that might be
> assigned to a VM and never have a netdev.
>
> It would be nonsense to create a veth-switchdev thing with out a
> netdev, and there have been various past attempts already NAK'd to
> transform a netdev into an ADI.
>
> Anyhow, if such a thing exists someday it could make sense to
> automatically substitute the HW version using a SF, if available.

The main problem as I see it is the fact that the SF interface is
bound too tightly to the hardware. The direct pass-thru with no
hairpin is always an option but if we are going to have an interface
where both ends are in the same system there are many cases where
always pushing all the packets off to the hardware don't necessarily
make sense.

> > could address those needs would be a good way to go for this as it
> > would force everyone to come together and define a standardized
> > feature set that all of the vendors would want to expose.
>
> I would say switchdev is already the standard feature set.

Yes, it is a standard feature set for the control plane. However for
the data-path it is somewhat limited as I feel it only describes what
goes through the switch. Not the interfaces that are exposed as the
endpoints. It is the problem of that last bit and how it is handled
that can make things ugly. For example the multicast/broadcast
replication problem that just occurred to me while writing this up.
The fact is for east-west traffic there has always been a problem with
the switchdev model as it limits everything to PCIe/DMA so there are
cases where software switches can outperform the hardware ones.

  reply	other threads:[~2020-12-16  2:20 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-14 21:43 [net-next v4 00/15] Add mlx5 subfunction support Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 01/15] net/mlx5: Fix compilation warning for 32-bit platform Saeed Mahameed
2020-12-14 22:31   ` Alexander Duyck
2020-12-14 22:45     ` Saeed Mahameed
2020-12-15  4:59     ` Leon Romanovsky
2020-12-14 21:43 ` [net-next v4 02/15] devlink: Prepare code to fill multiple port function attributes Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 03/15] devlink: Introduce PCI SF port flavour and port attribute Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 04/15] devlink: Support add and delete devlink port Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 05/15] devlink: Support get and set state of port function Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 06/15] net/mlx5: Introduce vhca state event notifier Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 07/15] net/mlx5: SF, Add auxiliary device support Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 08/15] net/mlx5: SF, Add auxiliary device driver Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 09/15] net/mlx5: E-switch, Prepare eswitch to handle SF vport Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 10/15] net/mlx5: E-switch, Add eswitch helpers for " Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 11/15] net/mlx5: SF, Add port add delete functionality Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 12/15] net/mlx5: SF, Port function state change support Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 13/15] devlink: Add devlink port documentation Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 14/15] devlink: Extend devlink port documentation for subfunctions Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 15/15] net/mlx5: Add devlink subfunction port documentation Saeed Mahameed
2020-12-15  1:53 ` [net-next v4 00/15] Add mlx5 subfunction support Alexander Duyck
2020-12-15  2:44   ` David Ahern
2020-12-15 16:16     ` Alexander Duyck
2020-12-15 16:59       ` Parav Pandit
2020-12-15  5:48   ` Parav Pandit
2020-12-15 18:47     ` Alexander Duyck
2020-12-15 20:05       ` Saeed Mahameed
2020-12-15 21:03       ` Jason Gunthorpe
2020-12-16  1:12       ` Edwin Peer
2020-12-16  2:39         ` Jason Gunthorpe
2020-12-16  3:12         ` Alexander Duyck
2020-12-15 20:59     ` David Ahern
2020-12-15  6:15   ` Saeed Mahameed
2020-12-15 19:12     ` Alexander Duyck
2020-12-15 20:35       ` Saeed Mahameed
2020-12-15 21:28         ` Jakub Kicinski
2020-12-16  6:50           ` Leon Romanovsky
2020-12-16 17:59             ` Saeed Mahameed
2020-12-15 21:41         ` Alexander Duyck
2020-12-16  0:19           ` Jason Gunthorpe
2020-12-16  2:19             ` Alexander Duyck [this message]
2020-12-16  3:03               ` Jason Gunthorpe
2020-12-16  4:13                 ` Alexander Duyck
2020-12-16  4:45                   ` Parav Pandit
2020-12-16 13:33                   ` Jason Gunthorpe
2020-12-16 16:31                     ` Alexander Duyck
2020-12-16 17:51                       ` Jason Gunthorpe
2020-12-16 19:27                         ` Alexander Duyck
2020-12-16 20:35                           ` Jason Gunthorpe
2020-12-16 22:53                             ` Alexander Duyck
2020-12-17  0:38                               ` Jason Gunthorpe
2020-12-17 18:48                                 ` Alexander Duyck
2020-12-17 19:40                                   ` Jason Gunthorpe
2020-12-17 21:05                                     ` Alexander Duyck
2020-12-18  0:08                                       ` Jason Gunthorpe
2020-12-18  1:30                               ` David Ahern
2020-12-18  3:11                                 ` Alexander Duyck
2020-12-18  3:55                                   ` David Ahern
2020-12-18 15:54                                     ` Alexander Duyck
2020-12-18  5:20                                   ` Parav Pandit
2020-12-18  5:36                                     ` Parav Pandit
2020-12-18 16:01                                     ` Alexander Duyck
2020-12-18 18:01                                       ` Parav Pandit
2020-12-18 19:22                                         ` Alexander Duyck
2020-12-18 20:18                                           ` Jason Gunthorpe
2020-12-19  0:03                                             ` Alexander Duyck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKgT0UeLBzqh=7gTLtqpOaw7HTSjG+AjXB7EkYBtwA6EJBccbg@mail.gmail.com' \
    --to=alexander.duyck@gmail.com \
    --cc=dan.j.williams@intel.com \
    --cc=davem@davemloft.net \
    --cc=david.m.ertman@intel.com \
    --cc=dsahern@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=jacob.e.keller@intel.com \
    --cc=jgg@nvidia.com \
    --cc=kiran.patil@intel.com \
    --cc=kuba@kernel.org \
    --cc=leonro@nvidia.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=saeed@kernel.org \
    --cc=sridhar.samudrala@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.