KVM Archive on lore.kernel.org
 help / color / Atom feed
From: Jason Gunthorpe <jgg@ziepe.ca>
To: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Parav Pandit <parav@mellanox.com>, Jiri Pirko <jiri@resnulli.us>,
	David M <david.m.ertman@intel.com>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	Saeed Mahameed <saeedm@mellanox.com>,
	"kwankhede@nvidia.com" <kwankhede@nvidia.com>,
	"leon@kernel.org" <leon@kernel.org>,
	"cohuck@redhat.com" <cohuck@redhat.com>,
	Jiri Pirko <jiri@mellanox.com>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	Or Gerlitz <gerlitz.or@gmail.com>
Subject: Re: [PATCH net-next 00/19] Mellanox, mlx5 sub function support
Date: Fri, 8 Nov 2019 20:44:26 -0400
Message-ID: <20191109004426.GB31761@ziepe.ca> (raw)
In-Reply-To: <20191108134559.42fbceff@cakuba>

On Fri, Nov 08, 2019 at 01:45:59PM -0800, Jakub Kicinski wrote:

> > IMHO, mdev has amdev_parent_ops structure clearly intended to link it
> > to vfio, so using a mdev for something not related to vfio seems like
> > a poor choice.
> 
> Yes, my suggestion to use mdev was entirely based on the premise that
> the purpose of this work is to get vfio working.. otherwise I'm unclear
> as to why we'd need a bus in the first place. If this is just for
> containers - we have macvlan offload for years now, with no need for a
> separate device.

This SF thing is a full fledged VF function, it is not at all like
macvlan. This is perhaps less important for the netdev part of the
world, but the difference is very big for the RDMA side, and should
enable VFIO too..
 
> On the RDMA/Intel front, would you mind explaining what the main
> motivation for the special buses is? I'm a little confurious.

Well, the issue is driver binding. For years we have had these
multi-function netdev drivers that have a single PCI device which must
bind into multiple subsystems, ie mlx5 does netdev and RDMA, the cxgb
drivers do netdev, RDMA, SCSI initiator, SCSI target, etc. [And I
expect when NVMe over TCP rolls out we will have drivers like cxgb4
binding to 6 subsytems in total!]

Today most of this is a big hack where the PCI device binds to the
netdev driver and then the other drivers in different subsystems
'discover' that an appropriate netdev is plugged in using various
unique, hacky and ugly means. For instance cxgb4 duplicates a chunk of
the device core, see cxgb4_register_uld() for example. Other drivers
try to use netdev notifiers, and various other wild things.

So, the general concept is to use the driver model to manage driver
binding. A multi-subsystem driver would have several parts:

- A pci_driver which binds to the pci_device (the core)
  It creates, on a bus, struct ??_device's for the other subsystems
  that this HW supports. ie if the chip supports netdev then a 
  ??_device that binds to the netdev driver is created, same for RDMA

- A ??_driver in netdev binds to the device and accesses the core API
- A ??_driver in RDMA binds to the device and accesses the core API
- A ??_driver in SCSI binds to the device and accesses the core API

Now the driver model directly handles all binding, autoloading,
discovery, etc, and 'netdev' is just another consumer of 'core'
functionality.

For something like mlx5 the 'core' is the stuff in
drivers/net/ethernet/mellanox/mlx5/core/*.c, give or take. It is
broadly generic stuff like send commands, create queues, manage HW
resources, etc.

There has been some lack of clarity on what the ?? should be. People
have proposed platform and MFD, and those seem to be no-goes. So, it
looks like ?? will be a mlx5_driver on a mlx5_bus, and Intel will use
an ice_driver on a ice_bus, ditto for cxgb4, if I understand Greg's
guidance.

Though I'm wondering if we should have a 'multi_subsystem_device' that
was really just about passing a 'void *core_handle' from the 'core'
(ie the bus) to the driver (ie RDMA, netdev, etc). 

It seems weakly defined, but also exactly what every driver doing this
needs.. It is basically what this series is abusing mdev to accomplish.

> My understanding is MFD was created to help with cases where single
> device has multiple pieces of common IP in it. 

MFD really seems to be good at splitting a device when the HW is
orthogonal at the register level. Ie you might have regs 100-200 for
ethernet and 200-300 for RDMA.

But this is not how modern HW works, the functional division is more
subtle and more software based. ie on most devices a netdev and rdma
queue are nearly the same, just a few settings make them function
differently.

So what is needed isn't a split of register set like MFD specializes
in, but a unique per-driver API between the 'core' and 'subsystem'
parts of the multi-subsystem device.

> Do modern RDMA cards really share IP across generations? 

What is a generation? Mellanox has had a stable RDMA driver across
many sillicon generations. Intel looks like their new driver will
support at least the last two or more sillicon generations..

RDMA drivers are monstrous complex things, there is a big incentive to
not respin them every time a new chip comes out.

> Is there a need to reload the drivers for the separate pieces (I
> wonder if the devlink reload doesn't belong to the device model :().

Yes, it is already done, but without driver model support the only way
to reload the rdma driver is to unload the entire module as there is
no 'unbind'

Jason

  reply index

Thread overview: 132+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-07 16:04 Parav Pandit
2019-11-07 16:08 ` [PATCH net-next 01/19] net/mlx5: E-switch, Move devlink port close to eswitch port Parav Pandit
2019-11-07 16:08   ` [PATCH net-next 02/19] net/mlx5: E-Switch, Add SF vport, vport-rep support Parav Pandit
2019-11-07 16:08   ` [PATCH net-next 03/19] net/mlx5: Introduce SF table framework Parav Pandit
2019-11-07 16:08   ` [PATCH net-next 04/19] net/mlx5: Introduce SF life cycle APIs to allocate/free Parav Pandit
2019-11-07 16:08   ` [PATCH net-next 05/19] net/mlx5: E-Switch, Enable/disable SF's vport during SF life cycle Parav Pandit
2019-11-07 16:08   ` [PATCH net-next 06/19] net/mlx5: Add support for mediated devices in switchdev mode Parav Pandit
2019-11-08 10:32     ` Jiri Pirko
2019-11-08 16:03       ` Parav Pandit
2019-11-08 16:22         ` Jiri Pirko
2019-11-08 16:29           ` Parav Pandit
2019-11-08 18:01             ` Jiri Pirko
2019-11-08 18:04             ` Jiri Pirko
2019-11-08 18:21               ` Parav Pandit
2019-11-07 16:08   ` [PATCH net-next 07/19] vfio/mdev: Introduce sha1 based mdev alias Parav Pandit
2019-11-08 11:04     ` Jiri Pirko
2019-11-08 15:59       ` Parav Pandit
2019-11-08 16:28         ` Jiri Pirko
2019-11-08 11:10     ` Cornelia Huck
2019-11-08 16:03       ` Parav Pandit
2019-11-07 16:08   ` [PATCH net-next 08/19] vfio/mdev: Make mdev alias unique among all mdevs Parav Pandit
2019-11-08 10:49     ` Jiri Pirko
2019-11-08 15:13       ` Parav Pandit
2019-11-07 16:08   ` [PATCH net-next 09/19] vfio/mdev: Expose mdev alias in sysfs tree Parav Pandit
2019-11-08 13:22     ` Jiri Pirko
2019-11-08 18:03       ` Alex Williamson
2019-11-08 18:16         ` Jiri Pirko
2019-11-07 16:08   ` [PATCH net-next 10/19] vfio/mdev: Introduce an API mdev_alias Parav Pandit
2019-11-07 16:08   ` [PATCH net-next 11/19] vfio/mdev: Improvise mdev life cycle and parent removal scheme Parav Pandit
2019-11-08 13:01     ` Cornelia Huck
2019-11-08 16:12       ` Parav Pandit
2019-11-07 16:08   ` [PATCH net-next 12/19] devlink: Introduce mdev port flavour Parav Pandit
2019-11-07 20:38     ` Jakub Kicinski
2019-11-07 21:03       ` Parav Pandit
2019-11-08  1:17         ` Jakub Kicinski
2019-11-08  1:44           ` Parav Pandit
2019-11-08  2:20             ` Jakub Kicinski
2019-11-08  2:31               ` Parav Pandit
2019-11-08  9:46                 ` Jiri Pirko
2019-11-08 15:45                   ` Parav Pandit
2019-11-08 16:31                     ` Jiri Pirko
2019-11-08 16:43                       ` Parav Pandit
2019-11-08 18:11                         ` Jiri Pirko
2019-11-08 18:23                           ` Parav Pandit
2019-11-08 18:34                             ` Jiri Pirko
2019-11-08 18:56                               ` Parav Pandit
2019-11-08  9:30               ` Jiri Pirko
2019-11-08 15:41                 ` Parav Pandit
2019-11-07 16:08   ` [PATCH net-next 13/19] net/mlx5: Register SF devlink port Parav Pandit
2019-11-07 16:08   ` [PATCH net-next 14/19] net/mlx5: Share irqs between SFs and parent PCI device Parav Pandit
2019-11-07 16:08   ` [PATCH net-next 15/19] net/mlx5: Add load/unload routines for SF driver binding Parav Pandit
2019-11-08  9:48     ` Jiri Pirko
2019-11-08 11:13       ` Jiri Pirko
2019-11-07 16:08   ` [PATCH net-next 16/19] net/mlx5: Implement dma ops and params for mediated device Parav Pandit
2019-11-07 20:42     ` Jakub Kicinski
2019-11-07 21:30       ` Parav Pandit
2019-11-08  1:16         ` Jakub Kicinski
2019-11-08  6:37     ` Christoph Hellwig
2019-11-08 15:29       ` Parav Pandit
2019-11-07 16:08   ` [PATCH net-next 17/19] net/mlx5: Add mdev driver to bind to mdev devices Parav Pandit
2019-11-07 16:08   ` [PATCH net-next 18/19] Documentation: net: mlx5: Add mdev usage documentation Parav Pandit
2019-11-07 16:08   ` [PATCH net-next 19/19] mtty: Optionally support mtty alias Parav Pandit
2019-11-08  6:26     ` Leon Romanovsky
2019-11-08 10:45     ` Jiri Pirko
2019-11-08 15:08       ` Parav Pandit
2019-11-08 15:15         ` Jiri Pirko
2019-11-08 13:46     ` Cornelia Huck
2019-11-08 15:10       ` Parav Pandit
2019-11-08 15:28         ` Cornelia Huck
2019-11-08 15:30           ` Parav Pandit
2019-11-08 17:54             ` Alex Williamson
2019-11-08  9:51   ` [PATCH net-next 01/19] net/mlx5: E-switch, Move devlink port close to eswitch port Jiri Pirko
2019-11-08 15:50     ` Parav Pandit
2019-11-07 17:03 ` [PATCH net-next 00/19] Mellanox, mlx5 sub function support Leon Romanovsky
2019-11-07 20:10   ` Parav Pandit
2019-11-08  6:20     ` Leon Romanovsky
2019-11-08 15:01       ` Parav Pandit
2019-11-07 20:32 ` Jakub Kicinski
2019-11-07 20:52   ` Parav Pandit
2019-11-08  1:16     ` Jakub Kicinski
2019-11-08  1:49       ` Parav Pandit
2019-11-08  2:12         ` Jakub Kicinski
2019-11-08 12:12   ` Jiri Pirko
2019-11-08 14:40     ` Jason Gunthorpe
2019-11-08 15:40       ` Parav Pandit
2019-11-08 19:12         ` Jakub Kicinski
2019-11-08 20:12           ` Jason Gunthorpe
2019-11-08 20:20             ` Parav Pandit
2019-11-08 20:32               ` Jason Gunthorpe
2019-11-08 20:52                 ` gregkh
2019-11-08 20:34             ` Alex Williamson
2019-11-08 21:05               ` Jason Gunthorpe
2019-11-08 21:19                 ` gregkh
2019-11-08 21:52                 ` Alex Williamson
2019-11-08 22:48                   ` Parav Pandit
2019-11-09  0:57                     ` Jason Gunthorpe
2019-11-09 17:41                       ` Jakub Kicinski
2019-11-10 19:04                         ` Jason Gunthorpe
2019-11-10 19:48                       ` Parav Pandit
2019-11-11 14:17                         ` Jiri Pirko
2019-11-11 14:58                           ` Parav Pandit
2019-11-11 15:06                             ` Jiri Pirko
2019-11-19  4:51                               ` Parav Pandit
2019-11-09  0:12                   ` Jason Gunthorpe
2019-11-09  0:45                     ` Parav Pandit
2019-11-11  2:19                 ` Jason Wang
2019-11-08 21:45             ` Jakub Kicinski
2019-11-09  0:44               ` Jason Gunthorpe [this message]
2019-11-09  8:46                 ` gregkh
2019-11-09 11:18                   ` Jiri Pirko
2019-11-09 17:28                     ` Jakub Kicinski
2019-11-10  9:16                     ` gregkh
2019-11-09 17:27                 ` Jakub Kicinski
2019-11-10  9:18                   ` gregkh
2019-11-11  3:46                     ` Jakub Kicinski
2019-11-11  5:18                       ` Parav Pandit
2019-11-11 13:30                       ` Jiri Pirko
2019-11-11 14:14                         ` gregkh
2019-11-11 14:37                           ` Jiri Pirko
2019-11-10 19:37                   ` Jason Gunthorpe
2019-11-11  3:57                     ` Jakub Kicinski
2019-11-08 16:06     ` Parav Pandit
2019-11-08 19:06     ` Jakub Kicinski
2019-11-08 19:34       ` Parav Pandit
2019-11-08 19:48         ` Jakub Kicinski
2019-11-08 19:41       ` Jiri Pirko
2019-11-08 20:40         ` Parav Pandit
2019-11-08 21:21         ` Jakub Kicinski
2019-11-08 21:39           ` Jiri Pirko
2019-11-08 21:51             ` Jakub Kicinski
2019-11-08 22:21               ` Jiri Pirko
2019-11-07 23:57 ` David Miller

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191109004426.GB31761@ziepe.ca \
    --to=jgg@ziepe.ca \
    --cc=alex.williamson@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=davem@davemloft.net \
    --cc=david.m.ertman@intel.com \
    --cc=gerlitz.or@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jakub.kicinski@netronome.com \
    --cc=jiri@mellanox.com \
    --cc=jiri@resnulli.us \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=leon@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=parav@mellanox.com \
    --cc=saeedm@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git