All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg KH <gregkh@linuxfoundation.org>
To: Parav Pandit <parav@mellanox.com>
Cc: "netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"michal.lkml@markovi.net" <michal.lkml@markovi.net>,
	"davem@davemloft.net" <davem@davemloft.net>,
	Jiri Pirko <jiri@mellanox.com>,
	Jakub Kicinski <jakub.kicinski@netronome.com>
Subject: Re: [RFC net-next 8/8] net/mlx5: Add subdev driver to bind to subdev devices
Date: Tue, 5 Mar 2019 20:27:29 +0100	[thread overview]
Message-ID: <20190305192729.GA17047@kroah.com> (raw)
In-Reply-To: <VI1PR0501MB2271AA91B5D29CA69778268BD1720@VI1PR0501MB2271.eurprd05.prod.outlook.com>

On Tue, Mar 05, 2019 at 05:57:58PM +0000, Parav Pandit wrote:
> 
> 
> > -----Original Message-----
> > From: Greg KH <gregkh@linuxfoundation.org>
> > Sent: Tuesday, March 5, 2019 1:14 AM
> > To: Parav Pandit <parav@mellanox.com>
> > Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org;
> > michal.lkml@markovi.net; davem@davemloft.net; Jiri Pirko
> > <jiri@mellanox.com>; Jakub Kicinski <jakub.kicinski@netronome.com>
> > Subject: Re: [RFC net-next 8/8] net/mlx5: Add subdev driver to bind to
> > subdev devices
> > 
> > On Fri, Mar 01, 2019 at 05:21:13PM +0000, Parav Pandit wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Greg KH <gregkh@linuxfoundation.org>
> > > > Sent: Friday, March 1, 2019 1:22 AM
> > > > To: Parav Pandit <parav@mellanox.com>
> > > > Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > > michal.lkml@markovi.net; davem@davemloft.net; Jiri Pirko
> > > > <jiri@mellanox.com>
> > > > Subject: Re: [RFC net-next 8/8] net/mlx5: Add subdev driver to bind
> > > > to subdev devices
> > > >
> > > > On Thu, Feb 28, 2019 at 11:37:52PM -0600, Parav Pandit wrote:
> > > > > Add a subdev driver to probe the subdev devices and create fake
> > > > > netdevice for it.
> > > >
> > > > So I'm guessing here is the "meat" of the whole goal here?
> > > >
> > > > You just want multiple netdevices per PCI device?  Why can't you do
> > > > that today in your PCI driver?
> > > >
> > > Yes, but it just not multiple netdevices.
> > > Let me please elaborate in detail.
> > >
> > > There is a swichdev mode of a PCI function for netdevices.
> > > In this mode a given netdev has additional control netdev (called
> > representor netdevice = rep-ndev).
> > > This rep-ndev is attached to OVS for adding rules, offloads etc using
> > standard tc, netfilter infra.
> > > Currently this rep-ndev controls switch side of the settings, but not the
> > host side of netdev.
> > > So there is discussion to create another netdev or devlink port..
> > >
> > > Additionally this subdev has optional rdma device too.
> > >
> > > And when we are in switchdev mode, this rdma dev has similar rdma rep
> > device for control.
> > >
> > > In some cases we actually don't create netdev when it is in InfiniBand
> > mode.
> > > Here there is PCI device->rdma_device.
> > >
> > > In other case, a given sub device for rdma is dual port device, having
> > netdevice for each that can use existing netdev->dev_port.
> > >
> > > Creating 4 devices of two different classes using one iproute2/ip or
> > iproute2/rdma command is horrible thing to do.
> > 
> > Why is that?
> > 
> When user creates the device, user tool needs to return a device handle that got created.
> Creating multiple devices doesn't make sense. I haven't seen any tool doing such crazy thing.

And what do you mean by "device handle"?  All you get here is a sysfs
device tree.

> > > In case if this sub device has to be a passthrough device, ip link command
> > will fail badly that day, because we are creating some sub device which is not
> > even a netdevice.
> > 
> > But it is a network device, right?
> > 
> When there is passthrough subdevice, there won't be netdevice created.
> We don't want to create passthrough subdevice using iproute2/ip tool which primarily works on netdevices.

I don't know enough networking to claim anything here, so I'll ignore
this :)

> > > So iproute2/devlink which works on bus+device, mainly PCI today, seems
> > right abstraction point to create sub devices.
> > > This also extends to map ports of the device, health, registers debug, etc
> > rich infrastructure that is already built.
> > >
> > > Additionally, we don't want mlx driver and other drivers to go through its
> > child devices (split logic in netdev and rdma) for power management.
> > 
> > And how is power management going to work with your new devices?  All
> > you have here is a tiny shim around a driver bus, 
> So subdevices power management is done before their parent's.
> Vendor driver doesn't need to iterate its child devices to suspend/resume it.

True, so we can just autosuspend these "children" device and the "vendor
driver" is not going to care?  You are going to care as you are talking
to the same PCI device.  This goes to the other question about "how are
you sharing PCI device resources?"

> > I do not see any new
> > functionality, and as others have said, no way to actually share, or split up,
> > the PCI resources.
> > 
> devlink tool create command will be able to accept more parameters during device creation time to share and split PCI resources.
> This is just the start of the development and RFC is to agree on direction.
> devlink tool has parameters options that can be queried/set and existing infra will be used for granular device config.

Pointers to this beast?

> > > Kernel core code does that well today, that we like to leverage through
> > subdev bus or mfd pm callbacks.
> > >
> > > So it is lot more than just creating netdevices.
> > 
> > But that's all you are showing here :)
> > 
> Starting use case is netdev and rdma, but we don't want to create new
> tools few months/a year later for passthrough mode or for different
> link layers etc.

And I don't want to see duplicated driver model code happen either,
which is why I point out the MFD layer :)

> > > > What problem are you trying to solve that others also are having
> > > > that requires all of this?
> > > >
> > > > Adding a new bus type and subsystem is fine, but usually we want
> > > > more than just one user of it, as this does not really show how it
> > > > is exercised very well.
> > > This subdev and devlink infrastructure solves this problem of creating
> > smaller sub devices out of one PCI device.
> > > Someone has to start.. :-)
> > 
> > That is what a mfd should allow you to do.
> > 
> I did cursory look at mfd.
> It lacks removing specific devices, but that is small. It can be
> enhanced to remove specific mfd device.

That should be easy enough, work with the MFD developers.  I think
something like that should work today as you can use USB devices with
MFD, right?

> > 
> > No, do not abuse a platform device. 
> Yes. that is my point mfd devices are platform devices.
> mfd creates platform devices. and to match to it, platfrom_register_driver() have to be called to bind to it.
> I do not know currently if we have the flexibility to say that instead of binding X driver, bind Y driver for platform devices.

try it :)

> > You should be able to just use a normal
> > PCI device for this just fine, and if not, we should be able to make the
> > needed changes to mfd for that.
> > 
> Ok. so parent pci device and mfd devices.
> mfd seems to fit this use case.
> Do you think 'Platform devices' section is stale in [1] for autonomy, host bridge, soc platform etc points?

Nope, they are still horrible things and I hate them :)

Maybe we should just make MFD create "virtual" devices (bare ones, no
need for platform stuff), and that would solve the issue of the platform
device bloat being drug around everywhere.

> Should we update the documentation to indicate that it can be used for
> non-autonomous, user created devices and it can be used for creating
> devices on top of PCI parent device etc?

Nope, leave it alone please.

thanks,

greg k-h

  reply	other threads:[~2019-03-05 19:27 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-01  5:37 [RFC net-next 0/8] Introducing subdev bus and devlink extension Parav Pandit
2019-03-01  5:37 ` [RFC net-next 1/8] subdev: Introducing subdev bus Parav Pandit
2019-03-01  7:17   ` Greg KH
2019-03-01 16:35     ` Parav Pandit
2019-03-01 17:00       ` Greg KH
2019-03-26 11:48     ` Lorenzo Pieralisi
2019-03-01  5:37 ` [RFC net-next 2/8] subdev: Introduce pm callbacks Parav Pandit
2019-03-01  5:37 ` [RFC net-next 3/8] modpost: Add support for subdev device id table Parav Pandit
2019-03-01  5:37 ` [RFC net-next 4/8] devlink: Introduce and use devlink_init/cleanup() in alloc/free Parav Pandit
2019-03-01  5:37 ` [RFC net-next 5/8] devlink: Add variant of devlink_register/unregister Parav Pandit
2019-03-01  5:37 ` [RFC net-next 6/8] devlink: Add support for devlink subdev lifecycle Parav Pandit
2019-03-01  5:37 ` [RFC net-next 7/8] net/mlx5: Add devlink subdev life cycle command support Parav Pandit
2019-03-01  7:18   ` Greg KH
2019-03-01 16:04     ` Parav Pandit
2019-03-01  5:37 ` [RFC net-next 8/8] net/mlx5: Add subdev driver to bind to subdev devices Parav Pandit
2019-03-01  7:21   ` Greg KH
2019-03-01 17:21     ` Parav Pandit
2019-03-05  7:13       ` Greg KH
2019-03-05 17:57         ` Parav Pandit
2019-03-05 19:27           ` Greg KH [this message]
2019-03-05 21:37             ` Parav Pandit
2019-03-01 22:12   ` Saeed Mahameed
2019-03-04 16:45     ` Parav Pandit
2019-03-01 20:03 ` [RFC net-next 0/8] Introducing subdev bus and devlink extension Jakub Kicinski
2019-03-04  4:41   ` Parav Pandit
2019-03-05  1:35     ` Jakub Kicinski
2019-03-05 19:46       ` Parav Pandit
2019-03-05 22:39         ` Kirti Wankhede
2019-03-05 23:17           ` Parav Pandit
2019-03-05 23:44             ` Parav Pandit
2019-03-06  0:44               ` Parav Pandit
2019-03-06  3:51                 ` Kirti Wankhede
2019-03-06  5:42                   ` Parav Pandit
2019-03-07 19:04                     ` Kirti Wankhede
2019-03-07 20:27                       ` Parav Pandit
2019-03-07 20:53                         ` Kirti Wankhede
2019-03-07 21:02                           ` Parav Pandit
2019-03-07 21:07                             ` Kirti Wankhede
2019-03-07 21:21                               ` Parav Pandit
2019-03-07 22:01                                 ` Kirti Wankhede
2019-03-07 22:31                                   ` Parav Pandit
2019-03-08 12:19                                     ` Kirti Wankhede
2019-03-08 17:09                                       ` Parav Pandit
2019-03-05  1:45     ` Jakub Kicinski
2019-03-05 16:52       ` Parav Pandit
2021-05-31 10:36         ` moyufeng
2021-06-01  5:37           ` Jakub Kicinski
2021-06-01  7:33             ` Yunsheng Lin
2021-06-01 21:34               ` Jakub Kicinski
2021-06-02  2:24                 ` Yunsheng Lin
2021-06-02 16:34                   ` Jakub Kicinski
2021-06-03  3:46                     ` Yunsheng Lin
2021-06-03 17:53                       ` Jakub Kicinski
2021-06-04  1:18                         ` Yunsheng Lin
2021-06-04 18:41                           ` Jakub Kicinski
2021-06-07  1:36                             ` Yunsheng Lin
2021-06-07 19:46                               ` Jakub Kicinski
2021-06-08 12:10                                 ` Yunsheng Lin
2021-06-08 17:29                                   ` Jakub Kicinski
2021-06-09  9:16                                     ` Yunsheng Lin
2021-06-09  9:38                                       ` Parav Pandit
2021-06-09 11:05                                         ` Yunsheng Lin
2021-06-09 11:59                                           ` Parav Pandit
2021-06-09 12:30                                             ` Yunsheng Lin
2021-06-09 13:45                                               ` Parav Pandit
2021-06-10  7:04                                                 ` Yunsheng Lin
2021-06-10  7:17                                                   ` Parav Pandit
2021-06-09 16:40                                       ` Jakub Kicinski
2021-06-10  6:52                                         ` Yunsheng Lin
2021-06-09  9:52                                   ` Parav Pandit
2021-06-09 11:16                                     ` Yunsheng Lin
2021-06-09 12:00                                       ` Parav Pandit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190305192729.GA17047@kroah.com \
    --to=gregkh@linuxfoundation.org \
    --cc=davem@davemloft.net \
    --cc=jakub.kicinski@netronome.com \
    --cc=jiri@mellanox.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michal.lkml@markovi.net \
    --cc=netdev@vger.kernel.org \
    --cc=parav@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.