linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Parav Pandit <parav@mellanox.com>
To: Kirti Wankhede <kwankhede@nvidia.com>,
	Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Or Gerlitz <gerlitz.or@gmail.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"michal.lkml@markovi.net" <michal.lkml@markovi.net>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	Jiri Pirko <jiri@mellanox.com>
Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension
Date: Tue, 5 Mar 2019 23:17:13 +0000	[thread overview]
Message-ID: <VI1PR0501MB2271962A3044DD29CE0E4C0CD1720@VI1PR0501MB2271.eurprd05.prod.outlook.com> (raw)
In-Reply-To: <54d846bc-cfa5-6665-efcb-a6c85e87763b@nvidia.com>

Hi Kirti,

> -----Original Message-----
> From: Kirti Wankhede <kwankhede@nvidia.com>
> Sent: Tuesday, March 5, 2019 4:40 PM
> To: Parav Pandit <parav@mellanox.com>; Jakub Kicinski
> <jakub.kicinski@netronome.com>
> Cc: Or Gerlitz <gerlitz.or@gmail.com>; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org; michal.lkml@markovi.net; davem@davemloft.net;
> gregkh@linuxfoundation.org; Jiri Pirko <jiri@mellanox.com>
> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
> 
> 
> 
> >> On Mon, 4 Mar 2019 04:41:01 +0000, Parav Pandit wrote:
> >>>> -----Original Message-----
> >>>> From: Jakub Kicinski <jakub.kicinski@netronome.com>
> >>>> Sent: Friday, March 1, 2019 2:04 PM
> >>>> To: Parav Pandit <parav@mellanox.com>; Or Gerlitz
> >>>> <gerlitz.or@gmail.com>
> >>>> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org;
> >>>> michal.lkml@markovi.net; davem@davemloft.net;
> >>>> gregkh@linuxfoundation.org; Jiri Pirko <jiri@mellanox.com>
> >>>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink
> >>>> extension
> >>>>
> >>>> On Thu, 28 Feb 2019 23:37:44 -0600, Parav Pandit wrote:
> >>>>> Requirements for above use cases:
> >>>>> --------------------------------
> >>>>> 1. We need a generic user interface & core APIs to create sub
> >>>>> devices from a parent pci device but should be generic enough for
> >>>>> other parent devices 2. Interface should be vendor agnostic 3.
> >>>>> User should be able to set device params at creation time 4. In
> >>>>> future if needed, tool should be able to create passthrough device
> >>>>> to map to a virtual machine
> >>>>
> >>>> Like a mediated device?
> >>>
> >>> Yes.
> >>>
> >>>> https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt
> >>>> https://www.dpdk.org/wp-
> content/uploads/sites/35/2018/06/Mediated-
> >>>> Devices-Better-Userland-IO.pdf
> >>>>
> >>>> Other than pass-through it is entirely unclear to me why you'd need
> >>>> a
> >> bus.
> >>>> (Or should I say VM pass through or DPDK?)  Could you clarify why
> >>>> the need for a bus?
> >>>>
> >>> A bus follow standard linux kernel device driver model to attach a
> >>> driver to specific device. Platform device with my limited
> >>> understanding looks a hack/abuse of it based on documentation [1],
> >>> but it can possibly be an alternative to bus if it looks fine to
> >>> Greg and others.
> >>
> >> I grok from this text that the main advantage you see is the ability
> >> to choose a driver for the subdevice.
> >>
> > Yes.
> >
> >>>> My thinking is that we should allow spawning subports in devlink
> >>>> and if user specifies "passthrough" the device spawned would be an
> mdev.
> >>>
> >>> devlink device is much more comprehensive way to create sub-devices
> >>> than sub-ports for at least below reasons.
> >>>
> >>> 1. devlink device already defines device->port relation which
> >>> enables to create multiport device.
> >>
> >> I presume that by devlink device you mean devlink instance?  Yes,
> >> this part I'm following.
> >>
> > Yes -> 'struct devlink'
> >>> subport breaks that.
> >>
> >> Breaks what?  The ability to create a devlink instance with multiple ports?
> >>
> > Right.
> >
> >>> 2. With bus model, it enables us to load driver of same vendor or
> >>> generic one such a vfio in future.
> >>
> 
> You can achieve this with mdev as well.
> 
> >> Yes, sorry, I'm not an expert on mdevs, but isn't that the goal of those?
> >> Could you go into more detail why not just use mdevs?
> >>
> > I am novice at mdev level too. mdev or vfio mdev.
> > Currently by default we bind to same vendor driver, but when it was
> created as passthrough device, vendor driver won't create netdevice or rdma
> device for it.
> > And vfio/mdev or whatever mature available driver would bind at that
> point.
> >
> 
> Using mdev framework, if you want to partition a physical device into
> multiple logic devices, you can bind those devices to same vendor driver
> through vfio-mdev, where as if you want to passthrough the device bind it to
> vfio-pci. If I understand correctly, that is what you are looking for.
> 
> 
We cannot bind a whole PCI device to vfio-pci, reason is,
A given PCI device has existing protocol devices on it such as netdevs and rdma dev.
This device is partitioned while those protocol devices exist and
mlx5_core, mlx5_ib drivers are loaded on it.
And we also need to connect these objects rightly to eswitch exposed 
by devlink interface (net/core/devlink.c) that supports
eswitch binding, health, registers, parameters, ports support.
It also supports existing PCI VFs.

I don’t think we want to replicate all of this again in mdev subsystem [1].

[1] https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt

So devlink interface to migrate users from managing VFs to
non_VF sub device is natural progression.

However, in future, I believe we would be creating mediated devices on user request,
to use mdev modules and map them to VM.

Also 'mdev_bus' is created as a class and not as a bus. This limits to not use 
devlink interface whose handle is bus+device name.

So one option is to change mdev from class to bus.
devlink will create mdevs on the bus, mdev driver can probe these devices on host system by default.
And if told to do passthrough, a different driver exposes them to VM.
How feasible is this?

> >>> 3. Devices live on the bus, mapping a subport to 'struct device' is
> >>> not intuitive.
> >>
> >> Are you saying that the main devlink instance would not have any port
> >> information for the subdevices?
> >>
> > Right, this newly created devlink device is the control point of its port(s).
> >
> >> Devices live on a bus.  Software constructs - depend on how one wants
> >> to model them - don't have to.
> >>
> >>> 4. sub-device allows to use existing devlink port, registers, health
> >>> infrastructure to sub devices, which otherwise need to be duplicated
> >>> for ports.
> >>
> >> Health stuff is not tied to a port, I'm not following you.  You can
> >> create a reporter per port, per ACL rule or per SB or per whatever your
> heart desires..
> >>
> > Instead of creating multiple reporters and inventing these reporter
> > naming schemes, creating devlink instance leverage all health reporting
> done for a devliink instance.
> > So whatever is done for instance A (parent), can be available for instance B
> (subdev).
> >
> >>> 5. Even though current devlink devices are networking devices, there
> >>> is nothing restricts it to be that way. So subport is a restricted
> >>> view.
> >>> 6. devlink device already covers
> >>> port sub-object, hence creating devlink device is desired.
> >>>
> >>>>> 5. A device can have multiple ports
> >>>>
> >>>> What does this mean, in practice?  You want to spawn a subdev which
> >>>> can access both ports?  That'd be for RDMA use cases, more than
> >>>> Ethernet, right?  (Just clarifying :))
> >>>>
> >>> Yep, you got it right. :-)
> >>>
> >>>>> So how is it done?
> >>>>> ------------------
> >>>>> (a) user in control
> >>>>> To address above requirements, a generic tool iproute2/devlink is
> >>>>> extended for sub device's life cycle.
> >>>>> However a devlink tool and its kernel counter part is not
> >>>>> sufficient to create protocol agnostic devices on a existing PCI
> >>>>> bus.
> >>>>
> >>>> "Protocol agnostic"?...  What does that mean?
> >>>>
> >>> Devlink works on bus,device model. It doesn't matter what class of
> >>> device is. For example, for pci class can be anything. So newly
> >>> created sub-devices are not limited to netdev/rdma devices. Its
> >>> agnostic to protocol. More importantly, we don't want to create
> >>> these sub-devices who bus type is 'pci'. Because as described below,
> >>> PCI has its addressing scheme and pci bus must not have mix-n match
> devices.
> >>>
> >>> So probably better wording should be, 'a devlink tool and its kernel
> >>> counterpart is not sufficient to create sub-devices of same class as
> >>> that of PCI device.
> >>
> >> Let me clarify - for networking devices the partition will most
> >> likely end up as a subport, but its not a requirement that each partition
> must be a subport..
> >> The question was about the necessity to invent a new bus, and have
> >> every resource have a struct device..
> >>
> >
> > A device object and bus connecting all software objects correctly.
> > This includes, 1. devlink bus/name handle based access 2. matching
> > such device in sysfs 3. parent child hierarchy in sysfs 4. ability to
> > bind different driver 5. multi-ports per device 6. still usable for
> > single port use case 7. parameters setting at devlink instance level
> > 8. parent-child relation handling power mgmt 9. follows standard linux
> > driver model
> >
> > Some are achievable to through mfd too, instead of subdev bus.
> > Will follow Greg's guidance on this.
> >
> 
> I think you can achieve all the above points with mdev framework as well.
> Check samples at samples/vfio-mdev/ in kernel for quick understanding.
> 
> Thanks,
> Kirti

  reply	other threads:[~2019-03-05 23:17 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-01  5:37 [RFC net-next 0/8] Introducing subdev bus and devlink extension Parav Pandit
2019-03-01  5:37 ` [RFC net-next 1/8] subdev: Introducing subdev bus Parav Pandit
2019-03-01  7:17   ` Greg KH
2019-03-01 16:35     ` Parav Pandit
2019-03-01 17:00       ` Greg KH
2019-03-26 11:48     ` Lorenzo Pieralisi
2019-03-01  5:37 ` [RFC net-next 2/8] subdev: Introduce pm callbacks Parav Pandit
2019-03-01  5:37 ` [RFC net-next 3/8] modpost: Add support for subdev device id table Parav Pandit
2019-03-01  5:37 ` [RFC net-next 4/8] devlink: Introduce and use devlink_init/cleanup() in alloc/free Parav Pandit
2019-03-01  5:37 ` [RFC net-next 5/8] devlink: Add variant of devlink_register/unregister Parav Pandit
2019-03-01  5:37 ` [RFC net-next 6/8] devlink: Add support for devlink subdev lifecycle Parav Pandit
2019-03-01  5:37 ` [RFC net-next 7/8] net/mlx5: Add devlink subdev life cycle command support Parav Pandit
2019-03-01  7:18   ` Greg KH
2019-03-01 16:04     ` Parav Pandit
2019-03-01  5:37 ` [RFC net-next 8/8] net/mlx5: Add subdev driver to bind to subdev devices Parav Pandit
2019-03-01  7:21   ` Greg KH
2019-03-01 17:21     ` Parav Pandit
2019-03-05  7:13       ` Greg KH
2019-03-05 17:57         ` Parav Pandit
2019-03-05 19:27           ` Greg KH
2019-03-05 21:37             ` Parav Pandit
2019-03-01 22:12   ` Saeed Mahameed
2019-03-04 16:45     ` Parav Pandit
2019-03-01 20:03 ` [RFC net-next 0/8] Introducing subdev bus and devlink extension Jakub Kicinski
2019-03-04  4:41   ` Parav Pandit
2019-03-05  1:35     ` Jakub Kicinski
2019-03-05 19:46       ` Parav Pandit
2019-03-05 22:39         ` Kirti Wankhede
2019-03-05 23:17           ` Parav Pandit [this message]
2019-03-05 23:44             ` Parav Pandit
2019-03-06  0:44               ` Parav Pandit
2019-03-06  3:51                 ` Kirti Wankhede
2019-03-06  5:42                   ` Parav Pandit
2019-03-07 19:04                     ` Kirti Wankhede
2019-03-07 20:27                       ` Parav Pandit
2019-03-07 20:53                         ` Kirti Wankhede
2019-03-07 21:02                           ` Parav Pandit
2019-03-07 21:07                             ` Kirti Wankhede
2019-03-07 21:21                               ` Parav Pandit
2019-03-07 22:01                                 ` Kirti Wankhede
2019-03-07 22:31                                   ` Parav Pandit
2019-03-08 12:19                                     ` Kirti Wankhede
2019-03-08 17:09                                       ` Parav Pandit
2019-03-05  1:45     ` Jakub Kicinski
2019-03-05 16:52       ` Parav Pandit
2021-05-31 10:36         ` moyufeng
2021-06-01  5:37           ` Jakub Kicinski
2021-06-01  7:33             ` Yunsheng Lin
2021-06-01 21:34               ` Jakub Kicinski
2021-06-02  2:24                 ` Yunsheng Lin
2021-06-02 16:34                   ` Jakub Kicinski
2021-06-03  3:46                     ` Yunsheng Lin
2021-06-03 17:53                       ` Jakub Kicinski
2021-06-04  1:18                         ` Yunsheng Lin
2021-06-04 18:41                           ` Jakub Kicinski
2021-06-07  1:36                             ` Yunsheng Lin
2021-06-07 19:46                               ` Jakub Kicinski
2021-06-08 12:10                                 ` Yunsheng Lin
2021-06-08 17:29                                   ` Jakub Kicinski
2021-06-09  9:16                                     ` Yunsheng Lin
2021-06-09  9:38                                       ` Parav Pandit
2021-06-09 11:05                                         ` Yunsheng Lin
2021-06-09 11:59                                           ` Parav Pandit
2021-06-09 12:30                                             ` Yunsheng Lin
2021-06-09 13:45                                               ` Parav Pandit
2021-06-10  7:04                                                 ` Yunsheng Lin
2021-06-10  7:17                                                   ` Parav Pandit
2021-06-09 16:40                                       ` Jakub Kicinski
2021-06-10  6:52                                         ` Yunsheng Lin
2021-06-09  9:52                                   ` Parav Pandit
2021-06-09 11:16                                     ` Yunsheng Lin
2021-06-09 12:00                                       ` Parav Pandit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=VI1PR0501MB2271962A3044DD29CE0E4C0CD1720@VI1PR0501MB2271.eurprd05.prod.outlook.com \
    --to=parav@mellanox.com \
    --cc=davem@davemloft.net \
    --cc=gerlitz.or@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jakub.kicinski@netronome.com \
    --cc=jiri@mellanox.com \
    --cc=kwankhede@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michal.lkml@markovi.net \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).