linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Parav Pandit <parav@mellanox.com>
To: Kirti Wankhede <kwankhede@nvidia.com>,
	Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Or Gerlitz <gerlitz.or@gmail.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"michal.lkml@markovi.net" <michal.lkml@markovi.net>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	Jiri Pirko <jiri@mellanox.com>,
	Alex Williamson <alex.williamson@redhat.com>
Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension
Date: Thu, 7 Mar 2019 22:31:37 +0000	[thread overview]
Message-ID: <VI1PR0501MB22713073D559C8A6B656ECC6D14C0@VI1PR0501MB2271.eurprd05.prod.outlook.com> (raw)
In-Reply-To: <931d95b3-7153-4bfc-353b-7226d4ea9465@nvidia.com>



> -----Original Message-----
> From: Kirti Wankhede <kwankhede@nvidia.com>
> Sent: Thursday, March 7, 2019 4:02 PM
> To: Parav Pandit <parav@mellanox.com>; Jakub Kicinski
> <jakub.kicinski@netronome.com>
> Cc: Or Gerlitz <gerlitz.or@gmail.com>; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org; michal.lkml@markovi.net; davem@davemloft.net;
> gregkh@linuxfoundation.org; Jiri Pirko <jiri@mellanox.com>; Alex
> Williamson <alex.williamson@redhat.com>
> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
> 
> 
> 
> On 3/8/2019 2:51 AM, Parav Pandit wrote:
> >
> >
> >> -----Original Message-----
> >> From: Kirti Wankhede <kwankhede@nvidia.com>
> >> Sent: Thursday, March 7, 2019 3:08 PM
> >> To: Parav Pandit <parav@mellanox.com>; Jakub Kicinski
> >> <jakub.kicinski@netronome.com>
> >> Cc: Or Gerlitz <gerlitz.or@gmail.com>; netdev@vger.kernel.org; linux-
> >> kernel@vger.kernel.org; michal.lkml@markovi.net;
> davem@davemloft.net;
> >> gregkh@linuxfoundation.org; Jiri Pirko <jiri@mellanox.com>; Alex
> >> Williamson <alex.williamson@redhat.com>
> >> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink
> >> extension
> >>
> >>
> >>
> >> On 3/8/2019 2:32 AM, Parav Pandit wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Kirti Wankhede <kwankhede@nvidia.com>
> >>>> Sent: Thursday, March 7, 2019 2:54 PM
> >>>> To: Parav Pandit <parav@mellanox.com>; Jakub Kicinski
> >>>> <jakub.kicinski@netronome.com>
> >>>> Cc: Or Gerlitz <gerlitz.or@gmail.com>; netdev@vger.kernel.org;
> >>>> linux- kernel@vger.kernel.org; michal.lkml@markovi.net;
> >> davem@davemloft.net;
> >>>> gregkh@linuxfoundation.org; Jiri Pirko <jiri@mellanox.com>; Alex
> >>>> Williamson <alex.williamson@redhat.com>
> >>>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink
> >>>> extension
> >>>>
> >>>>
> >>>>
> >>>> <snip>
> >>>>
> >>>>>>>
> >>>>>>> Yes. I got my patches to adapt to mdev way. Will be posting RFC
> >>>>>>> v2
> >> soon.
> >>>>>>> Will wait for a day to receive more comments/views from Greg and
> >>>> others.
> >>>>>>>
> >>>>>>> As I explained in this cover-letter and discussion, First use
> >>>>>>> case is to create and use mdevs in the host (and not in VM).
> >>>>>>> Later on, I am sure once we have mdevs available, VM users will
> >>>>>>> likely use
> >>>>>> it.
> >>>>>>>
> >>>>>>> So, mlx5_core driver will have two components as starting point.
> >>>>>>>
> >>>>>>> 1. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev.c
> >>>>>>> This is mdev device life cycle driver which will do,
> >>>>>>> mdev_register_device()
> >>>>>> and implements mlx5_mdev_ops.
> >>>>>>>
> >>>>>> Ok. I would suggest not use mdev.c file name, may be add device
> >>>>>> name, something like mlx_mdev.c or vfio_mlx.c
> >>>>>>
> >>>>> mlx5/core is coding convention is not following to prefix mlx to
> >>>>> its
> >>>>> 40+
> >>>> files.
> >>>>>
> >>>>> it uses actual subsystem or functionality name, such as, sriov.c
> >>>>> eswitch.c fw.c en_tc.c (en for Ethernet) lag.c so, mdev.c aligns
> >>>>> to rest of the 40+ files.
> >>>>>
> >>>>>
> >>>>>>> 2. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev_driver.c
> >>>>>>> This is mdev device driver which does mdev_register_driver() and
> >>>>>>> probe() creates netdev by heavily reusing existing code of the
> >>>>>>> PF
> >> device.
> >>>>>>> These drivers will not be placed under drivers/vfio/mdev,
> >>>>>>> because this is
> >>>>>> not a vfio driver.
> >>>>>>> This is fine, right?
> >>>>>>>
> >>>>>>
> >>>>>> I'm not too familiar with netdev, but can you create netdev on
> >>>>>> open() call on mlx mdev device? Then you don't have to write mdev
> >>>>>> device
> >>>> driver.
> >>>>>>
> >>>>> Who invokes open() and release()?
> >>>>> I believe it is the qemu would do open(), release, read/write/mmap?
> >>>>>
> >>>>> Assuming that is the case,
> >>>>> I think its incorrect to create netdev in open.
> >>>>> Because when we want to map the mdev to VM using above mdev
> calls,
> >>>>> we
> >>>> actually wont be creating netdev in host.
> >>>>> Instead, some queues etc will be setup as part of these calls.
> >>>>>
> >>>>> By default this created mdev is bound to vfio_mdev.
> >>>>> And once we unbind the device from this driver, we need to bind to
> >>>>> mlx5
> >>>> driver so that driver can create the netdev etc.
> >>>>>
> >>>>> Or did I get open() and friends call wrong?
> >>>>>
> >>>>
> >>>> In 'struct mdev_parent_ops' there are create() and remove(). When
> >>>> user creates mdev device by writing UUID to create sysfs, vendor
> >>>> driver's
> >>>> create() callback gets called. This should be used to
> >>>> allocate/commit
> >>> Yes. I am already past that stage.
> >>>
> >>>> resources from parent device and on remove() callback free those
> >> resources.
> >>>> So there is no need to bind mlx5 driver to that mdev device.
> >>>>
> >>> If we don't bind mlx5 driver, vfio_mdev driver is bound to it. Such
> >>> driver
> >> won't create netdev.
> >>
> >> Doesn't need to.
> >>
> >> Create netdev from create() callback.
> >>
> > I strongly believe this is incorrect way to use create() API.
> > Because,
> > mdev is mediated device from its primary pci device. It is not a protocol
> device.
> >
> > It it also incorrect to tell user that vfio_mdev driver is bound to this mdev
> and mlx5_core driver creating netdev on top of mdev.
> >
> 
> vfio_mdev is generic common driver.
> Vendor driver who want to partition its device should handle its child
> creation and its life cycle. What is wrong in that? Why netdev has to be
> created from probe() only and not from create()?
> 
I am not suggesting to invent any new probe() method.
create() is generic mdev creation entry point.
When create() is implemented by vendor driver, vendor driver doesn't know if this mdev will be provisioned for VM or for host.
So it must do only generic mdev init sequence.
This means, it cannot create netdev here. As simple as that.

When user wants to use this mdev in a host, user will first unbind it from vfio_mdev driver and binds this mdev to mlx5_driver.
probe() of mlx5_core driver is called who did mdev_register_driver.
At this point netdev is created.

If user wants to use this mdev for VM, than vfio_mdev driver and qemu will control it via open/release friend functions.

> > When we want to map this mdev to VM, what should create() do?
> 
> Mediated device should be created before it is mapped to VM.
> 
Of course.
Let me rephrase the question: 
what shouldn't be done by create() when it wants to map to VM?
Answer is: it shouldn't create a netdev, but do necessary initialization so that it can be mapped to VM.
Because netdev will be created inside the VM not in the host.

create() simply doesn't know during creation time, where this mdev will be used (VM or host).
So it doesn't make any sense to create netdev in create().

I hope it's clear now.

> If you look at the sequence of mdev device creation:
> - 'struct device' is created with bus 'mdev_bus_type'
> - register the device - device_register(&mdev->dev) -> which calls vfio_mdev's
> probe() -> common code for all vendor drivers
> - mdev_device_create_ops() -> calls vendor driver's create() -> this is for
> vendor specific allocation and initialization. This is the callback from where
> you can do what you want to do for mdev device creation and initialization.
> Why it has to be named as probe()?

I do not intent to create any new probe().
I think I explained the flow well above - 
i.e. role of mdev_driver->probe() vs mdev_device->create().

> 
> > We will have to shift the code from create() to mdev_device_driver()-
> >probe() to address a use case of selectively mapping a mdev to VM or to
> host and implement appropriate open/close etc functions for VM case.
> >
> > So why not start correctly from the beginning?
> >
> 
> What is wrong with current implementation which is being used and tested
> for multiple devices?
> 
Oh, nothing wrong in current implementation.
Which current implementation provisions mdev in host (and not in guest VM)?

I am just using right code split of already available mdev.
When user wants to map a device to VM, attach vfio_mdev driver and create vfio_device.
When user wants to use a device in host, don't attach vfio_mdev driver, instead attach, appropriate driver what owns this mdev.

Again,  I am not inventing any new probe().
We will use all existing infra of mdev and core kernel to bind/unbind driver with device.

  reply	other threads:[~2019-03-07 22:31 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-01  5:37 [RFC net-next 0/8] Introducing subdev bus and devlink extension Parav Pandit
2019-03-01  5:37 ` [RFC net-next 1/8] subdev: Introducing subdev bus Parav Pandit
2019-03-01  7:17   ` Greg KH
2019-03-01 16:35     ` Parav Pandit
2019-03-01 17:00       ` Greg KH
2019-03-26 11:48     ` Lorenzo Pieralisi
2019-03-01  5:37 ` [RFC net-next 2/8] subdev: Introduce pm callbacks Parav Pandit
2019-03-01  5:37 ` [RFC net-next 3/8] modpost: Add support for subdev device id table Parav Pandit
2019-03-01  5:37 ` [RFC net-next 4/8] devlink: Introduce and use devlink_init/cleanup() in alloc/free Parav Pandit
2019-03-01  5:37 ` [RFC net-next 5/8] devlink: Add variant of devlink_register/unregister Parav Pandit
2019-03-01  5:37 ` [RFC net-next 6/8] devlink: Add support for devlink subdev lifecycle Parav Pandit
2019-03-01  5:37 ` [RFC net-next 7/8] net/mlx5: Add devlink subdev life cycle command support Parav Pandit
2019-03-01  7:18   ` Greg KH
2019-03-01 16:04     ` Parav Pandit
2019-03-01  5:37 ` [RFC net-next 8/8] net/mlx5: Add subdev driver to bind to subdev devices Parav Pandit
2019-03-01  7:21   ` Greg KH
2019-03-01 17:21     ` Parav Pandit
2019-03-05  7:13       ` Greg KH
2019-03-05 17:57         ` Parav Pandit
2019-03-05 19:27           ` Greg KH
2019-03-05 21:37             ` Parav Pandit
2019-03-01 22:12   ` Saeed Mahameed
2019-03-04 16:45     ` Parav Pandit
2019-03-01 20:03 ` [RFC net-next 0/8] Introducing subdev bus and devlink extension Jakub Kicinski
2019-03-04  4:41   ` Parav Pandit
2019-03-05  1:35     ` Jakub Kicinski
2019-03-05 19:46       ` Parav Pandit
2019-03-05 22:39         ` Kirti Wankhede
2019-03-05 23:17           ` Parav Pandit
2019-03-05 23:44             ` Parav Pandit
2019-03-06  0:44               ` Parav Pandit
2019-03-06  3:51                 ` Kirti Wankhede
2019-03-06  5:42                   ` Parav Pandit
2019-03-07 19:04                     ` Kirti Wankhede
2019-03-07 20:27                       ` Parav Pandit
2019-03-07 20:53                         ` Kirti Wankhede
2019-03-07 21:02                           ` Parav Pandit
2019-03-07 21:07                             ` Kirti Wankhede
2019-03-07 21:21                               ` Parav Pandit
2019-03-07 22:01                                 ` Kirti Wankhede
2019-03-07 22:31                                   ` Parav Pandit [this message]
2019-03-08 12:19                                     ` Kirti Wankhede
2019-03-08 17:09                                       ` Parav Pandit
2019-03-05  1:45     ` Jakub Kicinski
2019-03-05 16:52       ` Parav Pandit
2021-05-31 10:36         ` moyufeng
2021-06-01  5:37           ` Jakub Kicinski
2021-06-01  7:33             ` Yunsheng Lin
2021-06-01 21:34               ` Jakub Kicinski
2021-06-02  2:24                 ` Yunsheng Lin
2021-06-02 16:34                   ` Jakub Kicinski
2021-06-03  3:46                     ` Yunsheng Lin
2021-06-03 17:53                       ` Jakub Kicinski
2021-06-04  1:18                         ` Yunsheng Lin
2021-06-04 18:41                           ` Jakub Kicinski
2021-06-07  1:36                             ` Yunsheng Lin
2021-06-07 19:46                               ` Jakub Kicinski
2021-06-08 12:10                                 ` Yunsheng Lin
2021-06-08 17:29                                   ` Jakub Kicinski
2021-06-09  9:16                                     ` Yunsheng Lin
2021-06-09  9:38                                       ` Parav Pandit
2021-06-09 11:05                                         ` Yunsheng Lin
2021-06-09 11:59                                           ` Parav Pandit
2021-06-09 12:30                                             ` Yunsheng Lin
2021-06-09 13:45                                               ` Parav Pandit
2021-06-10  7:04                                                 ` Yunsheng Lin
2021-06-10  7:17                                                   ` Parav Pandit
2021-06-09 16:40                                       ` Jakub Kicinski
2021-06-10  6:52                                         ` Yunsheng Lin
2021-06-09  9:52                                   ` Parav Pandit
2021-06-09 11:16                                     ` Yunsheng Lin
2021-06-09 12:00                                       ` Parav Pandit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=VI1PR0501MB22713073D559C8A6B656ECC6D14C0@VI1PR0501MB2271.eurprd05.prod.outlook.com \
    --to=parav@mellanox.com \
    --cc=alex.williamson@redhat.com \
    --cc=davem@davemloft.net \
    --cc=gerlitz.or@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jakub.kicinski@netronome.com \
    --cc=jiri@mellanox.com \
    --cc=kwankhede@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michal.lkml@markovi.net \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).