[RFC,v4,0/3] vhost: introduce mdev based hardware backend
mbox series

Message ID 20190917010204.30376-1-tiwei.bie@intel.com
Headers show
Series
  • vhost: introduce mdev based hardware backend
Related show

Message

Tiwei Bie Sept. 17, 2019, 1:02 a.m. UTC
This RFC is to demonstrate below ideas,

a) Build vhost-mdev on top of the same abstraction defined in
   the virtio-mdev series [1];

b) Introduce /dev/vhost-mdev to do vhost ioctls and support
   setting mdev device as backend;

Now the userspace API looks like this:

- Userspace generates a compatible mdev device;

- Userspace opens this mdev device with VFIO API (including
  doing IOMMU programming for this mdev device with VFIO's
  container/group based interface);

- Userspace opens /dev/vhost-mdev and gets vhost fd;

- Userspace uses vhost ioctls to setup vhost (userspace should
  do VHOST_MDEV_SET_BACKEND ioctl with VFIO group fd and device
  fd first before doing other vhost ioctls);

Only compile test has been done for this series for now.

RFCv3: https://patchwork.kernel.org/patch/11117785/

[1] https://lkml.org/lkml/2019/9/10/135

Tiwei Bie (3):
  vfio: support getting vfio device from device fd
  vfio: support checking vfio driver by device ops
  vhost: introduce mdev based hardware backend

 drivers/vfio/mdev/vfio_mdev.c    |   3 +-
 drivers/vfio/vfio.c              |  32 +++
 drivers/vhost/Kconfig            |   9 +
 drivers/vhost/Makefile           |   3 +
 drivers/vhost/mdev.c             | 462 +++++++++++++++++++++++++++++++
 drivers/vhost/vhost.c            |  39 ++-
 drivers/vhost/vhost.h            |   6 +
 include/linux/vfio.h             |  11 +
 include/uapi/linux/vhost.h       |  10 +
 include/uapi/linux/vhost_types.h |   5 +
 10 files changed, 573 insertions(+), 7 deletions(-)
 create mode 100644 drivers/vhost/mdev.c

Comments

Jason Wang Sept. 17, 2019, 1:29 a.m. UTC | #1
On 2019/9/17 上午9:02, Tiwei Bie wrote:
> This RFC is to demonstrate below ideas,
>
> a) Build vhost-mdev on top of the same abstraction defined in
>     the virtio-mdev series [1];
>
> b) Introduce /dev/vhost-mdev to do vhost ioctls and support
>     setting mdev device as backend;
>
> Now the userspace API looks like this:
>
> - Userspace generates a compatible mdev device;
>
> - Userspace opens this mdev device with VFIO API (including
>    doing IOMMU programming for this mdev device with VFIO's
>    container/group based interface);
>
> - Userspace opens /dev/vhost-mdev and gets vhost fd;
>
> - Userspace uses vhost ioctls to setup vhost (userspace should
>    do VHOST_MDEV_SET_BACKEND ioctl with VFIO group fd and device
>    fd first before doing other vhost ioctls);
>
> Only compile test has been done for this series for now.
>
> RFCv3: https://patchwork.kernel.org/patch/11117785/
>
> [1] https://lkml.org/lkml/2019/9/10/135


Thanks a lot for the patches.

Per Michael request, the API in [1] might need some tweak, I want to 
introduce some device specific parent_ops instead of vfio specific one. 
This RFC has been posted at https://lkml.org/lkml/2019/9/12/151.


>
> Tiwei Bie (3):
>    vfio: support getting vfio device from device fd
>    vfio: support checking vfio driver by device ops
>    vhost: introduce mdev based hardware backend
>
>   drivers/vfio/mdev/vfio_mdev.c    |   3 +-
>   drivers/vfio/vfio.c              |  32 +++
>   drivers/vhost/Kconfig            |   9 +
>   drivers/vhost/Makefile           |   3 +
>   drivers/vhost/mdev.c             | 462 +++++++++++++++++++++++++++++++
>   drivers/vhost/vhost.c            |  39 ++-
>   drivers/vhost/vhost.h            |   6 +
>   include/linux/vfio.h             |  11 +
>   include/uapi/linux/vhost.h       |  10 +
>   include/uapi/linux/vhost_types.h |   5 +
>   10 files changed, 573 insertions(+), 7 deletions(-)
>   create mode 100644 drivers/vhost/mdev.c
>
Jason Wang Sept. 17, 2019, 3:32 a.m. UTC | #2
On 2019/9/17 上午9:02, Tiwei Bie wrote:
> This RFC is to demonstrate below ideas,
>
> a) Build vhost-mdev on top of the same abstraction defined in
>     the virtio-mdev series [1];
>
> b) Introduce /dev/vhost-mdev to do vhost ioctls and support
>     setting mdev device as backend;
>
> Now the userspace API looks like this:
>
> - Userspace generates a compatible mdev device;
>
> - Userspace opens this mdev device with VFIO API (including
>    doing IOMMU programming for this mdev device with VFIO's
>    container/group based interface);
>
> - Userspace opens /dev/vhost-mdev and gets vhost fd;
>
> - Userspace uses vhost ioctls to setup vhost (userspace should
>    do VHOST_MDEV_SET_BACKEND ioctl with VFIO group fd and device
>    fd first before doing other vhost ioctls);
>
> Only compile test has been done for this series for now.


Have a hard thought on the architecture:

1) Create a vhost char device and pass vfio mdev device fd to it as a 
backend and translate vhost-mdev ioctl to virtio mdev transport (e.g 
read/write). DMA was done through the VFIO DMA mapping on the container 
that is attached.

We have two more choices:

2) Use vfio-mdev but do not create vhost-mdev device, instead, just 
implement vhost ioctl on vfio_device_ops, and translate them into 
virtio-mdev transport or just pass ioctl to parent.

3) Don't use vfio-mdev, create a new vhost-mdev driver, during probe 
still try to add dev to vfio group and talk to parent with device 
specific ops

So I have some questions:

1) Compared to method 2, what's the advantage of creating a new vhost 
char device? I guess it's for keep the API compatibility?

2) For method 2, is there any easy way for user/admin to distinguish e.g 
ordinary vfio-mdev for vhost from ordinary vfio-mdev?  I saw you 
introduce ops matching helper but it's not friendly to management.

3) A drawback of 1) and 2) is that it must follow vfio_device_ops that 
assumes the parameter comes from userspace, it prevents support kernel 
virtio drivers.

4) So comes the idea of method 3, since it register a new vhost-mdev 
driver, we can use device specific ops instead of VFIO ones, then we can 
have a common API between vDPA parent and vhost-mdev/virtio-mdev drivers.

What's your thoughts?

Thanks


>
> RFCv3: https://patchwork.kernel.org/patch/11117785/
>
> [1] https://lkml.org/lkml/2019/9/10/135
>
> Tiwei Bie (3):
>    vfio: support getting vfio device from device fd
>    vfio: support checking vfio driver by device ops
>    vhost: introduce mdev based hardware backend
>
>   drivers/vfio/mdev/vfio_mdev.c    |   3 +-
>   drivers/vfio/vfio.c              |  32 +++
>   drivers/vhost/Kconfig            |   9 +
>   drivers/vhost/Makefile           |   3 +
>   drivers/vhost/mdev.c             | 462 +++++++++++++++++++++++++++++++
>   drivers/vhost/vhost.c            |  39 ++-
>   drivers/vhost/vhost.h            |   6 +
>   include/linux/vfio.h             |  11 +
>   include/uapi/linux/vhost.h       |  10 +
>   include/uapi/linux/vhost_types.h |   5 +
>   10 files changed, 573 insertions(+), 7 deletions(-)
>   create mode 100644 drivers/vhost/mdev.c
>
Tiwei Bie Sept. 17, 2019, 10:58 a.m. UTC | #3
On Tue, Sep 17, 2019 at 11:32:03AM +0800, Jason Wang wrote:
> On 2019/9/17 上午9:02, Tiwei Bie wrote:
> > This RFC is to demonstrate below ideas,
> > 
> > a) Build vhost-mdev on top of the same abstraction defined in
> >     the virtio-mdev series [1];
> > 
> > b) Introduce /dev/vhost-mdev to do vhost ioctls and support
> >     setting mdev device as backend;
> > 
> > Now the userspace API looks like this:
> > 
> > - Userspace generates a compatible mdev device;
> > 
> > - Userspace opens this mdev device with VFIO API (including
> >    doing IOMMU programming for this mdev device with VFIO's
> >    container/group based interface);
> > 
> > - Userspace opens /dev/vhost-mdev and gets vhost fd;
> > 
> > - Userspace uses vhost ioctls to setup vhost (userspace should
> >    do VHOST_MDEV_SET_BACKEND ioctl with VFIO group fd and device
> >    fd first before doing other vhost ioctls);
> > 
> > Only compile test has been done for this series for now.
> 
> 
> Have a hard thought on the architecture:

Thanks a lot! Do appreciate it!

> 
> 1) Create a vhost char device and pass vfio mdev device fd to it as a
> backend and translate vhost-mdev ioctl to virtio mdev transport (e.g
> read/write). DMA was done through the VFIO DMA mapping on the container that
> is attached.

Yeah, that's what we are doing in this series.

> 
> We have two more choices:
> 
> 2) Use vfio-mdev but do not create vhost-mdev device, instead, just
> implement vhost ioctl on vfio_device_ops, and translate them into
> virtio-mdev transport or just pass ioctl to parent.

Yeah. Instead of introducing /dev/vhost-mdev char device, do
vhost ioctls on VFIO device fd directly. That's what we did
in RFC v3.

> 
> 3) Don't use vfio-mdev, create a new vhost-mdev driver, during probe still
> try to add dev to vfio group and talk to parent with device specific ops

If my understanding is correct, this means we need to introduce
a new VFIO device driver to replace the existing vfio-mdev driver
in our case. Below is a quick draft just to show my understanding:

#include <linux/init.h>
#include <linux/module.h>
#include <linux/device.h>
#include <linux/kernel.h>
#include <linux/slab.h>
#include <linux/vfio.h>
#include <linux/mdev.h>

#include "mdev_private.h"

/* XXX: we need a proper way to include below vhost header. */
#include "../../vhost/vhost.h"

static int vfio_vhost_mdev_open(void *device_data)
{
	if (!try_module_get(THIS_MODULE))
		return -ENODEV;

	/* ... */
	vhost_dev_init(...);

	return 0;
}

static void vfio_vhost_mdev_release(void *device_data)
{
	/* ... */
	module_put(THIS_MODULE);
}

static long vfio_vhost_mdev_unlocked_ioctl(void *device_data,
					   unsigned int cmd, unsigned long arg)
{
	struct mdev_device *mdev = device_data;
	struct mdev_parent *parent = mdev->parent;

	/*
	 * Use vhost ioctls.
	 *
	 * We will have a different parent_ops design.
	 * And potentially, we can share the same parent_ops
	 * with virtio_mdev.
	 */
	switch (cmd) {
	case VHOST_GET_FEATURES:
		parent->ops->get_features(mdev, ...);
		break;
	/* ... */
	}

	return 0;
}

static ssize_t vfio_vhost_mdev_read(void *device_data, char __user *buf,
				    size_t count, loff_t *ppos)
{
	/* ... */
	return 0;
}

static ssize_t vfio_vhost_mdev_write(void *device_data, const char __user *buf,
				     size_t count, loff_t *ppos)
{
	/* ... */
	return 0;
}

static int vfio_vhost_mdev_mmap(void *device_data, struct vm_area_struct *vma)
{
	/* ... */
	return 0;
}

static const struct vfio_device_ops vfio_vhost_mdev_dev_ops = {
	.name		= "vfio-vhost-mdev",
	.open		= vfio_vhost_mdev_open,
	.release	= vfio_vhost_mdev_release,
	.ioctl		= vfio_vhost_mdev_unlocked_ioctl,
	.read		= vfio_vhost_mdev_read,
	.write		= vfio_vhost_mdev_write,
	.mmap		= vfio_vhost_mdev_mmap,
};

static int vfio_vhost_mdev_probe(struct device *dev)
{
	struct mdev_device *mdev = to_mdev_device(dev);

	/* ... */
	return vfio_add_group_dev(dev, &vfio_vhost_mdev_dev_ops, mdev);
}

static void vfio_vhost_mdev_remove(struct device *dev)
{
	/* ... */
	vfio_del_group_dev(dev);
}

static struct mdev_driver vfio_vhost_mdev_driver = {
	.name	= "vfio_vhost_mdev",
	.probe	= vfio_vhost_mdev_probe,
	.remove	= vfio_vhost_mdev_remove,
};

static int __init vfio_vhost_mdev_init(void)
{
	return mdev_register_driver(&vfio_vhost_mdev_driver, THIS_MODULE);
}
module_init(vfio_vhost_mdev_init)

static void __exit vfio_vhost_mdev_exit(void)
{
	mdev_unregister_driver(&vfio_vhost_mdev_driver);
}
module_exit(vfio_vhost_mdev_exit)

> 
> So I have some questions:
> 
> 1) Compared to method 2, what's the advantage of creating a new vhost char
> device? I guess it's for keep the API compatibility?

One benefit is that we can avoid doing vhost ioctls on
VFIO device fd.

> 
> 2) For method 2, is there any easy way for user/admin to distinguish e.g
> ordinary vfio-mdev for vhost from ordinary vfio-mdev?

I think device-api could be a choice.

> I saw you introduce
> ops matching helper but it's not friendly to management.

The ops matching helper is just to check whether a given
vfio-device is based on a mdev device.

> 
> 3) A drawback of 1) and 2) is that it must follow vfio_device_ops that
> assumes the parameter comes from userspace, it prevents support kernel
> virtio drivers.
> 
> 4) So comes the idea of method 3, since it register a new vhost-mdev driver,
> we can use device specific ops instead of VFIO ones, then we can have a
> common API between vDPA parent and vhost-mdev/virtio-mdev drivers.

As the above draft shows, this requires introducing a new
VFIO device driver. I think Alex's opinion matters here.

Thanks,
Tiwei

> 
> What's your thoughts?
> 
> Thanks
> 
> 
> > 
> > RFCv3: https://patchwork.kernel.org/patch/11117785/
> > 
> > [1] https://lkml.org/lkml/2019/9/10/135
> > 
> > Tiwei Bie (3):
> >    vfio: support getting vfio device from device fd
> >    vfio: support checking vfio driver by device ops
> >    vhost: introduce mdev based hardware backend
> > 
> >   drivers/vfio/mdev/vfio_mdev.c    |   3 +-
> >   drivers/vfio/vfio.c              |  32 +++
> >   drivers/vhost/Kconfig            |   9 +
> >   drivers/vhost/Makefile           |   3 +
> >   drivers/vhost/mdev.c             | 462 +++++++++++++++++++++++++++++++
> >   drivers/vhost/vhost.c            |  39 ++-
> >   drivers/vhost/vhost.h            |   6 +
> >   include/linux/vfio.h             |  11 +
> >   include/uapi/linux/vhost.h       |  10 +
> >   include/uapi/linux/vhost_types.h |   5 +
> >   10 files changed, 573 insertions(+), 7 deletions(-)
> >   create mode 100644 drivers/vhost/mdev.c
> >
Jason Wang Sept. 18, 2019, 5:51 a.m. UTC | #4
On 2019/9/17 下午6:58, Tiwei Bie wrote:
> On Tue, Sep 17, 2019 at 11:32:03AM +0800, Jason Wang wrote:
>> On 2019/9/17 上午9:02, Tiwei Bie wrote:
>>> This RFC is to demonstrate below ideas,
>>>
>>> a) Build vhost-mdev on top of the same abstraction defined in
>>>      the virtio-mdev series [1];
>>>
>>> b) Introduce /dev/vhost-mdev to do vhost ioctls and support
>>>      setting mdev device as backend;
>>>
>>> Now the userspace API looks like this:
>>>
>>> - Userspace generates a compatible mdev device;
>>>
>>> - Userspace opens this mdev device with VFIO API (including
>>>     doing IOMMU programming for this mdev device with VFIO's
>>>     container/group based interface);
>>>
>>> - Userspace opens /dev/vhost-mdev and gets vhost fd;
>>>
>>> - Userspace uses vhost ioctls to setup vhost (userspace should
>>>     do VHOST_MDEV_SET_BACKEND ioctl with VFIO group fd and device
>>>     fd first before doing other vhost ioctls);
>>>
>>> Only compile test has been done for this series for now.
>>
>> Have a hard thought on the architecture:
> Thanks a lot! Do appreciate it!
>
>> 1) Create a vhost char device and pass vfio mdev device fd to it as a
>> backend and translate vhost-mdev ioctl to virtio mdev transport (e.g
>> read/write). DMA was done through the VFIO DMA mapping on the container that
>> is attached.
> Yeah, that's what we are doing in this series.
>
>> We have two more choices:
>>
>> 2) Use vfio-mdev but do not create vhost-mdev device, instead, just
>> implement vhost ioctl on vfio_device_ops, and translate them into
>> virtio-mdev transport or just pass ioctl to parent.
> Yeah. Instead of introducing /dev/vhost-mdev char device, do
> vhost ioctls on VFIO device fd directly. That's what we did
> in RFC v3.
>
>> 3) Don't use vfio-mdev, create a new vhost-mdev driver, during probe still
>> try to add dev to vfio group and talk to parent with device specific ops
> If my understanding is correct, this means we need to introduce
> a new VFIO device driver to replace the existing vfio-mdev driver
> in our case. Below is a quick draft just to show my understanding:
>
> #include <linux/init.h>
> #include <linux/module.h>
> #include <linux/device.h>
> #include <linux/kernel.h>
> #include <linux/slab.h>
> #include <linux/vfio.h>
> #include <linux/mdev.h>
>
> #include "mdev_private.h"
>
> /* XXX: we need a proper way to include below vhost header. */
> #include "../../vhost/vhost.h"
>
> static int vfio_vhost_mdev_open(void *device_data)
> {
> 	if (!try_module_get(THIS_MODULE))
> 		return -ENODEV;
>
> 	/* ... */
> 	vhost_dev_init(...);
>
> 	return 0;
> }
>
> static void vfio_vhost_mdev_release(void *device_data)
> {
> 	/* ... */
> 	module_put(THIS_MODULE);
> }
>
> static long vfio_vhost_mdev_unlocked_ioctl(void *device_data,
> 					   unsigned int cmd, unsigned long arg)
> {
> 	struct mdev_device *mdev = device_data;
> 	struct mdev_parent *parent = mdev->parent;
>
> 	/*
> 	 * Use vhost ioctls.
> 	 *
> 	 * We will have a different parent_ops design.
> 	 * And potentially, we can share the same parent_ops
> 	 * with virtio_mdev.
> 	 */
> 	switch (cmd) {
> 	case VHOST_GET_FEATURES:
> 		parent->ops->get_features(mdev, ...);
> 		break;
> 	/* ... */
> 	}
>
> 	return 0;
> }
>
> static ssize_t vfio_vhost_mdev_read(void *device_data, char __user *buf,
> 				    size_t count, loff_t *ppos)
> {
> 	/* ... */
> 	return 0;
> }
>
> static ssize_t vfio_vhost_mdev_write(void *device_data, const char __user *buf,
> 				     size_t count, loff_t *ppos)
> {
> 	/* ... */
> 	return 0;
> }
>
> static int vfio_vhost_mdev_mmap(void *device_data, struct vm_area_struct *vma)
> {
> 	/* ... */
> 	return 0;
> }
>
> static const struct vfio_device_ops vfio_vhost_mdev_dev_ops = {
> 	.name		= "vfio-vhost-mdev",
> 	.open		= vfio_vhost_mdev_open,
> 	.release	= vfio_vhost_mdev_release,
> 	.ioctl		= vfio_vhost_mdev_unlocked_ioctl,
> 	.read		= vfio_vhost_mdev_read,
> 	.write		= vfio_vhost_mdev_write,
> 	.mmap		= vfio_vhost_mdev_mmap,
> };
>
> static int vfio_vhost_mdev_probe(struct device *dev)
> {
> 	struct mdev_device *mdev = to_mdev_device(dev);
>
> 	/* ... */
> 	return vfio_add_group_dev(dev, &vfio_vhost_mdev_dev_ops, mdev);
> }
>
> static void vfio_vhost_mdev_remove(struct device *dev)
> {
> 	/* ... */
> 	vfio_del_group_dev(dev);
> }
>
> static struct mdev_driver vfio_vhost_mdev_driver = {
> 	.name	= "vfio_vhost_mdev",
> 	.probe	= vfio_vhost_mdev_probe,
> 	.remove	= vfio_vhost_mdev_remove,
> };
>
> static int __init vfio_vhost_mdev_init(void)
> {
> 	return mdev_register_driver(&vfio_vhost_mdev_driver, THIS_MODULE);
> }
> module_init(vfio_vhost_mdev_init)
>
> static void __exit vfio_vhost_mdev_exit(void)
> {
> 	mdev_unregister_driver(&vfio_vhost_mdev_driver);
> }
> module_exit(vfio_vhost_mdev_exit)


Yes, something like this basically.


>> So I have some questions:
>>
>> 1) Compared to method 2, what's the advantage of creating a new vhost char
>> device? I guess it's for keep the API compatibility?
> One benefit is that we can avoid doing vhost ioctls on
> VFIO device fd.


Yes, but any benefit from doing this?


>
>> 2) For method 2, is there any easy way for user/admin to distinguish e.g
>> ordinary vfio-mdev for vhost from ordinary vfio-mdev?
> I think device-api could be a choice.


Ok.


>
>> I saw you introduce
>> ops matching helper but it's not friendly to management.
> The ops matching helper is just to check whether a given
> vfio-device is based on a mdev device.
>
>> 3) A drawback of 1) and 2) is that it must follow vfio_device_ops that
>> assumes the parameter comes from userspace, it prevents support kernel
>> virtio drivers.
>>
>> 4) So comes the idea of method 3, since it register a new vhost-mdev driver,
>> we can use device specific ops instead of VFIO ones, then we can have a
>> common API between vDPA parent and vhost-mdev/virtio-mdev drivers.
> As the above draft shows, this requires introducing a new
> VFIO device driver. I think Alex's opinion matters here.


Yes, it is.

Thanks


> Thanks,
> Tiwei
>
>> What's your thoughts?
>>
>> Thanks
>>
>>
>>> RFCv3: https://patchwork.kernel.org/patch/11117785/
>>>
>>> [1] https://lkml.org/lkml/2019/9/10/135
>>>
>>> Tiwei Bie (3):
>>>     vfio: support getting vfio device from device fd
>>>     vfio: support checking vfio driver by device ops
>>>     vhost: introduce mdev based hardware backend
>>>
>>>    drivers/vfio/mdev/vfio_mdev.c    |   3 +-
>>>    drivers/vfio/vfio.c              |  32 +++
>>>    drivers/vhost/Kconfig            |   9 +
>>>    drivers/vhost/Makefile           |   3 +
>>>    drivers/vhost/mdev.c             | 462 +++++++++++++++++++++++++++++++
>>>    drivers/vhost/vhost.c            |  39 ++-
>>>    drivers/vhost/vhost.h            |   6 +
>>>    include/linux/vfio.h             |  11 +
>>>    include/uapi/linux/vhost.h       |  10 +
>>>    include/uapi/linux/vhost_types.h |   5 +
>>>    10 files changed, 573 insertions(+), 7 deletions(-)
>>>    create mode 100644 drivers/vhost/mdev.c
>>>
Michael S. Tsirkin Sept. 18, 2019, 2:32 p.m. UTC | #5
On Wed, Sep 18, 2019 at 01:51:21PM +0800, Jason Wang wrote:
> 
> On 2019/9/17 下午6:58, Tiwei Bie wrote:
> > On Tue, Sep 17, 2019 at 11:32:03AM +0800, Jason Wang wrote:
> > > On 2019/9/17 上午9:02, Tiwei Bie wrote:
> > > > This RFC is to demonstrate below ideas,
> > > > 
> > > > a) Build vhost-mdev on top of the same abstraction defined in
> > > >      the virtio-mdev series [1];
> > > > 
> > > > b) Introduce /dev/vhost-mdev to do vhost ioctls and support
> > > >      setting mdev device as backend;
> > > > 
> > > > Now the userspace API looks like this:
> > > > 
> > > > - Userspace generates a compatible mdev device;
> > > > 
> > > > - Userspace opens this mdev device with VFIO API (including
> > > >     doing IOMMU programming for this mdev device with VFIO's
> > > >     container/group based interface);
> > > > 
> > > > - Userspace opens /dev/vhost-mdev and gets vhost fd;
> > > > 
> > > > - Userspace uses vhost ioctls to setup vhost (userspace should
> > > >     do VHOST_MDEV_SET_BACKEND ioctl with VFIO group fd and device
> > > >     fd first before doing other vhost ioctls);
> > > > 
> > > > Only compile test has been done for this series for now.
> > > 
> > > Have a hard thought on the architecture:
> > Thanks a lot! Do appreciate it!
> > 
> > > 1) Create a vhost char device and pass vfio mdev device fd to it as a
> > > backend and translate vhost-mdev ioctl to virtio mdev transport (e.g
> > > read/write). DMA was done through the VFIO DMA mapping on the container that
> > > is attached.
> > Yeah, that's what we are doing in this series.
> > 
> > > We have two more choices:
> > > 
> > > 2) Use vfio-mdev but do not create vhost-mdev device, instead, just
> > > implement vhost ioctl on vfio_device_ops, and translate them into
> > > virtio-mdev transport or just pass ioctl to parent.
> > Yeah. Instead of introducing /dev/vhost-mdev char device, do
> > vhost ioctls on VFIO device fd directly. That's what we did
> > in RFC v3.
> > 
> > > 3) Don't use vfio-mdev, create a new vhost-mdev driver, during probe still
> > > try to add dev to vfio group and talk to parent with device specific ops
> > If my understanding is correct, this means we need to introduce
> > a new VFIO device driver to replace the existing vfio-mdev driver
> > in our case. Below is a quick draft just to show my understanding:
> > 
> > #include <linux/init.h>
> > #include <linux/module.h>
> > #include <linux/device.h>
> > #include <linux/kernel.h>
> > #include <linux/slab.h>
> > #include <linux/vfio.h>
> > #include <linux/mdev.h>
> > 
> > #include "mdev_private.h"
> > 
> > /* XXX: we need a proper way to include below vhost header. */
> > #include "../../vhost/vhost.h"
> > 
> > static int vfio_vhost_mdev_open(void *device_data)
> > {
> > 	if (!try_module_get(THIS_MODULE))
> > 		return -ENODEV;
> > 
> > 	/* ... */
> > 	vhost_dev_init(...);
> > 
> > 	return 0;
> > }
> > 
> > static void vfio_vhost_mdev_release(void *device_data)
> > {
> > 	/* ... */
> > 	module_put(THIS_MODULE);
> > }
> > 
> > static long vfio_vhost_mdev_unlocked_ioctl(void *device_data,
> > 					   unsigned int cmd, unsigned long arg)
> > {
> > 	struct mdev_device *mdev = device_data;
> > 	struct mdev_parent *parent = mdev->parent;
> > 
> > 	/*
> > 	 * Use vhost ioctls.
> > 	 *
> > 	 * We will have a different parent_ops design.
> > 	 * And potentially, we can share the same parent_ops
> > 	 * with virtio_mdev.
> > 	 */
> > 	switch (cmd) {
> > 	case VHOST_GET_FEATURES:
> > 		parent->ops->get_features(mdev, ...);
> > 		break;
> > 	/* ... */
> > 	}
> > 
> > 	return 0;
> > }
> > 
> > static ssize_t vfio_vhost_mdev_read(void *device_data, char __user *buf,
> > 				    size_t count, loff_t *ppos)
> > {
> > 	/* ... */
> > 	return 0;
> > }
> > 
> > static ssize_t vfio_vhost_mdev_write(void *device_data, const char __user *buf,
> > 				     size_t count, loff_t *ppos)
> > {
> > 	/* ... */
> > 	return 0;
> > }
> > 
> > static int vfio_vhost_mdev_mmap(void *device_data, struct vm_area_struct *vma)
> > {
> > 	/* ... */
> > 	return 0;
> > }
> > 
> > static const struct vfio_device_ops vfio_vhost_mdev_dev_ops = {
> > 	.name		= "vfio-vhost-mdev",
> > 	.open		= vfio_vhost_mdev_open,
> > 	.release	= vfio_vhost_mdev_release,
> > 	.ioctl		= vfio_vhost_mdev_unlocked_ioctl,
> > 	.read		= vfio_vhost_mdev_read,
> > 	.write		= vfio_vhost_mdev_write,
> > 	.mmap		= vfio_vhost_mdev_mmap,
> > };
> > 
> > static int vfio_vhost_mdev_probe(struct device *dev)
> > {
> > 	struct mdev_device *mdev = to_mdev_device(dev);
> > 
> > 	/* ... */
> > 	return vfio_add_group_dev(dev, &vfio_vhost_mdev_dev_ops, mdev);
> > }
> > 
> > static void vfio_vhost_mdev_remove(struct device *dev)
> > {
> > 	/* ... */
> > 	vfio_del_group_dev(dev);
> > }
> > 
> > static struct mdev_driver vfio_vhost_mdev_driver = {
> > 	.name	= "vfio_vhost_mdev",
> > 	.probe	= vfio_vhost_mdev_probe,
> > 	.remove	= vfio_vhost_mdev_remove,
> > };
> > 
> > static int __init vfio_vhost_mdev_init(void)
> > {
> > 	return mdev_register_driver(&vfio_vhost_mdev_driver, THIS_MODULE);
> > }
> > module_init(vfio_vhost_mdev_init)
> > 
> > static void __exit vfio_vhost_mdev_exit(void)
> > {
> > 	mdev_unregister_driver(&vfio_vhost_mdev_driver);
> > }
> > module_exit(vfio_vhost_mdev_exit)
> 
> 
> Yes, something like this basically.
> 
> 
> > > So I have some questions:
> > > 
> > > 1) Compared to method 2, what's the advantage of creating a new vhost char
> > > device? I guess it's for keep the API compatibility?
> > One benefit is that we can avoid doing vhost ioctls on
> > VFIO device fd.
> 
> 
> Yes, but any benefit from doing this?

It does seem a bit more modular, but it's certainly not a big deal.

> > 
> > > 2) For method 2, is there any easy way for user/admin to distinguish e.g
> > > ordinary vfio-mdev for vhost from ordinary vfio-mdev?
> > I think device-api could be a choice.
> 
> 
> Ok.
> 
> 
> > 
> > > I saw you introduce
> > > ops matching helper but it's not friendly to management.
> > The ops matching helper is just to check whether a given
> > vfio-device is based on a mdev device.
> > 
> > > 3) A drawback of 1) and 2) is that it must follow vfio_device_ops that
> > > assumes the parameter comes from userspace, it prevents support kernel
> > > virtio drivers.
> > > 
> > > 4) So comes the idea of method 3, since it register a new vhost-mdev driver,
> > > we can use device specific ops instead of VFIO ones, then we can have a
> > > common API between vDPA parent and vhost-mdev/virtio-mdev drivers.
> > As the above draft shows, this requires introducing a new
> > VFIO device driver. I think Alex's opinion matters here.
> 
> 
> Yes, it is.
> 
> Thanks
> 
> 
> > Thanks,
> > Tiwei
> > 
> > > What's your thoughts?
> > > 
> > > Thanks
> > > 
> > > 
> > > > RFCv3: https://patchwork.kernel.org/patch/11117785/
> > > > 
> > > > [1] https://lkml.org/lkml/2019/9/10/135
> > > > 
> > > > Tiwei Bie (3):
> > > >     vfio: support getting vfio device from device fd
> > > >     vfio: support checking vfio driver by device ops
> > > >     vhost: introduce mdev based hardware backend
> > > > 
> > > >    drivers/vfio/mdev/vfio_mdev.c    |   3 +-
> > > >    drivers/vfio/vfio.c              |  32 +++
> > > >    drivers/vhost/Kconfig            |   9 +
> > > >    drivers/vhost/Makefile           |   3 +
> > > >    drivers/vhost/mdev.c             | 462 +++++++++++++++++++++++++++++++
> > > >    drivers/vhost/vhost.c            |  39 ++-
> > > >    drivers/vhost/vhost.h            |   6 +
> > > >    include/linux/vfio.h             |  11 +
> > > >    include/uapi/linux/vhost.h       |  10 +
> > > >    include/uapi/linux/vhost_types.h |   5 +
> > > >    10 files changed, 573 insertions(+), 7 deletions(-)
> > > >    create mode 100644 drivers/vhost/mdev.c
> > > >
Jason Wang Sept. 19, 2019, 1:08 p.m. UTC | #6
On 2019/9/18 下午10:32, Michael S. Tsirkin wrote:
>>>> So I have some questions:
>>>>
>>>> 1) Compared to method 2, what's the advantage of creating a new vhost char
>>>> device? I guess it's for keep the API compatibility?
>>> One benefit is that we can avoid doing vhost ioctls on
>>> VFIO device fd.
>> Yes, but any benefit from doing this?
> It does seem a bit more modular, but it's certainly not a big deal.


Ok, if we go this way, it could be as simple as provide some callback to 
vhost, then vhost can just forward the ioctl through parent_ops.


>
>>>> 2) For method 2, is there any easy way for user/admin to distinguish e.g
>>>> ordinary vfio-mdev for vhost from ordinary vfio-mdev?
>>> I think device-api could be a choice.
>> Ok.
>>
>>
>>>> I saw you introduce
>>>> ops matching helper but it's not friendly to management.
>>> The ops matching helper is just to check whether a given
>>> vfio-device is based on a mdev device.
>>>
>>>> 3) A drawback of 1) and 2) is that it must follow vfio_device_ops that
>>>> assumes the parameter comes from userspace, it prevents support kernel
>>>> virtio drivers.
>>>>
>>>> 4) So comes the idea of method 3, since it register a new vhost-mdev driver,
>>>> we can use device specific ops instead of VFIO ones, then we can have a
>>>> common API between vDPA parent and vhost-mdev/virtio-mdev drivers.
>>> As the above draft shows, this requires introducing a new
>>> VFIO device driver. I think Alex's opinion matters here.


Just to clarify, a new type of mdev driver but provides dummy 
vfio_device_ops for VFIO to make container DMA ioctl work.

Thanks


>> Yes, it is.
>>
>> Thanks
>>
>>
Tiwei Bie Sept. 19, 2019, 3:45 p.m. UTC | #7
On Thu, Sep 19, 2019 at 09:08:11PM +0800, Jason Wang wrote:
> On 2019/9/18 下午10:32, Michael S. Tsirkin wrote:
> > > > > So I have some questions:
> > > > > 
> > > > > 1) Compared to method 2, what's the advantage of creating a new vhost char
> > > > > device? I guess it's for keep the API compatibility?
> > > > One benefit is that we can avoid doing vhost ioctls on
> > > > VFIO device fd.
> > > Yes, but any benefit from doing this?
> > It does seem a bit more modular, but it's certainly not a big deal.
> 
> Ok, if we go this way, it could be as simple as provide some callback to
> vhost, then vhost can just forward the ioctl through parent_ops.
> 
> > 
> > > > > 2) For method 2, is there any easy way for user/admin to distinguish e.g
> > > > > ordinary vfio-mdev for vhost from ordinary vfio-mdev?
> > > > I think device-api could be a choice.
> > > Ok.
> > > 
> > > 
> > > > > I saw you introduce
> > > > > ops matching helper but it's not friendly to management.
> > > > The ops matching helper is just to check whether a given
> > > > vfio-device is based on a mdev device.
> > > > 
> > > > > 3) A drawback of 1) and 2) is that it must follow vfio_device_ops that
> > > > > assumes the parameter comes from userspace, it prevents support kernel
> > > > > virtio drivers.
> > > > > 
> > > > > 4) So comes the idea of method 3, since it register a new vhost-mdev driver,
> > > > > we can use device specific ops instead of VFIO ones, then we can have a
> > > > > common API between vDPA parent and vhost-mdev/virtio-mdev drivers.
> > > > As the above draft shows, this requires introducing a new
> > > > VFIO device driver. I think Alex's opinion matters here.
> 
> Just to clarify, a new type of mdev driver but provides dummy
> vfio_device_ops for VFIO to make container DMA ioctl work.

I see. Thanks! IIUC, you mean we can provide a very tiny
VFIO device driver in drivers/vhost/mdev.c, e.g.:

static int vfio_vhost_mdev_open(void *device_data)
{
	if (!try_module_get(THIS_MODULE))
		return -ENODEV;
	return 0;
}

static void vfio_vhost_mdev_release(void *device_data)
{
	module_put(THIS_MODULE);
}

static const struct vfio_device_ops vfio_vhost_mdev_dev_ops = {
	.name		= "vfio-vhost-mdev",
	.open		= vfio_vhost_mdev_open,
	.release	= vfio_vhost_mdev_release,
};

static int vhost_mdev_probe(struct device *dev)
{
	struct mdev_device *mdev = to_mdev_device(dev);

	... Check the mdev device_id proposed in ...
	... https://lkml.org/lkml/2019/9/12/151 ...

	return vfio_add_group_dev(dev, &vfio_vhost_mdev_dev_ops, mdev);
}

static void vhost_mdev_remove(struct device *dev)
{
	vfio_del_group_dev(dev);
}

static struct mdev_driver vhost_mdev_driver = {
	.name	= "vhost_mdev",
	.probe	= vhost_mdev_probe,
	.remove	= vhost_mdev_remove,
};

So we can bind above mdev driver to the virtio-mdev compatible
mdev devices when we want to use vhost-mdev.

After binding above driver to the mdev device, we can setup IOMMU
via VFIO and get VFIO device fd of this mdev device, and pass it
to vhost fd (/dev/vhost-mdev) with a SET_BACKEND ioctl.

Thanks,
Tiwei

> 
> Thanks
> 
> 
> > > Yes, it is.
> > > 
> > > Thanks
> > > 
> > >
Jason Wang Sept. 20, 2019, 12:59 a.m. UTC | #8
On 2019/9/19 下午11:45, Tiwei Bie wrote:
> On Thu, Sep 19, 2019 at 09:08:11PM +0800, Jason Wang wrote:
>> On 2019/9/18 下午10:32, Michael S. Tsirkin wrote:
>>>>>> So I have some questions:
>>>>>>
>>>>>> 1) Compared to method 2, what's the advantage of creating a new vhost char
>>>>>> device? I guess it's for keep the API compatibility?
>>>>> One benefit is that we can avoid doing vhost ioctls on
>>>>> VFIO device fd.
>>>> Yes, but any benefit from doing this?
>>> It does seem a bit more modular, but it's certainly not a big deal.
>> Ok, if we go this way, it could be as simple as provide some callback to
>> vhost, then vhost can just forward the ioctl through parent_ops.
>>
>>>>>> 2) For method 2, is there any easy way for user/admin to distinguish e.g
>>>>>> ordinary vfio-mdev for vhost from ordinary vfio-mdev?
>>>>> I think device-api could be a choice.
>>>> Ok.
>>>>
>>>>
>>>>>> I saw you introduce
>>>>>> ops matching helper but it's not friendly to management.
>>>>> The ops matching helper is just to check whether a given
>>>>> vfio-device is based on a mdev device.
>>>>>
>>>>>> 3) A drawback of 1) and 2) is that it must follow vfio_device_ops that
>>>>>> assumes the parameter comes from userspace, it prevents support kernel
>>>>>> virtio drivers.
>>>>>>
>>>>>> 4) So comes the idea of method 3, since it register a new vhost-mdev driver,
>>>>>> we can use device specific ops instead of VFIO ones, then we can have a
>>>>>> common API between vDPA parent and vhost-mdev/virtio-mdev drivers.
>>>>> As the above draft shows, this requires introducing a new
>>>>> VFIO device driver. I think Alex's opinion matters here.
>> Just to clarify, a new type of mdev driver but provides dummy
>> vfio_device_ops for VFIO to make container DMA ioctl work.
> I see. Thanks! IIUC, you mean we can provide a very tiny
> VFIO device driver in drivers/vhost/mdev.c, e.g.:
>
> static int vfio_vhost_mdev_open(void *device_data)
> {
> 	if (!try_module_get(THIS_MODULE))
> 		return -ENODEV;
> 	return 0;
> }
>
> static void vfio_vhost_mdev_release(void *device_data)
> {
> 	module_put(THIS_MODULE);
> }
>
> static const struct vfio_device_ops vfio_vhost_mdev_dev_ops = {
> 	.name		= "vfio-vhost-mdev",
> 	.open		= vfio_vhost_mdev_open,
> 	.release	= vfio_vhost_mdev_release,
> };
>
> static int vhost_mdev_probe(struct device *dev)
> {
> 	struct mdev_device *mdev = to_mdev_device(dev);
>
> 	... Check the mdev device_id proposed in ...
> 	... https://lkml.org/lkml/2019/9/12/151 ...
>
> 	return vfio_add_group_dev(dev, &vfio_vhost_mdev_dev_ops, mdev);
> }
>
> static void vhost_mdev_remove(struct device *dev)
> {
> 	vfio_del_group_dev(dev);
> }
>
> static struct mdev_driver vhost_mdev_driver = {
> 	.name	= "vhost_mdev",
> 	.probe	= vhost_mdev_probe,
> 	.remove	= vhost_mdev_remove,
> };
>
> So we can bind above mdev driver to the virtio-mdev compatible
> mdev devices when we want to use vhost-mdev.
>
> After binding above driver to the mdev device, we can setup IOMMU
> via VFIO and get VFIO device fd of this mdev device, and pass it
> to vhost fd (/dev/vhost-mdev) with a SET_BACKEND ioctl.
>
> Thanks,
> Tiwei


Yes, something like this.

Thanks


>> Thanks
>>
>>
>>>> Yes, it is.
>>>>
>>>> Thanks
>>>>
>>>>
Jason Wang Sept. 20, 2019, 1:30 a.m. UTC | #9
On 2019/9/19 下午11:45, Tiwei Bie wrote:
> On Thu, Sep 19, 2019 at 09:08:11PM +0800, Jason Wang wrote:
>> On 2019/9/18 下午10:32, Michael S. Tsirkin wrote:
>>>>>> So I have some questions:
>>>>>>
>>>>>> 1) Compared to method 2, what's the advantage of creating a new vhost char
>>>>>> device? I guess it's for keep the API compatibility?
>>>>> One benefit is that we can avoid doing vhost ioctls on
>>>>> VFIO device fd.
>>>> Yes, but any benefit from doing this?
>>> It does seem a bit more modular, but it's certainly not a big deal.
>> Ok, if we go this way, it could be as simple as provide some callback to
>> vhost, then vhost can just forward the ioctl through parent_ops.
>>
>>>>>> 2) For method 2, is there any easy way for user/admin to distinguish e.g
>>>>>> ordinary vfio-mdev for vhost from ordinary vfio-mdev?
>>>>> I think device-api could be a choice.
>>>> Ok.
>>>>
>>>>
>>>>>> I saw you introduce
>>>>>> ops matching helper but it's not friendly to management.
>>>>> The ops matching helper is just to check whether a given
>>>>> vfio-device is based on a mdev device.
>>>>>
>>>>>> 3) A drawback of 1) and 2) is that it must follow vfio_device_ops that
>>>>>> assumes the parameter comes from userspace, it prevents support kernel
>>>>>> virtio drivers.
>>>>>>
>>>>>> 4) So comes the idea of method 3, since it register a new vhost-mdev driver,
>>>>>> we can use device specific ops instead of VFIO ones, then we can have a
>>>>>> common API between vDPA parent and vhost-mdev/virtio-mdev drivers.
>>>>> As the above draft shows, this requires introducing a new
>>>>> VFIO device driver. I think Alex's opinion matters here.
>> Just to clarify, a new type of mdev driver but provides dummy
>> vfio_device_ops for VFIO to make container DMA ioctl work.
> I see. Thanks! IIUC, you mean we can provide a very tiny
> VFIO device driver in drivers/vhost/mdev.c, e.g.:
>
> static int vfio_vhost_mdev_open(void *device_data)
> {
> 	if (!try_module_get(THIS_MODULE))
> 		return -ENODEV;
> 	return 0;
> }
>
> static void vfio_vhost_mdev_release(void *device_data)
> {
> 	module_put(THIS_MODULE);
> }
>
> static const struct vfio_device_ops vfio_vhost_mdev_dev_ops = {
> 	.name		= "vfio-vhost-mdev",
> 	.open		= vfio_vhost_mdev_open,
> 	.release	= vfio_vhost_mdev_release,
> };
>
> static int vhost_mdev_probe(struct device *dev)
> {
> 	struct mdev_device *mdev = to_mdev_device(dev);
>
> 	... Check the mdev device_id proposed in ...
> 	... https://lkml.org/lkml/2019/9/12/151 ...


To clarify, this should be done through the id_table fields in 
vhost_mdev_driver, and it should claim it supports virtio-mdev device only:


static struct mdev_class_id id_table[] = {
     { MDEV_ID_VIRTIO },
     { 0 },
};


static struct mdev_driver vhost_mdev_driver = {
     ...
     .id_table = id_table,
}


>
> 	return vfio_add_group_dev(dev, &vfio_vhost_mdev_dev_ops, mdev);


And in vfio_vhost_mdev_ops, all its need is to just implement vhost-net 
ioctl and translate them to virtio-mdev transport (e.g device_ops I 
proposed or ioctls other whatever other method) API. And it could have a 
dummy ops implementation for the other device_ops.


> }
>
> static void vhost_mdev_remove(struct device *dev)
> {
> 	vfio_del_group_dev(dev);
> }
>
> static struct mdev_driver vhost_mdev_driver = {
> 	.name	= "vhost_mdev",
> 	.probe	= vhost_mdev_probe,
> 	.remove	= vhost_mdev_remove,
> };
>
> So we can bind above mdev driver to the virtio-mdev compatible
> mdev devices when we want to use vhost-mdev.
>
> After binding above driver to the mdev device, we can setup IOMMU
> via VFIO and get VFIO device fd of this mdev device, and pass it
> to vhost fd (/dev/vhost-mdev) with a SET_BACKEND ioctl.


Then what vhost-mdev char device did is just forwarding ioctl back to 
this vfio device fd which seems a overkill. It's simpler that just do 
ioctl on the device ops directly.

Thanks


>
> Thanks,
> Tiwei
>
>> Thanks
>>
>>
>>>> Yes, it is.
>>>>
>>>> Thanks
>>>>
>>>>
Tiwei Bie Sept. 20, 2019, 2:16 a.m. UTC | #10
On Fri, Sep 20, 2019 at 09:30:58AM +0800, Jason Wang wrote:
> On 2019/9/19 下午11:45, Tiwei Bie wrote:
> > On Thu, Sep 19, 2019 at 09:08:11PM +0800, Jason Wang wrote:
> > > On 2019/9/18 下午10:32, Michael S. Tsirkin wrote:
> > > > > > > So I have some questions:
> > > > > > > 
> > > > > > > 1) Compared to method 2, what's the advantage of creating a new vhost char
> > > > > > > device? I guess it's for keep the API compatibility?
> > > > > > One benefit is that we can avoid doing vhost ioctls on
> > > > > > VFIO device fd.
> > > > > Yes, but any benefit from doing this?
> > > > It does seem a bit more modular, but it's certainly not a big deal.
> > > Ok, if we go this way, it could be as simple as provide some callback to
> > > vhost, then vhost can just forward the ioctl through parent_ops.
> > > 
> > > > > > > 2) For method 2, is there any easy way for user/admin to distinguish e.g
> > > > > > > ordinary vfio-mdev for vhost from ordinary vfio-mdev?
> > > > > > I think device-api could be a choice.
> > > > > Ok.
> > > > > 
> > > > > 
> > > > > > > I saw you introduce
> > > > > > > ops matching helper but it's not friendly to management.
> > > > > > The ops matching helper is just to check whether a given
> > > > > > vfio-device is based on a mdev device.
> > > > > > 
> > > > > > > 3) A drawback of 1) and 2) is that it must follow vfio_device_ops that
> > > > > > > assumes the parameter comes from userspace, it prevents support kernel
> > > > > > > virtio drivers.
> > > > > > > 
> > > > > > > 4) So comes the idea of method 3, since it register a new vhost-mdev driver,
> > > > > > > we can use device specific ops instead of VFIO ones, then we can have a
> > > > > > > common API between vDPA parent and vhost-mdev/virtio-mdev drivers.
> > > > > > As the above draft shows, this requires introducing a new
> > > > > > VFIO device driver. I think Alex's opinion matters here.
> > > Just to clarify, a new type of mdev driver but provides dummy
> > > vfio_device_ops for VFIO to make container DMA ioctl work.
> > I see. Thanks! IIUC, you mean we can provide a very tiny
> > VFIO device driver in drivers/vhost/mdev.c, e.g.:
> > 
> > static int vfio_vhost_mdev_open(void *device_data)
> > {
> > 	if (!try_module_get(THIS_MODULE))
> > 		return -ENODEV;
> > 	return 0;
> > }
> > 
> > static void vfio_vhost_mdev_release(void *device_data)
> > {
> > 	module_put(THIS_MODULE);
> > }
> > 
> > static const struct vfio_device_ops vfio_vhost_mdev_dev_ops = {
> > 	.name		= "vfio-vhost-mdev",
> > 	.open		= vfio_vhost_mdev_open,
> > 	.release	= vfio_vhost_mdev_release,
> > };
> > 
> > static int vhost_mdev_probe(struct device *dev)
> > {
> > 	struct mdev_device *mdev = to_mdev_device(dev);
> > 
> > 	... Check the mdev device_id proposed in ...
> > 	... https://lkml.org/lkml/2019/9/12/151 ...
> 
> 
> To clarify, this should be done through the id_table fields in
> vhost_mdev_driver, and it should claim it supports virtio-mdev device only:
> 
> 
> static struct mdev_class_id id_table[] = {
>     { MDEV_ID_VIRTIO },
>     { 0 },
> };
> 
> 
> static struct mdev_driver vhost_mdev_driver = {
>     ...
>     .id_table = id_table,
> }

In this way, both of virtio-mdev and vhost-mdev will try to
take this device. We may want a way to let vhost-mdev take this
device only when users explicitly ask it to do it. Or maybe we
can have a different MDEV_ID for vhost-mdev but share the device
ops with virtio-mdev.

> 
> 
> > 
> > 	return vfio_add_group_dev(dev, &vfio_vhost_mdev_dev_ops, mdev);
> 
> 
> And in vfio_vhost_mdev_ops, all its need is to just implement vhost-net
> ioctl and translate them to virtio-mdev transport (e.g device_ops I proposed
> or ioctls other whatever other method) API.

I see, so my previous understanding is basically correct:

https://lkml.org/lkml/2019/9/17/332

I.e. we won't have a separate vhost fd and we will do all vhost
ioctls on the VFIO device fd backed by this new VFIO driver.

> And it could have a dummy ops
> implementation for the other device_ops.
> 
> 
> > }
> > 
> > static void vhost_mdev_remove(struct device *dev)
> > {
> > 	vfio_del_group_dev(dev);
> > }
> > 
> > static struct mdev_driver vhost_mdev_driver = {
> > 	.name	= "vhost_mdev",
> > 	.probe	= vhost_mdev_probe,
> > 	.remove	= vhost_mdev_remove,
> > };
> > 
> > So we can bind above mdev driver to the virtio-mdev compatible
> > mdev devices when we want to use vhost-mdev.
> > 
> > After binding above driver to the mdev device, we can setup IOMMU
> > via VFIO and get VFIO device fd of this mdev device, and pass it
> > to vhost fd (/dev/vhost-mdev) with a SET_BACKEND ioctl.
> 
> 
> Then what vhost-mdev char device did is just forwarding ioctl back to this
> vfio device fd which seems a overkill. It's simpler that just do ioctl on
> the device ops directly.

Yes.

Thanks,
Tiwei


> 
> Thanks
> 
> 
> > 
> > Thanks,
> > Tiwei
> > 
> > > Thanks
> > > 
> > > 
> > > > > Yes, it is.
> > > > > 
> > > > > Thanks
> > > > > 
> > > > >
Jason Wang Sept. 20, 2019, 2:36 a.m. UTC | #11
On 2019/9/20 上午10:16, Tiwei Bie wrote:
> On Fri, Sep 20, 2019 at 09:30:58AM +0800, Jason Wang wrote:
>> On 2019/9/19 下午11:45, Tiwei Bie wrote:
>>> On Thu, Sep 19, 2019 at 09:08:11PM +0800, Jason Wang wrote:
>>>> On 2019/9/18 下午10:32, Michael S. Tsirkin wrote:
>>>>>>>> So I have some questions:
>>>>>>>>
>>>>>>>> 1) Compared to method 2, what's the advantage of creating a new vhost char
>>>>>>>> device? I guess it's for keep the API compatibility?
>>>>>>> One benefit is that we can avoid doing vhost ioctls on
>>>>>>> VFIO device fd.
>>>>>> Yes, but any benefit from doing this?
>>>>> It does seem a bit more modular, but it's certainly not a big deal.
>>>> Ok, if we go this way, it could be as simple as provide some callback to
>>>> vhost, then vhost can just forward the ioctl through parent_ops.
>>>>
>>>>>>>> 2) For method 2, is there any easy way for user/admin to distinguish e.g
>>>>>>>> ordinary vfio-mdev for vhost from ordinary vfio-mdev?
>>>>>>> I think device-api could be a choice.
>>>>>> Ok.
>>>>>>
>>>>>>
>>>>>>>> I saw you introduce
>>>>>>>> ops matching helper but it's not friendly to management.
>>>>>>> The ops matching helper is just to check whether a given
>>>>>>> vfio-device is based on a mdev device.
>>>>>>>
>>>>>>>> 3) A drawback of 1) and 2) is that it must follow vfio_device_ops that
>>>>>>>> assumes the parameter comes from userspace, it prevents support kernel
>>>>>>>> virtio drivers.
>>>>>>>>
>>>>>>>> 4) So comes the idea of method 3, since it register a new vhost-mdev driver,
>>>>>>>> we can use device specific ops instead of VFIO ones, then we can have a
>>>>>>>> common API between vDPA parent and vhost-mdev/virtio-mdev drivers.
>>>>>>> As the above draft shows, this requires introducing a new
>>>>>>> VFIO device driver. I think Alex's opinion matters here.
>>>> Just to clarify, a new type of mdev driver but provides dummy
>>>> vfio_device_ops for VFIO to make container DMA ioctl work.
>>> I see. Thanks! IIUC, you mean we can provide a very tiny
>>> VFIO device driver in drivers/vhost/mdev.c, e.g.:
>>>
>>> static int vfio_vhost_mdev_open(void *device_data)
>>> {
>>> 	if (!try_module_get(THIS_MODULE))
>>> 		return -ENODEV;
>>> 	return 0;
>>> }
>>>
>>> static void vfio_vhost_mdev_release(void *device_data)
>>> {
>>> 	module_put(THIS_MODULE);
>>> }
>>>
>>> static const struct vfio_device_ops vfio_vhost_mdev_dev_ops = {
>>> 	.name		= "vfio-vhost-mdev",
>>> 	.open		= vfio_vhost_mdev_open,
>>> 	.release	= vfio_vhost_mdev_release,
>>> };
>>>
>>> static int vhost_mdev_probe(struct device *dev)
>>> {
>>> 	struct mdev_device *mdev = to_mdev_device(dev);
>>>
>>> 	... Check the mdev device_id proposed in ...
>>> 	... https://lkml.org/lkml/2019/9/12/151 ...
>>
>> To clarify, this should be done through the id_table fields in
>> vhost_mdev_driver, and it should claim it supports virtio-mdev device only:
>>
>>
>> static struct mdev_class_id id_table[] = {
>>      { MDEV_ID_VIRTIO },
>>      { 0 },
>> };
>>
>>
>> static struct mdev_driver vhost_mdev_driver = {
>>      ...
>>      .id_table = id_table,
>> }
> In this way, both of virtio-mdev and vhost-mdev will try to
> take this device. We may want a way to let vhost-mdev take this
> device only when users explicitly ask it to do it. Or maybe we
> can have a different MDEV_ID for vhost-mdev but share the device
> ops with virtio-mdev.


I think it's similar to virtio-pci vs vfio-pci. User can choose to 
switch the driver through bind/unbind.


>
>>
>>> 	return vfio_add_group_dev(dev, &vfio_vhost_mdev_dev_ops, mdev);
>>
>> And in vfio_vhost_mdev_ops, all its need is to just implement vhost-net
>> ioctl and translate them to virtio-mdev transport (e.g device_ops I proposed
>> or ioctls other whatever other method) API.
> I see, so my previous understanding is basically correct:
>
> https://lkml.org/lkml/2019/9/17/332
>
> I.e. we won't have a separate vhost fd and we will do all vhost
> ioctls on the VFIO device fd backed by this new VFIO driver.


Yes.

Thanks


>
>> And it could have a dummy ops
>> implementation for the other device_ops.
>>
>>
>>> }
>>>
>>> static void vhost_mdev_remove(struct device *dev)
>>> {
>>> 	vfio_del_group_dev(dev);
>>> }
>>>
>>> static struct mdev_driver vhost_mdev_driver = {
>>> 	.name	= "vhost_mdev",
>>> 	.probe	= vhost_mdev_probe,
>>> 	.remove	= vhost_mdev_remove,
>>> };
>>>
>>> So we can bind above mdev driver to the virtio-mdev compatible
>>> mdev devices when we want to use vhost-mdev.
>>>
>>> After binding above driver to the mdev device, we can setup IOMMU
>>> via VFIO and get VFIO device fd of this mdev device, and pass it
>>> to vhost fd (/dev/vhost-mdev) with a SET_BACKEND ioctl.
>>
>> Then what vhost-mdev char device did is just forwarding ioctl back to this
>> vfio device fd which seems a overkill. It's simpler that just do ioctl on
>> the device ops directly.
> Yes.
>
> Thanks,
> Tiwei
>
>
>> Thanks
>>
>>
>>> Thanks,
>>> Tiwei
>>>
>>>> Thanks
>>>>
>>>>
>>>>>> Yes, it is.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>