Re: [RFC v4 0/3] vhost: introduce mdev based hardware backend

From: Jason Wang <jasowang@redhat.com>
To: Tiwei Bie <tiwei.bie@intel.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
	alex.williamson@redhat.com, maxime.coquelin@redhat.com,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	netdev@vger.kernel.org, dan.daly@intel.com,
	cunming.liang@intel.com, zhihong.wang@intel.com,
	lingshan.zhu@intel.com
Subject: Re: [RFC v4 0/3] vhost: introduce mdev based hardware backend
Date: Fri, 20 Sep 2019 10:36:57 +0800	[thread overview]
Message-ID: <09dced89-3892-be43-3748-054ce21e37ab@redhat.com> (raw)
In-Reply-To: <20190920021630.GA4108@___>

On 2019/9/20 上午10:16, Tiwei Bie wrote:
> On Fri, Sep 20, 2019 at 09:30:58AM +0800, Jason Wang wrote:
>> On 2019/9/19 下午11:45, Tiwei Bie wrote:
>>> On Thu, Sep 19, 2019 at 09:08:11PM +0800, Jason Wang wrote:
>>>> On 2019/9/18 下午10:32, Michael S. Tsirkin wrote:
>>>>>>>> So I have some questions:
>>>>>>>>
>>>>>>>> 1) Compared to method 2, what's the advantage of creating a new vhost char
>>>>>>>> device? I guess it's for keep the API compatibility?
>>>>>>> One benefit is that we can avoid doing vhost ioctls on
>>>>>>> VFIO device fd.
>>>>>> Yes, but any benefit from doing this?
>>>>> It does seem a bit more modular, but it's certainly not a big deal.
>>>> Ok, if we go this way, it could be as simple as provide some callback to
>>>> vhost, then vhost can just forward the ioctl through parent_ops.
>>>>
>>>>>>>> 2) For method 2, is there any easy way for user/admin to distinguish e.g
>>>>>>>> ordinary vfio-mdev for vhost from ordinary vfio-mdev?
>>>>>>> I think device-api could be a choice.
>>>>>> Ok.
>>>>>>
>>>>>>
>>>>>>>> I saw you introduce
>>>>>>>> ops matching helper but it's not friendly to management.
>>>>>>> The ops matching helper is just to check whether a given
>>>>>>> vfio-device is based on a mdev device.
>>>>>>>
>>>>>>>> 3) A drawback of 1) and 2) is that it must follow vfio_device_ops that
>>>>>>>> assumes the parameter comes from userspace, it prevents support kernel
>>>>>>>> virtio drivers.
>>>>>>>>
>>>>>>>> 4) So comes the idea of method 3, since it register a new vhost-mdev driver,
>>>>>>>> we can use device specific ops instead of VFIO ones, then we can have a
>>>>>>>> common API between vDPA parent and vhost-mdev/virtio-mdev drivers.
>>>>>>> As the above draft shows, this requires introducing a new
>>>>>>> VFIO device driver. I think Alex's opinion matters here.
>>>> Just to clarify, a new type of mdev driver but provides dummy
>>>> vfio_device_ops for VFIO to make container DMA ioctl work.
>>> I see. Thanks! IIUC, you mean we can provide a very tiny
>>> VFIO device driver in drivers/vhost/mdev.c, e.g.:
>>>
>>> static int vfio_vhost_mdev_open(void *device_data)
>>> {
>>> 	if (!try_module_get(THIS_MODULE))
>>> 		return -ENODEV;
>>> 	return 0;
>>> }
>>>
>>> static void vfio_vhost_mdev_release(void *device_data)
>>> {
>>> 	module_put(THIS_MODULE);
>>> }
>>>
>>> static const struct vfio_device_ops vfio_vhost_mdev_dev_ops = {
>>> 	.name		= "vfio-vhost-mdev",
>>> 	.open		= vfio_vhost_mdev_open,
>>> 	.release	= vfio_vhost_mdev_release,
>>> };
>>>
>>> static int vhost_mdev_probe(struct device *dev)
>>> {
>>> 	struct mdev_device *mdev = to_mdev_device(dev);
>>>
>>> 	... Check the mdev device_id proposed in ...
>>> 	... https://lkml.org/lkml/2019/9/12/151 ...
>>
>> To clarify, this should be done through the id_table fields in
>> vhost_mdev_driver, and it should claim it supports virtio-mdev device only:
>>
>>
>> static struct mdev_class_id id_table[] = {
>>      { MDEV_ID_VIRTIO },
>>      { 0 },
>> };
>>
>>
>> static struct mdev_driver vhost_mdev_driver = {
>>      ...
>>      .id_table = id_table,
>> }
> In this way, both of virtio-mdev and vhost-mdev will try to
> take this device. We may want a way to let vhost-mdev take this
> device only when users explicitly ask it to do it. Or maybe we
> can have a different MDEV_ID for vhost-mdev but share the device
> ops with virtio-mdev.

I think it's similar to virtio-pci vs vfio-pci. User can choose to 
switch the driver through bind/unbind.

>
>>
>>> 	return vfio_add_group_dev(dev, &vfio_vhost_mdev_dev_ops, mdev);
>>
>> And in vfio_vhost_mdev_ops, all its need is to just implement vhost-net
>> ioctl and translate them to virtio-mdev transport (e.g device_ops I proposed
>> or ioctls other whatever other method) API.
> I see, so my previous understanding is basically correct:
>
> https://lkml.org/lkml/2019/9/17/332
>
> I.e. we won't have a separate vhost fd and we will do all vhost
> ioctls on the VFIO device fd backed by this new VFIO driver.

Yes.

Thanks

>
>> And it could have a dummy ops
>> implementation for the other device_ops.
>>
>>
>>> }
>>>
>>> static void vhost_mdev_remove(struct device *dev)
>>> {
>>> 	vfio_del_group_dev(dev);
>>> }
>>>
>>> static struct mdev_driver vhost_mdev_driver = {
>>> 	.name	= "vhost_mdev",
>>> 	.probe	= vhost_mdev_probe,
>>> 	.remove	= vhost_mdev_remove,
>>> };
>>>
>>> So we can bind above mdev driver to the virtio-mdev compatible
>>> mdev devices when we want to use vhost-mdev.
>>>
>>> After binding above driver to the mdev device, we can setup IOMMU
>>> via VFIO and get VFIO device fd of this mdev device, and pass it
>>> to vhost fd (/dev/vhost-mdev) with a SET_BACKEND ioctl.
>>
>> Then what vhost-mdev char device did is just forwarding ioctl back to this
>> vfio device fd which seems a overkill. It's simpler that just do ioctl on
>> the device ops directly.
> Yes.
>
> Thanks,
> Tiwei
>
>
>> Thanks
>>
>>
>>> Thanks,
>>> Tiwei
>>>
>>>> Thanks
>>>>
>>>>
>>>>>> Yes, it is.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>