RE: [PATCH V7 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices

From: "Tian, Kevin" <kevin.tian@intel.com>
To: Yishai Hadas <yishaih@nvidia.com>,
	"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
	"jgg@nvidia.com" <jgg@nvidia.com>,
	"mst@redhat.com" <mst@redhat.com>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"virtualization@lists.linux-foundation.org"
	<virtualization@lists.linux-foundation.org>,
	"parav@nvidia.com" <parav@nvidia.com>,
	"feliu@nvidia.com" <feliu@nvidia.com>,
	"jiri@nvidia.com" <jiri@nvidia.com>,
	"joao.m.martins@oracle.com" <joao.m.martins@oracle.com>,
	"si-wei.liu@oracle.com" <si-wei.liu@oracle.com>,
	"leonro@nvidia.com" <leonro@nvidia.com>,
	"maorg@nvidia.com" <maorg@nvidia.com>,
	"jasowang@redhat.com" <jasowang@redhat.com>
Subject: RE: [PATCH V7 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
Date: Thu, 14 Dec 2023 06:07:55 +0000	[thread overview]
Message-ID: <BN9PR11MB5276C5E5AF53B2DCCD654DEF8C8CA@BN9PR11MB5276.namprd11.prod.outlook.com> (raw)
In-Reply-To: <fc4a3133-0233-4843-a4e4-ad86e5b91b3d@nvidia.com>

> From: Yishai Hadas <yishaih@nvidia.com>
> Sent: Wednesday, December 13, 2023 8:25 PM
> 
> On 13/12/2023 10:23, Tian, Kevin wrote:
> >> From: Yishai Hadas <yishaih@nvidia.com>
> >> Sent: Thursday, December 7, 2023 6:28 PM
> >>
> >> +
> >> +static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
> >> +					char __user *buf, size_t count,
> >> +					loff_t *ppos)
> >> +{
> >> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> >> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> >> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> >> +	size_t register_offset;
> >> +	loff_t copy_offset;
> >> +	size_t copy_count;
> >> +	__le32 val32;
> >> +	__le16 val16;
> >> +	u8 val8;
> >> +	int ret;
> >> +
> >> +	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
> >> +	if (ret < 0)
> >> +		return ret;
> >> +
> >> +	if (range_intersect_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
> >> +				  &copy_offset, &copy_count,
> >> &register_offset)) {
> >> +		val16 = cpu_to_le16(VIRTIO_TRANS_ID_NET);
> >> +		if (copy_to_user(buf + copy_offset, (void *)&val16 +
> >> register_offset, copy_count))
> >> +			return -EFAULT;
> >> +	}
> >> +
> >> +	if ((le16_to_cpu(virtvdev->pci_cmd) & PCI_COMMAND_IO) &&
> >> +	    range_intersect_range(pos, count, PCI_COMMAND, sizeof(val16),
> >> +				  &copy_offset, &copy_count,
> >> &register_offset)) {
> >> +		if (copy_from_user((void *)&val16 + register_offset, buf +
> >> copy_offset,
> >> +				   copy_count))
> >> +			return -EFAULT;
> >> +		val16 |= cpu_to_le16(PCI_COMMAND_IO);
> >> +		if (copy_to_user(buf + copy_offset, (void *)&val16 +
> >> register_offset,
> >> +				 copy_count))
> >> +			return -EFAULT;
> >> +	}
> >
> > the write handler calls vfio_pci_core_write() for PCI_COMMAND so
> > the core vconfig should have the latest copy of the IO bit value which
> > is copied to the user buffer by vfio_pci_core_read(). then not necessary
> > to update it again.
> 
> You assume the the 'vconfig' mechanism/flow is always applicable for
> that specific field, this should be double-checked.
> However, as for now the driver doesn't rely / use the vconfig for other
> fields as it doesn't match and need a big refactor, I prefer to not rely
> on it at all and have it here.

iiuc this driver does relies on vconfig for other fields. It first calls
vfio_pci_core_read() and then modifies selected fields which needs
special tweak in this driver.

btw what virtio-net specific tweak is applied to PCI_COMMAND? You
basically cache the cmd value and then set PCI_COMMAND_IO if
it's set in the cached value. But this is already covered by pci
vconfig. If anything is broken there then we already have a big
trouble.

The trick for bar0 makes sense as it doesn't exist physically then
the vconfig mechanism may not give the expected result. But
I didn't see the same rationale for PCI_COMMAND.

> >> +
> >> +static ssize_t
> >> +virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user
> >> *buf,
> >> +			size_t count, loff_t *ppos)
> >> +{
> >> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> >> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> >> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> >> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> >> +
> >> +	if (!count)
> >> +		return 0;
> >> +
> >> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
> >> +		size_t register_offset;
> >> +		loff_t copy_offset;
> >> +		size_t copy_count;
> >> +
> >> +		if (range_intersect_range(pos, count, PCI_COMMAND,
> >> sizeof(virtvdev->pci_cmd),
> >> +					  &copy_offset, &copy_count,
> >> +					  &register_offset)) {
> >> +			if (copy_from_user((void *)&virtvdev->pci_cmd +
> >> register_offset,
> >> +					   buf + copy_offset,
> >> +					   copy_count))
> >> +				return -EFAULT;
> >> +		}
> >> +
> >> +		if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0,
> >> +					  sizeof(virtvdev->pci_base_addr_0),
> >> +					  &copy_offset, &copy_count,
> >> +					  &register_offset)) {
> >> +			if (copy_from_user((void *)&virtvdev-
> >>> pci_base_addr_0 + register_offset,
> >> +					   buf + copy_offset,
> >> +					   copy_count))
> >> +				return -EFAULT;
> >> +		}
> >> +	}
> >
> > wrap above into virtiovf_pci_write_config() to be symmetric with
> > the read path.
> 
> Upon the read path, we do the full flow and return to the user. Here we
> just save some data and continue to call the core, so I'm not sure that
> this worth a dedicated function.

I don't see a real difference.

for the read path you first call vfio_pci_core_read() then modifies some
fields.

for the write path you save some data then call vfio_pci_core_write().

> 
> However, this can be done, do you still suggest it for V8 ?

yes

> >> +static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
> >> +{
> >> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> >> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> >> +	struct pci_dev *pdev;
> >> +	int ret;
> >> +
> >> +	ret = vfio_pci_core_init_dev(core_vdev);
> >> +	if (ret)
> >> +		return ret;
> >> +
> >> +	pdev = virtvdev->core_device.pdev;
> >> +	ret = virtiovf_read_notify_info(virtvdev);
> >> +	if (ret)
> >> +		return ret;
> >> +
> >> +	/* Being ready with a buffer that supports MSIX */
> >> +	virtvdev->bar0_virtual_buf_size = VIRTIO_PCI_CONFIG_OFF(true) +
> >> +				virtiovf_get_device_config_size(pdev-
> >>> device);
> >
> > which code is relevant to MSIX?
> 
> The buffer size must include the MSIX part to match the virtio-net
> specification.
> 
> As part of virtiovf_issue_legacy_rw_cmd() we may use the full buffer
> upon read/write.

at least mention that MSIX is in the device config region otherwise
it's not helpful to people w/o virtio background.

> >> +
> >> +static const struct vfio_device_ops virtiovf_vfio_pci_ops = {
> >> +	.name = "virtio-vfio-pci",
> >> +	.init = vfio_pci_core_init_dev,
> >> +	.release = vfio_pci_core_release_dev,
> >> +	.open_device = virtiovf_pci_open_device,
> >
> > could be vfio_pci_core_open_device(). Given virtiovf specific init func
> > is not called  virtiovf_pci_open_device() is essentially same as the
> > core func.
> 
> We don't have today vfio_pci_core_open_device() as an exported function.
> 
> The virtiovf_pci_open_device() matches both cases so I don't see a real
> reason to export it now.
> 
> By the way, it follows other drivers in vfio, see for example here [1].
> 
> [1]
> https://elixir.bootlin.com/linux/v6.7-
> rc5/source/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c#L1383

ah, yes.

> >> +
> >> +static int virtiovf_pci_probe(struct pci_dev *pdev,
> >> +			      const struct pci_device_id *id)
> >> +{
> >> +	const struct vfio_device_ops *ops = &virtiovf_vfio_pci_ops;
> >> +	struct virtiovf_pci_core_device *virtvdev;
> >> +	int ret;
> >> +
> >> +	if (pdev->is_virtfn && virtio_pci_admin_has_legacy_io(pdev) &&
> >> +	    !virtiovf_bar0_exists(pdev))
> >> +		ops = &virtiovf_vfio_pci_tran_ops;
> >
> > I have a confusion here.
> >
> > why do we want to allow this driver binding to non-matching VF or
> > even PF?
> 
> The intention is to allow the binding of any virtio-net device (i.e. PF,
> VF which is not transitional capable) to have a single driver over VFIO
> for all virtio-net devices.
> 
> This enables any user space application to bind and use any virtio-net
> device without the need to care.
> 
> In case the device is not transitional capable, it will simply use the
> generic vfio functionality.

Is it useful to print a message here?

> 
> >
> > if that is the intention then the naming/description should be adjusted
> > to not specific to vf throughout this patch.
> >
> > e.g. don't use "virtiovf_" prefix...
> 
> The main functionality is to supply the transitional device to user
> space for the VF, hence the prefix and the description for that driver
> refers to VF.
> 
> Let's stay with that.

ok

> 
> >
> > the config option is generic:
> >
> > +config VIRTIO_VFIO_PCI
> > +        tristate "VFIO support for VIRTIO NET PCI devices"
> >
> > but the description is specific to vf:
> >
> > +          This provides support for exposing VIRTIO NET VF devices which
> support
> > +          legacy IO access, using the VFIO framework that can work with a
> legacy
> > +          virtio driver in the guest.
> >
> > then the module description is generic again:
> >
> > +MODULE_DESCRIPTION(
> > +	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO NET devices");
> >
> 
> Yes, as the binding allows that, it looks fine to me.
> 

what about revising the kconfig message to make it clear that it's
for all virtio-net device with special trick to make VF as a
transitional device?