RE: [PATCH V7 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices

From: "Tian, Kevin" <kevin.tian@intel.com>
To: Yishai Hadas <yishaih@nvidia.com>,
	"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
	"mst@redhat.com" <mst@redhat.com>,
	"jasowang@redhat.com" <jasowang@redhat.com>,
	"jgg@nvidia.com" <jgg@nvidia.com>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"virtualization@lists.linux-foundation.org"
	<virtualization@lists.linux-foundation.org>,
	"parav@nvidia.com" <parav@nvidia.com>,
	"feliu@nvidia.com" <feliu@nvidia.com>,
	"jiri@nvidia.com" <jiri@nvidia.com>,
	"joao.m.martins@oracle.com" <joao.m.martins@oracle.com>,
	"si-wei.liu@oracle.com" <si-wei.liu@oracle.com>,
	"leonro@nvidia.com" <leonro@nvidia.com>,
	"maorg@nvidia.com" <maorg@nvidia.com>
Subject: RE: [PATCH V7 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
Date: Wed, 13 Dec 2023 08:23:50 +0000	[thread overview]
Message-ID: <BN9PR11MB5276C9276E78C66B0C5DA9088C8DA@BN9PR11MB5276.namprd11.prod.outlook.com> (raw)
In-Reply-To: <20231207102820.74820-10-yishaih@nvidia.com>

> From: Yishai Hadas <yishaih@nvidia.com>
> Sent: Thursday, December 7, 2023 6:28 PM
> 
> Any read/write towards the control parts of the BAR will be captured by
> the new driver and will be translated into admin commands towards the
> device.
> 
> Any data path read/write access (i.e. virtio driver notifications) will
> be forwarded to the physical BAR which its properties were supplied by
> the admin command VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO upon the
> probing/init flow.

this is still captured by the new driver. Just the difference between
using admin cmds vs. directly accessing bar when emulating the access.

> +config VIRTIO_VFIO_PCI
> +        tristate "VFIO support for VIRTIO NET PCI devices"
> +        depends on VIRTIO_PCI
> +        select VFIO_PCI_CORE
> +        help
> +          This provides support for exposing VIRTIO NET VF devices which
> support
> +          legacy IO access, using the VFIO framework that can work with a
> legacy
> +          virtio driver in the guest.
> +          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
> +          not indicate I/O Space.

"thus, ..." duplicates with the former part.

> +          As of that this driver emulated I/O BAR in software to let a VF be

s/emulated/emulates/

> +          seen as a transitional device in the guest and let it work with
> +          a legacy driver.

VFIO is not specific to the guest. a native application including a legacy
virtio driver could also benefit. let's not write it in a way specific to virt.

> +
> +static int
> +translate_io_bar_to_mem_bar(struct virtiovf_pci_core_device *virtvdev,
> +			    loff_t pos, char __user *buf,
> +			    size_t count, bool read)

this name only talks about the behavior for VIRTIO_PCI_QUEUE_NOTIFY.

for legacy admin cmd it's unclear whether it's actually conveyed to a
mem bar.

is it clearer to call it virtiovf_pci_bar0_rw()?

> +
> +static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
> +					char __user *buf, size_t count,
> +					loff_t *ppos)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +	size_t register_offset;
> +	loff_t copy_offset;
> +	size_t copy_count;
> +	__le32 val32;
> +	__le16 val16;
> +	u8 val8;
> +	int ret;
> +
> +	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
> +	if (ret < 0)
> +		return ret;
> +
> +	if (range_intersect_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
> +				  &copy_offset, &copy_count,
> &register_offset)) {
> +		val16 = cpu_to_le16(VIRTIO_TRANS_ID_NET);
> +		if (copy_to_user(buf + copy_offset, (void *)&val16 +
> register_offset, copy_count))
> +			return -EFAULT;
> +	}
> +
> +	if ((le16_to_cpu(virtvdev->pci_cmd) & PCI_COMMAND_IO) &&
> +	    range_intersect_range(pos, count, PCI_COMMAND, sizeof(val16),
> +				  &copy_offset, &copy_count,
> &register_offset)) {
> +		if (copy_from_user((void *)&val16 + register_offset, buf +
> copy_offset,
> +				   copy_count))
> +			return -EFAULT;
> +		val16 |= cpu_to_le16(PCI_COMMAND_IO);
> +		if (copy_to_user(buf + copy_offset, (void *)&val16 +
> register_offset,
> +				 copy_count))
> +			return -EFAULT;
> +	}

the write handler calls vfio_pci_core_write() for PCI_COMMAND so
the core vconfig should have the latest copy of the IO bit value which
is copied to the user buffer by vfio_pci_core_read(). then not necessary
to update it again.

btw the approach in this patch sounds a bit hackish - it modifies the
result before/after vfio pci core emulation instead of directly injecting
its specific emulation logic in vfio vconfig. It's probably being that
vfio vconfig currently has a global permission/handler scheme for
all pci devices. Extending it to support per-device tweak might need
lots of change.

So I'm not advocating that big change at this point, especially when
only this driver imposes such requirement now. But in the future when
more drivers e.g. Ankit's nvgrace-gpu want to do similar tweak we
may consider such possibility.

> +
> +	if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0,
> sizeof(val32),
> +				  &copy_offset, &copy_count,
> &register_offset)) {
> +		u32 bar_mask = ~(virtvdev->bar0_virtual_buf_size - 1);
> +		u32 pci_base_addr_0 = le32_to_cpu(virtvdev-
> >pci_base_addr_0);
> +
> +		val32 = cpu_to_le32((pci_base_addr_0 & bar_mask) |
> PCI_BASE_ADDRESS_SPACE_IO);
> +		if (copy_to_user(buf + copy_offset, (void *)&val32 +
> register_offset, copy_count))
> +			return -EFAULT;
> +	}

Do we care about the initial value of bar0? this patch leaves it as 0,
unlike other real bars initialized with the hw value. In reality this
may not be a problem as software usually writes all 1's to detect
the size as the first step.

raise it just in case others may see a potential issue.

> +
> +static ssize_t
> +virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user
> *buf,
> +			size_t count, loff_t *ppos)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +
> +	if (!count)
> +		return 0;
> +
> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
> +		size_t register_offset;
> +		loff_t copy_offset;
> +		size_t copy_count;
> +
> +		if (range_intersect_range(pos, count, PCI_COMMAND,
> sizeof(virtvdev->pci_cmd),
> +					  &copy_offset, &copy_count,
> +					  &register_offset)) {
> +			if (copy_from_user((void *)&virtvdev->pci_cmd +
> register_offset,
> +					   buf + copy_offset,
> +					   copy_count))
> +				return -EFAULT;
> +		}
> +
> +		if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0,
> +					  sizeof(virtvdev->pci_base_addr_0),
> +					  &copy_offset, &copy_count,
> +					  &register_offset)) {
> +			if (copy_from_user((void *)&virtvdev-
> >pci_base_addr_0 + register_offset,
> +					   buf + copy_offset,
> +					   copy_count))
> +				return -EFAULT;
> +		}
> +	}

wrap above into virtiovf_pci_write_config() to be symmetric with
the read path.

> +static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct pci_dev *pdev;
> +	int ret;
> +
> +	ret = vfio_pci_core_init_dev(core_vdev);
> +	if (ret)
> +		return ret;
> +
> +	pdev = virtvdev->core_device.pdev;
> +	ret = virtiovf_read_notify_info(virtvdev);
> +	if (ret)
> +		return ret;
> +
> +	/* Being ready with a buffer that supports MSIX */
> +	virtvdev->bar0_virtual_buf_size = VIRTIO_PCI_CONFIG_OFF(true) +
> +				virtiovf_get_device_config_size(pdev-
> >device);

which code is relevant to MSIX?

> +
> +static const struct vfio_device_ops virtiovf_vfio_pci_ops = {
> +	.name = "virtio-vfio-pci",
> +	.init = vfio_pci_core_init_dev,
> +	.release = vfio_pci_core_release_dev,
> +	.open_device = virtiovf_pci_open_device,

could be vfio_pci_core_open_device(). Given virtiovf specific init func
is not called  virtiovf_pci_open_device() is essentially same as the
core func.

> +
> +static int virtiovf_pci_probe(struct pci_dev *pdev,
> +			      const struct pci_device_id *id)
> +{
> +	const struct vfio_device_ops *ops = &virtiovf_vfio_pci_ops;
> +	struct virtiovf_pci_core_device *virtvdev;
> +	int ret;
> +
> +	if (pdev->is_virtfn && virtio_pci_admin_has_legacy_io(pdev) &&
> +	    !virtiovf_bar0_exists(pdev))
> +		ops = &virtiovf_vfio_pci_tran_ops;

I have a confusion here.

why do we want to allow this driver binding to non-matching VF or
even PF?

if that is the intention then the naming/description should be adjusted
to not specific to vf throughout this patch.

e.g. don't use "virtiovf_" prefix...

the config option is generic:

+config VIRTIO_VFIO_PCI
+        tristate "VFIO support for VIRTIO NET PCI devices"

but the description is specific to vf:

+          This provides support for exposing VIRTIO NET VF devices which support
+          legacy IO access, using the VFIO framework that can work with a legacy
+          virtio driver in the guest.

then the module description is generic again:

+MODULE_DESCRIPTION(
+	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO NET devices");