All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V4 vfio 0/9] Introduce a vfio driver over virtio devices
@ 2023-11-29 14:37 Yishai Hadas
  2023-11-29 14:37 ` [PATCH V4 vfio 1/9] virtio: Define feature bit for administration virtqueue Yishai Hadas
                   ` (9 more replies)
  0 siblings, 10 replies; 22+ messages in thread
From: Yishai Hadas @ 2023-11-29 14:37 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, si-wei.liu, leonro, yishaih, maorg

This series introduce a vfio driver over virtio devices to support the
legacy interface functionality for VFs.

Background, from the virtio spec [1].
--------------------------------------------------------------------
In some systems, there is a need to support a virtio legacy driver with
a device that does not directly support the legacy interface. In such
scenarios, a group owner device can provide the legacy interface
functionality for the group member devices. The driver of the owner
device can then access the legacy interface of a member device on behalf
of the legacy member device driver.

For example, with the SR-IOV group type, group members (VFs) can not
present the legacy interface in an I/O BAR in BAR0 as expected by the
legacy pci driver. If the legacy driver is running inside a virtual
machine, the hypervisor executing the virtual machine can present a
virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
legacy driver accesses to this I/O BAR and forwards them to the group
owner device (PF) using group administration commands.
--------------------------------------------------------------------

The first 6 patches are in the virtio area and handle the below:
- Introduce the admin virtqueue infrastcture.
- Expose the layout of the commands that should be used for
  supporting the legacy access.
- Expose APIs to enable upper layers as of vfio, net, etc
  to execute admin commands.

The above follows the virtio spec that was lastly accepted in that area
[1].

The last 3 patches are in the vfio area and handle the below:
- Expose some APIs from vfio/pci to be used by the vfio/virtio driver.
- Introduce a vfio driver over virtio devices to support the legacy
  interface functionality for VFs. 

The series was tested successfully over virtio-net VFs in the host,
while running in the guest both modern and legacy drivers.

[1]
https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c

Changes from V3: https://www.spinics.net/lists/kvm/msg333008.html
Virtio:
- Rebase on top of 6.7 rc3.
Vfio:
- Fix a typo, drop 'acc' from 'virtiovf_acc_vfio_pci_tran_ops'.

Changes from V2: https://lore.kernel.org/all/20231029155952.67686-8-yishaih@nvidia.com/T/
Virtio:
- Rebase on top of 6.7 rc1.
- Add a mutex to serialize admin commands execution and virtqueue
  deletion, as was suggested by Michael.
- Remove the 'ref_count' usage which is not needed any more.
- Reduce the depth of the admin vq to match a single command at a given time.
- Add a supported check upon command execution and move to use a single
  flow of virtqueue_exec_admin_cmd().
- Improve the description of the exported commands to better match the
  specification and the expected usage as was asked by Michael.

Vfio:
- Upon calling to virtio_pci_admin_legacy/common_device_io_read/write()
  supply the 'offset' within the relevant configuration area, following
  the virtio exported APIs.

Changes from V1: https://lore.kernel.org/all/20231023104548.07b3aa19.alex.williamson@redhat.com/T/
Virtio:
- Drop its first patch, it was accepted upstream already.
- Add a new patch (#6) which initializes the supported admin commands
  upon admin queue activation as was suggested by Michael.
- Split the legacy_io_read/write commands per common/device
  configuration as was asked by Michael.
- Don't expose any more the list query/used APIs outside of virtio.
- Instead, expose an API to check whether the legacy io functionality is
  supported as was suggested by Michael.
- Fix some Krobot's note by adding the missing include file.

Vfio:
- Refer specifically to virtio-net as part of the driver/module description
  as Alex asked.
- Change to check MSIX enablement based on the irq type of the given vfio
  core device. In addition, drop its capable checking from the probe flow
  as was asked by Alex.
- Adapt to use the new virtio exposed APIs and clean some code accordingly.
- Adapt to some cleaner style code in some places (if/else) as was suggested
  by Alex.
- Fix the range_intersect_range() function and adapt its usage as was
  pointed by Alex.
- Make struct virtiovf_pci_core_device better packed.
- Overwrite the subsystem vendor ID to be 0x1af4 as was discussed in
  the ML.
- Add support for the 'bar sizing negotiation' as was asked by Alex.
- Drop the 'acc' from the 'ops' as Alex asked.

Changes from V0: https://www.spinics.net/lists/linux-virtualization/msg63802.html

Virtio:
- Fix the common config map size issue that was reported by Michael
  Tsirkin.
- Do not use vp_dev->vqs[] array upon vp_del_vqs() as was asked by
  Michael, instead skip the AQ specifically.
- Move admin vq implementation into virtio_pci_modern.c as was asked by
  Michael.
- Rename structure virtio_avq to virtio_pci_admin_vq and some extra
  corresponding renames.
- Remove exported symbols virtio_pci_vf_get_pf_dev(),
  virtio_admin_cmd_exec() as now callers are local to the module.
- Handle inflight commands as part of the device reset flow.
- Introduce APIs per admin command in virtio-pci as was asked by Michael.

Vfio:
- Change to use EXPORT_SYMBOL_GPL instead of EXPORT_SYMBOL for
  vfio_pci_core_setup_barmap() and vfio_pci_iowrite#xxx() as pointed by
  Alex.
- Drop the intermediate patch which prepares the commands and calls the
  generic virtio admin command API (i.e. virtio_admin_cmd_exec()).
- Instead, call directly to the new APIs per admin command that are
  exported from Virtio - based on Michael's request.
- Enable only virtio-net as part of the pci_device_id table to enforce
  upon binding only what is supported as suggested by Alex.
- Add support for byte-wise access (read/write) over the device config
  region as was asked by Alex.
- Consider whether MSIX is practically enabled/disabled to choose the
  right opcode upon issuing read/write admin command, as mentioned
  by Michael.
- Move to use VIRTIO_PCI_CONFIG_OFF instead of adding some new defines
  as was suggested by Michael.
- Set the '.close_device' op to vfio_pci_core_close_device() as was
  pointed by Alex.
- Adapt to Vfio multi-line comment style in a few places.
- Add virtualization@lists.linux-foundation.org in the MAINTAINERS file
  to be CCed for the new driver as was suggested by Jason.

Yishai

Feng Liu (4):
  virtio: Define feature bit for administration virtqueue
  virtio-pci: Introduce admin virtqueue
  virtio-pci: Introduce admin command sending function
  virtio-pci: Introduce admin commands

Yishai Hadas (5):
  virtio-pci: Initialize the supported admin commands
  virtio-pci: Introduce APIs to execute legacy IO admin commands
  vfio/pci: Expose vfio_pci_core_setup_barmap()
  vfio/pci: Expose vfio_pci_iowrite/read##size()
  vfio/virtio: Introduce a vfio driver over virtio devices

 MAINTAINERS                            |   7 +
 drivers/vfio/pci/Kconfig               |   2 +
 drivers/vfio/pci/Makefile              |   2 +
 drivers/vfio/pci/vfio_pci_core.c       |  25 ++
 drivers/vfio/pci/vfio_pci_rdwr.c       |  38 +-
 drivers/vfio/pci/virtio/Kconfig        |  16 +
 drivers/vfio/pci/virtio/Makefile       |   4 +
 drivers/vfio/pci/virtio/main.c         | 554 +++++++++++++++++++++++++
 drivers/virtio/virtio.c                |  37 +-
 drivers/virtio/virtio_pci_common.c     |  14 +
 drivers/virtio/virtio_pci_common.h     |  21 +-
 drivers/virtio/virtio_pci_modern.c     | 503 +++++++++++++++++++++-
 drivers/virtio/virtio_pci_modern_dev.c |  24 +-
 include/linux/vfio_pci_core.h          |  20 +
 include/linux/virtio.h                 |   8 +
 include/linux/virtio_config.h          |   4 +
 include/linux/virtio_pci_admin.h       |  21 +
 include/linux/virtio_pci_modern.h      |   2 +
 include/uapi/linux/virtio_config.h     |   8 +-
 include/uapi/linux/virtio_pci.h        |  71 ++++
 20 files changed, 1342 insertions(+), 39 deletions(-)
 create mode 100644 drivers/vfio/pci/virtio/Kconfig
 create mode 100644 drivers/vfio/pci/virtio/Makefile
 create mode 100644 drivers/vfio/pci/virtio/main.c
 create mode 100644 include/linux/virtio_pci_admin.h

-- 
2.27.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH V4 vfio 1/9] virtio: Define feature bit for administration virtqueue
  2023-11-29 14:37 [PATCH V4 vfio 0/9] Introduce a vfio driver over virtio devices Yishai Hadas
@ 2023-11-29 14:37 ` Yishai Hadas
  2023-11-29 14:37 ` [PATCH V4 vfio 2/9] virtio-pci: Introduce admin virtqueue Yishai Hadas
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Yishai Hadas @ 2023-11-29 14:37 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, si-wei.liu, leonro, yishaih, maorg

From: Feng Liu <feliu@nvidia.com>

Introduce VIRTIO_F_ADMIN_VQ which is used for administration virtqueue
support.

Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 include/uapi/linux/virtio_config.h | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
index 8881aea60f6f..2445f365bce7 100644
--- a/include/uapi/linux/virtio_config.h
+++ b/include/uapi/linux/virtio_config.h
@@ -52,7 +52,7 @@
  * rest are per-device feature bits.
  */
 #define VIRTIO_TRANSPORT_F_START	28
-#define VIRTIO_TRANSPORT_F_END		41
+#define VIRTIO_TRANSPORT_F_END		42
 
 #ifndef VIRTIO_CONFIG_NO_LEGACY
 /* Do we get callbacks when the ring is completely used, even if we've
@@ -114,4 +114,10 @@
  * This feature indicates that the driver can reset a queue individually.
  */
 #define VIRTIO_F_RING_RESET		40
+
+/*
+ * This feature indicates that the device support administration virtqueues.
+ */
+#define VIRTIO_F_ADMIN_VQ		41
+
 #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH V4 vfio 2/9] virtio-pci: Introduce admin virtqueue
  2023-11-29 14:37 [PATCH V4 vfio 0/9] Introduce a vfio driver over virtio devices Yishai Hadas
  2023-11-29 14:37 ` [PATCH V4 vfio 1/9] virtio: Define feature bit for administration virtqueue Yishai Hadas
@ 2023-11-29 14:37 ` Yishai Hadas
  2023-11-29 14:37 ` [PATCH V4 vfio 3/9] virtio-pci: Introduce admin command sending function Yishai Hadas
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Yishai Hadas @ 2023-11-29 14:37 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, si-wei.liu, leonro, yishaih, maorg

From: Feng Liu <feliu@nvidia.com>

Introduce support for the admin virtqueue. By negotiating
VIRTIO_F_ADMIN_VQ feature, driver detects capability and creates one
administration virtqueue. Administration virtqueue implementation in
virtio pci generic layer, enables multiple types of upper layer
drivers such as vfio, net, blk to utilize it.

Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/virtio/virtio.c                | 37 +++++++++++--
 drivers/virtio/virtio_pci_common.c     |  3 ++
 drivers/virtio/virtio_pci_common.h     | 15 +++++-
 drivers/virtio/virtio_pci_modern.c     | 75 +++++++++++++++++++++++++-
 drivers/virtio/virtio_pci_modern_dev.c | 24 ++++++++-
 include/linux/virtio_config.h          |  4 ++
 include/linux/virtio_pci_modern.h      |  2 +
 include/uapi/linux/virtio_pci.h        |  5 ++
 8 files changed, 157 insertions(+), 8 deletions(-)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 3893dc29eb26..f4080692b351 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -302,9 +302,15 @@ static int virtio_dev_probe(struct device *_d)
 	if (err)
 		goto err;
 
+	if (dev->config->create_avq) {
+		err = dev->config->create_avq(dev);
+		if (err)
+			goto err;
+	}
+
 	err = drv->probe(dev);
 	if (err)
-		goto err;
+		goto err_probe;
 
 	/* If probe didn't do it, mark device DRIVER_OK ourselves. */
 	if (!(dev->config->get_status(dev) & VIRTIO_CONFIG_S_DRIVER_OK))
@@ -316,6 +322,10 @@ static int virtio_dev_probe(struct device *_d)
 	virtio_config_enable(dev);
 
 	return 0;
+
+err_probe:
+	if (dev->config->destroy_avq)
+		dev->config->destroy_avq(dev);
 err:
 	virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
 	return err;
@@ -331,6 +341,9 @@ static void virtio_dev_remove(struct device *_d)
 
 	drv->remove(dev);
 
+	if (dev->config->destroy_avq)
+		dev->config->destroy_avq(dev);
+
 	/* Driver should have reset device. */
 	WARN_ON_ONCE(dev->config->get_status(dev));
 
@@ -489,13 +502,20 @@ EXPORT_SYMBOL_GPL(unregister_virtio_device);
 int virtio_device_freeze(struct virtio_device *dev)
 {
 	struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
+	int ret;
 
 	virtio_config_disable(dev);
 
 	dev->failed = dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED;
 
-	if (drv && drv->freeze)
-		return drv->freeze(dev);
+	if (drv && drv->freeze) {
+		ret = drv->freeze(dev);
+		if (ret)
+			return ret;
+	}
+
+	if (dev->config->destroy_avq)
+		dev->config->destroy_avq(dev);
 
 	return 0;
 }
@@ -532,10 +552,16 @@ int virtio_device_restore(struct virtio_device *dev)
 	if (ret)
 		goto err;
 
+	if (dev->config->create_avq) {
+		ret = dev->config->create_avq(dev);
+		if (ret)
+			goto err;
+	}
+
 	if (drv->restore) {
 		ret = drv->restore(dev);
 		if (ret)
-			goto err;
+			goto err_restore;
 	}
 
 	/* If restore didn't do it, mark device DRIVER_OK ourselves. */
@@ -546,6 +572,9 @@ int virtio_device_restore(struct virtio_device *dev)
 
 	return 0;
 
+err_restore:
+	if (dev->config->destroy_avq)
+		dev->config->destroy_avq(dev);
 err:
 	virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
 	return ret;
diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
index 7a5593997e0e..fafd13d0e4d4 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -236,6 +236,9 @@ void vp_del_vqs(struct virtio_device *vdev)
 	int i;
 
 	list_for_each_entry_safe(vq, n, &vdev->vqs, list) {
+		if (vp_dev->is_avq(vdev, vq->index))
+			continue;
+
 		if (vp_dev->per_vq_vectors) {
 			int v = vp_dev->vqs[vq->index]->msix_vector;
 
diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
index 4b773bd7c58c..7306128e63e9 100644
--- a/drivers/virtio/virtio_pci_common.h
+++ b/drivers/virtio/virtio_pci_common.h
@@ -41,6 +41,14 @@ struct virtio_pci_vq_info {
 	unsigned int msix_vector;
 };
 
+struct virtio_pci_admin_vq {
+	/* Virtqueue info associated with this admin queue. */
+	struct virtio_pci_vq_info info;
+	/* Name of the admin queue: avq.$vq_index. */
+	char name[10];
+	u16 vq_index;
+};
+
 /* Our device structure */
 struct virtio_pci_device {
 	struct virtio_device vdev;
@@ -58,9 +66,13 @@ struct virtio_pci_device {
 	spinlock_t lock;
 	struct list_head virtqueues;
 
-	/* array of all queues for house-keeping */
+	/* Array of all virtqueues reported in the
+	 * PCI common config num_queues field
+	 */
 	struct virtio_pci_vq_info **vqs;
 
+	struct virtio_pci_admin_vq admin_vq;
+
 	/* MSI-X support */
 	int msix_enabled;
 	int intx_enabled;
@@ -86,6 +98,7 @@ struct virtio_pci_device {
 	void (*del_vq)(struct virtio_pci_vq_info *info);
 
 	u16 (*config_vector)(struct virtio_pci_device *vp_dev, u16 vector);
+	bool (*is_avq)(struct virtio_device *vdev, unsigned int index);
 };
 
 /* Constants for MSI-X */
diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
index ee6a386d250b..ce915018b5b0 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -19,6 +19,8 @@
 #define VIRTIO_RING_NO_LEGACY
 #include "virtio_pci_common.h"
 
+#define VIRTIO_AVQ_SGS_MAX	4
+
 static u64 vp_get_features(struct virtio_device *vdev)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
@@ -26,6 +28,16 @@ static u64 vp_get_features(struct virtio_device *vdev)
 	return vp_modern_get_features(&vp_dev->mdev);
 }
 
+static bool vp_is_avq(struct virtio_device *vdev, unsigned int index)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+
+	if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ))
+		return false;
+
+	return index == vp_dev->admin_vq.vq_index;
+}
+
 static void vp_transport_features(struct virtio_device *vdev, u64 features)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
@@ -37,6 +49,9 @@ static void vp_transport_features(struct virtio_device *vdev, u64 features)
 
 	if (features & BIT_ULL(VIRTIO_F_RING_RESET))
 		__virtio_set_bit(vdev, VIRTIO_F_RING_RESET);
+
+	if (features & BIT_ULL(VIRTIO_F_ADMIN_VQ))
+		__virtio_set_bit(vdev, VIRTIO_F_ADMIN_VQ);
 }
 
 static int __vp_check_common_size_one_feature(struct virtio_device *vdev, u32 fbit,
@@ -69,6 +84,9 @@ static int vp_check_common_size(struct virtio_device *vdev)
 	if (vp_check_common_size_one_feature(vdev, VIRTIO_F_RING_RESET, queue_reset))
 		return -EINVAL;
 
+	if (vp_check_common_size_one_feature(vdev, VIRTIO_F_ADMIN_VQ, admin_queue_num))
+		return -EINVAL;
+
 	return 0;
 }
 
@@ -345,6 +363,7 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
 	struct virtio_pci_modern_device *mdev = &vp_dev->mdev;
 	bool (*notify)(struct virtqueue *vq);
 	struct virtqueue *vq;
+	bool is_avq;
 	u16 num;
 	int err;
 
@@ -353,11 +372,13 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
 	else
 		notify = vp_notify;
 
-	if (index >= vp_modern_get_num_queues(mdev))
+	is_avq = vp_is_avq(&vp_dev->vdev, index);
+	if (index >= vp_modern_get_num_queues(mdev) && !is_avq)
 		return ERR_PTR(-EINVAL);
 
+	num = is_avq ?
+		VIRTIO_AVQ_SGS_MAX : vp_modern_get_queue_size(mdev, index);
 	/* Check if queue is either not available or already active. */
-	num = vp_modern_get_queue_size(mdev, index);
 	if (!num || vp_modern_get_queue_enable(mdev, index))
 		return ERR_PTR(-ENOENT);
 
@@ -383,6 +404,9 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
 		goto err;
 	}
 
+	if (is_avq)
+		vp_dev->admin_vq.info.vq = vq;
+
 	return vq;
 
 err:
@@ -418,6 +442,9 @@ static void del_vq(struct virtio_pci_vq_info *info)
 	struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
 	struct virtio_pci_modern_device *mdev = &vp_dev->mdev;
 
+	if (vp_is_avq(&vp_dev->vdev, vq->index))
+		vp_dev->admin_vq.info.vq = NULL;
+
 	if (vp_dev->msix_enabled)
 		vp_modern_queue_vector(mdev, vq->index,
 				       VIRTIO_MSI_NO_VECTOR);
@@ -527,6 +554,45 @@ static bool vp_get_shm_region(struct virtio_device *vdev,
 	return true;
 }
 
+static int vp_modern_create_avq(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	struct virtio_pci_admin_vq *avq;
+	struct virtqueue *vq;
+	u16 admin_q_num;
+
+	if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ))
+		return 0;
+
+	admin_q_num = vp_modern_avq_num(&vp_dev->mdev);
+	if (!admin_q_num)
+		return -EINVAL;
+
+	avq = &vp_dev->admin_vq;
+	avq->vq_index = vp_modern_avq_index(&vp_dev->mdev);
+	sprintf(avq->name, "avq.%u", avq->vq_index);
+	vq = vp_dev->setup_vq(vp_dev, &vp_dev->admin_vq.info, avq->vq_index, NULL,
+			      avq->name, NULL, VIRTIO_MSI_NO_VECTOR);
+	if (IS_ERR(vq)) {
+		dev_err(&vdev->dev, "failed to setup admin virtqueue, err=%ld",
+			PTR_ERR(vq));
+		return PTR_ERR(vq);
+	}
+
+	vp_modern_set_queue_enable(&vp_dev->mdev, avq->info.vq->index, true);
+	return 0;
+}
+
+static void vp_modern_destroy_avq(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+
+	if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ))
+		return;
+
+	vp_dev->del_vq(&vp_dev->admin_vq.info);
+}
+
 static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
 	.get		= NULL,
 	.set		= NULL,
@@ -545,6 +611,8 @@ static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
 	.get_shm_region  = vp_get_shm_region,
 	.disable_vq_and_reset = vp_modern_disable_vq_and_reset,
 	.enable_vq_after_reset = vp_modern_enable_vq_after_reset,
+	.create_avq = vp_modern_create_avq,
+	.destroy_avq = vp_modern_destroy_avq,
 };
 
 static const struct virtio_config_ops virtio_pci_config_ops = {
@@ -565,6 +633,8 @@ static const struct virtio_config_ops virtio_pci_config_ops = {
 	.get_shm_region  = vp_get_shm_region,
 	.disable_vq_and_reset = vp_modern_disable_vq_and_reset,
 	.enable_vq_after_reset = vp_modern_enable_vq_after_reset,
+	.create_avq = vp_modern_create_avq,
+	.destroy_avq = vp_modern_destroy_avq,
 };
 
 /* the PCI probing function */
@@ -588,6 +658,7 @@ int virtio_pci_modern_probe(struct virtio_pci_device *vp_dev)
 	vp_dev->config_vector = vp_config_vector;
 	vp_dev->setup_vq = setup_vq;
 	vp_dev->del_vq = del_vq;
+	vp_dev->is_avq = vp_is_avq;
 	vp_dev->isr = mdev->isr;
 	vp_dev->vdev.id = mdev->id;
 
diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
index 7de8b1ebabac..0d3dbfaf4b23 100644
--- a/drivers/virtio/virtio_pci_modern_dev.c
+++ b/drivers/virtio/virtio_pci_modern_dev.c
@@ -207,6 +207,10 @@ static inline void check_offsets(void)
 		     offsetof(struct virtio_pci_modern_common_cfg, queue_notify_data));
 	BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_RESET !=
 		     offsetof(struct virtio_pci_modern_common_cfg, queue_reset));
+	BUILD_BUG_ON(VIRTIO_PCI_COMMON_ADM_Q_IDX !=
+		     offsetof(struct virtio_pci_modern_common_cfg, admin_queue_index));
+	BUILD_BUG_ON(VIRTIO_PCI_COMMON_ADM_Q_NUM !=
+		     offsetof(struct virtio_pci_modern_common_cfg, admin_queue_num));
 }
 
 /*
@@ -296,7 +300,7 @@ int vp_modern_probe(struct virtio_pci_modern_device *mdev)
 	mdev->common = vp_modern_map_capability(mdev, common,
 			      sizeof(struct virtio_pci_common_cfg), 4, 0,
 			      offsetofend(struct virtio_pci_modern_common_cfg,
-					  queue_reset),
+					  admin_queue_num),
 			      &mdev->common_len, NULL);
 	if (!mdev->common)
 		goto err_map_common;
@@ -719,6 +723,24 @@ void __iomem *vp_modern_map_vq_notify(struct virtio_pci_modern_device *mdev,
 }
 EXPORT_SYMBOL_GPL(vp_modern_map_vq_notify);
 
+u16 vp_modern_avq_num(struct virtio_pci_modern_device *mdev)
+{
+	struct virtio_pci_modern_common_cfg __iomem *cfg;
+
+	cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common;
+	return vp_ioread16(&cfg->admin_queue_num);
+}
+EXPORT_SYMBOL_GPL(vp_modern_avq_num);
+
+u16 vp_modern_avq_index(struct virtio_pci_modern_device *mdev)
+{
+	struct virtio_pci_modern_common_cfg __iomem *cfg;
+
+	cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common;
+	return vp_ioread16(&cfg->admin_queue_index);
+}
+EXPORT_SYMBOL_GPL(vp_modern_avq_index);
+
 MODULE_VERSION("0.1");
 MODULE_DESCRIPTION("Modern Virtio PCI Device");
 MODULE_AUTHOR("Jason Wang <jasowang@redhat.com>");
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index 2b3438de2c4d..da9b271b54db 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -93,6 +93,8 @@ typedef void vq_callback_t(struct virtqueue *);
  *	Returns 0 on success or error status
  *	If disable_vq_and_reset is set, then enable_vq_after_reset must also be
  *	set.
+ * @create_avq: create admin virtqueue resource.
+ * @destroy_avq: destroy admin virtqueue resource.
  */
 struct virtio_config_ops {
 	void (*get)(struct virtio_device *vdev, unsigned offset,
@@ -120,6 +122,8 @@ struct virtio_config_ops {
 			       struct virtio_shm_region *region, u8 id);
 	int (*disable_vq_and_reset)(struct virtqueue *vq);
 	int (*enable_vq_after_reset)(struct virtqueue *vq);
+	int (*create_avq)(struct virtio_device *vdev);
+	void (*destroy_avq)(struct virtio_device *vdev);
 };
 
 /* If driver didn't advertise the feature, it will never appear. */
diff --git a/include/linux/virtio_pci_modern.h b/include/linux/virtio_pci_modern.h
index a09e13a577a9..c0b1b1ca1163 100644
--- a/include/linux/virtio_pci_modern.h
+++ b/include/linux/virtio_pci_modern.h
@@ -125,4 +125,6 @@ int vp_modern_probe(struct virtio_pci_modern_device *mdev);
 void vp_modern_remove(struct virtio_pci_modern_device *mdev);
 int vp_modern_get_queue_reset(struct virtio_pci_modern_device *mdev, u16 index);
 void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index);
+u16 vp_modern_avq_num(struct virtio_pci_modern_device *mdev);
+u16 vp_modern_avq_index(struct virtio_pci_modern_device *mdev);
 #endif
diff --git a/include/uapi/linux/virtio_pci.h b/include/uapi/linux/virtio_pci.h
index 44f4dd2add18..240ddeef7eae 100644
--- a/include/uapi/linux/virtio_pci.h
+++ b/include/uapi/linux/virtio_pci.h
@@ -175,6 +175,9 @@ struct virtio_pci_modern_common_cfg {
 
 	__le16 queue_notify_data;	/* read-write */
 	__le16 queue_reset;		/* read-write */
+
+	__le16 admin_queue_index;	/* read-only */
+	__le16 admin_queue_num;		/* read-only */
 };
 
 /* Fields in VIRTIO_PCI_CAP_PCI_CFG: */
@@ -215,6 +218,8 @@ struct virtio_pci_cfg_cap {
 #define VIRTIO_PCI_COMMON_Q_USEDHI	52
 #define VIRTIO_PCI_COMMON_Q_NDATA	56
 #define VIRTIO_PCI_COMMON_Q_RESET	58
+#define VIRTIO_PCI_COMMON_ADM_Q_IDX	60
+#define VIRTIO_PCI_COMMON_ADM_Q_NUM	62
 
 #endif /* VIRTIO_PCI_NO_MODERN */
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH V4 vfio 3/9] virtio-pci: Introduce admin command sending function
  2023-11-29 14:37 [PATCH V4 vfio 0/9] Introduce a vfio driver over virtio devices Yishai Hadas
  2023-11-29 14:37 ` [PATCH V4 vfio 1/9] virtio: Define feature bit for administration virtqueue Yishai Hadas
  2023-11-29 14:37 ` [PATCH V4 vfio 2/9] virtio-pci: Introduce admin virtqueue Yishai Hadas
@ 2023-11-29 14:37 ` Yishai Hadas
  2023-11-29 14:37 ` [PATCH V4 vfio 4/9] virtio-pci: Introduce admin commands Yishai Hadas
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Yishai Hadas @ 2023-11-29 14:37 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, si-wei.liu, leonro, yishaih, maorg

From: Feng Liu <feliu@nvidia.com>

Add support for sending admin command through admin virtqueue interface.
Abort any inflight admin commands once device reset completes. Activate
admin queue when device becomes ready; deactivate on device reset.

To comply to the below specification statement [1], the admin virtqueue
is activated for upper layer users only after setting DRIVER_OK status.

[1] The driver MUST NOT send any buffer available notifications to the
device before setting DRIVER_OK.

Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/virtio/virtio_pci_common.h |   3 +
 drivers/virtio/virtio_pci_modern.c | 143 ++++++++++++++++++++++++++++-
 include/linux/virtio.h             |   8 ++
 include/uapi/linux/virtio_pci.h    |  22 +++++
 4 files changed, 174 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
index 7306128e63e9..a50a58014c9f 100644
--- a/drivers/virtio/virtio_pci_common.h
+++ b/drivers/virtio/virtio_pci_common.h
@@ -29,6 +29,7 @@
 #include <linux/virtio_pci_modern.h>
 #include <linux/highmem.h>
 #include <linux/spinlock.h>
+#include <linux/mutex.h>
 
 struct virtio_pci_vq_info {
 	/* the actual virtqueue */
@@ -44,6 +45,8 @@ struct virtio_pci_vq_info {
 struct virtio_pci_admin_vq {
 	/* Virtqueue info associated with this admin queue. */
 	struct virtio_pci_vq_info info;
+	/* serializing admin commands execution and virtqueue deletion */
+	struct mutex cmd_lock;
 	/* Name of the admin queue: avq.$vq_index. */
 	char name[10];
 	u16 vq_index;
diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
index ce915018b5b0..18366a82408c 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -38,6 +38,132 @@ static bool vp_is_avq(struct virtio_device *vdev, unsigned int index)
 	return index == vp_dev->admin_vq.vq_index;
 }
 
+static int virtqueue_exec_admin_cmd(struct virtio_pci_admin_vq *admin_vq,
+				    struct scatterlist **sgs,
+				    unsigned int out_num,
+				    unsigned int in_num,
+				    void *data)
+{
+	struct virtqueue *vq;
+	int ret, len;
+
+	vq = admin_vq->info.vq;
+	if (!vq)
+		return -EIO;
+
+	ret = virtqueue_add_sgs(vq, sgs, out_num, in_num, data, GFP_KERNEL);
+	if (ret < 0)
+		return -EIO;
+
+	if (unlikely(!virtqueue_kick(vq)))
+		return -EIO;
+
+	while (!virtqueue_get_buf(vq, &len) &&
+	       !virtqueue_is_broken(vq))
+		cpu_relax();
+
+	if (virtqueue_is_broken(vq))
+		return -EIO;
+
+	return 0;
+}
+
+static int vp_modern_admin_cmd_exec(struct virtio_device *vdev,
+				    struct virtio_admin_cmd *cmd)
+{
+	struct scatterlist *sgs[VIRTIO_AVQ_SGS_MAX], hdr, stat;
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	struct virtio_admin_cmd_status *va_status;
+	unsigned int out_num = 0, in_num = 0;
+	struct virtio_admin_cmd_hdr *va_hdr;
+	u16 status;
+	int ret;
+
+	if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ))
+		return -EOPNOTSUPP;
+
+	va_status = kzalloc(sizeof(*va_status), GFP_KERNEL);
+	if (!va_status)
+		return -ENOMEM;
+
+	va_hdr = kzalloc(sizeof(*va_hdr), GFP_KERNEL);
+	if (!va_hdr) {
+		ret = -ENOMEM;
+		goto err_alloc;
+	}
+
+	va_hdr->opcode = cmd->opcode;
+	va_hdr->group_type = cmd->group_type;
+	va_hdr->group_member_id = cmd->group_member_id;
+
+	/* Add header */
+	sg_init_one(&hdr, va_hdr, sizeof(*va_hdr));
+	sgs[out_num] = &hdr;
+	out_num++;
+
+	if (cmd->data_sg) {
+		sgs[out_num] = cmd->data_sg;
+		out_num++;
+	}
+
+	/* Add return status */
+	sg_init_one(&stat, va_status, sizeof(*va_status));
+	sgs[out_num + in_num] = &stat;
+	in_num++;
+
+	if (cmd->result_sg) {
+		sgs[out_num + in_num] = cmd->result_sg;
+		in_num++;
+	}
+
+	mutex_lock(&vp_dev->admin_vq.cmd_lock);
+	ret = virtqueue_exec_admin_cmd(&vp_dev->admin_vq, sgs,
+				       out_num, in_num, sgs);
+	mutex_unlock(&vp_dev->admin_vq.cmd_lock);
+
+	if (ret) {
+		dev_err(&vdev->dev,
+			"Failed to execute command on admin vq: %d\n.", ret);
+		goto err_cmd_exec;
+	}
+
+	status = le16_to_cpu(va_status->status);
+	if (status != VIRTIO_ADMIN_STATUS_OK) {
+		dev_err(&vdev->dev,
+			"admin command error: status(%#x) qualifier(%#x)\n",
+			status, le16_to_cpu(va_status->status_qualifier));
+		ret = -status;
+	}
+
+err_cmd_exec:
+	kfree(va_hdr);
+err_alloc:
+	kfree(va_status);
+	return ret;
+}
+
+static void vp_modern_avq_activate(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	struct virtio_pci_admin_vq *admin_vq = &vp_dev->admin_vq;
+
+	if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ))
+		return;
+
+	__virtqueue_unbreak(admin_vq->info.vq);
+}
+
+static void vp_modern_avq_deactivate(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	struct virtio_pci_admin_vq *admin_vq = &vp_dev->admin_vq;
+
+	if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ))
+		return;
+
+	__virtqueue_break(admin_vq->info.vq);
+}
+
 static void vp_transport_features(struct virtio_device *vdev, u64 features)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
@@ -213,6 +339,8 @@ static void vp_set_status(struct virtio_device *vdev, u8 status)
 	/* We should never be setting status to 0. */
 	BUG_ON(status == 0);
 	vp_modern_set_status(&vp_dev->mdev, status);
+	if (status & VIRTIO_CONFIG_S_DRIVER_OK)
+		vp_modern_avq_activate(vdev);
 }
 
 static void vp_reset(struct virtio_device *vdev)
@@ -229,6 +357,9 @@ static void vp_reset(struct virtio_device *vdev)
 	 */
 	while (vp_modern_get_status(mdev))
 		msleep(1);
+
+	vp_modern_avq_deactivate(vdev);
+
 	/* Flush pending VQ/configuration callbacks. */
 	vp_synchronize_vectors(vdev);
 }
@@ -404,8 +535,11 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
 		goto err;
 	}
 
-	if (is_avq)
+	if (is_avq) {
+		mutex_lock(&vp_dev->admin_vq.cmd_lock);
 		vp_dev->admin_vq.info.vq = vq;
+		mutex_unlock(&vp_dev->admin_vq.cmd_lock);
+	}
 
 	return vq;
 
@@ -442,8 +576,11 @@ static void del_vq(struct virtio_pci_vq_info *info)
 	struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
 	struct virtio_pci_modern_device *mdev = &vp_dev->mdev;
 
-	if (vp_is_avq(&vp_dev->vdev, vq->index))
+	if (vp_is_avq(&vp_dev->vdev, vq->index)) {
+		mutex_lock(&vp_dev->admin_vq.cmd_lock);
 		vp_dev->admin_vq.info.vq = NULL;
+		mutex_unlock(&vp_dev->admin_vq.cmd_lock);
+	}
 
 	if (vp_dev->msix_enabled)
 		vp_modern_queue_vector(mdev, vq->index,
@@ -662,6 +799,7 @@ int virtio_pci_modern_probe(struct virtio_pci_device *vp_dev)
 	vp_dev->isr = mdev->isr;
 	vp_dev->vdev.id = mdev->id;
 
+	mutex_init(&vp_dev->admin_vq.cmd_lock);
 	return 0;
 }
 
@@ -669,5 +807,6 @@ void virtio_pci_modern_remove(struct virtio_pci_device *vp_dev)
 {
 	struct virtio_pci_modern_device *mdev = &vp_dev->mdev;
 
+	mutex_destroy(&vp_dev->admin_vq.cmd_lock);
 	vp_modern_remove(mdev);
 }
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 4cc614a38376..b0201747a263 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -103,6 +103,14 @@ int virtqueue_resize(struct virtqueue *vq, u32 num,
 int virtqueue_reset(struct virtqueue *vq,
 		    void (*recycle)(struct virtqueue *vq, void *buf));
 
+struct virtio_admin_cmd {
+	__le16 opcode;
+	__le16 group_type;
+	__le64 group_member_id;
+	struct scatterlist *data_sg;
+	struct scatterlist *result_sg;
+};
+
 /**
  * struct virtio_device - representation of a device using virtio
  * @index: unique position on the virtio bus
diff --git a/include/uapi/linux/virtio_pci.h b/include/uapi/linux/virtio_pci.h
index 240ddeef7eae..187fd9e34a30 100644
--- a/include/uapi/linux/virtio_pci.h
+++ b/include/uapi/linux/virtio_pci.h
@@ -223,4 +223,26 @@ struct virtio_pci_cfg_cap {
 
 #endif /* VIRTIO_PCI_NO_MODERN */
 
+/* Admin command status. */
+#define VIRTIO_ADMIN_STATUS_OK		0
+
+struct __packed virtio_admin_cmd_hdr {
+	__le16 opcode;
+	/*
+	 * 1 - SR-IOV
+	 * 2-65535 - reserved
+	 */
+	__le16 group_type;
+	/* Unused, reserved for future extensions. */
+	__u8 reserved1[12];
+	__le64 group_member_id;
+};
+
+struct __packed virtio_admin_cmd_status {
+	__le16 status;
+	__le16 status_qualifier;
+	/* Unused, reserved for future extensions. */
+	__u8 reserved2[4];
+};
+
 #endif
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH V4 vfio 4/9] virtio-pci: Introduce admin commands
  2023-11-29 14:37 [PATCH V4 vfio 0/9] Introduce a vfio driver over virtio devices Yishai Hadas
                   ` (2 preceding siblings ...)
  2023-11-29 14:37 ` [PATCH V4 vfio 3/9] virtio-pci: Introduce admin command sending function Yishai Hadas
@ 2023-11-29 14:37 ` Yishai Hadas
  2023-11-30  9:52   ` Michael S. Tsirkin
  2023-11-29 14:37 ` [PATCH V4 vfio 5/9] virtio-pci: Initialize the supported " Yishai Hadas
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: Yishai Hadas @ 2023-11-29 14:37 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, si-wei.liu, leonro, yishaih, maorg

From: Feng Liu <feliu@nvidia.com>

Introduces admin commands, as follow:

The "list query" command can be used by the driver to query the
set of admin commands supported by the virtio device.
The "list use" command is used to inform the virtio device which
admin commands the driver will use.
The "legacy common cfg rd/wr" commands are used to read from/write
into the legacy common configuration structure.
The "legacy dev cfg rd/wr" commands are used to read from/write
into the legacy device configuration structure.
The "notify info" command is used to query the notification region
information.

Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 include/uapi/linux/virtio_pci.h | 44 +++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/include/uapi/linux/virtio_pci.h b/include/uapi/linux/virtio_pci.h
index 187fd9e34a30..f920537f5541 100644
--- a/include/uapi/linux/virtio_pci.h
+++ b/include/uapi/linux/virtio_pci.h
@@ -226,6 +226,23 @@ struct virtio_pci_cfg_cap {
 /* Admin command status. */
 #define VIRTIO_ADMIN_STATUS_OK		0
 
+/* Admin command opcode. */
+#define VIRTIO_ADMIN_CMD_LIST_QUERY	0x0
+#define VIRTIO_ADMIN_CMD_LIST_USE	0x1
+
+/* Admin command group type. */
+#define VIRTIO_ADMIN_GROUP_TYPE_SRIOV	0x1
+
+/* Transitional device admin command. */
+#define VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE	0x2
+#define VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ		0x3
+#define VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE		0x4
+#define VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ		0x5
+#define VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO		0x6
+
+/* Increment MAX_OPCODE to next value when new opcode is added */
+#define VIRTIO_ADMIN_MAX_CMD_OPCODE			0x6
+
 struct __packed virtio_admin_cmd_hdr {
 	__le16 opcode;
 	/*
@@ -245,4 +262,31 @@ struct __packed virtio_admin_cmd_status {
 	__u8 reserved2[4];
 };
 
+struct __packed virtio_admin_cmd_legacy_wr_data {
+	__u8 offset; /* Starting offset of the register(s) to write. */
+	__u8 reserved[7];
+	__u8 registers[];
+};
+
+struct __packed virtio_admin_cmd_legacy_rd_data {
+	__u8 offset; /* Starting offset of the register(s) to read. */
+};
+
+#define VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END 0
+#define VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_DEV 0x1
+#define VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM 0x2
+
+#define VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO 4
+
+struct __packed virtio_admin_cmd_notify_info_data {
+	__u8 flags; /* 0 = end of list, 1 = owner device, 2 = member device */
+	__u8 bar; /* BAR of the member or the owner device */
+	__u8 padding[6];
+	__le64 offset; /* Offset within bar. */
+};
+
+struct virtio_admin_cmd_notify_info_result {
+	struct virtio_admin_cmd_notify_info_data entries[VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO];
+};
+
 #endif
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH V4 vfio 5/9] virtio-pci: Initialize the supported admin commands
  2023-11-29 14:37 [PATCH V4 vfio 0/9] Introduce a vfio driver over virtio devices Yishai Hadas
                   ` (3 preceding siblings ...)
  2023-11-29 14:37 ` [PATCH V4 vfio 4/9] virtio-pci: Introduce admin commands Yishai Hadas
@ 2023-11-29 14:37 ` Yishai Hadas
  2023-11-29 14:37 ` [PATCH V4 vfio 6/9] virtio-pci: Introduce APIs to execute legacy IO " Yishai Hadas
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Yishai Hadas @ 2023-11-29 14:37 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, si-wei.liu, leonro, yishaih, maorg

Initialize the supported admin commands upon activating the admin queue.

The supported commands are saved as part of the admin queue context.

Next patches in this series will expose APIs to use them.

Reviewed-by: Feng Liu <feliu@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/virtio/virtio_pci_common.h |  1 +
 drivers/virtio/virtio_pci_modern.c | 48 ++++++++++++++++++++++++++++--
 2 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
index a50a58014c9f..2e3ae417519d 100644
--- a/drivers/virtio/virtio_pci_common.h
+++ b/drivers/virtio/virtio_pci_common.h
@@ -47,6 +47,7 @@ struct virtio_pci_admin_vq {
 	struct virtio_pci_vq_info info;
 	/* serializing admin commands execution and virtqueue deletion */
 	struct mutex cmd_lock;
+	u64 supported_cmds;
 	/* Name of the admin queue: avq.$vq_index. */
 	char name[10];
 	u16 vq_index;
diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
index 18366a82408c..951014dbb086 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -39,6 +39,7 @@ static bool vp_is_avq(struct virtio_device *vdev, unsigned int index)
 }
 
 static int virtqueue_exec_admin_cmd(struct virtio_pci_admin_vq *admin_vq,
+				    u16 opcode,
 				    struct scatterlist **sgs,
 				    unsigned int out_num,
 				    unsigned int in_num,
@@ -51,6 +52,11 @@ static int virtqueue_exec_admin_cmd(struct virtio_pci_admin_vq *admin_vq,
 	if (!vq)
 		return -EIO;
 
+	if (opcode != VIRTIO_ADMIN_CMD_LIST_QUERY &&
+	    opcode != VIRTIO_ADMIN_CMD_LIST_USE &&
+	    !((1ULL << opcode) & admin_vq->supported_cmds))
+		return -EOPNOTSUPP;
+
 	ret = virtqueue_add_sgs(vq, sgs, out_num, in_num, data, GFP_KERNEL);
 	if (ret < 0)
 		return -EIO;
@@ -117,8 +123,9 @@ static int vp_modern_admin_cmd_exec(struct virtio_device *vdev,
 	}
 
 	mutex_lock(&vp_dev->admin_vq.cmd_lock);
-	ret = virtqueue_exec_admin_cmd(&vp_dev->admin_vq, sgs,
-				       out_num, in_num, sgs);
+	ret = virtqueue_exec_admin_cmd(&vp_dev->admin_vq,
+				       le16_to_cpu(cmd->opcode),
+				       sgs, out_num, in_num, sgs);
 	mutex_unlock(&vp_dev->admin_vq.cmd_lock);
 
 	if (ret) {
@@ -142,6 +149,42 @@ static int vp_modern_admin_cmd_exec(struct virtio_device *vdev,
 	return ret;
 }
 
+static void virtio_pci_admin_cmd_list_init(struct virtio_device *virtio_dev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(virtio_dev);
+	struct virtio_admin_cmd cmd = {};
+	struct scatterlist result_sg;
+	struct scatterlist data_sg;
+	__le64 *data;
+	int ret;
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	if (!data)
+		return;
+
+	sg_init_one(&result_sg, data, sizeof(*data));
+	cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_QUERY);
+	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
+	cmd.result_sg = &result_sg;
+
+	ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
+	if (ret)
+		goto end;
+
+	sg_init_one(&data_sg, data, sizeof(*data));
+	cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_USE);
+	cmd.data_sg = &data_sg;
+	cmd.result_sg = NULL;
+
+	ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
+	if (ret)
+		goto end;
+
+	vp_dev->admin_vq.supported_cmds = le64_to_cpu(*data);
+end:
+	kfree(data);
+}
+
 static void vp_modern_avq_activate(struct virtio_device *vdev)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
@@ -151,6 +194,7 @@ static void vp_modern_avq_activate(struct virtio_device *vdev)
 		return;
 
 	__virtqueue_unbreak(admin_vq->info.vq);
+	virtio_pci_admin_cmd_list_init(vdev);
 }
 
 static void vp_modern_avq_deactivate(struct virtio_device *vdev)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH V4 vfio 6/9] virtio-pci: Introduce APIs to execute legacy IO admin commands
  2023-11-29 14:37 [PATCH V4 vfio 0/9] Introduce a vfio driver over virtio devices Yishai Hadas
                   ` (4 preceding siblings ...)
  2023-11-29 14:37 ` [PATCH V4 vfio 5/9] virtio-pci: Initialize the supported " Yishai Hadas
@ 2023-11-29 14:37 ` Yishai Hadas
  2023-11-29 14:37 ` [PATCH V4 vfio 7/9] vfio/pci: Expose vfio_pci_core_setup_barmap() Yishai Hadas
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Yishai Hadas @ 2023-11-29 14:37 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, si-wei.liu, leonro, yishaih, maorg

Introduce APIs to execute legacy IO admin commands.

It includes: io_legacy_read/write for both common and the device
configuration, io_legacy_notify_info.

In addition, exposing an API to check whether the legacy IO commands are
supported. (i.e. virtio_pci_admin_has_legacy_io()).

Those APIs will be used by the next patches from this series.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/virtio/virtio_pci_common.c |  11 ++
 drivers/virtio/virtio_pci_common.h |   2 +
 drivers/virtio/virtio_pci_modern.c | 245 +++++++++++++++++++++++++++++
 include/linux/virtio_pci_admin.h   |  21 +++
 4 files changed, 279 insertions(+)
 create mode 100644 include/linux/virtio_pci_admin.h

diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
index fafd13d0e4d4..9f93330fc2cc 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -645,6 +645,17 @@ static struct pci_driver virtio_pci_driver = {
 	.sriov_configure = virtio_pci_sriov_configure,
 };
 
+struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev)
+{
+	struct virtio_pci_device *pf_vp_dev;
+
+	pf_vp_dev = pci_iov_get_pf_drvdata(pdev, &virtio_pci_driver);
+	if (IS_ERR(pf_vp_dev))
+		return NULL;
+
+	return &pf_vp_dev->vdev;
+}
+
 module_pci_driver(virtio_pci_driver);
 
 MODULE_AUTHOR("Anthony Liguori <aliguori@us.ibm.com>");
diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
index 2e3ae417519d..af676b3b9907 100644
--- a/drivers/virtio/virtio_pci_common.h
+++ b/drivers/virtio/virtio_pci_common.h
@@ -156,4 +156,6 @@ static inline void virtio_pci_legacy_remove(struct virtio_pci_device *vp_dev)
 int virtio_pci_modern_probe(struct virtio_pci_device *);
 void virtio_pci_modern_remove(struct virtio_pci_device *);
 
+struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev);
+
 #endif
diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
index 951014dbb086..37a0035f8381 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -15,6 +15,7 @@
  */
 
 #include <linux/delay.h>
+#include <linux/virtio_pci_admin.h>
 #define VIRTIO_PCI_NO_LEGACY
 #define VIRTIO_RING_NO_LEGACY
 #include "virtio_pci_common.h"
@@ -774,6 +775,250 @@ static void vp_modern_destroy_avq(struct virtio_device *vdev)
 	vp_dev->del_vq(&vp_dev->admin_vq.info);
 }
 
+#define VIRTIO_LEGACY_ADMIN_CMD_BITMAP \
+	(BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE) | \
+	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ) | \
+	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE) | \
+	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
+	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
+
+/*
+ * virtio_pci_admin_has_legacy_io - Checks whether the legacy IO
+ * commands are supported
+ * @dev: VF pci_dev
+ *
+ * Returns true on success.
+ */
+bool virtio_pci_admin_has_legacy_io(struct pci_dev *pdev)
+{
+	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
+	struct virtio_pci_device *vp_dev;
+
+	if (!virtio_dev)
+		return false;
+
+	if (!virtio_has_feature(virtio_dev, VIRTIO_F_ADMIN_VQ))
+		return false;
+
+	vp_dev = to_vp_device(virtio_dev);
+
+	if ((vp_dev->admin_vq.supported_cmds & VIRTIO_LEGACY_ADMIN_CMD_BITMAP) ==
+		VIRTIO_LEGACY_ADMIN_CMD_BITMAP)
+		return true;
+	return false;
+}
+EXPORT_SYMBOL_GPL(virtio_pci_admin_has_legacy_io);
+
+static int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
+					    u8 offset, u8 size, u8 *buf)
+{
+	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
+	struct virtio_admin_cmd_legacy_wr_data *data;
+	struct virtio_admin_cmd cmd = {};
+	struct scatterlist data_sg;
+	int vf_id;
+	int ret;
+
+	if (!virtio_dev)
+		return -ENODEV;
+
+	vf_id = pci_iov_vf_id(pdev);
+	if (vf_id < 0)
+		return vf_id;
+
+	data = kzalloc(sizeof(*data) + size, GFP_KERNEL);
+	if (!data)
+		return -ENOMEM;
+
+	data->offset = offset;
+	memcpy(data->registers, buf, size);
+	sg_init_one(&data_sg, data, sizeof(*data) + size);
+	cmd.opcode = cpu_to_le16(opcode);
+	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
+	cmd.group_member_id = cpu_to_le64(vf_id + 1);
+	cmd.data_sg = &data_sg;
+	ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
+
+	kfree(data);
+	return ret;
+}
+
+/*
+ * virtio_pci_admin_legacy_io_write_common - Write legacy common configuration
+ * of a member device
+ * @dev: VF pci_dev
+ * @offset: starting byte offset within the common configuration area to write to
+ * @size: size of the data to write
+ * @buf: buffer which holds the data
+ *
+ * Note: caller must serialize access for the given device.
+ * Returns 0 on success, or negative on failure.
+ */
+int virtio_pci_admin_legacy_common_io_write(struct pci_dev *pdev, u8 offset,
+					    u8 size, u8 *buf)
+{
+	return virtio_pci_admin_legacy_io_write(pdev,
+					VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE,
+					offset, size, buf);
+}
+EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_common_io_write);
+
+/*
+ * virtio_pci_admin_legacy_io_write_device - Write legacy device configuration
+ * of a member device
+ * @dev: VF pci_dev
+ * @offset: starting byte offset within the device configuration area to write to
+ * @size: size of the data to write
+ * @buf: buffer which holds the data
+ *
+ * Note: caller must serialize access for the given device.
+ * Returns 0 on success, or negative on failure.
+ */
+int virtio_pci_admin_legacy_device_io_write(struct pci_dev *pdev, u8 offset,
+					    u8 size, u8 *buf)
+{
+	return virtio_pci_admin_legacy_io_write(pdev,
+					VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE,
+					offset, size, buf);
+}
+EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_device_io_write);
+
+static int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
+					   u8 offset, u8 size, u8 *buf)
+{
+	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
+	struct virtio_admin_cmd_legacy_rd_data *data;
+	struct scatterlist data_sg, result_sg;
+	struct virtio_admin_cmd cmd = {};
+	int vf_id;
+	int ret;
+
+	if (!virtio_dev)
+		return -ENODEV;
+
+	vf_id = pci_iov_vf_id(pdev);
+	if (vf_id < 0)
+		return vf_id;
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	if (!data)
+		return -ENOMEM;
+
+	data->offset = offset;
+	sg_init_one(&data_sg, data, sizeof(*data));
+	sg_init_one(&result_sg, buf, size);
+	cmd.opcode = cpu_to_le16(opcode);
+	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
+	cmd.group_member_id = cpu_to_le64(vf_id + 1);
+	cmd.data_sg = &data_sg;
+	cmd.result_sg = &result_sg;
+	ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
+
+	kfree(data);
+	return ret;
+}
+
+/*
+ * virtio_pci_admin_legacy_device_io_read - Read legacy device configuration of
+ * a member device
+ * @dev: VF pci_dev
+ * @offset: starting byte offset within the device configuration area to read from
+ * @size: size of the data to be read
+ * @buf: buffer to hold the returned data
+ *
+ * Note: caller must serialize access for the given device.
+ * Returns 0 on success, or negative on failure.
+ */
+int virtio_pci_admin_legacy_device_io_read(struct pci_dev *pdev, u8 offset,
+					   u8 size, u8 *buf)
+{
+	return virtio_pci_admin_legacy_io_read(pdev,
+					VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ,
+					offset, size, buf);
+}
+EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_device_io_read);
+
+/*
+ * virtio_pci_admin_legacy_common_io_read - Read legacy common configuration of
+ * a member device
+ * @dev: VF pci_dev
+ * @offset: starting byte offset within the common configuration area to read from
+ * @size: size of the data to be read
+ * @buf: buffer to hold the returned data
+ *
+ * Note: caller must serialize access for the given device.
+ * Returns 0 on success, or negative on failure.
+ */
+int virtio_pci_admin_legacy_common_io_read(struct pci_dev *pdev, u8 offset,
+					   u8 size, u8 *buf)
+{
+	return virtio_pci_admin_legacy_io_read(pdev,
+					VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ,
+					offset, size, buf);
+}
+EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_common_io_read);
+
+/*
+ * virtio_pci_admin_legacy_io_notify_info - Read the queue notification
+ * information for legacy interface
+ * @dev: VF pci_dev
+ * @req_bar_flags: requested bar flags
+ * @bar: on output the BAR number of the owner or member device
+ * @bar_offset: on output the offset within bar
+ *
+ * Returns 0 on success, or negative on failure.
+ */
+int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
+					   u8 req_bar_flags, u8 *bar,
+					   u64 *bar_offset)
+{
+	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
+	struct virtio_admin_cmd_notify_info_result *result;
+	struct virtio_admin_cmd cmd = {};
+	struct scatterlist result_sg;
+	int vf_id;
+	int ret;
+
+	if (!virtio_dev)
+		return -ENODEV;
+
+	vf_id = pci_iov_vf_id(pdev);
+	if (vf_id < 0)
+		return vf_id;
+
+	result = kzalloc(sizeof(*result), GFP_KERNEL);
+	if (!result)
+		return -ENOMEM;
+
+	sg_init_one(&result_sg, result, sizeof(*result));
+	cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO);
+	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
+	cmd.group_member_id = cpu_to_le64(vf_id + 1);
+	cmd.result_sg = &result_sg;
+	ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
+	if (!ret) {
+		struct virtio_admin_cmd_notify_info_data *entry;
+		int i;
+
+		ret = -ENOENT;
+		for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
+			entry = &result->entries[i];
+			if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
+				break;
+			if (entry->flags != req_bar_flags)
+				continue;
+			*bar = entry->bar;
+			*bar_offset = le64_to_cpu(entry->offset);
+			ret = 0;
+			break;
+		}
+	}
+
+	kfree(result);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_notify_info);
+
 static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
 	.get		= NULL,
 	.set		= NULL,
diff --git a/include/linux/virtio_pci_admin.h b/include/linux/virtio_pci_admin.h
new file mode 100644
index 000000000000..446ced8cb050
--- /dev/null
+++ b/include/linux/virtio_pci_admin.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_VIRTIO_PCI_ADMIN_H
+#define _LINUX_VIRTIO_PCI_ADMIN_H
+
+#include <linux/types.h>
+#include <linux/pci.h>
+
+bool virtio_pci_admin_has_legacy_io(struct pci_dev *pdev);
+int virtio_pci_admin_legacy_common_io_write(struct pci_dev *pdev, u8 offset,
+					    u8 size, u8 *buf);
+int virtio_pci_admin_legacy_common_io_read(struct pci_dev *pdev, u8 offset,
+					   u8 size, u8 *buf);
+int virtio_pci_admin_legacy_device_io_write(struct pci_dev *pdev, u8 offset,
+					    u8 size, u8 *buf);
+int virtio_pci_admin_legacy_device_io_read(struct pci_dev *pdev, u8 offset,
+					   u8 size, u8 *buf);
+int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
+					   u8 req_bar_flags, u8 *bar,
+					   u64 *bar_offset);
+
+#endif /* _LINUX_VIRTIO_PCI_ADMIN_H */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH V4 vfio 7/9] vfio/pci: Expose vfio_pci_core_setup_barmap()
  2023-11-29 14:37 [PATCH V4 vfio 0/9] Introduce a vfio driver over virtio devices Yishai Hadas
                   ` (5 preceding siblings ...)
  2023-11-29 14:37 ` [PATCH V4 vfio 6/9] virtio-pci: Introduce APIs to execute legacy IO " Yishai Hadas
@ 2023-11-29 14:37 ` Yishai Hadas
  2023-11-29 14:37 ` [PATCH V4 vfio 8/9] vfio/pci: Expose vfio_pci_iowrite/read##size() Yishai Hadas
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Yishai Hadas @ 2023-11-29 14:37 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, si-wei.liu, leonro, yishaih, maorg

Expose vfio_pci_core_setup_barmap() to be used by drivers.

This will let drivers to mmap a BAR and re-use it from both vfio and the
driver when it's applicable.

This API will be used in the next patches by the vfio/virtio coming
driver.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 25 +++++++++++++++++++++++++
 drivers/vfio/pci/vfio_pci_rdwr.c | 28 ++--------------------------
 include/linux/vfio_pci_core.h    |  1 +
 3 files changed, 28 insertions(+), 26 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 1929103ee59a..ebea39836dd9 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -684,6 +684,31 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
 }
 EXPORT_SYMBOL_GPL(vfio_pci_core_disable);
 
+int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar)
+{
+	struct pci_dev *pdev = vdev->pdev;
+	void __iomem *io;
+	int ret;
+
+	if (vdev->barmap[bar])
+		return 0;
+
+	ret = pci_request_selected_regions(pdev, 1 << bar, "vfio");
+	if (ret)
+		return ret;
+
+	io = pci_iomap(pdev, bar, 0);
+	if (!io) {
+		pci_release_selected_regions(pdev, 1 << bar);
+		return -ENOMEM;
+	}
+
+	vdev->barmap[bar] = io;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(vfio_pci_core_setup_barmap);
+
 void vfio_pci_core_close_device(struct vfio_device *core_vdev)
 {
 	struct vfio_pci_core_device *vdev =
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index e27de61ac9fe..6f08b3ecbb89 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -200,30 +200,6 @@ static ssize_t do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem,
 	return done;
 }
 
-static int vfio_pci_setup_barmap(struct vfio_pci_core_device *vdev, int bar)
-{
-	struct pci_dev *pdev = vdev->pdev;
-	int ret;
-	void __iomem *io;
-
-	if (vdev->barmap[bar])
-		return 0;
-
-	ret = pci_request_selected_regions(pdev, 1 << bar, "vfio");
-	if (ret)
-		return ret;
-
-	io = pci_iomap(pdev, bar, 0);
-	if (!io) {
-		pci_release_selected_regions(pdev, 1 << bar);
-		return -ENOMEM;
-	}
-
-	vdev->barmap[bar] = io;
-
-	return 0;
-}
-
 ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
 			size_t count, loff_t *ppos, bool iswrite)
 {
@@ -262,7 +238,7 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
 		}
 		x_end = end;
 	} else {
-		int ret = vfio_pci_setup_barmap(vdev, bar);
+		int ret = vfio_pci_core_setup_barmap(vdev, bar);
 		if (ret) {
 			done = ret;
 			goto out;
@@ -438,7 +414,7 @@ int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset,
 		return -EINVAL;
 #endif
 
-	ret = vfio_pci_setup_barmap(vdev, bar);
+	ret = vfio_pci_core_setup_barmap(vdev, bar);
 	if (ret)
 		return ret;
 
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 562e8754869d..67ac58e20e1d 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -127,6 +127,7 @@ int vfio_pci_core_match(struct vfio_device *core_vdev, char *buf);
 int vfio_pci_core_enable(struct vfio_pci_core_device *vdev);
 void vfio_pci_core_disable(struct vfio_pci_core_device *vdev);
 void vfio_pci_core_finish_enable(struct vfio_pci_core_device *vdev);
+int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar);
 pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev,
 						pci_channel_state_t state);
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH V4 vfio 8/9] vfio/pci: Expose vfio_pci_iowrite/read##size()
  2023-11-29 14:37 [PATCH V4 vfio 0/9] Introduce a vfio driver over virtio devices Yishai Hadas
                   ` (6 preceding siblings ...)
  2023-11-29 14:37 ` [PATCH V4 vfio 7/9] vfio/pci: Expose vfio_pci_core_setup_barmap() Yishai Hadas
@ 2023-11-29 14:37 ` Yishai Hadas
  2023-11-30 19:20   ` Alex Williamson
  2023-11-29 14:37 ` [PATCH V4 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices Yishai Hadas
  2023-11-30 10:07 ` [PATCH V4 vfio 0/9] " Michael S. Tsirkin
  9 siblings, 1 reply; 22+ messages in thread
From: Yishai Hadas @ 2023-11-29 14:37 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, si-wei.liu, leonro, yishaih, maorg

Expose vfio_pci_iowrite/read##size() to let it be used by drivers.

This functionality is needed to enable direct access to some physical
BAR of the device with the proper locks/checks in place.

The next patches from this series will use this functionality on a data
path flow when a direct access to the BAR is needed.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/vfio/pci/vfio_pci_rdwr.c | 10 ++++++----
 include/linux/vfio_pci_core.h    | 19 +++++++++++++++++++
 2 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index 6f08b3ecbb89..817ec9a89123 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -38,7 +38,7 @@
 #define vfio_iowrite8	iowrite8
 
 #define VFIO_IOWRITE(size) \
-static int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev,		\
+int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev,		\
 			bool test_mem, u##size val, void __iomem *io)	\
 {									\
 	if (test_mem) {							\
@@ -55,7 +55,8 @@ static int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev,		\
 		up_read(&vdev->memory_lock);				\
 									\
 	return 0;							\
-}
+}									\
+EXPORT_SYMBOL_GPL(vfio_pci_iowrite##size);
 
 VFIO_IOWRITE(8)
 VFIO_IOWRITE(16)
@@ -65,7 +66,7 @@ VFIO_IOWRITE(64)
 #endif
 
 #define VFIO_IOREAD(size) \
-static int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev,		\
+int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev,		\
 			bool test_mem, u##size *val, void __iomem *io)	\
 {									\
 	if (test_mem) {							\
@@ -82,7 +83,8 @@ static int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev,		\
 		up_read(&vdev->memory_lock);				\
 									\
 	return 0;							\
-}
+}									\
+EXPORT_SYMBOL_GPL(vfio_pci_ioread##size);
 
 VFIO_IOREAD(8)
 VFIO_IOREAD(16)
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 67ac58e20e1d..22c915317788 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -131,4 +131,23 @@ int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar);
 pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev,
 						pci_channel_state_t state);
 
+#define VFIO_IOWRITE_DECLATION(size) \
+int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev,		\
+			bool test_mem, u##size val, void __iomem *io);
+
+VFIO_IOWRITE_DECLATION(8)
+VFIO_IOWRITE_DECLATION(16)
+VFIO_IOWRITE_DECLATION(32)
+#ifdef iowrite64
+VFIO_IOWRITE_DECLATION(64)
+#endif
+
+#define VFIO_IOREAD_DECLATION(size) \
+int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev,		\
+			bool test_mem, u##size *val, void __iomem *io);
+
+VFIO_IOREAD_DECLATION(8)
+VFIO_IOREAD_DECLATION(16)
+VFIO_IOREAD_DECLATION(32)
+
 #endif /* VFIO_PCI_CORE_H */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH V4 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-11-29 14:37 [PATCH V4 vfio 0/9] Introduce a vfio driver over virtio devices Yishai Hadas
                   ` (7 preceding siblings ...)
  2023-11-29 14:37 ` [PATCH V4 vfio 8/9] vfio/pci: Expose vfio_pci_iowrite/read##size() Yishai Hadas
@ 2023-11-29 14:37 ` Yishai Hadas
  2023-11-30 22:10   ` Alex Williamson
  2023-11-30 10:07 ` [PATCH V4 vfio 0/9] " Michael S. Tsirkin
  9 siblings, 1 reply; 22+ messages in thread
From: Yishai Hadas @ 2023-11-29 14:37 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, si-wei.liu, leonro, yishaih, maorg

Introduce a vfio driver over virtio devices to support the legacy
interface functionality for VFs.

Background, from the virtio spec [1].
--------------------------------------------------------------------
In some systems, there is a need to support a virtio legacy driver with
a device that does not directly support the legacy interface. In such
scenarios, a group owner device can provide the legacy interface
functionality for the group member devices. The driver of the owner
device can then access the legacy interface of a member device on behalf
of the legacy member device driver.

For example, with the SR-IOV group type, group members (VFs) can not
present the legacy interface in an I/O BAR in BAR0 as expected by the
legacy pci driver. If the legacy driver is running inside a virtual
machine, the hypervisor executing the virtual machine can present a
virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
legacy driver accesses to this I/O BAR and forwards them to the group
owner device (PF) using group administration commands.
--------------------------------------------------------------------

Specifically, this driver adds support for a virtio-net VF to be exposed
as a transitional device to a guest driver and allows the legacy IO BAR
functionality on top.

This allows a VM which uses a legacy virtio-net driver in the guest to
work transparently over a VF which its driver in the host is that new
driver.

The driver can be extended easily to support some other types of virtio
devices (e.g virtio-blk), by adding in a few places the specific type
properties as was done for virtio-net.

For now, only the virtio-net use case was tested and as such we introduce
the support only for such a device.

Practically,
Upon probing a VF for a virtio-net device, in case its PF supports
legacy access over the virtio admin commands and the VF doesn't have BAR
0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
transitional device with I/O BAR in BAR 0.

The existence of the simulated I/O bar is reported later on by
overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
exposes itself as a transitional device by overwriting some properties
upon reading its config space.

Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
guest may use it via read/write calls according to the virtio
specification.

Any read/write towards the control parts of the BAR will be captured by
the new driver and will be translated into admin commands towards the
device.

Any data path read/write access (i.e. virtio driver notifications) will
be forwarded to the physical BAR which its properties were supplied by
the admin command VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO upon the
probing/init flow.

With that code in place a legacy driver in the guest has the look and
feel as if having a transitional device with legacy support for both its
control and data path flows.

[1]
https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 MAINTAINERS                      |   7 +
 drivers/vfio/pci/Kconfig         |   2 +
 drivers/vfio/pci/Makefile        |   2 +
 drivers/vfio/pci/virtio/Kconfig  |  16 +
 drivers/vfio/pci/virtio/Makefile |   4 +
 drivers/vfio/pci/virtio/main.c   | 554 +++++++++++++++++++++++++++++++
 6 files changed, 585 insertions(+)
 create mode 100644 drivers/vfio/pci/virtio/Kconfig
 create mode 100644 drivers/vfio/pci/virtio/Makefile
 create mode 100644 drivers/vfio/pci/virtio/main.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 012df8ccf34e..b246b769092d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -22872,6 +22872,13 @@ L:	kvm@vger.kernel.org
 S:	Maintained
 F:	drivers/vfio/pci/mlx5/
 
+VFIO VIRTIO PCI DRIVER
+M:	Yishai Hadas <yishaih@nvidia.com>
+L:	kvm@vger.kernel.org
+L:	virtualization@lists.linux-foundation.org
+S:	Maintained
+F:	drivers/vfio/pci/virtio
+
 VFIO PCI DEVICE SPECIFIC DRIVERS
 R:	Jason Gunthorpe <jgg@nvidia.com>
 R:	Yishai Hadas <yishaih@nvidia.com>
diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
index 8125e5f37832..18c397df566d 100644
--- a/drivers/vfio/pci/Kconfig
+++ b/drivers/vfio/pci/Kconfig
@@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
 
 source "drivers/vfio/pci/pds/Kconfig"
 
+source "drivers/vfio/pci/virtio/Kconfig"
+
 endmenu
diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
index 45167be462d8..046139a4eca5 100644
--- a/drivers/vfio/pci/Makefile
+++ b/drivers/vfio/pci/Makefile
@@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
 obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
 
 obj-$(CONFIG_PDS_VFIO_PCI) += pds/
+
+obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
new file mode 100644
index 000000000000..3a6707639220
--- /dev/null
+++ b/drivers/vfio/pci/virtio/Kconfig
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: GPL-2.0-only
+config VIRTIO_VFIO_PCI
+        tristate "VFIO support for VIRTIO NET PCI devices"
+        depends on VIRTIO_PCI
+        select VFIO_PCI_CORE
+        help
+          This provides support for exposing VIRTIO NET VF devices which support
+          legacy IO access, using the VFIO framework that can work with a legacy
+          virtio driver in the guest.
+          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
+          not indicate I/O Space.
+          As of that this driver emulated I/O BAR in software to let a VF be
+          seen as a transitional device in the guest and let it work with
+          a legacy driver.
+
+          If you don't know what to do here, say N.
diff --git a/drivers/vfio/pci/virtio/Makefile b/drivers/vfio/pci/virtio/Makefile
new file mode 100644
index 000000000000..2039b39fb723
--- /dev/null
+++ b/drivers/vfio/pci/virtio/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0-only
+obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio-vfio-pci.o
+virtio-vfio-pci-y := main.o
+
diff --git a/drivers/vfio/pci/virtio/main.c b/drivers/vfio/pci/virtio/main.c
new file mode 100644
index 000000000000..3ca19c891673
--- /dev/null
+++ b/drivers/vfio/pci/virtio/main.c
@@ -0,0 +1,554 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
+ */
+
+#include <linux/device.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/pci.h>
+#include <linux/pm_runtime.h>
+#include <linux/types.h>
+#include <linux/uaccess.h>
+#include <linux/vfio.h>
+#include <linux/vfio_pci_core.h>
+#include <linux/virtio_pci.h>
+#include <linux/virtio_net.h>
+#include <linux/virtio_pci_admin.h>
+
+struct virtiovf_pci_core_device {
+	struct vfio_pci_core_device core_device;
+	u8 bar0_virtual_buf_size;
+	u8 *bar0_virtual_buf;
+	/* synchronize access to the virtual buf */
+	struct mutex bar_mutex;
+	void __iomem *notify_addr;
+	u64 notify_offset;
+	__le32 pci_base_addr_0;
+	__le16 pci_cmd;
+	u8 notify_bar;
+};
+
+static int
+virtiovf_issue_legacy_rw_cmd(struct virtiovf_pci_core_device *virtvdev,
+			     loff_t pos, char __user *buf,
+			     size_t count, bool read)
+{
+	bool msix_enabled =
+		(virtvdev->core_device.irq_type == VFIO_PCI_MSIX_IRQ_INDEX);
+	struct pci_dev *pdev = virtvdev->core_device.pdev;
+	u8 *bar0_buf = virtvdev->bar0_virtual_buf;
+	bool common;
+	u8 offset;
+	int ret;
+
+	common = pos < VIRTIO_PCI_CONFIG_OFF(msix_enabled);
+	/* offset within the relevant configuration area */
+	offset = common ? pos : pos - VIRTIO_PCI_CONFIG_OFF(msix_enabled);
+	mutex_lock(&virtvdev->bar_mutex);
+	if (read) {
+		if (common)
+			ret = virtio_pci_admin_legacy_common_io_read(pdev, offset,
+					count, bar0_buf + pos);
+		else
+			ret = virtio_pci_admin_legacy_device_io_read(pdev, offset,
+					count, bar0_buf + pos);
+		if (ret)
+			goto out;
+		if (copy_to_user(buf, bar0_buf + pos, count))
+			ret = -EFAULT;
+	} else {
+		if (copy_from_user(bar0_buf + pos, buf, count)) {
+			ret = -EFAULT;
+			goto out;
+		}
+
+		if (common)
+			ret = virtio_pci_admin_legacy_common_io_write(pdev, offset,
+					count, bar0_buf + pos);
+		else
+			ret = virtio_pci_admin_legacy_device_io_write(pdev, offset,
+					count, bar0_buf + pos);
+	}
+out:
+	mutex_unlock(&virtvdev->bar_mutex);
+	return ret;
+}
+
+static int
+translate_io_bar_to_mem_bar(struct virtiovf_pci_core_device *virtvdev,
+			    loff_t pos, char __user *buf,
+			    size_t count, bool read)
+{
+	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
+	u16 queue_notify;
+	int ret;
+
+	if (pos + count > virtvdev->bar0_virtual_buf_size)
+		return -EINVAL;
+
+	switch (pos) {
+	case VIRTIO_PCI_QUEUE_NOTIFY:
+		if (count != sizeof(queue_notify))
+			return -EINVAL;
+		if (read) {
+			ret = vfio_pci_ioread16(core_device, true, &queue_notify,
+						virtvdev->notify_addr);
+			if (ret)
+				return ret;
+			if (copy_to_user(buf, &queue_notify,
+					 sizeof(queue_notify)))
+				return -EFAULT;
+		} else {
+			if (copy_from_user(&queue_notify, buf, count))
+				return -EFAULT;
+			ret = vfio_pci_iowrite16(core_device, true, queue_notify,
+						 virtvdev->notify_addr);
+		}
+		break;
+	default:
+		ret = virtiovf_issue_legacy_rw_cmd(virtvdev, pos, buf, count,
+						   read);
+	}
+
+	return ret ? ret : count;
+}
+
+static bool range_intersect_range(loff_t range1_start, size_t count1,
+				  loff_t range2_start, size_t count2,
+				  loff_t *start_offset,
+				  size_t *intersect_count,
+				  size_t *register_offset)
+{
+	if (range1_start <= range2_start &&
+	    range1_start + count1 > range2_start) {
+		*start_offset = range2_start - range1_start;
+		*intersect_count = min_t(size_t, count2,
+					 range1_start + count1 - range2_start);
+		*register_offset = 0;
+		return true;
+	}
+
+	if (range1_start > range2_start &&
+	    range1_start < range2_start + count2) {
+		*start_offset = 0;
+		*intersect_count = min_t(size_t, count1,
+					 range2_start + count2 - range1_start);
+		*register_offset = range1_start - range2_start;
+		return true;
+	}
+
+	return false;
+}
+
+static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
+					char __user *buf, size_t count,
+					loff_t *ppos)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
+	size_t register_offset;
+	loff_t copy_offset;
+	size_t copy_count;
+	__le32 val32;
+	__le16 val16;
+	u8 val8;
+	int ret;
+
+	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
+	if (ret < 0)
+		return ret;
+
+	if (range_intersect_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
+				  &copy_offset, &copy_count, &register_offset)) {
+		val16 = cpu_to_le16(VIRTIO_TRANS_ID_NET);
+		if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset, copy_count))
+			return -EFAULT;
+	}
+
+	if ((le16_to_cpu(virtvdev->pci_cmd) & PCI_COMMAND_IO) &&
+	    range_intersect_range(pos, count, PCI_COMMAND, sizeof(val16),
+				  &copy_offset, &copy_count, &register_offset)) {
+		if (copy_from_user((void *)&val16 + register_offset, buf + copy_offset,
+				   copy_count))
+			return -EFAULT;
+		val16 |= cpu_to_le16(PCI_COMMAND_IO);
+		if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset,
+				 copy_count))
+			return -EFAULT;
+	}
+
+	if (range_intersect_range(pos, count, PCI_REVISION_ID, sizeof(val8),
+				  &copy_offset, &copy_count, &register_offset)) {
+		/* Transional needs to have revision 0 */
+		val8 = 0;
+		if (copy_to_user(buf + copy_offset, &val8, copy_count))
+			return -EFAULT;
+	}
+
+	if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0, sizeof(val32),
+				  &copy_offset, &copy_count, &register_offset)) {
+		u8 log_bar_size = ilog2(roundup_pow_of_two(virtvdev->bar0_virtual_buf_size));
+		u32 mask_size = ~((BIT(log_bar_size) - 1));
+		u32 pci_base_addr_0 = le32_to_cpu(virtvdev->pci_base_addr_0);
+
+		val32 = cpu_to_le32((pci_base_addr_0 & mask_size) | PCI_BASE_ADDRESS_SPACE_IO);
+		if (copy_to_user(buf + copy_offset, (void *)&val32 + register_offset, copy_count))
+			return -EFAULT;
+	}
+
+	if (range_intersect_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16),
+				  &copy_offset, &copy_count, &register_offset)) {
+		/*
+		 * Transitional devices use the PCI subsystem device id as
+		 * virtio device id, same as legacy driver always did.
+		 */
+		val16 = cpu_to_le16(VIRTIO_ID_NET);
+		if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset,
+				 copy_count))
+			return -EFAULT;
+	}
+
+	if (range_intersect_range(pos, count, PCI_SUBSYSTEM_VENDOR_ID, sizeof(val16),
+				  &copy_offset, &copy_count, &register_offset)) {
+		val16 = cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET);
+		if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset,
+				 copy_count))
+			return -EFAULT;
+	}
+
+	return count;
+}
+
+static ssize_t
+virtiovf_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
+		       size_t count, loff_t *ppos)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	struct pci_dev *pdev = virtvdev->core_device.pdev;
+	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
+	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
+	int ret;
+
+	if (!count)
+		return 0;
+
+	if (index == VFIO_PCI_CONFIG_REGION_INDEX)
+		return virtiovf_pci_read_config(core_vdev, buf, count, ppos);
+
+	if (index != VFIO_PCI_BAR0_REGION_INDEX)
+		return vfio_pci_core_read(core_vdev, buf, count, ppos);
+
+	ret = pm_runtime_resume_and_get(&pdev->dev);
+	if (ret) {
+		pci_info_ratelimited(pdev, "runtime resume failed %d\n",
+				     ret);
+		return -EIO;
+	}
+
+	ret = translate_io_bar_to_mem_bar(virtvdev, pos, buf, count, true);
+	pm_runtime_put(&pdev->dev);
+	return ret;
+}
+
+static ssize_t
+virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
+			size_t count, loff_t *ppos)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	struct pci_dev *pdev = virtvdev->core_device.pdev;
+	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
+	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
+	int ret;
+
+	if (!count)
+		return 0;
+
+	if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
+		size_t register_offset;
+		loff_t copy_offset;
+		size_t copy_count;
+
+		if (range_intersect_range(pos, count, PCI_COMMAND, sizeof(virtvdev->pci_cmd),
+					  &copy_offset, &copy_count,
+					  &register_offset)) {
+			if (copy_from_user((void *)&virtvdev->pci_cmd + register_offset,
+					   buf + copy_offset,
+					   copy_count))
+				return -EFAULT;
+		}
+
+		if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0,
+					  sizeof(virtvdev->pci_base_addr_0),
+					  &copy_offset, &copy_count,
+					  &register_offset)) {
+			if (copy_from_user((void *)&virtvdev->pci_base_addr_0 + register_offset,
+					   buf + copy_offset,
+					   copy_count))
+				return -EFAULT;
+		}
+	}
+
+	if (index != VFIO_PCI_BAR0_REGION_INDEX)
+		return vfio_pci_core_write(core_vdev, buf, count, ppos);
+
+	ret = pm_runtime_resume_and_get(&pdev->dev);
+	if (ret) {
+		pci_info_ratelimited(pdev, "runtime resume failed %d\n", ret);
+		return -EIO;
+	}
+
+	ret = translate_io_bar_to_mem_bar(virtvdev, pos, (char __user *)buf, count, false);
+	pm_runtime_put(&pdev->dev);
+	return ret;
+}
+
+static int
+virtiovf_pci_ioctl_get_region_info(struct vfio_device *core_vdev,
+				   unsigned int cmd, unsigned long arg)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	unsigned long minsz = offsetofend(struct vfio_region_info, offset);
+	void __user *uarg = (void __user *)arg;
+	struct vfio_region_info info = {};
+
+	if (copy_from_user(&info, uarg, minsz))
+		return -EFAULT;
+
+	if (info.argsz < minsz)
+		return -EINVAL;
+
+	switch (info.index) {
+	case VFIO_PCI_BAR0_REGION_INDEX:
+		info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
+		info.size = virtvdev->bar0_virtual_buf_size;
+		info.flags = VFIO_REGION_INFO_FLAG_READ |
+			     VFIO_REGION_INFO_FLAG_WRITE;
+		return copy_to_user(uarg, &info, minsz) ? -EFAULT : 0;
+	default:
+		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
+	}
+}
+
+static long
+virtiovf_vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
+			     unsigned long arg)
+{
+	switch (cmd) {
+	case VFIO_DEVICE_GET_REGION_INFO:
+		return virtiovf_pci_ioctl_get_region_info(core_vdev, cmd, arg);
+	default:
+		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
+	}
+}
+
+static int
+virtiovf_set_notify_addr(struct virtiovf_pci_core_device *virtvdev)
+{
+	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
+	int ret;
+
+	/*
+	 * Setup the BAR where the 'notify' exists to be used by vfio as well
+	 * This will let us mmap it only once and use it when needed.
+	 */
+	ret = vfio_pci_core_setup_barmap(core_device,
+					 virtvdev->notify_bar);
+	if (ret)
+		return ret;
+
+	virtvdev->notify_addr = core_device->barmap[virtvdev->notify_bar] +
+			virtvdev->notify_offset;
+	return 0;
+}
+
+static int virtiovf_pci_open_device(struct vfio_device *core_vdev)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	struct vfio_pci_core_device *vdev = &virtvdev->core_device;
+	int ret;
+
+	ret = vfio_pci_core_enable(vdev);
+	if (ret)
+		return ret;
+
+	if (virtvdev->bar0_virtual_buf) {
+		/*
+		 * Upon close_device() the vfio_pci_core_disable() is called
+		 * and will close all the previous mmaps, so it seems that the
+		 * valid life cycle for the 'notify' addr is per open/close.
+		 */
+		ret = virtiovf_set_notify_addr(virtvdev);
+		if (ret) {
+			vfio_pci_core_disable(vdev);
+			return ret;
+		}
+	}
+
+	vfio_pci_core_finish_enable(vdev);
+	return 0;
+}
+
+static int virtiovf_get_device_config_size(unsigned short device)
+{
+	/* Network card */
+	return offsetofend(struct virtio_net_config, status);
+}
+
+static int virtiovf_read_notify_info(struct virtiovf_pci_core_device *virtvdev)
+{
+	u64 offset;
+	int ret;
+	u8 bar;
+
+	ret = virtio_pci_admin_legacy_io_notify_info(virtvdev->core_device.pdev,
+				VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM,
+				&bar, &offset);
+	if (ret)
+		return ret;
+
+	virtvdev->notify_bar = bar;
+	virtvdev->notify_offset = offset;
+	return 0;
+}
+
+static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	struct pci_dev *pdev;
+	int ret;
+
+	ret = vfio_pci_core_init_dev(core_vdev);
+	if (ret)
+		return ret;
+
+	pdev = virtvdev->core_device.pdev;
+	ret = virtiovf_read_notify_info(virtvdev);
+	if (ret)
+		return ret;
+
+	/* Being ready with a buffer that supports MSIX */
+	virtvdev->bar0_virtual_buf_size = VIRTIO_PCI_CONFIG_OFF(true) +
+				virtiovf_get_device_config_size(pdev->device);
+	virtvdev->bar0_virtual_buf = kzalloc(virtvdev->bar0_virtual_buf_size,
+					     GFP_KERNEL);
+	if (!virtvdev->bar0_virtual_buf)
+		return -ENOMEM;
+	mutex_init(&virtvdev->bar_mutex);
+	return 0;
+}
+
+static void virtiovf_pci_core_release_dev(struct vfio_device *core_vdev)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+
+	kfree(virtvdev->bar0_virtual_buf);
+	vfio_pci_core_release_dev(core_vdev);
+}
+
+static const struct vfio_device_ops virtiovf_vfio_pci_tran_ops = {
+	.name = "virtio-vfio-pci-trans",
+	.init = virtiovf_pci_init_device,
+	.release = virtiovf_pci_core_release_dev,
+	.open_device = virtiovf_pci_open_device,
+	.close_device = vfio_pci_core_close_device,
+	.ioctl = virtiovf_vfio_pci_core_ioctl,
+	.read = virtiovf_pci_core_read,
+	.write = virtiovf_pci_core_write,
+	.mmap = vfio_pci_core_mmap,
+	.request = vfio_pci_core_request,
+	.match = vfio_pci_core_match,
+	.bind_iommufd = vfio_iommufd_physical_bind,
+	.unbind_iommufd = vfio_iommufd_physical_unbind,
+	.attach_ioas = vfio_iommufd_physical_attach_ioas,
+};
+
+static const struct vfio_device_ops virtiovf_vfio_pci_ops = {
+	.name = "virtio-vfio-pci",
+	.init = vfio_pci_core_init_dev,
+	.release = vfio_pci_core_release_dev,
+	.open_device = virtiovf_pci_open_device,
+	.close_device = vfio_pci_core_close_device,
+	.ioctl = vfio_pci_core_ioctl,
+	.device_feature = vfio_pci_core_ioctl_feature,
+	.read = vfio_pci_core_read,
+	.write = vfio_pci_core_write,
+	.mmap = vfio_pci_core_mmap,
+	.request = vfio_pci_core_request,
+	.match = vfio_pci_core_match,
+	.bind_iommufd = vfio_iommufd_physical_bind,
+	.unbind_iommufd = vfio_iommufd_physical_unbind,
+	.attach_ioas = vfio_iommufd_physical_attach_ioas,
+};
+
+static bool virtiovf_bar0_exists(struct pci_dev *pdev)
+{
+	struct resource *res = pdev->resource;
+
+	return res->flags ? true : false;
+}
+
+static int virtiovf_pci_probe(struct pci_dev *pdev,
+			      const struct pci_device_id *id)
+{
+	const struct vfio_device_ops *ops = &virtiovf_vfio_pci_ops;
+	struct virtiovf_pci_core_device *virtvdev;
+	int ret;
+
+	if (pdev->is_virtfn && virtio_pci_admin_has_legacy_io(pdev) &&
+	    !virtiovf_bar0_exists(pdev))
+		ops = &virtiovf_vfio_pci_tran_ops;
+
+	virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
+				     &pdev->dev, ops);
+	if (IS_ERR(virtvdev))
+		return PTR_ERR(virtvdev);
+
+	dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
+	ret = vfio_pci_core_register_device(&virtvdev->core_device);
+	if (ret)
+		goto out;
+	return 0;
+out:
+	vfio_put_device(&virtvdev->core_device.vdev);
+	return ret;
+}
+
+static void virtiovf_pci_remove(struct pci_dev *pdev)
+{
+	struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
+
+	vfio_pci_core_unregister_device(&virtvdev->core_device);
+	vfio_put_device(&virtvdev->core_device.vdev);
+}
+
+static const struct pci_device_id virtiovf_pci_table[] = {
+	/* Only virtio-net is supported/tested so far */
+	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1041) },
+	{}
+};
+
+MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
+
+static struct pci_driver virtiovf_pci_driver = {
+	.name = KBUILD_MODNAME,
+	.id_table = virtiovf_pci_table,
+	.probe = virtiovf_pci_probe,
+	.remove = virtiovf_pci_remove,
+	.err_handler = &vfio_pci_core_err_handlers,
+	.driver_managed_dma = true,
+};
+
+module_pci_driver(virtiovf_pci_driver);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
+MODULE_DESCRIPTION(
+	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO NET devices");
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH V4 vfio 4/9] virtio-pci: Introduce admin commands
  2023-11-29 14:37 ` [PATCH V4 vfio 4/9] virtio-pci: Introduce admin commands Yishai Hadas
@ 2023-11-30  9:52   ` Michael S. Tsirkin
  2023-11-30 10:35     ` Yishai Hadas
  0 siblings, 1 reply; 22+ messages in thread
From: Michael S. Tsirkin @ 2023-11-30  9:52 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: alex.williamson, jasowang, jgg, kvm, virtualization, parav,
	feliu, jiri, kevin.tian, joao.m.martins, si-wei.liu, leonro,
	maorg

On Wed, Nov 29, 2023 at 04:37:41PM +0200, Yishai Hadas wrote:
> +/* Transitional device admin command. */
> +#define VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE	0x2
> +#define VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ		0x3
> +#define VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE		0x4
> +#define VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ		0x5
> +#define VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO		0x6
> +
> +/* Increment MAX_OPCODE to next value when new opcode is added */
> +#define VIRTIO_ADMIN_MAX_CMD_OPCODE			0x6

Does anything need VIRTIO_ADMIN_MAX_CMD_OPCODE? Not in this
patchset...

-- 
MST


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V4 vfio 0/9] Introduce a vfio driver over virtio devices
  2023-11-29 14:37 [PATCH V4 vfio 0/9] Introduce a vfio driver over virtio devices Yishai Hadas
                   ` (8 preceding siblings ...)
  2023-11-29 14:37 ` [PATCH V4 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices Yishai Hadas
@ 2023-11-30 10:07 ` Michael S. Tsirkin
  2023-11-30 15:47   ` Yishai Hadas
  9 siblings, 1 reply; 22+ messages in thread
From: Michael S. Tsirkin @ 2023-11-30 10:07 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: alex.williamson, jasowang, jgg, kvm, virtualization, parav,
	feliu, jiri, kevin.tian, joao.m.martins, si-wei.liu, leonro,
	maorg

On Wed, Nov 29, 2023 at 04:37:37PM +0200, Yishai Hadas wrote:
> This series introduce a vfio driver over virtio devices to support the
> legacy interface functionality for VFs.
> 
> Background, from the virtio spec [1].
> --------------------------------------------------------------------
> In some systems, there is a need to support a virtio legacy driver with
> a device that does not directly support the legacy interface. In such
> scenarios, a group owner device can provide the legacy interface
> functionality for the group member devices. The driver of the owner
> device can then access the legacy interface of a member device on behalf
> of the legacy member device driver.
> 
> For example, with the SR-IOV group type, group members (VFs) can not
> present the legacy interface in an I/O BAR in BAR0 as expected by the
> legacy pci driver. If the legacy driver is running inside a virtual
> machine, the hypervisor executing the virtual machine can present a
> virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
> legacy driver accesses to this I/O BAR and forwards them to the group
> owner device (PF) using group administration commands.
> --------------------------------------------------------------------
> 
> The first 6 patches are in the virtio area and handle the below:
> - Introduce the admin virtqueue infrastcture.
> - Expose the layout of the commands that should be used for
>   supporting the legacy access.
> - Expose APIs to enable upper layers as of vfio, net, etc
>   to execute admin commands.
> 
> The above follows the virtio spec that was lastly accepted in that area
> [1].
> 
> The last 3 patches are in the vfio area and handle the below:
> - Expose some APIs from vfio/pci to be used by the vfio/virtio driver.
> - Introduce a vfio driver over virtio devices to support the legacy
>   interface functionality for VFs. 
> 
> The series was tested successfully over virtio-net VFs in the host,
> while running in the guest both modern and legacy drivers.
> 
> [1]
> https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c



I sent a minor question, but besides that, patches 1-6 look ok
Acked-by: Michael S. Tsirkin <mst@redhat.com>

I take no stance on patches 7-9 this is up to Alex.


> Changes from V3: https://www.spinics.net/lists/kvm/msg333008.html
> Virtio:
> - Rebase on top of 6.7 rc3.
> Vfio:
> - Fix a typo, drop 'acc' from 'virtiovf_acc_vfio_pci_tran_ops'.
> 
> Changes from V2: https://lore.kernel.org/all/20231029155952.67686-8-yishaih@nvidia.com/T/
> Virtio:
> - Rebase on top of 6.7 rc1.
> - Add a mutex to serialize admin commands execution and virtqueue
>   deletion, as was suggested by Michael.
> - Remove the 'ref_count' usage which is not needed any more.
> - Reduce the depth of the admin vq to match a single command at a given time.
> - Add a supported check upon command execution and move to use a single
>   flow of virtqueue_exec_admin_cmd().
> - Improve the description of the exported commands to better match the
>   specification and the expected usage as was asked by Michael.
> 
> Vfio:
> - Upon calling to virtio_pci_admin_legacy/common_device_io_read/write()
>   supply the 'offset' within the relevant configuration area, following
>   the virtio exported APIs.
> 
> Changes from V1: https://lore.kernel.org/all/20231023104548.07b3aa19.alex.williamson@redhat.com/T/
> Virtio:
> - Drop its first patch, it was accepted upstream already.
> - Add a new patch (#6) which initializes the supported admin commands
>   upon admin queue activation as was suggested by Michael.
> - Split the legacy_io_read/write commands per common/device
>   configuration as was asked by Michael.
> - Don't expose any more the list query/used APIs outside of virtio.
> - Instead, expose an API to check whether the legacy io functionality is
>   supported as was suggested by Michael.
> - Fix some Krobot's note by adding the missing include file.
> 
> Vfio:
> - Refer specifically to virtio-net as part of the driver/module description
>   as Alex asked.
> - Change to check MSIX enablement based on the irq type of the given vfio
>   core device. In addition, drop its capable checking from the probe flow
>   as was asked by Alex.
> - Adapt to use the new virtio exposed APIs and clean some code accordingly.
> - Adapt to some cleaner style code in some places (if/else) as was suggested
>   by Alex.
> - Fix the range_intersect_range() function and adapt its usage as was
>   pointed by Alex.
> - Make struct virtiovf_pci_core_device better packed.
> - Overwrite the subsystem vendor ID to be 0x1af4 as was discussed in
>   the ML.
> - Add support for the 'bar sizing negotiation' as was asked by Alex.
> - Drop the 'acc' from the 'ops' as Alex asked.
> 
> Changes from V0: https://www.spinics.net/lists/linux-virtualization/msg63802.html
> 
> Virtio:
> - Fix the common config map size issue that was reported by Michael
>   Tsirkin.
> - Do not use vp_dev->vqs[] array upon vp_del_vqs() as was asked by
>   Michael, instead skip the AQ specifically.
> - Move admin vq implementation into virtio_pci_modern.c as was asked by
>   Michael.
> - Rename structure virtio_avq to virtio_pci_admin_vq and some extra
>   corresponding renames.
> - Remove exported symbols virtio_pci_vf_get_pf_dev(),
>   virtio_admin_cmd_exec() as now callers are local to the module.
> - Handle inflight commands as part of the device reset flow.
> - Introduce APIs per admin command in virtio-pci as was asked by Michael.
> 
> Vfio:
> - Change to use EXPORT_SYMBOL_GPL instead of EXPORT_SYMBOL for
>   vfio_pci_core_setup_barmap() and vfio_pci_iowrite#xxx() as pointed by
>   Alex.
> - Drop the intermediate patch which prepares the commands and calls the
>   generic virtio admin command API (i.e. virtio_admin_cmd_exec()).
> - Instead, call directly to the new APIs per admin command that are
>   exported from Virtio - based on Michael's request.
> - Enable only virtio-net as part of the pci_device_id table to enforce
>   upon binding only what is supported as suggested by Alex.
> - Add support for byte-wise access (read/write) over the device config
>   region as was asked by Alex.
> - Consider whether MSIX is practically enabled/disabled to choose the
>   right opcode upon issuing read/write admin command, as mentioned
>   by Michael.
> - Move to use VIRTIO_PCI_CONFIG_OFF instead of adding some new defines
>   as was suggested by Michael.
> - Set the '.close_device' op to vfio_pci_core_close_device() as was
>   pointed by Alex.
> - Adapt to Vfio multi-line comment style in a few places.
> - Add virtualization@lists.linux-foundation.org in the MAINTAINERS file
>   to be CCed for the new driver as was suggested by Jason.
> 
> Yishai
> 
> Feng Liu (4):
>   virtio: Define feature bit for administration virtqueue
>   virtio-pci: Introduce admin virtqueue
>   virtio-pci: Introduce admin command sending function
>   virtio-pci: Introduce admin commands
> 
> Yishai Hadas (5):
>   virtio-pci: Initialize the supported admin commands
>   virtio-pci: Introduce APIs to execute legacy IO admin commands
>   vfio/pci: Expose vfio_pci_core_setup_barmap()
>   vfio/pci: Expose vfio_pci_iowrite/read##size()
>   vfio/virtio: Introduce a vfio driver over virtio devices
> 
>  MAINTAINERS                            |   7 +
>  drivers/vfio/pci/Kconfig               |   2 +
>  drivers/vfio/pci/Makefile              |   2 +
>  drivers/vfio/pci/vfio_pci_core.c       |  25 ++
>  drivers/vfio/pci/vfio_pci_rdwr.c       |  38 +-
>  drivers/vfio/pci/virtio/Kconfig        |  16 +
>  drivers/vfio/pci/virtio/Makefile       |   4 +
>  drivers/vfio/pci/virtio/main.c         | 554 +++++++++++++++++++++++++
>  drivers/virtio/virtio.c                |  37 +-
>  drivers/virtio/virtio_pci_common.c     |  14 +
>  drivers/virtio/virtio_pci_common.h     |  21 +-
>  drivers/virtio/virtio_pci_modern.c     | 503 +++++++++++++++++++++-
>  drivers/virtio/virtio_pci_modern_dev.c |  24 +-
>  include/linux/vfio_pci_core.h          |  20 +
>  include/linux/virtio.h                 |   8 +
>  include/linux/virtio_config.h          |   4 +
>  include/linux/virtio_pci_admin.h       |  21 +
>  include/linux/virtio_pci_modern.h      |   2 +
>  include/uapi/linux/virtio_config.h     |   8 +-
>  include/uapi/linux/virtio_pci.h        |  71 ++++
>  20 files changed, 1342 insertions(+), 39 deletions(-)
>  create mode 100644 drivers/vfio/pci/virtio/Kconfig
>  create mode 100644 drivers/vfio/pci/virtio/Makefile
>  create mode 100644 drivers/vfio/pci/virtio/main.c
>  create mode 100644 include/linux/virtio_pci_admin.h
> 
> -- 
> 2.27.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V4 vfio 4/9] virtio-pci: Introduce admin commands
  2023-11-30  9:52   ` Michael S. Tsirkin
@ 2023-11-30 10:35     ` Yishai Hadas
  2023-11-30 10:45       ` Michael S. Tsirkin
  0 siblings, 1 reply; 22+ messages in thread
From: Yishai Hadas @ 2023-11-30 10:35 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: alex.williamson, jasowang, jgg, kvm, virtualization, parav,
	feliu, jiri, kevin.tian, joao.m.martins, si-wei.liu, leonro,
	maorg

On 30/11/2023 11:52, Michael S. Tsirkin wrote:
> On Wed, Nov 29, 2023 at 04:37:41PM +0200, Yishai Hadas wrote:
>> +/* Transitional device admin command. */
>> +#define VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE	0x2
>> +#define VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ		0x3
>> +#define VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE		0x4
>> +#define VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ		0x5
>> +#define VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO		0x6
>> +
>> +/* Increment MAX_OPCODE to next value when new opcode is added */
>> +#define VIRTIO_ADMIN_MAX_CMD_OPCODE			0x6
> 
> Does anything need VIRTIO_ADMIN_MAX_CMD_OPCODE? Not in this
> patchset...
> 

Right, once you suggested to move to 'u64 supported_cmds' it's not any 
more in use.

It still can be used in the future, however I can drop it if it's worth 
a V5 sending.

Yishai

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V4 vfio 4/9] virtio-pci: Introduce admin commands
  2023-11-30 10:35     ` Yishai Hadas
@ 2023-11-30 10:45       ` Michael S. Tsirkin
  0 siblings, 0 replies; 22+ messages in thread
From: Michael S. Tsirkin @ 2023-11-30 10:45 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: alex.williamson, jasowang, jgg, kvm, virtualization, parav,
	feliu, jiri, kevin.tian, joao.m.martins, si-wei.liu, leonro,
	maorg

On Thu, Nov 30, 2023 at 12:35:09PM +0200, Yishai Hadas wrote:
> On 30/11/2023 11:52, Michael S. Tsirkin wrote:
> > On Wed, Nov 29, 2023 at 04:37:41PM +0200, Yishai Hadas wrote:
> > > +/* Transitional device admin command. */
> > > +#define VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE	0x2
> > > +#define VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ		0x3
> > > +#define VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE		0x4
> > > +#define VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ		0x5
> > > +#define VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO		0x6
> > > +
> > > +/* Increment MAX_OPCODE to next value when new opcode is added */
> > > +#define VIRTIO_ADMIN_MAX_CMD_OPCODE			0x6
> > 
> > Does anything need VIRTIO_ADMIN_MAX_CMD_OPCODE? Not in this
> > patchset...
> > 
> 
> Right, once you suggested to move to 'u64 supported_cmds' it's not any more
> in use.
> 
> It still can be used in the future, however I can drop it if it's worth a V5
> sending.
> 
> Yishai

If you don't need it then yea, we need to be careful about what we put in uapi headers.
wait with v5 until others can review v4 though.

-- 
MST


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V4 vfio 0/9] Introduce a vfio driver over virtio devices
  2023-11-30 10:07 ` [PATCH V4 vfio 0/9] " Michael S. Tsirkin
@ 2023-11-30 15:47   ` Yishai Hadas
  0 siblings, 0 replies; 22+ messages in thread
From: Yishai Hadas @ 2023-11-30 15:47 UTC (permalink / raw)
  To: Michael S. Tsirkin, Alex Williamson
  Cc: jasowang, jgg, kvm, virtualization, parav, feliu, jiri,
	kevin.tian, joao.m.martins, si-wei.liu, leonro, maorg

On 30/11/2023 12:07, Michael S. Tsirkin wrote:
> On Wed, Nov 29, 2023 at 04:37:37PM +0200, Yishai Hadas wrote:
>> This series introduce a vfio driver over virtio devices to support the
>> legacy interface functionality for VFs.
>>
>> Background, from the virtio spec [1].
>> --------------------------------------------------------------------
>> In some systems, there is a need to support a virtio legacy driver with
>> a device that does not directly support the legacy interface. In such
>> scenarios, a group owner device can provide the legacy interface
>> functionality for the group member devices. The driver of the owner
>> device can then access the legacy interface of a member device on behalf
>> of the legacy member device driver.
>>
>> For example, with the SR-IOV group type, group members (VFs) can not
>> present the legacy interface in an I/O BAR in BAR0 as expected by the
>> legacy pci driver. If the legacy driver is running inside a virtual
>> machine, the hypervisor executing the virtual machine can present a
>> virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
>> legacy driver accesses to this I/O BAR and forwards them to the group
>> owner device (PF) using group administration commands.
>> --------------------------------------------------------------------
>>
>> The first 6 patches are in the virtio area and handle the below:
>> - Introduce the admin virtqueue infrastcture.
>> - Expose the layout of the commands that should be used for
>>    supporting the legacy access.
>> - Expose APIs to enable upper layers as of vfio, net, etc
>>    to execute admin commands.
>>
>> The above follows the virtio spec that was lastly accepted in that area
>> [1].
>>
>> The last 3 patches are in the vfio area and handle the below:
>> - Expose some APIs from vfio/pci to be used by the vfio/virtio driver.
>> - Introduce a vfio driver over virtio devices to support the legacy
>>    interface functionality for VFs.
>>
>> The series was tested successfully over virtio-net VFs in the host,
>> while running in the guest both modern and legacy drivers.
>>
>> [1]
>> https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
> 
> 
> 
> I sent a minor question, but besides that, patches 1-6 look ok
> Acked-by: Michael S. Tsirkin <mst@redhat.com
Thanks Michael

I'll add your Acked-by for the above patches as part of V5 and will drop 
the unused macro (i.e.VIRTIO_ADMIN_MAX_CMD_OPCODE) as you asked.

> 
> I take no stance on patches 7-9 this is up to Alex.

Alex,
Are we fine with the rest of series from your side ?

Thanks,
Yishai

> 
> 
>> Changes from V3: https://www.spinics.net/lists/kvm/msg333008.html
>> Virtio:
>> - Rebase on top of 6.7 rc3.
>> Vfio:
>> - Fix a typo, drop 'acc' from 'virtiovf_acc_vfio_pci_tran_ops'.
>>
>> Changes from V2: https://lore.kernel.org/all/20231029155952.67686-8-yishaih@nvidia.com/T/
>> Virtio:
>> - Rebase on top of 6.7 rc1.
>> - Add a mutex to serialize admin commands execution and virtqueue
>>    deletion, as was suggested by Michael.
>> - Remove the 'ref_count' usage which is not needed any more.
>> - Reduce the depth of the admin vq to match a single command at a given time.
>> - Add a supported check upon command execution and move to use a single
>>    flow of virtqueue_exec_admin_cmd().
>> - Improve the description of the exported commands to better match the
>>    specification and the expected usage as was asked by Michael.
>>
>> Vfio:
>> - Upon calling to virtio_pci_admin_legacy/common_device_io_read/write()
>>    supply the 'offset' within the relevant configuration area, following
>>    the virtio exported APIs.
>>
>> Changes from V1: https://lore.kernel.org/all/20231023104548.07b3aa19.alex.williamson@redhat.com/T/
>> Virtio:
>> - Drop its first patch, it was accepted upstream already.
>> - Add a new patch (#6) which initializes the supported admin commands
>>    upon admin queue activation as was suggested by Michael.
>> - Split the legacy_io_read/write commands per common/device
>>    configuration as was asked by Michael.
>> - Don't expose any more the list query/used APIs outside of virtio.
>> - Instead, expose an API to check whether the legacy io functionality is
>>    supported as was suggested by Michael.
>> - Fix some Krobot's note by adding the missing include file.
>>
>> Vfio:
>> - Refer specifically to virtio-net as part of the driver/module description
>>    as Alex asked.
>> - Change to check MSIX enablement based on the irq type of the given vfio
>>    core device. In addition, drop its capable checking from the probe flow
>>    as was asked by Alex.
>> - Adapt to use the new virtio exposed APIs and clean some code accordingly.
>> - Adapt to some cleaner style code in some places (if/else) as was suggested
>>    by Alex.
>> - Fix the range_intersect_range() function and adapt its usage as was
>>    pointed by Alex.
>> - Make struct virtiovf_pci_core_device better packed.
>> - Overwrite the subsystem vendor ID to be 0x1af4 as was discussed in
>>    the ML.
>> - Add support for the 'bar sizing negotiation' as was asked by Alex.
>> - Drop the 'acc' from the 'ops' as Alex asked.
>>
>> Changes from V0: https://www.spinics.net/lists/linux-virtualization/msg63802.html
>>
>> Virtio:
>> - Fix the common config map size issue that was reported by Michael
>>    Tsirkin.
>> - Do not use vp_dev->vqs[] array upon vp_del_vqs() as was asked by
>>    Michael, instead skip the AQ specifically.
>> - Move admin vq implementation into virtio_pci_modern.c as was asked by
>>    Michael.
>> - Rename structure virtio_avq to virtio_pci_admin_vq and some extra
>>    corresponding renames.
>> - Remove exported symbols virtio_pci_vf_get_pf_dev(),
>>    virtio_admin_cmd_exec() as now callers are local to the module.
>> - Handle inflight commands as part of the device reset flow.
>> - Introduce APIs per admin command in virtio-pci as was asked by Michael.
>>
>> Vfio:
>> - Change to use EXPORT_SYMBOL_GPL instead of EXPORT_SYMBOL for
>>    vfio_pci_core_setup_barmap() and vfio_pci_iowrite#xxx() as pointed by
>>    Alex.
>> - Drop the intermediate patch which prepares the commands and calls the
>>    generic virtio admin command API (i.e. virtio_admin_cmd_exec()).
>> - Instead, call directly to the new APIs per admin command that are
>>    exported from Virtio - based on Michael's request.
>> - Enable only virtio-net as part of the pci_device_id table to enforce
>>    upon binding only what is supported as suggested by Alex.
>> - Add support for byte-wise access (read/write) over the device config
>>    region as was asked by Alex.
>> - Consider whether MSIX is practically enabled/disabled to choose the
>>    right opcode upon issuing read/write admin command, as mentioned
>>    by Michael.
>> - Move to use VIRTIO_PCI_CONFIG_OFF instead of adding some new defines
>>    as was suggested by Michael.
>> - Set the '.close_device' op to vfio_pci_core_close_device() as was
>>    pointed by Alex.
>> - Adapt to Vfio multi-line comment style in a few places.
>> - Add virtualization@lists.linux-foundation.org in the MAINTAINERS file
>>    to be CCed for the new driver as was suggested by Jason.
>>
>> Yishai
>>
>> Feng Liu (4):
>>    virtio: Define feature bit for administration virtqueue
>>    virtio-pci: Introduce admin virtqueue
>>    virtio-pci: Introduce admin command sending function
>>    virtio-pci: Introduce admin commands
>>
>> Yishai Hadas (5):
>>    virtio-pci: Initialize the supported admin commands
>>    virtio-pci: Introduce APIs to execute legacy IO admin commands
>>    vfio/pci: Expose vfio_pci_core_setup_barmap()
>>    vfio/pci: Expose vfio_pci_iowrite/read##size()
>>    vfio/virtio: Introduce a vfio driver over virtio devices
>>
>>   MAINTAINERS                            |   7 +
>>   drivers/vfio/pci/Kconfig               |   2 +
>>   drivers/vfio/pci/Makefile              |   2 +
>>   drivers/vfio/pci/vfio_pci_core.c       |  25 ++
>>   drivers/vfio/pci/vfio_pci_rdwr.c       |  38 +-
>>   drivers/vfio/pci/virtio/Kconfig        |  16 +
>>   drivers/vfio/pci/virtio/Makefile       |   4 +
>>   drivers/vfio/pci/virtio/main.c         | 554 +++++++++++++++++++++++++
>>   drivers/virtio/virtio.c                |  37 +-
>>   drivers/virtio/virtio_pci_common.c     |  14 +
>>   drivers/virtio/virtio_pci_common.h     |  21 +-
>>   drivers/virtio/virtio_pci_modern.c     | 503 +++++++++++++++++++++-
>>   drivers/virtio/virtio_pci_modern_dev.c |  24 +-
>>   include/linux/vfio_pci_core.h          |  20 +
>>   include/linux/virtio.h                 |   8 +
>>   include/linux/virtio_config.h          |   4 +
>>   include/linux/virtio_pci_admin.h       |  21 +
>>   include/linux/virtio_pci_modern.h      |   2 +
>>   include/uapi/linux/virtio_config.h     |   8 +-
>>   include/uapi/linux/virtio_pci.h        |  71 ++++
>>   20 files changed, 1342 insertions(+), 39 deletions(-)
>>   create mode 100644 drivers/vfio/pci/virtio/Kconfig
>>   create mode 100644 drivers/vfio/pci/virtio/Makefile
>>   create mode 100644 drivers/vfio/pci/virtio/main.c
>>   create mode 100644 include/linux/virtio_pci_admin.h
>>
>> -- 
>> 2.27.0
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V4 vfio 8/9] vfio/pci: Expose vfio_pci_iowrite/read##size()
  2023-11-29 14:37 ` [PATCH V4 vfio 8/9] vfio/pci: Expose vfio_pci_iowrite/read##size() Yishai Hadas
@ 2023-11-30 19:20   ` Alex Williamson
  2023-12-03 14:14     ` Yishai Hadas
  0 siblings, 1 reply; 22+ messages in thread
From: Alex Williamson @ 2023-11-30 19:20 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: mst, jasowang, jgg, kvm, virtualization, parav, feliu, jiri,
	kevin.tian, joao.m.martins, si-wei.liu, leonro, maorg

On Wed, 29 Nov 2023 16:37:45 +0200
Yishai Hadas <yishaih@nvidia.com> wrote:

> Expose vfio_pci_iowrite/read##size() to let it be used by drivers.
> 
> This functionality is needed to enable direct access to some physical
> BAR of the device with the proper locks/checks in place.
> 
> The next patches from this series will use this functionality on a data
> path flow when a direct access to the BAR is needed.
> 
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> ---
>  drivers/vfio/pci/vfio_pci_rdwr.c | 10 ++++++----
>  include/linux/vfio_pci_core.h    | 19 +++++++++++++++++++
>  2 files changed, 25 insertions(+), 4 deletions(-)

I don't follow the inconsistency between this and the previous patch.
Why did we move and rename the code to setup the barmap but we export
the ioread/write functions in place?  Thanks,

Alex


> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
> index 6f08b3ecbb89..817ec9a89123 100644
> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
> @@ -38,7 +38,7 @@
>  #define vfio_iowrite8	iowrite8
>  
>  #define VFIO_IOWRITE(size) \
> -static int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev,		\
> +int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev,		\
>  			bool test_mem, u##size val, void __iomem *io)	\
>  {									\
>  	if (test_mem) {							\
> @@ -55,7 +55,8 @@ static int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev,		\
>  		up_read(&vdev->memory_lock);				\
>  									\
>  	return 0;							\
> -}
> +}									\
> +EXPORT_SYMBOL_GPL(vfio_pci_iowrite##size);
>  
>  VFIO_IOWRITE(8)
>  VFIO_IOWRITE(16)
> @@ -65,7 +66,7 @@ VFIO_IOWRITE(64)
>  #endif
>  
>  #define VFIO_IOREAD(size) \
> -static int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev,		\
> +int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev,		\
>  			bool test_mem, u##size *val, void __iomem *io)	\
>  {									\
>  	if (test_mem) {							\
> @@ -82,7 +83,8 @@ static int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev,		\
>  		up_read(&vdev->memory_lock);				\
>  									\
>  	return 0;							\
> -}
> +}									\
> +EXPORT_SYMBOL_GPL(vfio_pci_ioread##size);
>  
>  VFIO_IOREAD(8)
>  VFIO_IOREAD(16)
> diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
> index 67ac58e20e1d..22c915317788 100644
> --- a/include/linux/vfio_pci_core.h
> +++ b/include/linux/vfio_pci_core.h
> @@ -131,4 +131,23 @@ int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar);
>  pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev,
>  						pci_channel_state_t state);
>  
> +#define VFIO_IOWRITE_DECLATION(size) \
> +int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev,		\
> +			bool test_mem, u##size val, void __iomem *io);
> +
> +VFIO_IOWRITE_DECLATION(8)
> +VFIO_IOWRITE_DECLATION(16)
> +VFIO_IOWRITE_DECLATION(32)
> +#ifdef iowrite64
> +VFIO_IOWRITE_DECLATION(64)
> +#endif
> +
> +#define VFIO_IOREAD_DECLATION(size) \
> +int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev,		\
> +			bool test_mem, u##size *val, void __iomem *io);
> +
> +VFIO_IOREAD_DECLATION(8)
> +VFIO_IOREAD_DECLATION(16)
> +VFIO_IOREAD_DECLATION(32)
> +
>  #endif /* VFIO_PCI_CORE_H */


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V4 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-11-29 14:37 ` [PATCH V4 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices Yishai Hadas
@ 2023-11-30 22:10   ` Alex Williamson
  2023-12-03 14:54     ` Yishai Hadas
  0 siblings, 1 reply; 22+ messages in thread
From: Alex Williamson @ 2023-11-30 22:10 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: mst, jasowang, jgg, kvm, virtualization, parav, feliu, jiri,
	kevin.tian, joao.m.martins, si-wei.liu, leonro, maorg

On Wed, 29 Nov 2023 16:37:46 +0200
Yishai Hadas <yishaih@nvidia.com> wrote:

> Introduce a vfio driver over virtio devices to support the legacy
> interface functionality for VFs.
> 
> Background, from the virtio spec [1].
> --------------------------------------------------------------------
> In some systems, there is a need to support a virtio legacy driver with
> a device that does not directly support the legacy interface. In such
> scenarios, a group owner device can provide the legacy interface
> functionality for the group member devices. The driver of the owner
> device can then access the legacy interface of a member device on behalf
> of the legacy member device driver.
> 
> For example, with the SR-IOV group type, group members (VFs) can not
> present the legacy interface in an I/O BAR in BAR0 as expected by the
> legacy pci driver. If the legacy driver is running inside a virtual
> machine, the hypervisor executing the virtual machine can present a
> virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
> legacy driver accesses to this I/O BAR and forwards them to the group
> owner device (PF) using group administration commands.
> --------------------------------------------------------------------
> 
> Specifically, this driver adds support for a virtio-net VF to be exposed
> as a transitional device to a guest driver and allows the legacy IO BAR
> functionality on top.
> 
> This allows a VM which uses a legacy virtio-net driver in the guest to
> work transparently over a VF which its driver in the host is that new
> driver.
> 
> The driver can be extended easily to support some other types of virtio
> devices (e.g virtio-blk), by adding in a few places the specific type
> properties as was done for virtio-net.
> 
> For now, only the virtio-net use case was tested and as such we introduce
> the support only for such a device.
> 
> Practically,
> Upon probing a VF for a virtio-net device, in case its PF supports
> legacy access over the virtio admin commands and the VF doesn't have BAR
> 0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
> transitional device with I/O BAR in BAR 0.
> 
> The existence of the simulated I/O bar is reported later on by
> overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
> exposes itself as a transitional device by overwriting some properties
> upon reading its config space.
> 
> Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
> guest may use it via read/write calls according to the virtio
> specification.
> 
> Any read/write towards the control parts of the BAR will be captured by
> the new driver and will be translated into admin commands towards the
> device.
> 
> Any data path read/write access (i.e. virtio driver notifications) will
> be forwarded to the physical BAR which its properties were supplied by
> the admin command VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO upon the
> probing/init flow.
> 
> With that code in place a legacy driver in the guest has the look and
> feel as if having a transitional device with legacy support for both its
> control and data path flows.
> 
> [1]
> https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
> 
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> ---
>  MAINTAINERS                      |   7 +
>  drivers/vfio/pci/Kconfig         |   2 +
>  drivers/vfio/pci/Makefile        |   2 +
>  drivers/vfio/pci/virtio/Kconfig  |  16 +
>  drivers/vfio/pci/virtio/Makefile |   4 +
>  drivers/vfio/pci/virtio/main.c   | 554 +++++++++++++++++++++++++++++++
>  6 files changed, 585 insertions(+)
>  create mode 100644 drivers/vfio/pci/virtio/Kconfig
>  create mode 100644 drivers/vfio/pci/virtio/Makefile
>  create mode 100644 drivers/vfio/pci/virtio/main.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 012df8ccf34e..b246b769092d 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -22872,6 +22872,13 @@ L:	kvm@vger.kernel.org
>  S:	Maintained
>  F:	drivers/vfio/pci/mlx5/
>  
> +VFIO VIRTIO PCI DRIVER
> +M:	Yishai Hadas <yishaih@nvidia.com>
> +L:	kvm@vger.kernel.org
> +L:	virtualization@lists.linux-foundation.org
> +S:	Maintained
> +F:	drivers/vfio/pci/virtio
> +
>  VFIO PCI DEVICE SPECIFIC DRIVERS
>  R:	Jason Gunthorpe <jgg@nvidia.com>
>  R:	Yishai Hadas <yishaih@nvidia.com>
> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> index 8125e5f37832..18c397df566d 100644
> --- a/drivers/vfio/pci/Kconfig
> +++ b/drivers/vfio/pci/Kconfig
> @@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
>  
>  source "drivers/vfio/pci/pds/Kconfig"
>  
> +source "drivers/vfio/pci/virtio/Kconfig"
> +
>  endmenu
> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
> index 45167be462d8..046139a4eca5 100644
> --- a/drivers/vfio/pci/Makefile
> +++ b/drivers/vfio/pci/Makefile
> @@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
>  obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
>  
>  obj-$(CONFIG_PDS_VFIO_PCI) += pds/
> +
> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
> diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
> new file mode 100644
> index 000000000000..3a6707639220
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/Kconfig
> @@ -0,0 +1,16 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +config VIRTIO_VFIO_PCI
> +        tristate "VFIO support for VIRTIO NET PCI devices"
> +        depends on VIRTIO_PCI
> +        select VFIO_PCI_CORE
> +        help
> +          This provides support for exposing VIRTIO NET VF devices which support
> +          legacy IO access, using the VFIO framework that can work with a legacy
> +          virtio driver in the guest.
> +          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
> +          not indicate I/O Space.
> +          As of that this driver emulated I/O BAR in software to let a VF be
> +          seen as a transitional device in the guest and let it work with
> +          a legacy driver.
> +
> +          If you don't know what to do here, say N.
> diff --git a/drivers/vfio/pci/virtio/Makefile b/drivers/vfio/pci/virtio/Makefile
> new file mode 100644
> index 000000000000..2039b39fb723
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/Makefile
> @@ -0,0 +1,4 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio-vfio-pci.o
> +virtio-vfio-pci-y := main.o
> +
> diff --git a/drivers/vfio/pci/virtio/main.c b/drivers/vfio/pci/virtio/main.c
> new file mode 100644
> index 000000000000..3ca19c891673
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/main.c
> @@ -0,0 +1,554 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> + */
> +
> +#include <linux/device.h>
> +#include <linux/module.h>
> +#include <linux/mutex.h>
> +#include <linux/pci.h>
> +#include <linux/pm_runtime.h>
> +#include <linux/types.h>
> +#include <linux/uaccess.h>
> +#include <linux/vfio.h>
> +#include <linux/vfio_pci_core.h>
> +#include <linux/virtio_pci.h>
> +#include <linux/virtio_net.h>
> +#include <linux/virtio_pci_admin.h>
> +
> +struct virtiovf_pci_core_device {
> +	struct vfio_pci_core_device core_device;
> +	u8 bar0_virtual_buf_size;

Poor packing here, move it down with notify_bar.

> +	u8 *bar0_virtual_buf;
> +	/* synchronize access to the virtual buf */
> +	struct mutex bar_mutex;
> +	void __iomem *notify_addr;
> +	u64 notify_offset;
> +	__le32 pci_base_addr_0;
> +	__le16 pci_cmd;
> +	u8 notify_bar;
> +};
> +
> +static int
> +virtiovf_issue_legacy_rw_cmd(struct virtiovf_pci_core_device *virtvdev,
> +			     loff_t pos, char __user *buf,
> +			     size_t count, bool read)
> +{
> +	bool msix_enabled =
> +		(virtvdev->core_device.irq_type == VFIO_PCI_MSIX_IRQ_INDEX);

This is racy, the irq_type could change, do we care or does that fall
into poor driver behavior?

> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
> +	u8 *bar0_buf = virtvdev->bar0_virtual_buf;
> +	bool common;
> +	u8 offset;
> +	int ret;
> +
> +	common = pos < VIRTIO_PCI_CONFIG_OFF(msix_enabled);

I don't understand virtio, but this looks like it has the effect of the
mac and status fields shifting in IO BAR depending on whether MSIX
is enabled.  Doesn't that give the user access to 4 bytes beyond the
status field?  Is that how it's supposed to work or should we be
dropping accesses to the 4-bytes MSIX support adds when not enabled?

> +	/* offset within the relevant configuration area */
> +	offset = common ? pos : pos - VIRTIO_PCI_CONFIG_OFF(msix_enabled);
> +	mutex_lock(&virtvdev->bar_mutex);
> +	if (read) {
> +		if (common)
> +			ret = virtio_pci_admin_legacy_common_io_read(pdev, offset,
> +					count, bar0_buf + pos);
> +		else
> +			ret = virtio_pci_admin_legacy_device_io_read(pdev, offset,
> +					count, bar0_buf + pos);
> +		if (ret)
> +			goto out;
> +		if (copy_to_user(buf, bar0_buf + pos, count))
> +			ret = -EFAULT;
> +	} else {
> +		if (copy_from_user(bar0_buf + pos, buf, count)) {
> +			ret = -EFAULT;
> +			goto out;
> +		}
> +
> +		if (common)
> +			ret = virtio_pci_admin_legacy_common_io_write(pdev, offset,
> +					count, bar0_buf + pos);
> +		else
> +			ret = virtio_pci_admin_legacy_device_io_write(pdev, offset,
> +					count, bar0_buf + pos);
> +	}
> +out:
> +	mutex_unlock(&virtvdev->bar_mutex);
> +	return ret;
> +}
> +
> +static int
> +translate_io_bar_to_mem_bar(struct virtiovf_pci_core_device *virtvdev,
> +			    loff_t pos, char __user *buf,
> +			    size_t count, bool read)
> +{
> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
> +	u16 queue_notify;
> +	int ret;
> +
> +	if (pos + count > virtvdev->bar0_virtual_buf_size)
> +		return -EINVAL;
> +
> +	switch (pos) {
> +	case VIRTIO_PCI_QUEUE_NOTIFY:
> +		if (count != sizeof(queue_notify))
> +			return -EINVAL;
> +		if (read) {
> +			ret = vfio_pci_ioread16(core_device, true, &queue_notify,
> +						virtvdev->notify_addr);
> +			if (ret)
> +				return ret;
> +			if (copy_to_user(buf, &queue_notify,
> +					 sizeof(queue_notify)))
> +				return -EFAULT;
> +		} else {
> +			if (copy_from_user(&queue_notify, buf, count))
> +				return -EFAULT;
> +			ret = vfio_pci_iowrite16(core_device, true, queue_notify,
> +						 virtvdev->notify_addr);
> +		}
> +		break;
> +	default:
> +		ret = virtiovf_issue_legacy_rw_cmd(virtvdev, pos, buf, count,
> +						   read);
> +	}
> +
> +	return ret ? ret : count;
> +}
> +
> +static bool range_intersect_range(loff_t range1_start, size_t count1,
> +				  loff_t range2_start, size_t count2,
> +				  loff_t *start_offset,
> +				  size_t *intersect_count,
> +				  size_t *register_offset)
> +{
> +	if (range1_start <= range2_start &&
> +	    range1_start + count1 > range2_start) {
> +		*start_offset = range2_start - range1_start;
> +		*intersect_count = min_t(size_t, count2,
> +					 range1_start + count1 - range2_start);
> +		*register_offset = 0;
> +		return true;
> +	}
> +
> +	if (range1_start > range2_start &&
> +	    range1_start < range2_start + count2) {
> +		*start_offset = 0;
> +		*intersect_count = min_t(size_t, count1,
> +					 range2_start + count2 - range1_start);
> +		*register_offset = range1_start - range2_start;
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
> +static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
> +					char __user *buf, size_t count,
> +					loff_t *ppos)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +	size_t register_offset;
> +	loff_t copy_offset;
> +	size_t copy_count;
> +	__le32 val32;
> +	__le16 val16;
> +	u8 val8;
> +	int ret;
> +
> +	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
> +	if (ret < 0)
> +		return ret;
> +
> +	if (range_intersect_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
> +				  &copy_offset, &copy_count, &register_offset)) {
> +		val16 = cpu_to_le16(VIRTIO_TRANS_ID_NET);
> +		if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset, copy_count))
> +			return -EFAULT;
> +	}
> +
> +	if ((le16_to_cpu(virtvdev->pci_cmd) & PCI_COMMAND_IO) &&
> +	    range_intersect_range(pos, count, PCI_COMMAND, sizeof(val16),
> +				  &copy_offset, &copy_count, &register_offset)) {
> +		if (copy_from_user((void *)&val16 + register_offset, buf + copy_offset,
> +				   copy_count))
> +			return -EFAULT;
> +		val16 |= cpu_to_le16(PCI_COMMAND_IO);
> +		if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset,
> +				 copy_count))
> +			return -EFAULT;
> +	}

Seems like COMMAND_IO survives across reset, pci_cmd is only ever
modified by user writes.

I also don't see that we honor COMMAND_IO for accesses.  Shouldn't reads
return -1 and writes get dropped if COMMAND_IO is cleared?

> +
> +	if (range_intersect_range(pos, count, PCI_REVISION_ID, sizeof(val8),
> +				  &copy_offset, &copy_count, &register_offset)) {
> +		/* Transional needs to have revision 0 */
> +		val8 = 0;
> +		if (copy_to_user(buf + copy_offset, &val8, copy_count))
> +			return -EFAULT;
> +	}
> +
> +	if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0, sizeof(val32),
> +				  &copy_offset, &copy_count, &register_offset)) {
> +		u8 log_bar_size = ilog2(roundup_pow_of_two(virtvdev->bar0_virtual_buf_size));
> +		u32 mask_size = ~((BIT(log_bar_size) - 1));

bar0_virtual_buf_size already needs to be a power-of-2, the size we
report here needs to match the REGION_INFO size, so we should enforce
the size in virtio_vf_pci_init_device().  I think it's 32bytes, so
we're actually good on the size, but the above makes no sense, ie.
rounding a value that's already a power-of-2 to a power-of-2, getting
the log2 size, shifting that back into the size we started with to get
a mask... why isn't it just:

		u32 bar_mask = ~(virtvdev->bar0_virtual_buf_size - 1);
?


> +		u32 pci_base_addr_0 = le32_to_cpu(virtvdev->pci_base_addr_0);
> +
> +		val32 = cpu_to_le32((pci_base_addr_0 & mask_size) | PCI_BASE_ADDRESS_SPACE_IO);

Looks like what I suggested to Ankit :)

Per our prior variant driver agreement, we're going to need to enlist
another reviewer as well.  Thanks,

Alex

> +		if (copy_to_user(buf + copy_offset, (void *)&val32 + register_offset, copy_count))
> +			return -EFAULT;
> +	}
> +
> +	if (range_intersect_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16),
> +				  &copy_offset, &copy_count, &register_offset)) {
> +		/*
> +		 * Transitional devices use the PCI subsystem device id as
> +		 * virtio device id, same as legacy driver always did.
> +		 */
> +		val16 = cpu_to_le16(VIRTIO_ID_NET);
> +		if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset,
> +				 copy_count))
> +			return -EFAULT;
> +	}
> +
> +	if (range_intersect_range(pos, count, PCI_SUBSYSTEM_VENDOR_ID, sizeof(val16),
> +				  &copy_offset, &copy_count, &register_offset)) {
> +		val16 = cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET);
> +		if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset,
> +				 copy_count))
> +			return -EFAULT;
> +	}
> +
> +	return count;
> +}
> +
> +static ssize_t
> +virtiovf_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
> +		       size_t count, loff_t *ppos)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +	int ret;
> +
> +	if (!count)
> +		return 0;
> +
> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX)
> +		return virtiovf_pci_read_config(core_vdev, buf, count, ppos);
> +
> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
> +		return vfio_pci_core_read(core_vdev, buf, count, ppos);
> +
> +	ret = pm_runtime_resume_and_get(&pdev->dev);
> +	if (ret) {
> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n",
> +				     ret);
> +		return -EIO;
> +	}
> +
> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, buf, count, true);
> +	pm_runtime_put(&pdev->dev);
> +	return ret;
> +}
> +
> +static ssize_t
> +virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
> +			size_t count, loff_t *ppos)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +	int ret;
> +
> +	if (!count)
> +		return 0;
> +
> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
> +		size_t register_offset;
> +		loff_t copy_offset;
> +		size_t copy_count;
> +
> +		if (range_intersect_range(pos, count, PCI_COMMAND, sizeof(virtvdev->pci_cmd),
> +					  &copy_offset, &copy_count,
> +					  &register_offset)) {
> +			if (copy_from_user((void *)&virtvdev->pci_cmd + register_offset,
> +					   buf + copy_offset,
> +					   copy_count))
> +				return -EFAULT;
> +		}
> +
> +		if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0,
> +					  sizeof(virtvdev->pci_base_addr_0),
> +					  &copy_offset, &copy_count,
> +					  &register_offset)) {
> +			if (copy_from_user((void *)&virtvdev->pci_base_addr_0 + register_offset,
> +					   buf + copy_offset,
> +					   copy_count))
> +				return -EFAULT;
> +		}
> +	}
> +
> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
> +		return vfio_pci_core_write(core_vdev, buf, count, ppos);
> +
> +	ret = pm_runtime_resume_and_get(&pdev->dev);
> +	if (ret) {
> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n", ret);
> +		return -EIO;
> +	}
> +
> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, (char __user *)buf, count, false);
> +	pm_runtime_put(&pdev->dev);
> +	return ret;
> +}
> +
> +static int
> +virtiovf_pci_ioctl_get_region_info(struct vfio_device *core_vdev,
> +				   unsigned int cmd, unsigned long arg)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	unsigned long minsz = offsetofend(struct vfio_region_info, offset);
> +	void __user *uarg = (void __user *)arg;
> +	struct vfio_region_info info = {};
> +
> +	if (copy_from_user(&info, uarg, minsz))
> +		return -EFAULT;
> +
> +	if (info.argsz < minsz)
> +		return -EINVAL;
> +
> +	switch (info.index) {
> +	case VFIO_PCI_BAR0_REGION_INDEX:
> +		info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
> +		info.size = virtvdev->bar0_virtual_buf_size;
> +		info.flags = VFIO_REGION_INFO_FLAG_READ |
> +			     VFIO_REGION_INFO_FLAG_WRITE;
> +		return copy_to_user(uarg, &info, minsz) ? -EFAULT : 0;
> +	default:
> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
> +	}
> +}
> +
> +static long
> +virtiovf_vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
> +			     unsigned long arg)
> +{
> +	switch (cmd) {
> +	case VFIO_DEVICE_GET_REGION_INFO:
> +		return virtiovf_pci_ioctl_get_region_info(core_vdev, cmd, arg);
> +	default:
> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
> +	}
> +}
> +
> +static int
> +virtiovf_set_notify_addr(struct virtiovf_pci_core_device *virtvdev)
> +{
> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
> +	int ret;
> +
> +	/*
> +	 * Setup the BAR where the 'notify' exists to be used by vfio as well
> +	 * This will let us mmap it only once and use it when needed.
> +	 */
> +	ret = vfio_pci_core_setup_barmap(core_device,
> +					 virtvdev->notify_bar);
> +	if (ret)
> +		return ret;
> +
> +	virtvdev->notify_addr = core_device->barmap[virtvdev->notify_bar] +
> +			virtvdev->notify_offset;
> +	return 0;
> +}
> +
> +static int virtiovf_pci_open_device(struct vfio_device *core_vdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct vfio_pci_core_device *vdev = &virtvdev->core_device;
> +	int ret;
> +
> +	ret = vfio_pci_core_enable(vdev);
> +	if (ret)
> +		return ret;
> +
> +	if (virtvdev->bar0_virtual_buf) {
> +		/*
> +		 * Upon close_device() the vfio_pci_core_disable() is called
> +		 * and will close all the previous mmaps, so it seems that the
> +		 * valid life cycle for the 'notify' addr is per open/close.
> +		 */
> +		ret = virtiovf_set_notify_addr(virtvdev);
> +		if (ret) {
> +			vfio_pci_core_disable(vdev);
> +			return ret;
> +		}
> +	}
> +
> +	vfio_pci_core_finish_enable(vdev);
> +	return 0;
> +}
> +
> +static int virtiovf_get_device_config_size(unsigned short device)
> +{
> +	/* Network card */
> +	return offsetofend(struct virtio_net_config, status);
> +}
> +
> +static int virtiovf_read_notify_info(struct virtiovf_pci_core_device *virtvdev)
> +{
> +	u64 offset;
> +	int ret;
> +	u8 bar;
> +
> +	ret = virtio_pci_admin_legacy_io_notify_info(virtvdev->core_device.pdev,
> +				VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM,
> +				&bar, &offset);
> +	if (ret)
> +		return ret;
> +
> +	virtvdev->notify_bar = bar;
> +	virtvdev->notify_offset = offset;
> +	return 0;
> +}
> +
> +static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct pci_dev *pdev;
> +	int ret;
> +
> +	ret = vfio_pci_core_init_dev(core_vdev);
> +	if (ret)
> +		return ret;
> +
> +	pdev = virtvdev->core_device.pdev;
> +	ret = virtiovf_read_notify_info(virtvdev);
> +	if (ret)
> +		return ret;
> +
> +	/* Being ready with a buffer that supports MSIX */
> +	virtvdev->bar0_virtual_buf_size = VIRTIO_PCI_CONFIG_OFF(true) +
> +				virtiovf_get_device_config_size(pdev->device);
> +	virtvdev->bar0_virtual_buf = kzalloc(virtvdev->bar0_virtual_buf_size,
> +					     GFP_KERNEL);
> +	if (!virtvdev->bar0_virtual_buf)
> +		return -ENOMEM;
> +	mutex_init(&virtvdev->bar_mutex);
> +	return 0;
> +}
> +
> +static void virtiovf_pci_core_release_dev(struct vfio_device *core_vdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +
> +	kfree(virtvdev->bar0_virtual_buf);
> +	vfio_pci_core_release_dev(core_vdev);
> +}
> +
> +static const struct vfio_device_ops virtiovf_vfio_pci_tran_ops = {
> +	.name = "virtio-vfio-pci-trans",
> +	.init = virtiovf_pci_init_device,
> +	.release = virtiovf_pci_core_release_dev,
> +	.open_device = virtiovf_pci_open_device,
> +	.close_device = vfio_pci_core_close_device,
> +	.ioctl = virtiovf_vfio_pci_core_ioctl,
> +	.read = virtiovf_pci_core_read,
> +	.write = virtiovf_pci_core_write,
> +	.mmap = vfio_pci_core_mmap,
> +	.request = vfio_pci_core_request,
> +	.match = vfio_pci_core_match,
> +	.bind_iommufd = vfio_iommufd_physical_bind,
> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
> +};
> +
> +static const struct vfio_device_ops virtiovf_vfio_pci_ops = {
> +	.name = "virtio-vfio-pci",
> +	.init = vfio_pci_core_init_dev,
> +	.release = vfio_pci_core_release_dev,
> +	.open_device = virtiovf_pci_open_device,
> +	.close_device = vfio_pci_core_close_device,
> +	.ioctl = vfio_pci_core_ioctl,
> +	.device_feature = vfio_pci_core_ioctl_feature,
> +	.read = vfio_pci_core_read,
> +	.write = vfio_pci_core_write,
> +	.mmap = vfio_pci_core_mmap,
> +	.request = vfio_pci_core_request,
> +	.match = vfio_pci_core_match,
> +	.bind_iommufd = vfio_iommufd_physical_bind,
> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
> +};
> +
> +static bool virtiovf_bar0_exists(struct pci_dev *pdev)
> +{
> +	struct resource *res = pdev->resource;
> +
> +	return res->flags ? true : false;
> +}
> +
> +static int virtiovf_pci_probe(struct pci_dev *pdev,
> +			      const struct pci_device_id *id)
> +{
> +	const struct vfio_device_ops *ops = &virtiovf_vfio_pci_ops;
> +	struct virtiovf_pci_core_device *virtvdev;
> +	int ret;
> +
> +	if (pdev->is_virtfn && virtio_pci_admin_has_legacy_io(pdev) &&
> +	    !virtiovf_bar0_exists(pdev))
> +		ops = &virtiovf_vfio_pci_tran_ops;
> +
> +	virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
> +				     &pdev->dev, ops);
> +	if (IS_ERR(virtvdev))
> +		return PTR_ERR(virtvdev);
> +
> +	dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
> +	ret = vfio_pci_core_register_device(&virtvdev->core_device);
> +	if (ret)
> +		goto out;
> +	return 0;
> +out:
> +	vfio_put_device(&virtvdev->core_device.vdev);
> +	return ret;
> +}
> +
> +static void virtiovf_pci_remove(struct pci_dev *pdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
> +
> +	vfio_pci_core_unregister_device(&virtvdev->core_device);
> +	vfio_put_device(&virtvdev->core_device.vdev);
> +}
> +
> +static const struct pci_device_id virtiovf_pci_table[] = {
> +	/* Only virtio-net is supported/tested so far */
> +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1041) },
> +	{}
> +};
> +
> +MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
> +
> +static struct pci_driver virtiovf_pci_driver = {
> +	.name = KBUILD_MODNAME,
> +	.id_table = virtiovf_pci_table,
> +	.probe = virtiovf_pci_probe,
> +	.remove = virtiovf_pci_remove,
> +	.err_handler = &vfio_pci_core_err_handlers,
> +	.driver_managed_dma = true,
> +};
> +
> +module_pci_driver(virtiovf_pci_driver);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
> +MODULE_DESCRIPTION(
> +	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO NET devices");


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V4 vfio 8/9] vfio/pci: Expose vfio_pci_iowrite/read##size()
  2023-11-30 19:20   ` Alex Williamson
@ 2023-12-03 14:14     ` Yishai Hadas
  2023-12-04 22:57       ` Alex Williamson
  0 siblings, 1 reply; 22+ messages in thread
From: Yishai Hadas @ 2023-12-03 14:14 UTC (permalink / raw)
  To: Alex Williamson
  Cc: mst, jasowang, jgg, kvm, virtualization, parav, feliu, jiri,
	kevin.tian, joao.m.martins, si-wei.liu, leonro, maorg

On 30/11/2023 21:20, Alex Williamson wrote:
> On Wed, 29 Nov 2023 16:37:45 +0200
> Yishai Hadas <yishaih@nvidia.com> wrote:
> 
>> Expose vfio_pci_iowrite/read##size() to let it be used by drivers.
>>
>> This functionality is needed to enable direct access to some physical
>> BAR of the device with the proper locks/checks in place.
>>
>> The next patches from this series will use this functionality on a data
>> path flow when a direct access to the BAR is needed.
>>
>> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>> ---
>>   drivers/vfio/pci/vfio_pci_rdwr.c | 10 ++++++----
>>   include/linux/vfio_pci_core.h    | 19 +++++++++++++++++++
>>   2 files changed, 25 insertions(+), 4 deletions(-)
> 
> I don't follow the inconsistency between this and the previous patch.
> Why did we move and rename the code to setup the barmap but we export
> the ioread/write functions in place?  Thanks,
> 
> Alex

The mount of code for barmap setup was quite small compared to the 
ioread/write functions.

However, I agree, we can be consistent here and export in both cases the 
functions in place as part of vfio_pci_rdwr.c which is already part of 
vfio_pci_core.ko

I may also rename in current patch vfio_pci_iowrite/read<xxx> to have 
the 'core' prefix as part of the functions names (i.e. 
vfio_pci_core_iowrite/read<xxx>) to be consistent with other exported 
core functions and adapt the callers to this name.

Makes sense ?

Yishai

> 
> 
>> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
>> index 6f08b3ecbb89..817ec9a89123 100644
>> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
>> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
>> @@ -38,7 +38,7 @@
>>   #define vfio_iowrite8	iowrite8
>>   
>>   #define VFIO_IOWRITE(size) \
>> -static int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev,		\
>> +int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev,		\
>>   			bool test_mem, u##size val, void __iomem *io)	\
>>   {									\
>>   	if (test_mem) {							\
>> @@ -55,7 +55,8 @@ static int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev,		\
>>   		up_read(&vdev->memory_lock);				\
>>   									\
>>   	return 0;							\
>> -}
>> +}									\
>> +EXPORT_SYMBOL_GPL(vfio_pci_iowrite##size);
>>   
>>   VFIO_IOWRITE(8)
>>   VFIO_IOWRITE(16)
>> @@ -65,7 +66,7 @@ VFIO_IOWRITE(64)
>>   #endif
>>   
>>   #define VFIO_IOREAD(size) \
>> -static int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev,		\
>> +int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev,		\
>>   			bool test_mem, u##size *val, void __iomem *io)	\
>>   {									\
>>   	if (test_mem) {							\
>> @@ -82,7 +83,8 @@ static int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev,		\
>>   		up_read(&vdev->memory_lock);				\
>>   									\
>>   	return 0;							\
>> -}
>> +}									\
>> +EXPORT_SYMBOL_GPL(vfio_pci_ioread##size);
>>   
>>   VFIO_IOREAD(8)
>>   VFIO_IOREAD(16)
>> diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
>> index 67ac58e20e1d..22c915317788 100644
>> --- a/include/linux/vfio_pci_core.h
>> +++ b/include/linux/vfio_pci_core.h
>> @@ -131,4 +131,23 @@ int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar);
>>   pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev,
>>   						pci_channel_state_t state);
>>   
>> +#define VFIO_IOWRITE_DECLATION(size) \
>> +int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev,		\
>> +			bool test_mem, u##size val, void __iomem *io);
>> +
>> +VFIO_IOWRITE_DECLATION(8)
>> +VFIO_IOWRITE_DECLATION(16)
>> +VFIO_IOWRITE_DECLATION(32)
>> +#ifdef iowrite64
>> +VFIO_IOWRITE_DECLATION(64)
>> +#endif
>> +
>> +#define VFIO_IOREAD_DECLATION(size) \
>> +int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev,		\
>> +			bool test_mem, u##size *val, void __iomem *io);
>> +
>> +VFIO_IOREAD_DECLATION(8)
>> +VFIO_IOREAD_DECLATION(16)
>> +VFIO_IOREAD_DECLATION(32)
>> +
>>   #endif /* VFIO_PCI_CORE_H */
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V4 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-11-30 22:10   ` Alex Williamson
@ 2023-12-03 14:54     ` Yishai Hadas
  2023-12-04 23:32       ` Alex Williamson
  0 siblings, 1 reply; 22+ messages in thread
From: Yishai Hadas @ 2023-12-03 14:54 UTC (permalink / raw)
  To: Alex Williamson
  Cc: mst, jasowang, jgg, kvm, virtualization, parav, feliu, jiri,
	kevin.tian, joao.m.martins, si-wei.liu, leonro, maorg, tzurielk

On 01/12/2023 0:10, Alex Williamson wrote:
> On Wed, 29 Nov 2023 16:37:46 +0200
> Yishai Hadas <yishaih@nvidia.com> wrote:
> 
>> Introduce a vfio driver over virtio devices to support the legacy
>> interface functionality for VFs.
>>
>> Background, from the virtio spec [1].
>> --------------------------------------------------------------------
>> In some systems, there is a need to support a virtio legacy driver with
>> a device that does not directly support the legacy interface. In such
>> scenarios, a group owner device can provide the legacy interface
>> functionality for the group member devices. The driver of the owner
>> device can then access the legacy interface of a member device on behalf
>> of the legacy member device driver.
>>
>> For example, with the SR-IOV group type, group members (VFs) can not
>> present the legacy interface in an I/O BAR in BAR0 as expected by the
>> legacy pci driver. If the legacy driver is running inside a virtual
>> machine, the hypervisor executing the virtual machine can present a
>> virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
>> legacy driver accesses to this I/O BAR and forwards them to the group
>> owner device (PF) using group administration commands.
>> --------------------------------------------------------------------
>>
>> Specifically, this driver adds support for a virtio-net VF to be exposed
>> as a transitional device to a guest driver and allows the legacy IO BAR
>> functionality on top.
>>
>> This allows a VM which uses a legacy virtio-net driver in the guest to
>> work transparently over a VF which its driver in the host is that new
>> driver.
>>
>> The driver can be extended easily to support some other types of virtio
>> devices (e.g virtio-blk), by adding in a few places the specific type
>> properties as was done for virtio-net.
>>
>> For now, only the virtio-net use case was tested and as such we introduce
>> the support only for such a device.
>>
>> Practically,
>> Upon probing a VF for a virtio-net device, in case its PF supports
>> legacy access over the virtio admin commands and the VF doesn't have BAR
>> 0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
>> transitional device with I/O BAR in BAR 0.
>>
>> The existence of the simulated I/O bar is reported later on by
>> overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
>> exposes itself as a transitional device by overwriting some properties
>> upon reading its config space.
>>
>> Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
>> guest may use it via read/write calls according to the virtio
>> specification.
>>
>> Any read/write towards the control parts of the BAR will be captured by
>> the new driver and will be translated into admin commands towards the
>> device.
>>
>> Any data path read/write access (i.e. virtio driver notifications) will
>> be forwarded to the physical BAR which its properties were supplied by
>> the admin command VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO upon the
>> probing/init flow.
>>
>> With that code in place a legacy driver in the guest has the look and
>> feel as if having a transitional device with legacy support for both its
>> control and data path flows.
>>
>> [1]
>> https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
>>
>> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>> ---
>>   MAINTAINERS                      |   7 +
>>   drivers/vfio/pci/Kconfig         |   2 +
>>   drivers/vfio/pci/Makefile        |   2 +
>>   drivers/vfio/pci/virtio/Kconfig  |  16 +
>>   drivers/vfio/pci/virtio/Makefile |   4 +
>>   drivers/vfio/pci/virtio/main.c   | 554 +++++++++++++++++++++++++++++++
>>   6 files changed, 585 insertions(+)
>>   create mode 100644 drivers/vfio/pci/virtio/Kconfig
>>   create mode 100644 drivers/vfio/pci/virtio/Makefile
>>   create mode 100644 drivers/vfio/pci/virtio/main.c
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 012df8ccf34e..b246b769092d 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -22872,6 +22872,13 @@ L:	kvm@vger.kernel.org
>>   S:	Maintained
>>   F:	drivers/vfio/pci/mlx5/
>>   
>> +VFIO VIRTIO PCI DRIVER
>> +M:	Yishai Hadas <yishaih@nvidia.com>
>> +L:	kvm@vger.kernel.org
>> +L:	virtualization@lists.linux-foundation.org
>> +S:	Maintained
>> +F:	drivers/vfio/pci/virtio
>> +
>>   VFIO PCI DEVICE SPECIFIC DRIVERS
>>   R:	Jason Gunthorpe <jgg@nvidia.com>
>>   R:	Yishai Hadas <yishaih@nvidia.com>
>> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
>> index 8125e5f37832..18c397df566d 100644
>> --- a/drivers/vfio/pci/Kconfig
>> +++ b/drivers/vfio/pci/Kconfig
>> @@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
>>   
>>   source "drivers/vfio/pci/pds/Kconfig"
>>   
>> +source "drivers/vfio/pci/virtio/Kconfig"
>> +
>>   endmenu
>> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
>> index 45167be462d8..046139a4eca5 100644
>> --- a/drivers/vfio/pci/Makefile
>> +++ b/drivers/vfio/pci/Makefile
>> @@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
>>   obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
>>   
>>   obj-$(CONFIG_PDS_VFIO_PCI) += pds/
>> +
>> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
>> diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
>> new file mode 100644
>> index 000000000000..3a6707639220
>> --- /dev/null
>> +++ b/drivers/vfio/pci/virtio/Kconfig
>> @@ -0,0 +1,16 @@
>> +# SPDX-License-Identifier: GPL-2.0-only
>> +config VIRTIO_VFIO_PCI
>> +        tristate "VFIO support for VIRTIO NET PCI devices"
>> +        depends on VIRTIO_PCI
>> +        select VFIO_PCI_CORE
>> +        help
>> +          This provides support for exposing VIRTIO NET VF devices which support
>> +          legacy IO access, using the VFIO framework that can work with a legacy
>> +          virtio driver in the guest.
>> +          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
>> +          not indicate I/O Space.
>> +          As of that this driver emulated I/O BAR in software to let a VF be
>> +          seen as a transitional device in the guest and let it work with
>> +          a legacy driver.
>> +
>> +          If you don't know what to do here, say N.
>> diff --git a/drivers/vfio/pci/virtio/Makefile b/drivers/vfio/pci/virtio/Makefile
>> new file mode 100644
>> index 000000000000..2039b39fb723
>> --- /dev/null
>> +++ b/drivers/vfio/pci/virtio/Makefile
>> @@ -0,0 +1,4 @@
>> +# SPDX-License-Identifier: GPL-2.0-only
>> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio-vfio-pci.o
>> +virtio-vfio-pci-y := main.o
>> +
>> diff --git a/drivers/vfio/pci/virtio/main.c b/drivers/vfio/pci/virtio/main.c
>> new file mode 100644
>> index 000000000000..3ca19c891673
>> --- /dev/null
>> +++ b/drivers/vfio/pci/virtio/main.c
>> @@ -0,0 +1,554 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
>> + */
>> +
>> +#include <linux/device.h>
>> +#include <linux/module.h>
>> +#include <linux/mutex.h>
>> +#include <linux/pci.h>
>> +#include <linux/pm_runtime.h>
>> +#include <linux/types.h>
>> +#include <linux/uaccess.h>
>> +#include <linux/vfio.h>
>> +#include <linux/vfio_pci_core.h>
>> +#include <linux/virtio_pci.h>
>> +#include <linux/virtio_net.h>
>> +#include <linux/virtio_pci_admin.h>
>> +
>> +struct virtiovf_pci_core_device {
>> +	struct vfio_pci_core_device core_device;
>> +	u8 bar0_virtual_buf_size;
> 
> Poor packing here, move it down with notify_bar.

OK

> 
>> +	u8 *bar0_virtual_buf;
>> +	/* synchronize access to the virtual buf */
>> +	struct mutex bar_mutex;
>> +	void __iomem *notify_addr;
>> +	u64 notify_offset;
>> +	__le32 pci_base_addr_0;
>> +	__le16 pci_cmd;
>> +	u8 notify_bar;
>> +};
>> +
>> +static int
>> +virtiovf_issue_legacy_rw_cmd(struct virtiovf_pci_core_device *virtvdev,
>> +			     loff_t pos, char __user *buf,
>> +			     size_t count, bool read)
>> +{
>> +	bool msix_enabled =
>> +		(virtvdev->core_device.irq_type == VFIO_PCI_MSIX_IRQ_INDEX);
> 
> This is racy, the irq_type could change, do we care or does that fall
> into poor driver behavior?

This clearly falls into a poor driver behavior (e.g QEMU reads from a 
given offset considering current irq_type once it changes it in parallel).

> 
>> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
>> +	u8 *bar0_buf = virtvdev->bar0_virtual_buf;
>> +	bool common;
>> +	u8 offset;
>> +	int ret;
>> +
>> +	common = pos < VIRTIO_PCI_CONFIG_OFF(msix_enabled);
> 
> I don't understand virtio, but this looks like it has the effect of the
> mac and status fields shifting in IO BAR depending on whether MSIX
> is enabled.  Doesn't that give the user access to 4 bytes beyond the
> status field? 

It is accessing 4 bytes within the device io region that the device 
would support.

  Is that how it's supposed to work or should we be
> dropping accesses to the 4-bytes MSIX support adds when not enabled?

This is how it's supposed to work, accessing that area has no side effects.

> 
>> +	/* offset within the relevant configuration area */
>> +	offset = common ? pos : pos - VIRTIO_PCI_CONFIG_OFF(msix_enabled);
>> +	mutex_lock(&virtvdev->bar_mutex);
>> +	if (read) {
>> +		if (common)
>> +			ret = virtio_pci_admin_legacy_common_io_read(pdev, offset,
>> +					count, bar0_buf + pos);
>> +		else
>> +			ret = virtio_pci_admin_legacy_device_io_read(pdev, offset,
>> +					count, bar0_buf + pos);
>> +		if (ret)
>> +			goto out;
>> +		if (copy_to_user(buf, bar0_buf + pos, count))
>> +			ret = -EFAULT;
>> +	} else {
>> +		if (copy_from_user(bar0_buf + pos, buf, count)) {
>> +			ret = -EFAULT;
>> +			goto out;
>> +		}
>> +
>> +		if (common)
>> +			ret = virtio_pci_admin_legacy_common_io_write(pdev, offset,
>> +					count, bar0_buf + pos);
>> +		else
>> +			ret = virtio_pci_admin_legacy_device_io_write(pdev, offset,
>> +					count, bar0_buf + pos);
>> +	}
>> +out:
>> +	mutex_unlock(&virtvdev->bar_mutex);
>> +	return ret;
>> +}
>> +
>> +static int
>> +translate_io_bar_to_mem_bar(struct virtiovf_pci_core_device *virtvdev,
>> +			    loff_t pos, char __user *buf,
>> +			    size_t count, bool read)
>> +{
>> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
>> +	u16 queue_notify;
>> +	int ret;
>> +
>> +	if (pos + count > virtvdev->bar0_virtual_buf_size)
>> +		return -EINVAL;
>> +
>> +	switch (pos) {
>> +	case VIRTIO_PCI_QUEUE_NOTIFY:
>> +		if (count != sizeof(queue_notify))
>> +			return -EINVAL;
>> +		if (read) {
>> +			ret = vfio_pci_ioread16(core_device, true, &queue_notify,
>> +						virtvdev->notify_addr);
>> +			if (ret)
>> +				return ret;
>> +			if (copy_to_user(buf, &queue_notify,
>> +					 sizeof(queue_notify)))
>> +				return -EFAULT;
>> +		} else {
>> +			if (copy_from_user(&queue_notify, buf, count))
>> +				return -EFAULT;
>> +			ret = vfio_pci_iowrite16(core_device, true, queue_notify,
>> +						 virtvdev->notify_addr);
>> +		}
>> +		break;
>> +	default:
>> +		ret = virtiovf_issue_legacy_rw_cmd(virtvdev, pos, buf, count,
>> +						   read);
>> +	}
>> +
>> +	return ret ? ret : count;
>> +}
>> +
>> +static bool range_intersect_range(loff_t range1_start, size_t count1,
>> +				  loff_t range2_start, size_t count2,
>> +				  loff_t *start_offset,
>> +				  size_t *intersect_count,
>> +				  size_t *register_offset)
>> +{
>> +	if (range1_start <= range2_start &&
>> +	    range1_start + count1 > range2_start) {
>> +		*start_offset = range2_start - range1_start;
>> +		*intersect_count = min_t(size_t, count2,
>> +					 range1_start + count1 - range2_start);
>> +		*register_offset = 0;
>> +		return true;
>> +	}
>> +
>> +	if (range1_start > range2_start &&
>> +	    range1_start < range2_start + count2) {
>> +		*start_offset = 0;
>> +		*intersect_count = min_t(size_t, count1,
>> +					 range2_start + count2 - range1_start);
>> +		*register_offset = range1_start - range2_start;
>> +		return true;
>> +	}
>> +
>> +	return false;
>> +}
>> +
>> +static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
>> +					char __user *buf, size_t count,
>> +					loff_t *ppos)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
>> +	size_t register_offset;
>> +	loff_t copy_offset;
>> +	size_t copy_count;
>> +	__le32 val32;
>> +	__le16 val16;
>> +	u8 val8;
>> +	int ret;
>> +
>> +	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	if (range_intersect_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
>> +				  &copy_offset, &copy_count, &register_offset)) {
>> +		val16 = cpu_to_le16(VIRTIO_TRANS_ID_NET);
>> +		if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset, copy_count))
>> +			return -EFAULT;
>> +	}
>> +
>> +	if ((le16_to_cpu(virtvdev->pci_cmd) & PCI_COMMAND_IO) &&
>> +	    range_intersect_range(pos, count, PCI_COMMAND, sizeof(val16),
>> +				  &copy_offset, &copy_count, &register_offset)) {
>> +		if (copy_from_user((void *)&val16 + register_offset, buf + copy_offset,
>> +				   copy_count))
>> +			return -EFAULT;
>> +		val16 |= cpu_to_le16(PCI_COMMAND_IO);
>> +		if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset,
>> +				 copy_count))
>> +			return -EFAULT;
>> +	}
> 
> Seems like COMMAND_IO survives across reset, pci_cmd is only ever
> modified by user writes.

I see.

If so, we may need to add a callback for pci_error_handlers.reset_done 
to set it back to zero.

I'll add as part of V5.

> 
> I also don't see that we honor COMMAND_IO for accesses.  Shouldn't reads
> return -1 and writes get dropped if COMMAND_IO is cleared?

You mean that if the 'COMMAND_IO' bit was not set, we should not allow 
read/write towards the IO bar, right ?

If so, I would expect to return -EFAULT from 
translate_io_bar_to_mem_bar() in that case for both read/write and let 
the upper SW layers (e.g. QEMU) to consider that as '-1 (0xfff) data' 
for read and ignore it upon writes.

Agree ?


> 
>> +
>> +	if (range_intersect_range(pos, count, PCI_REVISION_ID, sizeof(val8),
>> +				  &copy_offset, &copy_count, &register_offset)) {
>> +		/* Transional needs to have revision 0 */
>> +		val8 = 0;
>> +		if (copy_to_user(buf + copy_offset, &val8, copy_count))
>> +			return -EFAULT;
>> +	}
>> +
>> +	if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0, sizeof(val32),
>> +				  &copy_offset, &copy_count, &register_offset)) {
>> +		u8 log_bar_size = ilog2(roundup_pow_of_two(virtvdev->bar0_virtual_buf_size));
>> +		u32 mask_size = ~((BIT(log_bar_size) - 1));
> 
> bar0_virtual_buf_size already needs to be a power-of-2, the size we
> report here needs to match the REGION_INFO size, so we should enforce
> the size in virtio_vf_pci_init_device().  I think it's 32bytes, so
> we're actually good on the size,

Yes, this is 32 bytes which is power of 2 and we should be fine as you 
wrote.

Do you suggest to enforce the calculated size explicitly as part of 
virtio_vf_pci_init_device() to be power of 2 with some check ?


  but the above makes no sense, ie.
> rounding a value that's already a power-of-2 to a power-of-2, getting
> the log2 size, shifting that back into the size we started with to get
> a mask... why isn't it just:
> 
> 		u32 bar_mask = ~(virtvdev->bar0_virtual_buf_size - 1);
> ?

Yes, it makes sense, will use that formula as part of V5.

> 
> 
>> +		u32 pci_base_addr_0 = le32_to_cpu(virtvdev->pci_base_addr_0);
>> +
>> +		val32 = cpu_to_le32((pci_base_addr_0 & mask_size) | PCI_BASE_ADDRESS_SPACE_IO);
> 
> Looks like what I suggested to Ankit :)
> 
> Per our prior variant driver agreement, we're going to need to enlist
> another reviewer as well.  Thanks,
> 

OK

Hopefully V5 will address the latest notes from you and be ready as a 
candidate for other people to review.

Thanks,
Yishai

> Alex
> 
>> +		if (copy_to_user(buf + copy_offset, (void *)&val32 + register_offset, copy_count))
>> +			return -EFAULT;
>> +	}
>> +
>> +	if (range_intersect_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16),
>> +				  &copy_offset, &copy_count, &register_offset)) {
>> +		/*
>> +		 * Transitional devices use the PCI subsystem device id as
>> +		 * virtio device id, same as legacy driver always did.
>> +		 */
>> +		val16 = cpu_to_le16(VIRTIO_ID_NET);
>> +		if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset,
>> +				 copy_count))
>> +			return -EFAULT;
>> +	}
>> +
>> +	if (range_intersect_range(pos, count, PCI_SUBSYSTEM_VENDOR_ID, sizeof(val16),
>> +				  &copy_offset, &copy_count, &register_offset)) {
>> +		val16 = cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET);
>> +		if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset,
>> +				 copy_count))
>> +			return -EFAULT;
>> +	}
>> +
>> +	return count;
>> +}
>> +
>> +static ssize_t
>> +virtiovf_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
>> +		       size_t count, loff_t *ppos)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
>> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
>> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
>> +	int ret;
>> +
>> +	if (!count)
>> +		return 0;
>> +
>> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX)
>> +		return virtiovf_pci_read_config(core_vdev, buf, count, ppos);
>> +
>> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
>> +		return vfio_pci_core_read(core_vdev, buf, count, ppos);
>> +
>> +	ret = pm_runtime_resume_and_get(&pdev->dev);
>> +	if (ret) {
>> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n",
>> +				     ret);
>> +		return -EIO;
>> +	}
>> +
>> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, buf, count, true);
>> +	pm_runtime_put(&pdev->dev);
>> +	return ret;
>> +}
>> +
>> +static ssize_t
>> +virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
>> +			size_t count, loff_t *ppos)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
>> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
>> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
>> +	int ret;
>> +
>> +	if (!count)
>> +		return 0;
>> +
>> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
>> +		size_t register_offset;
>> +		loff_t copy_offset;
>> +		size_t copy_count;
>> +
>> +		if (range_intersect_range(pos, count, PCI_COMMAND, sizeof(virtvdev->pci_cmd),
>> +					  &copy_offset, &copy_count,
>> +					  &register_offset)) {
>> +			if (copy_from_user((void *)&virtvdev->pci_cmd + register_offset,
>> +					   buf + copy_offset,
>> +					   copy_count))
>> +				return -EFAULT;
>> +		}
>> +
>> +		if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0,
>> +					  sizeof(virtvdev->pci_base_addr_0),
>> +					  &copy_offset, &copy_count,
>> +					  &register_offset)) {
>> +			if (copy_from_user((void *)&virtvdev->pci_base_addr_0 + register_offset,
>> +					   buf + copy_offset,
>> +					   copy_count))
>> +				return -EFAULT;
>> +		}
>> +	}
>> +
>> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
>> +		return vfio_pci_core_write(core_vdev, buf, count, ppos);
>> +
>> +	ret = pm_runtime_resume_and_get(&pdev->dev);
>> +	if (ret) {
>> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n", ret);
>> +		return -EIO;
>> +	}
>> +
>> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, (char __user *)buf, count, false);
>> +	pm_runtime_put(&pdev->dev);
>> +	return ret;
>> +}
>> +
>> +static int
>> +virtiovf_pci_ioctl_get_region_info(struct vfio_device *core_vdev,
>> +				   unsigned int cmd, unsigned long arg)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>> +	unsigned long minsz = offsetofend(struct vfio_region_info, offset);
>> +	void __user *uarg = (void __user *)arg;
>> +	struct vfio_region_info info = {};
>> +
>> +	if (copy_from_user(&info, uarg, minsz))
>> +		return -EFAULT;
>> +
>> +	if (info.argsz < minsz)
>> +		return -EINVAL;
>> +
>> +	switch (info.index) {
>> +	case VFIO_PCI_BAR0_REGION_INDEX:
>> +		info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
>> +		info.size = virtvdev->bar0_virtual_buf_size;
>> +		info.flags = VFIO_REGION_INFO_FLAG_READ |
>> +			     VFIO_REGION_INFO_FLAG_WRITE;
>> +		return copy_to_user(uarg, &info, minsz) ? -EFAULT : 0;
>> +	default:
>> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
>> +	}
>> +}
>> +
>> +static long
>> +virtiovf_vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
>> +			     unsigned long arg)
>> +{
>> +	switch (cmd) {
>> +	case VFIO_DEVICE_GET_REGION_INFO:
>> +		return virtiovf_pci_ioctl_get_region_info(core_vdev, cmd, arg);
>> +	default:
>> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
>> +	}
>> +}
>> +
>> +static int
>> +virtiovf_set_notify_addr(struct virtiovf_pci_core_device *virtvdev)
>> +{
>> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
>> +	int ret;
>> +
>> +	/*
>> +	 * Setup the BAR where the 'notify' exists to be used by vfio as well
>> +	 * This will let us mmap it only once and use it when needed.
>> +	 */
>> +	ret = vfio_pci_core_setup_barmap(core_device,
>> +					 virtvdev->notify_bar);
>> +	if (ret)
>> +		return ret;
>> +
>> +	virtvdev->notify_addr = core_device->barmap[virtvdev->notify_bar] +
>> +			virtvdev->notify_offset;
>> +	return 0;
>> +}
>> +
>> +static int virtiovf_pci_open_device(struct vfio_device *core_vdev)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>> +	struct vfio_pci_core_device *vdev = &virtvdev->core_device;
>> +	int ret;
>> +
>> +	ret = vfio_pci_core_enable(vdev);
>> +	if (ret)
>> +		return ret;
>> +
>> +	if (virtvdev->bar0_virtual_buf) {
>> +		/*
>> +		 * Upon close_device() the vfio_pci_core_disable() is called
>> +		 * and will close all the previous mmaps, so it seems that the
>> +		 * valid life cycle for the 'notify' addr is per open/close.
>> +		 */
>> +		ret = virtiovf_set_notify_addr(virtvdev);
>> +		if (ret) {
>> +			vfio_pci_core_disable(vdev);
>> +			return ret;
>> +		}
>> +	}
>> +
>> +	vfio_pci_core_finish_enable(vdev);
>> +	return 0;
>> +}
>> +
>> +static int virtiovf_get_device_config_size(unsigned short device)
>> +{
>> +	/* Network card */
>> +	return offsetofend(struct virtio_net_config, status);
>> +}
>> +
>> +static int virtiovf_read_notify_info(struct virtiovf_pci_core_device *virtvdev)
>> +{
>> +	u64 offset;
>> +	int ret;
>> +	u8 bar;
>> +
>> +	ret = virtio_pci_admin_legacy_io_notify_info(virtvdev->core_device.pdev,
>> +				VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM,
>> +				&bar, &offset);
>> +	if (ret)
>> +		return ret;
>> +
>> +	virtvdev->notify_bar = bar;
>> +	virtvdev->notify_offset = offset;
>> +	return 0;
>> +}
>> +
>> +static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>> +	struct pci_dev *pdev;
>> +	int ret;
>> +
>> +	ret = vfio_pci_core_init_dev(core_vdev);
>> +	if (ret)
>> +		return ret;
>> +
>> +	pdev = virtvdev->core_device.pdev;
>> +	ret = virtiovf_read_notify_info(virtvdev);
>> +	if (ret)
>> +		return ret;
>> +
>> +	/* Being ready with a buffer that supports MSIX */
>> +	virtvdev->bar0_virtual_buf_size = VIRTIO_PCI_CONFIG_OFF(true) +
>> +				virtiovf_get_device_config_size(pdev->device);
>> +	virtvdev->bar0_virtual_buf = kzalloc(virtvdev->bar0_virtual_buf_size,
>> +					     GFP_KERNEL);
>> +	if (!virtvdev->bar0_virtual_buf)
>> +		return -ENOMEM;
>> +	mutex_init(&virtvdev->bar_mutex);
>> +	return 0;
>> +}
>> +
>> +static void virtiovf_pci_core_release_dev(struct vfio_device *core_vdev)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>> +
>> +	kfree(virtvdev->bar0_virtual_buf);
>> +	vfio_pci_core_release_dev(core_vdev);
>> +}
>> +
>> +static const struct vfio_device_ops virtiovf_vfio_pci_tran_ops = {
>> +	.name = "virtio-vfio-pci-trans",
>> +	.init = virtiovf_pci_init_device,
>> +	.release = virtiovf_pci_core_release_dev,
>> +	.open_device = virtiovf_pci_open_device,
>> +	.close_device = vfio_pci_core_close_device,
>> +	.ioctl = virtiovf_vfio_pci_core_ioctl,
>> +	.read = virtiovf_pci_core_read,
>> +	.write = virtiovf_pci_core_write,
>> +	.mmap = vfio_pci_core_mmap,
>> +	.request = vfio_pci_core_request,
>> +	.match = vfio_pci_core_match,
>> +	.bind_iommufd = vfio_iommufd_physical_bind,
>> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
>> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
>> +};
>> +
>> +static const struct vfio_device_ops virtiovf_vfio_pci_ops = {
>> +	.name = "virtio-vfio-pci",
>> +	.init = vfio_pci_core_init_dev,
>> +	.release = vfio_pci_core_release_dev,
>> +	.open_device = virtiovf_pci_open_device,
>> +	.close_device = vfio_pci_core_close_device,
>> +	.ioctl = vfio_pci_core_ioctl,
>> +	.device_feature = vfio_pci_core_ioctl_feature,
>> +	.read = vfio_pci_core_read,
>> +	.write = vfio_pci_core_write,
>> +	.mmap = vfio_pci_core_mmap,
>> +	.request = vfio_pci_core_request,
>> +	.match = vfio_pci_core_match,
>> +	.bind_iommufd = vfio_iommufd_physical_bind,
>> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
>> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
>> +};
>> +
>> +static bool virtiovf_bar0_exists(struct pci_dev *pdev)
>> +{
>> +	struct resource *res = pdev->resource;
>> +
>> +	return res->flags ? true : false;
>> +}
>> +
>> +static int virtiovf_pci_probe(struct pci_dev *pdev,
>> +			      const struct pci_device_id *id)
>> +{
>> +	const struct vfio_device_ops *ops = &virtiovf_vfio_pci_ops;
>> +	struct virtiovf_pci_core_device *virtvdev;
>> +	int ret;
>> +
>> +	if (pdev->is_virtfn && virtio_pci_admin_has_legacy_io(pdev) &&
>> +	    !virtiovf_bar0_exists(pdev))
>> +		ops = &virtiovf_vfio_pci_tran_ops;
>> +
>> +	virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
>> +				     &pdev->dev, ops);
>> +	if (IS_ERR(virtvdev))
>> +		return PTR_ERR(virtvdev);
>> +
>> +	dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
>> +	ret = vfio_pci_core_register_device(&virtvdev->core_device);
>> +	if (ret)
>> +		goto out;
>> +	return 0;
>> +out:
>> +	vfio_put_device(&virtvdev->core_device.vdev);
>> +	return ret;
>> +}
>> +
>> +static void virtiovf_pci_remove(struct pci_dev *pdev)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
>> +
>> +	vfio_pci_core_unregister_device(&virtvdev->core_device);
>> +	vfio_put_device(&virtvdev->core_device.vdev);
>> +}
>> +
>> +static const struct pci_device_id virtiovf_pci_table[] = {
>> +	/* Only virtio-net is supported/tested so far */
>> +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1041) },
>> +	{}
>> +};
>> +
>> +MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
>> +
>> +static struct pci_driver virtiovf_pci_driver = {
>> +	.name = KBUILD_MODNAME,
>> +	.id_table = virtiovf_pci_table,
>> +	.probe = virtiovf_pci_probe,
>> +	.remove = virtiovf_pci_remove,
>> +	.err_handler = &vfio_pci_core_err_handlers,
>> +	.driver_managed_dma = true,
>> +};
>> +
>> +module_pci_driver(virtiovf_pci_driver);
>> +
>> +MODULE_LICENSE("GPL");
>> +MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
>> +MODULE_DESCRIPTION(
>> +	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO NET devices");
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V4 vfio 8/9] vfio/pci: Expose vfio_pci_iowrite/read##size()
  2023-12-03 14:14     ` Yishai Hadas
@ 2023-12-04 22:57       ` Alex Williamson
  0 siblings, 0 replies; 22+ messages in thread
From: Alex Williamson @ 2023-12-04 22:57 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: mst, jasowang, jgg, kvm, virtualization, parav, feliu, jiri,
	kevin.tian, joao.m.martins, si-wei.liu, leonro, maorg

On Sun, 3 Dec 2023 16:14:15 +0200
Yishai Hadas <yishaih@nvidia.com> wrote:

> On 30/11/2023 21:20, Alex Williamson wrote:
> > On Wed, 29 Nov 2023 16:37:45 +0200
> > Yishai Hadas <yishaih@nvidia.com> wrote:
> >   
> >> Expose vfio_pci_iowrite/read##size() to let it be used by drivers.
> >>
> >> This functionality is needed to enable direct access to some physical
> >> BAR of the device with the proper locks/checks in place.
> >>
> >> The next patches from this series will use this functionality on a data
> >> path flow when a direct access to the BAR is needed.
> >>
> >> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> >> ---
> >>   drivers/vfio/pci/vfio_pci_rdwr.c | 10 ++++++----
> >>   include/linux/vfio_pci_core.h    | 19 +++++++++++++++++++
> >>   2 files changed, 25 insertions(+), 4 deletions(-)  
> > 
> > I don't follow the inconsistency between this and the previous patch.
> > Why did we move and rename the code to setup the barmap but we export
> > the ioread/write functions in place?  Thanks,
> > 
> > Alex  
> 
> The mount of code for barmap setup was quite small compared to the 
> ioread/write functions.
> 
> However, I agree, we can be consistent here and export in both cases the 
> functions in place as part of vfio_pci_rdwr.c which is already part of 
> vfio_pci_core.ko
> 
> I may also rename in current patch vfio_pci_iowrite/read<xxx> to have 
> the 'core' prefix as part of the functions names (i.e. 
> vfio_pci_core_iowrite/read<xxx>) to be consistent with other exported 
> core functions and adapt the callers to this name.
> 
> Makes sense ?

Yep.  Thanks,

Alex

> >> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
> >> index 6f08b3ecbb89..817ec9a89123 100644
> >> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
> >> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
> >> @@ -38,7 +38,7 @@
> >>   #define vfio_iowrite8	iowrite8
> >>   
> >>   #define VFIO_IOWRITE(size) \
> >> -static int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev,		\
> >> +int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev,		\
> >>   			bool test_mem, u##size val, void __iomem *io)	\
> >>   {									\
> >>   	if (test_mem) {							\
> >> @@ -55,7 +55,8 @@ static int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev,		\
> >>   		up_read(&vdev->memory_lock);				\
> >>   									\
> >>   	return 0;							\
> >> -}
> >> +}									\
> >> +EXPORT_SYMBOL_GPL(vfio_pci_iowrite##size);
> >>   
> >>   VFIO_IOWRITE(8)
> >>   VFIO_IOWRITE(16)
> >> @@ -65,7 +66,7 @@ VFIO_IOWRITE(64)
> >>   #endif
> >>   
> >>   #define VFIO_IOREAD(size) \
> >> -static int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev,		\
> >> +int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev,		\
> >>   			bool test_mem, u##size *val, void __iomem *io)	\
> >>   {									\
> >>   	if (test_mem) {							\
> >> @@ -82,7 +83,8 @@ static int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev,		\
> >>   		up_read(&vdev->memory_lock);				\
> >>   									\
> >>   	return 0;							\
> >> -}
> >> +}									\
> >> +EXPORT_SYMBOL_GPL(vfio_pci_ioread##size);
> >>   
> >>   VFIO_IOREAD(8)
> >>   VFIO_IOREAD(16)
> >> diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
> >> index 67ac58e20e1d..22c915317788 100644
> >> --- a/include/linux/vfio_pci_core.h
> >> +++ b/include/linux/vfio_pci_core.h
> >> @@ -131,4 +131,23 @@ int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar);
> >>   pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev,
> >>   						pci_channel_state_t state);
> >>   
> >> +#define VFIO_IOWRITE_DECLATION(size) \
> >> +int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev,		\
> >> +			bool test_mem, u##size val, void __iomem *io);
> >> +
> >> +VFIO_IOWRITE_DECLATION(8)
> >> +VFIO_IOWRITE_DECLATION(16)
> >> +VFIO_IOWRITE_DECLATION(32)
> >> +#ifdef iowrite64
> >> +VFIO_IOWRITE_DECLATION(64)
> >> +#endif
> >> +
> >> +#define VFIO_IOREAD_DECLATION(size) \
> >> +int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev,		\
> >> +			bool test_mem, u##size *val, void __iomem *io);
> >> +
> >> +VFIO_IOREAD_DECLATION(8)
> >> +VFIO_IOREAD_DECLATION(16)
> >> +VFIO_IOREAD_DECLATION(32)
> >> +
> >>   #endif /* VFIO_PCI_CORE_H */  
> >   
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V4 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-12-03 14:54     ` Yishai Hadas
@ 2023-12-04 23:32       ` Alex Williamson
  2023-12-05 14:20         ` Yishai Hadas
  0 siblings, 1 reply; 22+ messages in thread
From: Alex Williamson @ 2023-12-04 23:32 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: mst, jasowang, jgg, kvm, virtualization, parav, feliu, jiri,
	kevin.tian, joao.m.martins, si-wei.liu, leonro, maorg, tzurielk

On Sun, 3 Dec 2023 16:54:41 +0200
Yishai Hadas <yishaih@nvidia.com> wrote:

> On 01/12/2023 0:10, Alex Williamson wrote:
> > On Wed, 29 Nov 2023 16:37:46 +0200
> > Yishai Hadas <yishaih@nvidia.com> wrote:
> >   
> >> Introduce a vfio driver over virtio devices to support the legacy
> >> interface functionality for VFs.
> >>
> >> Background, from the virtio spec [1].
> >> --------------------------------------------------------------------
> >> In some systems, there is a need to support a virtio legacy driver with
> >> a device that does not directly support the legacy interface. In such
> >> scenarios, a group owner device can provide the legacy interface
> >> functionality for the group member devices. The driver of the owner
> >> device can then access the legacy interface of a member device on behalf
> >> of the legacy member device driver.
> >>
> >> For example, with the SR-IOV group type, group members (VFs) can not
> >> present the legacy interface in an I/O BAR in BAR0 as expected by the
> >> legacy pci driver. If the legacy driver is running inside a virtual
> >> machine, the hypervisor executing the virtual machine can present a
> >> virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
> >> legacy driver accesses to this I/O BAR and forwards them to the group
> >> owner device (PF) using group administration commands.
> >> --------------------------------------------------------------------
> >>
> >> Specifically, this driver adds support for a virtio-net VF to be exposed
> >> as a transitional device to a guest driver and allows the legacy IO BAR
> >> functionality on top.
> >>
> >> This allows a VM which uses a legacy virtio-net driver in the guest to
> >> work transparently over a VF which its driver in the host is that new
> >> driver.
> >>
> >> The driver can be extended easily to support some other types of virtio
> >> devices (e.g virtio-blk), by adding in a few places the specific type
> >> properties as was done for virtio-net.
> >>
> >> For now, only the virtio-net use case was tested and as such we introduce
> >> the support only for such a device.
> >>
> >> Practically,
> >> Upon probing a VF for a virtio-net device, in case its PF supports
> >> legacy access over the virtio admin commands and the VF doesn't have BAR
> >> 0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
> >> transitional device with I/O BAR in BAR 0.
> >>
> >> The existence of the simulated I/O bar is reported later on by
> >> overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
> >> exposes itself as a transitional device by overwriting some properties
> >> upon reading its config space.
> >>
> >> Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
> >> guest may use it via read/write calls according to the virtio
> >> specification.
> >>
> >> Any read/write towards the control parts of the BAR will be captured by
> >> the new driver and will be translated into admin commands towards the
> >> device.
> >>
> >> Any data path read/write access (i.e. virtio driver notifications) will
> >> be forwarded to the physical BAR which its properties were supplied by
> >> the admin command VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO upon the
> >> probing/init flow.
> >>
> >> With that code in place a legacy driver in the guest has the look and
> >> feel as if having a transitional device with legacy support for both its
> >> control and data path flows.
> >>
> >> [1]
> >> https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
> >>
> >> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> >> ---
> >>   MAINTAINERS                      |   7 +
> >>   drivers/vfio/pci/Kconfig         |   2 +
> >>   drivers/vfio/pci/Makefile        |   2 +
> >>   drivers/vfio/pci/virtio/Kconfig  |  16 +
> >>   drivers/vfio/pci/virtio/Makefile |   4 +
> >>   drivers/vfio/pci/virtio/main.c   | 554 +++++++++++++++++++++++++++++++
> >>   6 files changed, 585 insertions(+)
> >>   create mode 100644 drivers/vfio/pci/virtio/Kconfig
> >>   create mode 100644 drivers/vfio/pci/virtio/Makefile
> >>   create mode 100644 drivers/vfio/pci/virtio/main.c
> >>
> >> diff --git a/MAINTAINERS b/MAINTAINERS
> >> index 012df8ccf34e..b246b769092d 100644
> >> --- a/MAINTAINERS
> >> +++ b/MAINTAINERS
> >> @@ -22872,6 +22872,13 @@ L:	kvm@vger.kernel.org
> >>   S:	Maintained
> >>   F:	drivers/vfio/pci/mlx5/
> >>   
> >> +VFIO VIRTIO PCI DRIVER
> >> +M:	Yishai Hadas <yishaih@nvidia.com>
> >> +L:	kvm@vger.kernel.org
> >> +L:	virtualization@lists.linux-foundation.org
> >> +S:	Maintained
> >> +F:	drivers/vfio/pci/virtio
> >> +
> >>   VFIO PCI DEVICE SPECIFIC DRIVERS
> >>   R:	Jason Gunthorpe <jgg@nvidia.com>
> >>   R:	Yishai Hadas <yishaih@nvidia.com>
> >> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> >> index 8125e5f37832..18c397df566d 100644
> >> --- a/drivers/vfio/pci/Kconfig
> >> +++ b/drivers/vfio/pci/Kconfig
> >> @@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
> >>   
> >>   source "drivers/vfio/pci/pds/Kconfig"
> >>   
> >> +source "drivers/vfio/pci/virtio/Kconfig"
> >> +
> >>   endmenu
> >> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
> >> index 45167be462d8..046139a4eca5 100644
> >> --- a/drivers/vfio/pci/Makefile
> >> +++ b/drivers/vfio/pci/Makefile
> >> @@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
> >>   obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
> >>   
> >>   obj-$(CONFIG_PDS_VFIO_PCI) += pds/
> >> +
> >> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
> >> diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
> >> new file mode 100644
> >> index 000000000000..3a6707639220
> >> --- /dev/null
> >> +++ b/drivers/vfio/pci/virtio/Kconfig
> >> @@ -0,0 +1,16 @@
> >> +# SPDX-License-Identifier: GPL-2.0-only
> >> +config VIRTIO_VFIO_PCI
> >> +        tristate "VFIO support for VIRTIO NET PCI devices"
> >> +        depends on VIRTIO_PCI
> >> +        select VFIO_PCI_CORE
> >> +        help
> >> +          This provides support for exposing VIRTIO NET VF devices which support
> >> +          legacy IO access, using the VFIO framework that can work with a legacy
> >> +          virtio driver in the guest.
> >> +          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
> >> +          not indicate I/O Space.
> >> +          As of that this driver emulated I/O BAR in software to let a VF be
> >> +          seen as a transitional device in the guest and let it work with
> >> +          a legacy driver.
> >> +
> >> +          If you don't know what to do here, say N.
> >> diff --git a/drivers/vfio/pci/virtio/Makefile b/drivers/vfio/pci/virtio/Makefile
> >> new file mode 100644
> >> index 000000000000..2039b39fb723
> >> --- /dev/null
> >> +++ b/drivers/vfio/pci/virtio/Makefile
> >> @@ -0,0 +1,4 @@
> >> +# SPDX-License-Identifier: GPL-2.0-only
> >> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio-vfio-pci.o
> >> +virtio-vfio-pci-y := main.o
> >> +
> >> diff --git a/drivers/vfio/pci/virtio/main.c b/drivers/vfio/pci/virtio/main.c
> >> new file mode 100644
> >> index 000000000000..3ca19c891673
> >> --- /dev/null
> >> +++ b/drivers/vfio/pci/virtio/main.c
> >> @@ -0,0 +1,554 @@
> >> +// SPDX-License-Identifier: GPL-2.0-only
> >> +/*
> >> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> >> + */
> >> +
> >> +#include <linux/device.h>
> >> +#include <linux/module.h>
> >> +#include <linux/mutex.h>
> >> +#include <linux/pci.h>
> >> +#include <linux/pm_runtime.h>
> >> +#include <linux/types.h>
> >> +#include <linux/uaccess.h>
> >> +#include <linux/vfio.h>
> >> +#include <linux/vfio_pci_core.h>
> >> +#include <linux/virtio_pci.h>
> >> +#include <linux/virtio_net.h>
> >> +#include <linux/virtio_pci_admin.h>
> >> +
> >> +struct virtiovf_pci_core_device {
> >> +	struct vfio_pci_core_device core_device;
> >> +	u8 bar0_virtual_buf_size;  
> > 
> > Poor packing here, move it down with notify_bar.  
> 
> OK
> 
> >   
> >> +	u8 *bar0_virtual_buf;
> >> +	/* synchronize access to the virtual buf */
> >> +	struct mutex bar_mutex;
> >> +	void __iomem *notify_addr;
> >> +	u64 notify_offset;
> >> +	__le32 pci_base_addr_0;
> >> +	__le16 pci_cmd;
> >> +	u8 notify_bar;
> >> +};
> >> +
> >> +static int
> >> +virtiovf_issue_legacy_rw_cmd(struct virtiovf_pci_core_device *virtvdev,
> >> +			     loff_t pos, char __user *buf,
> >> +			     size_t count, bool read)
> >> +{
> >> +	bool msix_enabled =
> >> +		(virtvdev->core_device.irq_type == VFIO_PCI_MSIX_IRQ_INDEX);  
> > 
> > This is racy, the irq_type could change, do we care or does that fall
> > into poor driver behavior?  
> 
> This clearly falls into a poor driver behavior (e.g QEMU reads from a 
> given offset considering current irq_type once it changes it in parallel).

Ok
  
> >> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
> >> +	u8 *bar0_buf = virtvdev->bar0_virtual_buf;
> >> +	bool common;
> >> +	u8 offset;
> >> +	int ret;
> >> +
> >> +	common = pos < VIRTIO_PCI_CONFIG_OFF(msix_enabled);  
> > 
> > I don't understand virtio, but this looks like it has the effect of the
> > mac and status fields shifting in IO BAR depending on whether MSIX
> > is enabled.  Doesn't that give the user access to 4 bytes beyond the
> > status field?   
> 
> It is accessing 4 bytes within the device io region that the device 
> would support.
> 
>   Is that how it's supposed to work or should we be
> > dropping accesses to the 4-bytes MSIX support adds when not enabled?  
> 
> This is how it's supposed to work, accessing that area has no side effects.

Ok
 
> >> +	/* offset within the relevant configuration area */
> >> +	offset = common ? pos : pos - VIRTIO_PCI_CONFIG_OFF(msix_enabled);
> >> +	mutex_lock(&virtvdev->bar_mutex);
> >> +	if (read) {
> >> +		if (common)
> >> +			ret = virtio_pci_admin_legacy_common_io_read(pdev, offset,
> >> +					count, bar0_buf + pos);
> >> +		else
> >> +			ret = virtio_pci_admin_legacy_device_io_read(pdev, offset,
> >> +					count, bar0_buf + pos);
> >> +		if (ret)
> >> +			goto out;
> >> +		if (copy_to_user(buf, bar0_buf + pos, count))
> >> +			ret = -EFAULT;
> >> +	} else {
> >> +		if (copy_from_user(bar0_buf + pos, buf, count)) {
> >> +			ret = -EFAULT;
> >> +			goto out;
> >> +		}
> >> +
> >> +		if (common)
> >> +			ret = virtio_pci_admin_legacy_common_io_write(pdev, offset,
> >> +					count, bar0_buf + pos);
> >> +		else
> >> +			ret = virtio_pci_admin_legacy_device_io_write(pdev, offset,
> >> +					count, bar0_buf + pos);
> >> +	}
> >> +out:
> >> +	mutex_unlock(&virtvdev->bar_mutex);
> >> +	return ret;
> >> +}
> >> +
> >> +static int
> >> +translate_io_bar_to_mem_bar(struct virtiovf_pci_core_device *virtvdev,
> >> +			    loff_t pos, char __user *buf,
> >> +			    size_t count, bool read)
> >> +{
> >> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
> >> +	u16 queue_notify;
> >> +	int ret;
> >> +
> >> +	if (pos + count > virtvdev->bar0_virtual_buf_size)
> >> +		return -EINVAL;
> >> +
> >> +	switch (pos) {
> >> +	case VIRTIO_PCI_QUEUE_NOTIFY:
> >> +		if (count != sizeof(queue_notify))
> >> +			return -EINVAL;
> >> +		if (read) {
> >> +			ret = vfio_pci_ioread16(core_device, true, &queue_notify,
> >> +						virtvdev->notify_addr);
> >> +			if (ret)
> >> +				return ret;
> >> +			if (copy_to_user(buf, &queue_notify,
> >> +					 sizeof(queue_notify)))
> >> +				return -EFAULT;
> >> +		} else {
> >> +			if (copy_from_user(&queue_notify, buf, count))
> >> +				return -EFAULT;
> >> +			ret = vfio_pci_iowrite16(core_device, true, queue_notify,
> >> +						 virtvdev->notify_addr);
> >> +		}
> >> +		break;
> >> +	default:
> >> +		ret = virtiovf_issue_legacy_rw_cmd(virtvdev, pos, buf, count,
> >> +						   read);
> >> +	}
> >> +
> >> +	return ret ? ret : count;
> >> +}
> >> +
> >> +static bool range_intersect_range(loff_t range1_start, size_t count1,
> >> +				  loff_t range2_start, size_t count2,
> >> +				  loff_t *start_offset,
> >> +				  size_t *intersect_count,
> >> +				  size_t *register_offset)
> >> +{
> >> +	if (range1_start <= range2_start &&
> >> +	    range1_start + count1 > range2_start) {
> >> +		*start_offset = range2_start - range1_start;
> >> +		*intersect_count = min_t(size_t, count2,
> >> +					 range1_start + count1 - range2_start);
> >> +		*register_offset = 0;
> >> +		return true;
> >> +	}
> >> +
> >> +	if (range1_start > range2_start &&
> >> +	    range1_start < range2_start + count2) {
> >> +		*start_offset = 0;
> >> +		*intersect_count = min_t(size_t, count1,
> >> +					 range2_start + count2 - range1_start);
> >> +		*register_offset = range1_start - range2_start;
> >> +		return true;
> >> +	}
> >> +
> >> +	return false;
> >> +}
> >> +
> >> +static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
> >> +					char __user *buf, size_t count,
> >> +					loff_t *ppos)
> >> +{
> >> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> >> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> >> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> >> +	size_t register_offset;
> >> +	loff_t copy_offset;
> >> +	size_t copy_count;
> >> +	__le32 val32;
> >> +	__le16 val16;
> >> +	u8 val8;
> >> +	int ret;
> >> +
> >> +	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
> >> +	if (ret < 0)
> >> +		return ret;
> >> +
> >> +	if (range_intersect_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
> >> +				  &copy_offset, &copy_count, &register_offset)) {
> >> +		val16 = cpu_to_le16(VIRTIO_TRANS_ID_NET);
> >> +		if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset, copy_count))
> >> +			return -EFAULT;
> >> +	}
> >> +
> >> +	if ((le16_to_cpu(virtvdev->pci_cmd) & PCI_COMMAND_IO) &&
> >> +	    range_intersect_range(pos, count, PCI_COMMAND, sizeof(val16),
> >> +				  &copy_offset, &copy_count, &register_offset)) {
> >> +		if (copy_from_user((void *)&val16 + register_offset, buf + copy_offset,
> >> +				   copy_count))
> >> +			return -EFAULT;
> >> +		val16 |= cpu_to_le16(PCI_COMMAND_IO);
> >> +		if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset,
> >> +				 copy_count))
> >> +			return -EFAULT;
> >> +	}  
> > 
> > Seems like COMMAND_IO survives across reset, pci_cmd is only ever
> > modified by user writes.  
> 
> I see.
> 
> If so, we may need to add a callback for pci_error_handlers.reset_done 
> to set it back to zero.
> 
> I'll add as part of V5.
> 
> > 
> > I also don't see that we honor COMMAND_IO for accesses.  Shouldn't reads
> > return -1 and writes get dropped if COMMAND_IO is cleared?  
> 
> You mean that if the 'COMMAND_IO' bit was not set, we should not allow 
> read/write towards the IO bar, right ?

Yes

> If so, I would expect to return -EFAULT from 
> translate_io_bar_to_mem_bar() in that case for both read/write and let 
> the upper SW layers (e.g. QEMU) to consider that as '-1 (0xfff) data' 
> for read and ignore it upon writes.
> 
> Agree ?

vfio_pci_io{read,write}##size returns -EIO for MMIO accesses when
COMMAND_MEMORY is disabled, might as well follow that precedent rather
than -EFAULT.

Note that this will be a little different from a physical IO BAR
though.  It's not been necessary to check COMMAND_IO for such accesses
since IO space will soft fail and we can therefore let the IO system
fabricate the read value and drop writes.  QEMU might be overly verbose
getting an errno, but it should do the write thing relative to the VM.

> >> +
> >> +	if (range_intersect_range(pos, count, PCI_REVISION_ID, sizeof(val8),
> >> +				  &copy_offset, &copy_count, &register_offset)) {
> >> +		/* Transional needs to have revision 0 */
> >> +		val8 = 0;
> >> +		if (copy_to_user(buf + copy_offset, &val8, copy_count))
> >> +			return -EFAULT;
> >> +	}
> >> +
> >> +	if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0, sizeof(val32),
> >> +				  &copy_offset, &copy_count, &register_offset)) {
> >> +		u8 log_bar_size = ilog2(roundup_pow_of_two(virtvdev->bar0_virtual_buf_size));
> >> +		u32 mask_size = ~((BIT(log_bar_size) - 1));  
> > 
> > bar0_virtual_buf_size already needs to be a power-of-2, the size we
> > report here needs to match the REGION_INFO size, so we should enforce
> > the size in virtio_vf_pci_init_device().  I think it's 32bytes, so
> > we're actually good on the size,  
> 
> Yes, this is 32 bytes which is power of 2 and we should be fine as you 
> wrote.
> 
> Do you suggest to enforce the calculated size explicitly as part of 
> virtio_vf_pci_init_device() to be power of 2 with some check ?

Yes, I'd go ahead and do the roundup there or maybe assert that it's a
power of 2, potentially even a build bug check since it's sized based
on uAPI structures.  Essentially just something to prove we're not
creating a bogus size and reinforce that we can do things like below
without all the unnecessary operations above.  Thanks,

Alex

>   but the above makes no sense, ie.
> > rounding a value that's already a power-of-2 to a power-of-2, getting
> > the log2 size, shifting that back into the size we started with to get
> > a mask... why isn't it just:
> > 
> > 		u32 bar_mask = ~(virtvdev->bar0_virtual_buf_size - 1);
> > ?  
> 
> Yes, it makes sense, will use that formula as part of V5.
> 
> > 
> >   
> >> +		u32 pci_base_addr_0 = le32_to_cpu(virtvdev->pci_base_addr_0);
> >> +
> >> +		val32 = cpu_to_le32((pci_base_addr_0 & mask_size) | PCI_BASE_ADDRESS_SPACE_IO);  
> > 
> > Looks like what I suggested to Ankit :)
> > 
> > Per our prior variant driver agreement, we're going to need to enlist
> > another reviewer as well.  Thanks,
> >   
> 
> OK
> 
> Hopefully V5 will address the latest notes from you and be ready as a 
> candidate for other people to review.
> 
> Thanks,
> Yishai
> 
> > Alex
> >   
> >> +		if (copy_to_user(buf + copy_offset, (void *)&val32 + register_offset, copy_count))
> >> +			return -EFAULT;
> >> +	}
> >> +
> >> +	if (range_intersect_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16),
> >> +				  &copy_offset, &copy_count, &register_offset)) {
> >> +		/*
> >> +		 * Transitional devices use the PCI subsystem device id as
> >> +		 * virtio device id, same as legacy driver always did.
> >> +		 */
> >> +		val16 = cpu_to_le16(VIRTIO_ID_NET);
> >> +		if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset,
> >> +				 copy_count))
> >> +			return -EFAULT;
> >> +	}
> >> +
> >> +	if (range_intersect_range(pos, count, PCI_SUBSYSTEM_VENDOR_ID, sizeof(val16),
> >> +				  &copy_offset, &copy_count, &register_offset)) {
> >> +		val16 = cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET);
> >> +		if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset,
> >> +				 copy_count))
> >> +			return -EFAULT;
> >> +	}
> >> +
> >> +	return count;
> >> +}
> >> +
> >> +static ssize_t
> >> +virtiovf_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
> >> +		       size_t count, loff_t *ppos)
> >> +{
> >> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> >> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> >> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
> >> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> >> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> >> +	int ret;
> >> +
> >> +	if (!count)
> >> +		return 0;
> >> +
> >> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX)
> >> +		return virtiovf_pci_read_config(core_vdev, buf, count, ppos);
> >> +
> >> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
> >> +		return vfio_pci_core_read(core_vdev, buf, count, ppos);
> >> +
> >> +	ret = pm_runtime_resume_and_get(&pdev->dev);
> >> +	if (ret) {
> >> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n",
> >> +				     ret);
> >> +		return -EIO;
> >> +	}
> >> +
> >> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, buf, count, true);
> >> +	pm_runtime_put(&pdev->dev);
> >> +	return ret;
> >> +}
> >> +
> >> +static ssize_t
> >> +virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
> >> +			size_t count, loff_t *ppos)
> >> +{
> >> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> >> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> >> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
> >> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> >> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> >> +	int ret;
> >> +
> >> +	if (!count)
> >> +		return 0;
> >> +
> >> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
> >> +		size_t register_offset;
> >> +		loff_t copy_offset;
> >> +		size_t copy_count;
> >> +
> >> +		if (range_intersect_range(pos, count, PCI_COMMAND, sizeof(virtvdev->pci_cmd),
> >> +					  &copy_offset, &copy_count,
> >> +					  &register_offset)) {
> >> +			if (copy_from_user((void *)&virtvdev->pci_cmd + register_offset,
> >> +					   buf + copy_offset,
> >> +					   copy_count))
> >> +				return -EFAULT;
> >> +		}
> >> +
> >> +		if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0,
> >> +					  sizeof(virtvdev->pci_base_addr_0),
> >> +					  &copy_offset, &copy_count,
> >> +					  &register_offset)) {
> >> +			if (copy_from_user((void *)&virtvdev->pci_base_addr_0 + register_offset,
> >> +					   buf + copy_offset,
> >> +					   copy_count))
> >> +				return -EFAULT;
> >> +		}
> >> +	}
> >> +
> >> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
> >> +		return vfio_pci_core_write(core_vdev, buf, count, ppos);
> >> +
> >> +	ret = pm_runtime_resume_and_get(&pdev->dev);
> >> +	if (ret) {
> >> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n", ret);
> >> +		return -EIO;
> >> +	}
> >> +
> >> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, (char __user *)buf, count, false);
> >> +	pm_runtime_put(&pdev->dev);
> >> +	return ret;
> >> +}
> >> +
> >> +static int
> >> +virtiovf_pci_ioctl_get_region_info(struct vfio_device *core_vdev,
> >> +				   unsigned int cmd, unsigned long arg)
> >> +{
> >> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> >> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> >> +	unsigned long minsz = offsetofend(struct vfio_region_info, offset);
> >> +	void __user *uarg = (void __user *)arg;
> >> +	struct vfio_region_info info = {};
> >> +
> >> +	if (copy_from_user(&info, uarg, minsz))
> >> +		return -EFAULT;
> >> +
> >> +	if (info.argsz < minsz)
> >> +		return -EINVAL;
> >> +
> >> +	switch (info.index) {
> >> +	case VFIO_PCI_BAR0_REGION_INDEX:
> >> +		info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
> >> +		info.size = virtvdev->bar0_virtual_buf_size;
> >> +		info.flags = VFIO_REGION_INFO_FLAG_READ |
> >> +			     VFIO_REGION_INFO_FLAG_WRITE;
> >> +		return copy_to_user(uarg, &info, minsz) ? -EFAULT : 0;
> >> +	default:
> >> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
> >> +	}
> >> +}
> >> +
> >> +static long
> >> +virtiovf_vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
> >> +			     unsigned long arg)
> >> +{
> >> +	switch (cmd) {
> >> +	case VFIO_DEVICE_GET_REGION_INFO:
> >> +		return virtiovf_pci_ioctl_get_region_info(core_vdev, cmd, arg);
> >> +	default:
> >> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
> >> +	}
> >> +}
> >> +
> >> +static int
> >> +virtiovf_set_notify_addr(struct virtiovf_pci_core_device *virtvdev)
> >> +{
> >> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
> >> +	int ret;
> >> +
> >> +	/*
> >> +	 * Setup the BAR where the 'notify' exists to be used by vfio as well
> >> +	 * This will let us mmap it only once and use it when needed.
> >> +	 */
> >> +	ret = vfio_pci_core_setup_barmap(core_device,
> >> +					 virtvdev->notify_bar);
> >> +	if (ret)
> >> +		return ret;
> >> +
> >> +	virtvdev->notify_addr = core_device->barmap[virtvdev->notify_bar] +
> >> +			virtvdev->notify_offset;
> >> +	return 0;
> >> +}
> >> +
> >> +static int virtiovf_pci_open_device(struct vfio_device *core_vdev)
> >> +{
> >> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> >> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> >> +	struct vfio_pci_core_device *vdev = &virtvdev->core_device;
> >> +	int ret;
> >> +
> >> +	ret = vfio_pci_core_enable(vdev);
> >> +	if (ret)
> >> +		return ret;
> >> +
> >> +	if (virtvdev->bar0_virtual_buf) {
> >> +		/*
> >> +		 * Upon close_device() the vfio_pci_core_disable() is called
> >> +		 * and will close all the previous mmaps, so it seems that the
> >> +		 * valid life cycle for the 'notify' addr is per open/close.
> >> +		 */
> >> +		ret = virtiovf_set_notify_addr(virtvdev);
> >> +		if (ret) {
> >> +			vfio_pci_core_disable(vdev);
> >> +			return ret;
> >> +		}
> >> +	}
> >> +
> >> +	vfio_pci_core_finish_enable(vdev);
> >> +	return 0;
> >> +}
> >> +
> >> +static int virtiovf_get_device_config_size(unsigned short device)
> >> +{
> >> +	/* Network card */
> >> +	return offsetofend(struct virtio_net_config, status);
> >> +}
> >> +
> >> +static int virtiovf_read_notify_info(struct virtiovf_pci_core_device *virtvdev)
> >> +{
> >> +	u64 offset;
> >> +	int ret;
> >> +	u8 bar;
> >> +
> >> +	ret = virtio_pci_admin_legacy_io_notify_info(virtvdev->core_device.pdev,
> >> +				VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM,
> >> +				&bar, &offset);
> >> +	if (ret)
> >> +		return ret;
> >> +
> >> +	virtvdev->notify_bar = bar;
> >> +	virtvdev->notify_offset = offset;
> >> +	return 0;
> >> +}
> >> +
> >> +static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
> >> +{
> >> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> >> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> >> +	struct pci_dev *pdev;
> >> +	int ret;
> >> +
> >> +	ret = vfio_pci_core_init_dev(core_vdev);
> >> +	if (ret)
> >> +		return ret;
> >> +
> >> +	pdev = virtvdev->core_device.pdev;
> >> +	ret = virtiovf_read_notify_info(virtvdev);
> >> +	if (ret)
> >> +		return ret;
> >> +
> >> +	/* Being ready with a buffer that supports MSIX */
> >> +	virtvdev->bar0_virtual_buf_size = VIRTIO_PCI_CONFIG_OFF(true) +
> >> +				virtiovf_get_device_config_size(pdev->device);
> >> +	virtvdev->bar0_virtual_buf = kzalloc(virtvdev->bar0_virtual_buf_size,
> >> +					     GFP_KERNEL);
> >> +	if (!virtvdev->bar0_virtual_buf)
> >> +		return -ENOMEM;
> >> +	mutex_init(&virtvdev->bar_mutex);
> >> +	return 0;
> >> +}
> >> +
> >> +static void virtiovf_pci_core_release_dev(struct vfio_device *core_vdev)
> >> +{
> >> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> >> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> >> +
> >> +	kfree(virtvdev->bar0_virtual_buf);
> >> +	vfio_pci_core_release_dev(core_vdev);
> >> +}
> >> +
> >> +static const struct vfio_device_ops virtiovf_vfio_pci_tran_ops = {
> >> +	.name = "virtio-vfio-pci-trans",
> >> +	.init = virtiovf_pci_init_device,
> >> +	.release = virtiovf_pci_core_release_dev,
> >> +	.open_device = virtiovf_pci_open_device,
> >> +	.close_device = vfio_pci_core_close_device,
> >> +	.ioctl = virtiovf_vfio_pci_core_ioctl,
> >> +	.read = virtiovf_pci_core_read,
> >> +	.write = virtiovf_pci_core_write,
> >> +	.mmap = vfio_pci_core_mmap,
> >> +	.request = vfio_pci_core_request,
> >> +	.match = vfio_pci_core_match,
> >> +	.bind_iommufd = vfio_iommufd_physical_bind,
> >> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
> >> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
> >> +};
> >> +
> >> +static const struct vfio_device_ops virtiovf_vfio_pci_ops = {
> >> +	.name = "virtio-vfio-pci",
> >> +	.init = vfio_pci_core_init_dev,
> >> +	.release = vfio_pci_core_release_dev,
> >> +	.open_device = virtiovf_pci_open_device,
> >> +	.close_device = vfio_pci_core_close_device,
> >> +	.ioctl = vfio_pci_core_ioctl,
> >> +	.device_feature = vfio_pci_core_ioctl_feature,
> >> +	.read = vfio_pci_core_read,
> >> +	.write = vfio_pci_core_write,
> >> +	.mmap = vfio_pci_core_mmap,
> >> +	.request = vfio_pci_core_request,
> >> +	.match = vfio_pci_core_match,
> >> +	.bind_iommufd = vfio_iommufd_physical_bind,
> >> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
> >> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
> >> +};
> >> +
> >> +static bool virtiovf_bar0_exists(struct pci_dev *pdev)
> >> +{
> >> +	struct resource *res = pdev->resource;
> >> +
> >> +	return res->flags ? true : false;
> >> +}
> >> +
> >> +static int virtiovf_pci_probe(struct pci_dev *pdev,
> >> +			      const struct pci_device_id *id)
> >> +{
> >> +	const struct vfio_device_ops *ops = &virtiovf_vfio_pci_ops;
> >> +	struct virtiovf_pci_core_device *virtvdev;
> >> +	int ret;
> >> +
> >> +	if (pdev->is_virtfn && virtio_pci_admin_has_legacy_io(pdev) &&
> >> +	    !virtiovf_bar0_exists(pdev))
> >> +		ops = &virtiovf_vfio_pci_tran_ops;
> >> +
> >> +	virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
> >> +				     &pdev->dev, ops);
> >> +	if (IS_ERR(virtvdev))
> >> +		return PTR_ERR(virtvdev);
> >> +
> >> +	dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
> >> +	ret = vfio_pci_core_register_device(&virtvdev->core_device);
> >> +	if (ret)
> >> +		goto out;
> >> +	return 0;
> >> +out:
> >> +	vfio_put_device(&virtvdev->core_device.vdev);
> >> +	return ret;
> >> +}
> >> +
> >> +static void virtiovf_pci_remove(struct pci_dev *pdev)
> >> +{
> >> +	struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
> >> +
> >> +	vfio_pci_core_unregister_device(&virtvdev->core_device);
> >> +	vfio_put_device(&virtvdev->core_device.vdev);
> >> +}
> >> +
> >> +static const struct pci_device_id virtiovf_pci_table[] = {
> >> +	/* Only virtio-net is supported/tested so far */
> >> +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1041) },
> >> +	{}
> >> +};
> >> +
> >> +MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
> >> +
> >> +static struct pci_driver virtiovf_pci_driver = {
> >> +	.name = KBUILD_MODNAME,
> >> +	.id_table = virtiovf_pci_table,
> >> +	.probe = virtiovf_pci_probe,
> >> +	.remove = virtiovf_pci_remove,
> >> +	.err_handler = &vfio_pci_core_err_handlers,
> >> +	.driver_managed_dma = true,
> >> +};
> >> +
> >> +module_pci_driver(virtiovf_pci_driver);
> >> +
> >> +MODULE_LICENSE("GPL");
> >> +MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
> >> +MODULE_DESCRIPTION(
> >> +	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO NET devices");  
> >   
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V4 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-12-04 23:32       ` Alex Williamson
@ 2023-12-05 14:20         ` Yishai Hadas
  0 siblings, 0 replies; 22+ messages in thread
From: Yishai Hadas @ 2023-12-05 14:20 UTC (permalink / raw)
  To: Alex Williamson
  Cc: mst, jasowang, jgg, kvm, virtualization, parav, feliu, jiri,
	kevin.tian, joao.m.martins, si-wei.liu, leonro, maorg, tzurielk

On 05/12/2023 1:32, Alex Williamson wrote:
> On Sun, 3 Dec 2023 16:54:41 +0200
> Yishai Hadas <yishaih@nvidia.com> wrote:
> 
>> On 01/12/2023 0:10, Alex Williamson wrote:
>>> On Wed, 29 Nov 2023 16:37:46 +0200
>>> Yishai Hadas <yishaih@nvidia.com> wrote:
>>>    
>>>> Introduce a vfio driver over virtio devices to support the legacy
>>>> interface functionality for VFs.
>>>>
>>>> Background, from the virtio spec [1].
>>>> --------------------------------------------------------------------
>>>> In some systems, there is a need to support a virtio legacy driver with
>>>> a device that does not directly support the legacy interface. In such
>>>> scenarios, a group owner device can provide the legacy interface
>>>> functionality for the group member devices. The driver of the owner
>>>> device can then access the legacy interface of a member device on behalf
>>>> of the legacy member device driver.
>>>>
>>>> For example, with the SR-IOV group type, group members (VFs) can not
>>>> present the legacy interface in an I/O BAR in BAR0 as expected by the
>>>> legacy pci driver. If the legacy driver is running inside a virtual
>>>> machine, the hypervisor executing the virtual machine can present a
>>>> virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
>>>> legacy driver accesses to this I/O BAR and forwards them to the group
>>>> owner device (PF) using group administration commands.
>>>> --------------------------------------------------------------------
>>>>
>>>> Specifically, this driver adds support for a virtio-net VF to be exposed
>>>> as a transitional device to a guest driver and allows the legacy IO BAR
>>>> functionality on top.
>>>>
>>>> This allows a VM which uses a legacy virtio-net driver in the guest to
>>>> work transparently over a VF which its driver in the host is that new
>>>> driver.
>>>>
>>>> The driver can be extended easily to support some other types of virtio
>>>> devices (e.g virtio-blk), by adding in a few places the specific type
>>>> properties as was done for virtio-net.
>>>>
>>>> For now, only the virtio-net use case was tested and as such we introduce
>>>> the support only for such a device.
>>>>
>>>> Practically,
>>>> Upon probing a VF for a virtio-net device, in case its PF supports
>>>> legacy access over the virtio admin commands and the VF doesn't have BAR
>>>> 0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
>>>> transitional device with I/O BAR in BAR 0.
>>>>
>>>> The existence of the simulated I/O bar is reported later on by
>>>> overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
>>>> exposes itself as a transitional device by overwriting some properties
>>>> upon reading its config space.
>>>>
>>>> Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
>>>> guest may use it via read/write calls according to the virtio
>>>> specification.
>>>>
>>>> Any read/write towards the control parts of the BAR will be captured by
>>>> the new driver and will be translated into admin commands towards the
>>>> device.
>>>>
>>>> Any data path read/write access (i.e. virtio driver notifications) will
>>>> be forwarded to the physical BAR which its properties were supplied by
>>>> the admin command VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO upon the
>>>> probing/init flow.
>>>>
>>>> With that code in place a legacy driver in the guest has the look and
>>>> feel as if having a transitional device with legacy support for both its
>>>> control and data path flows.
>>>>
>>>> [1]
>>>> https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
>>>>
>>>> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>>>> ---
>>>>    MAINTAINERS                      |   7 +
>>>>    drivers/vfio/pci/Kconfig         |   2 +
>>>>    drivers/vfio/pci/Makefile        |   2 +
>>>>    drivers/vfio/pci/virtio/Kconfig  |  16 +
>>>>    drivers/vfio/pci/virtio/Makefile |   4 +
>>>>    drivers/vfio/pci/virtio/main.c   | 554 +++++++++++++++++++++++++++++++
>>>>    6 files changed, 585 insertions(+)
>>>>    create mode 100644 drivers/vfio/pci/virtio/Kconfig
>>>>    create mode 100644 drivers/vfio/pci/virtio/Makefile
>>>>    create mode 100644 drivers/vfio/pci/virtio/main.c
>>>>
>>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>>> index 012df8ccf34e..b246b769092d 100644
>>>> --- a/MAINTAINERS
>>>> +++ b/MAINTAINERS
>>>> @@ -22872,6 +22872,13 @@ L:	kvm@vger.kernel.org
>>>>    S:	Maintained
>>>>    F:	drivers/vfio/pci/mlx5/
>>>>    
>>>> +VFIO VIRTIO PCI DRIVER
>>>> +M:	Yishai Hadas <yishaih@nvidia.com>
>>>> +L:	kvm@vger.kernel.org
>>>> +L:	virtualization@lists.linux-foundation.org
>>>> +S:	Maintained
>>>> +F:	drivers/vfio/pci/virtio
>>>> +
>>>>    VFIO PCI DEVICE SPECIFIC DRIVERS
>>>>    R:	Jason Gunthorpe <jgg@nvidia.com>
>>>>    R:	Yishai Hadas <yishaih@nvidia.com>
>>>> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
>>>> index 8125e5f37832..18c397df566d 100644
>>>> --- a/drivers/vfio/pci/Kconfig
>>>> +++ b/drivers/vfio/pci/Kconfig
>>>> @@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
>>>>    
>>>>    source "drivers/vfio/pci/pds/Kconfig"
>>>>    
>>>> +source "drivers/vfio/pci/virtio/Kconfig"
>>>> +
>>>>    endmenu
>>>> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
>>>> index 45167be462d8..046139a4eca5 100644
>>>> --- a/drivers/vfio/pci/Makefile
>>>> +++ b/drivers/vfio/pci/Makefile
>>>> @@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
>>>>    obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
>>>>    
>>>>    obj-$(CONFIG_PDS_VFIO_PCI) += pds/
>>>> +
>>>> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
>>>> diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
>>>> new file mode 100644
>>>> index 000000000000..3a6707639220
>>>> --- /dev/null
>>>> +++ b/drivers/vfio/pci/virtio/Kconfig
>>>> @@ -0,0 +1,16 @@
>>>> +# SPDX-License-Identifier: GPL-2.0-only
>>>> +config VIRTIO_VFIO_PCI
>>>> +        tristate "VFIO support for VIRTIO NET PCI devices"
>>>> +        depends on VIRTIO_PCI
>>>> +        select VFIO_PCI_CORE
>>>> +        help
>>>> +          This provides support for exposing VIRTIO NET VF devices which support
>>>> +          legacy IO access, using the VFIO framework that can work with a legacy
>>>> +          virtio driver in the guest.
>>>> +          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
>>>> +          not indicate I/O Space.
>>>> +          As of that this driver emulated I/O BAR in software to let a VF be
>>>> +          seen as a transitional device in the guest and let it work with
>>>> +          a legacy driver.
>>>> +
>>>> +          If you don't know what to do here, say N.
>>>> diff --git a/drivers/vfio/pci/virtio/Makefile b/drivers/vfio/pci/virtio/Makefile
>>>> new file mode 100644
>>>> index 000000000000..2039b39fb723
>>>> --- /dev/null
>>>> +++ b/drivers/vfio/pci/virtio/Makefile
>>>> @@ -0,0 +1,4 @@
>>>> +# SPDX-License-Identifier: GPL-2.0-only
>>>> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio-vfio-pci.o
>>>> +virtio-vfio-pci-y := main.o
>>>> +
>>>> diff --git a/drivers/vfio/pci/virtio/main.c b/drivers/vfio/pci/virtio/main.c
>>>> new file mode 100644
>>>> index 000000000000..3ca19c891673
>>>> --- /dev/null
>>>> +++ b/drivers/vfio/pci/virtio/main.c
>>>> @@ -0,0 +1,554 @@
>>>> +// SPDX-License-Identifier: GPL-2.0-only
>>>> +/*
>>>> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
>>>> + */
>>>> +
>>>> +#include <linux/device.h>
>>>> +#include <linux/module.h>
>>>> +#include <linux/mutex.h>
>>>> +#include <linux/pci.h>
>>>> +#include <linux/pm_runtime.h>
>>>> +#include <linux/types.h>
>>>> +#include <linux/uaccess.h>
>>>> +#include <linux/vfio.h>
>>>> +#include <linux/vfio_pci_core.h>
>>>> +#include <linux/virtio_pci.h>
>>>> +#include <linux/virtio_net.h>
>>>> +#include <linux/virtio_pci_admin.h>
>>>> +
>>>> +struct virtiovf_pci_core_device {
>>>> +	struct vfio_pci_core_device core_device;
>>>> +	u8 bar0_virtual_buf_size;
>>>
>>> Poor packing here, move it down with notify_bar.
>>
>> OK
>>
>>>    
>>>> +	u8 *bar0_virtual_buf;
>>>> +	/* synchronize access to the virtual buf */
>>>> +	struct mutex bar_mutex;
>>>> +	void __iomem *notify_addr;
>>>> +	u64 notify_offset;
>>>> +	__le32 pci_base_addr_0;
>>>> +	__le16 pci_cmd;
>>>> +	u8 notify_bar;
>>>> +};
>>>> +
>>>> +static int
>>>> +virtiovf_issue_legacy_rw_cmd(struct virtiovf_pci_core_device *virtvdev,
>>>> +			     loff_t pos, char __user *buf,
>>>> +			     size_t count, bool read)
>>>> +{
>>>> +	bool msix_enabled =
>>>> +		(virtvdev->core_device.irq_type == VFIO_PCI_MSIX_IRQ_INDEX);
>>>
>>> This is racy, the irq_type could change, do we care or does that fall
>>> into poor driver behavior?
>>
>> This clearly falls into a poor driver behavior (e.g QEMU reads from a
>> given offset considering current irq_type once it changes it in parallel).
> 
> Ok
>    
>>>> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
>>>> +	u8 *bar0_buf = virtvdev->bar0_virtual_buf;
>>>> +	bool common;
>>>> +	u8 offset;
>>>> +	int ret;
>>>> +
>>>> +	common = pos < VIRTIO_PCI_CONFIG_OFF(msix_enabled);
>>>
>>> I don't understand virtio, but this looks like it has the effect of the
>>> mac and status fields shifting in IO BAR depending on whether MSIX
>>> is enabled.  Doesn't that give the user access to 4 bytes beyond the
>>> status field?
>>
>> It is accessing 4 bytes within the device io region that the device
>> would support.
>>
>>    Is that how it's supposed to work or should we be
>>> dropping accesses to the 4-bytes MSIX support adds when not enabled?
>>
>> This is how it's supposed to work, accessing that area has no side effects.
> 
> Ok
>   
>>>> +	/* offset within the relevant configuration area */
>>>> +	offset = common ? pos : pos - VIRTIO_PCI_CONFIG_OFF(msix_enabled);
>>>> +	mutex_lock(&virtvdev->bar_mutex);
>>>> +	if (read) {
>>>> +		if (common)
>>>> +			ret = virtio_pci_admin_legacy_common_io_read(pdev, offset,
>>>> +					count, bar0_buf + pos);
>>>> +		else
>>>> +			ret = virtio_pci_admin_legacy_device_io_read(pdev, offset,
>>>> +					count, bar0_buf + pos);
>>>> +		if (ret)
>>>> +			goto out;
>>>> +		if (copy_to_user(buf, bar0_buf + pos, count))
>>>> +			ret = -EFAULT;
>>>> +	} else {
>>>> +		if (copy_from_user(bar0_buf + pos, buf, count)) {
>>>> +			ret = -EFAULT;
>>>> +			goto out;
>>>> +		}
>>>> +
>>>> +		if (common)
>>>> +			ret = virtio_pci_admin_legacy_common_io_write(pdev, offset,
>>>> +					count, bar0_buf + pos);
>>>> +		else
>>>> +			ret = virtio_pci_admin_legacy_device_io_write(pdev, offset,
>>>> +					count, bar0_buf + pos);
>>>> +	}
>>>> +out:
>>>> +	mutex_unlock(&virtvdev->bar_mutex);
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +static int
>>>> +translate_io_bar_to_mem_bar(struct virtiovf_pci_core_device *virtvdev,
>>>> +			    loff_t pos, char __user *buf,
>>>> +			    size_t count, bool read)
>>>> +{
>>>> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
>>>> +	u16 queue_notify;
>>>> +	int ret;
>>>> +
>>>> +	if (pos + count > virtvdev->bar0_virtual_buf_size)
>>>> +		return -EINVAL;
>>>> +
>>>> +	switch (pos) {
>>>> +	case VIRTIO_PCI_QUEUE_NOTIFY:
>>>> +		if (count != sizeof(queue_notify))
>>>> +			return -EINVAL;
>>>> +		if (read) {
>>>> +			ret = vfio_pci_ioread16(core_device, true, &queue_notify,
>>>> +						virtvdev->notify_addr);
>>>> +			if (ret)
>>>> +				return ret;
>>>> +			if (copy_to_user(buf, &queue_notify,
>>>> +					 sizeof(queue_notify)))
>>>> +				return -EFAULT;
>>>> +		} else {
>>>> +			if (copy_from_user(&queue_notify, buf, count))
>>>> +				return -EFAULT;
>>>> +			ret = vfio_pci_iowrite16(core_device, true, queue_notify,
>>>> +						 virtvdev->notify_addr);
>>>> +		}
>>>> +		break;
>>>> +	default:
>>>> +		ret = virtiovf_issue_legacy_rw_cmd(virtvdev, pos, buf, count,
>>>> +						   read);
>>>> +	}
>>>> +
>>>> +	return ret ? ret : count;
>>>> +}
>>>> +
>>>> +static bool range_intersect_range(loff_t range1_start, size_t count1,
>>>> +				  loff_t range2_start, size_t count2,
>>>> +				  loff_t *start_offset,
>>>> +				  size_t *intersect_count,
>>>> +				  size_t *register_offset)
>>>> +{
>>>> +	if (range1_start <= range2_start &&
>>>> +	    range1_start + count1 > range2_start) {
>>>> +		*start_offset = range2_start - range1_start;
>>>> +		*intersect_count = min_t(size_t, count2,
>>>> +					 range1_start + count1 - range2_start);
>>>> +		*register_offset = 0;
>>>> +		return true;
>>>> +	}
>>>> +
>>>> +	if (range1_start > range2_start &&
>>>> +	    range1_start < range2_start + count2) {
>>>> +		*start_offset = 0;
>>>> +		*intersect_count = min_t(size_t, count1,
>>>> +					 range2_start + count2 - range1_start);
>>>> +		*register_offset = range1_start - range2_start;
>>>> +		return true;
>>>> +	}
>>>> +
>>>> +	return false;
>>>> +}
>>>> +
>>>> +static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
>>>> +					char __user *buf, size_t count,
>>>> +					loff_t *ppos)
>>>> +{
>>>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>>>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>>>> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
>>>> +	size_t register_offset;
>>>> +	loff_t copy_offset;
>>>> +	size_t copy_count;
>>>> +	__le32 val32;
>>>> +	__le16 val16;
>>>> +	u8 val8;
>>>> +	int ret;
>>>> +
>>>> +	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
>>>> +	if (ret < 0)
>>>> +		return ret;
>>>> +
>>>> +	if (range_intersect_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
>>>> +				  &copy_offset, &copy_count, &register_offset)) {
>>>> +		val16 = cpu_to_le16(VIRTIO_TRANS_ID_NET);
>>>> +		if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset, copy_count))
>>>> +			return -EFAULT;
>>>> +	}
>>>> +
>>>> +	if ((le16_to_cpu(virtvdev->pci_cmd) & PCI_COMMAND_IO) &&
>>>> +	    range_intersect_range(pos, count, PCI_COMMAND, sizeof(val16),
>>>> +				  &copy_offset, &copy_count, &register_offset)) {
>>>> +		if (copy_from_user((void *)&val16 + register_offset, buf + copy_offset,
>>>> +				   copy_count))
>>>> +			return -EFAULT;
>>>> +		val16 |= cpu_to_le16(PCI_COMMAND_IO);
>>>> +		if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset,
>>>> +				 copy_count))
>>>> +			return -EFAULT;
>>>> +	}
>>>
>>> Seems like COMMAND_IO survives across reset, pci_cmd is only ever
>>> modified by user writes.
>>
>> I see.
>>
>> If so, we may need to add a callback for pci_error_handlers.reset_done
>> to set it back to zero.
>>
>> I'll add as part of V5.
>>
>>>
>>> I also don't see that we honor COMMAND_IO for accesses.  Shouldn't reads
>>> return -1 and writes get dropped if COMMAND_IO is cleared?
>>
>> You mean that if the 'COMMAND_IO' bit was not set, we should not allow
>> read/write towards the IO bar, right ?
> 
> Yes
> 
>> If so, I would expect to return -EFAULT from
>> translate_io_bar_to_mem_bar() in that case for both read/write and let
>> the upper SW layers (e.g. QEMU) to consider that as '-1 (0xfff) data'
>> for read and ignore it upon writes.
>>
>> Agree ?
> 
> vfio_pci_io{read,write}##size returns -EIO for MMIO accesses when
> COMMAND_MEMORY is disabled, might as well follow that precedent rather
> than -EFAULT.
> 

Makes sense, I'll return -EIO in that case.

> Note that this will be a little different from a physical IO BAR
> though.  It's not been necessary to check COMMAND_IO for such accesses
> since IO space will soft fail and we can therefore let the IO system
> fabricate the read value and drop writes.  QEMU might be overly verbose
> getting an errno, but it should do the write thing relative to the VM.
> 
>>>> +
>>>> +	if (range_intersect_range(pos, count, PCI_REVISION_ID, sizeof(val8),
>>>> +				  &copy_offset, &copy_count, &register_offset)) {
>>>> +		/* Transional needs to have revision 0 */
>>>> +		val8 = 0;
>>>> +		if (copy_to_user(buf + copy_offset, &val8, copy_count))
>>>> +			return -EFAULT;
>>>> +	}
>>>> +
>>>> +	if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0, sizeof(val32),
>>>> +				  &copy_offset, &copy_count, &register_offset)) {
>>>> +		u8 log_bar_size = ilog2(roundup_pow_of_two(virtvdev->bar0_virtual_buf_size));
>>>> +		u32 mask_size = ~((BIT(log_bar_size) - 1));
>>>
>>> bar0_virtual_buf_size already needs to be a power-of-2, the size we
>>> report here needs to match the REGION_INFO size, so we should enforce
>>> the size in virtio_vf_pci_init_device().  I think it's 32bytes, so
>>> we're actually good on the size,
>>
>> Yes, this is 32 bytes which is power of 2 and we should be fine as you
>> wrote.
>>
>> Do you suggest to enforce the calculated size explicitly as part of
>> virtio_vf_pci_init_device() to be power of 2 with some check ?
> 
> Yes, I'd go ahead and do the roundup there or maybe assert that it's a
> power of 2, potentially even a build bug check since it's sized based
> on uAPI structures.  Essentially just something to prove we're not
> creating a bogus size and reinforce that we can do things like below
> without all the unnecessary operations above.  Thanks,

OK, the below will do the work as you suggested.
BUILD_BUG_ON(!is_power_of_2(virtvdev->bar0_virtual_buf_size));

Thanks,
Yishai

> 
> Alex
> 
>>    but the above makes no sense, ie.
>>> rounding a value that's already a power-of-2 to a power-of-2, getting
>>> the log2 size, shifting that back into the size we started with to get
>>> a mask... why isn't it just:
>>>
>>> 		u32 bar_mask = ~(virtvdev->bar0_virtual_buf_size - 1);
>>> ?
>>
>> Yes, it makes sense, will use that formula as part of V5.
>>
>>>
>>>    
>>>> +		u32 pci_base_addr_0 = le32_to_cpu(virtvdev->pci_base_addr_0);
>>>> +
>>>> +		val32 = cpu_to_le32((pci_base_addr_0 & mask_size) | PCI_BASE_ADDRESS_SPACE_IO);
>>>
>>> Looks like what I suggested to Ankit :)
>>>
>>> Per our prior variant driver agreement, we're going to need to enlist
>>> another reviewer as well.  Thanks,
>>>    
>>
>> OK
>>
>> Hopefully V5 will address the latest notes from you and be ready as a
>> candidate for other people to review.
>>
>> Thanks,
>> Yishai
>>
>>> Alex
>>>    
>>>> +		if (copy_to_user(buf + copy_offset, (void *)&val32 + register_offset, copy_count))
>>>> +			return -EFAULT;
>>>> +	}
>>>> +
>>>> +	if (range_intersect_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16),
>>>> +				  &copy_offset, &copy_count, &register_offset)) {
>>>> +		/*
>>>> +		 * Transitional devices use the PCI subsystem device id as
>>>> +		 * virtio device id, same as legacy driver always did.
>>>> +		 */
>>>> +		val16 = cpu_to_le16(VIRTIO_ID_NET);
>>>> +		if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset,
>>>> +				 copy_count))
>>>> +			return -EFAULT;
>>>> +	}
>>>> +
>>>> +	if (range_intersect_range(pos, count, PCI_SUBSYSTEM_VENDOR_ID, sizeof(val16),
>>>> +				  &copy_offset, &copy_count, &register_offset)) {
>>>> +		val16 = cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET);
>>>> +		if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset,
>>>> +				 copy_count))
>>>> +			return -EFAULT;
>>>> +	}
>>>> +
>>>> +	return count;
>>>> +}
>>>> +
>>>> +static ssize_t
>>>> +virtiovf_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
>>>> +		       size_t count, loff_t *ppos)
>>>> +{
>>>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>>>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>>>> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
>>>> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
>>>> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
>>>> +	int ret;
>>>> +
>>>> +	if (!count)
>>>> +		return 0;
>>>> +
>>>> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX)
>>>> +		return virtiovf_pci_read_config(core_vdev, buf, count, ppos);
>>>> +
>>>> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
>>>> +		return vfio_pci_core_read(core_vdev, buf, count, ppos);
>>>> +
>>>> +	ret = pm_runtime_resume_and_get(&pdev->dev);
>>>> +	if (ret) {
>>>> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n",
>>>> +				     ret);
>>>> +		return -EIO;
>>>> +	}
>>>> +
>>>> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, buf, count, true);
>>>> +	pm_runtime_put(&pdev->dev);
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +static ssize_t
>>>> +virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
>>>> +			size_t count, loff_t *ppos)
>>>> +{
>>>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>>>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>>>> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
>>>> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
>>>> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
>>>> +	int ret;
>>>> +
>>>> +	if (!count)
>>>> +		return 0;
>>>> +
>>>> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
>>>> +		size_t register_offset;
>>>> +		loff_t copy_offset;
>>>> +		size_t copy_count;
>>>> +
>>>> +		if (range_intersect_range(pos, count, PCI_COMMAND, sizeof(virtvdev->pci_cmd),
>>>> +					  &copy_offset, &copy_count,
>>>> +					  &register_offset)) {
>>>> +			if (copy_from_user((void *)&virtvdev->pci_cmd + register_offset,
>>>> +					   buf + copy_offset,
>>>> +					   copy_count))
>>>> +				return -EFAULT;
>>>> +		}
>>>> +
>>>> +		if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0,
>>>> +					  sizeof(virtvdev->pci_base_addr_0),
>>>> +					  &copy_offset, &copy_count,
>>>> +					  &register_offset)) {
>>>> +			if (copy_from_user((void *)&virtvdev->pci_base_addr_0 + register_offset,
>>>> +					   buf + copy_offset,
>>>> +					   copy_count))
>>>> +				return -EFAULT;
>>>> +		}
>>>> +	}
>>>> +
>>>> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
>>>> +		return vfio_pci_core_write(core_vdev, buf, count, ppos);
>>>> +
>>>> +	ret = pm_runtime_resume_and_get(&pdev->dev);
>>>> +	if (ret) {
>>>> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n", ret);
>>>> +		return -EIO;
>>>> +	}
>>>> +
>>>> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, (char __user *)buf, count, false);
>>>> +	pm_runtime_put(&pdev->dev);
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +static int
>>>> +virtiovf_pci_ioctl_get_region_info(struct vfio_device *core_vdev,
>>>> +				   unsigned int cmd, unsigned long arg)
>>>> +{
>>>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>>>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>>>> +	unsigned long minsz = offsetofend(struct vfio_region_info, offset);
>>>> +	void __user *uarg = (void __user *)arg;
>>>> +	struct vfio_region_info info = {};
>>>> +
>>>> +	if (copy_from_user(&info, uarg, minsz))
>>>> +		return -EFAULT;
>>>> +
>>>> +	if (info.argsz < minsz)
>>>> +		return -EINVAL;
>>>> +
>>>> +	switch (info.index) {
>>>> +	case VFIO_PCI_BAR0_REGION_INDEX:
>>>> +		info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
>>>> +		info.size = virtvdev->bar0_virtual_buf_size;
>>>> +		info.flags = VFIO_REGION_INFO_FLAG_READ |
>>>> +			     VFIO_REGION_INFO_FLAG_WRITE;
>>>> +		return copy_to_user(uarg, &info, minsz) ? -EFAULT : 0;
>>>> +	default:
>>>> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
>>>> +	}
>>>> +}
>>>> +
>>>> +static long
>>>> +virtiovf_vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
>>>> +			     unsigned long arg)
>>>> +{
>>>> +	switch (cmd) {
>>>> +	case VFIO_DEVICE_GET_REGION_INFO:
>>>> +		return virtiovf_pci_ioctl_get_region_info(core_vdev, cmd, arg);
>>>> +	default:
>>>> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
>>>> +	}
>>>> +}
>>>> +
>>>> +static int
>>>> +virtiovf_set_notify_addr(struct virtiovf_pci_core_device *virtvdev)
>>>> +{
>>>> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
>>>> +	int ret;
>>>> +
>>>> +	/*
>>>> +	 * Setup the BAR where the 'notify' exists to be used by vfio as well
>>>> +	 * This will let us mmap it only once and use it when needed.
>>>> +	 */
>>>> +	ret = vfio_pci_core_setup_barmap(core_device,
>>>> +					 virtvdev->notify_bar);
>>>> +	if (ret)
>>>> +		return ret;
>>>> +
>>>> +	virtvdev->notify_addr = core_device->barmap[virtvdev->notify_bar] +
>>>> +			virtvdev->notify_offset;
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static int virtiovf_pci_open_device(struct vfio_device *core_vdev)
>>>> +{
>>>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>>>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>>>> +	struct vfio_pci_core_device *vdev = &virtvdev->core_device;
>>>> +	int ret;
>>>> +
>>>> +	ret = vfio_pci_core_enable(vdev);
>>>> +	if (ret)
>>>> +		return ret;
>>>> +
>>>> +	if (virtvdev->bar0_virtual_buf) {
>>>> +		/*
>>>> +		 * Upon close_device() the vfio_pci_core_disable() is called
>>>> +		 * and will close all the previous mmaps, so it seems that the
>>>> +		 * valid life cycle for the 'notify' addr is per open/close.
>>>> +		 */
>>>> +		ret = virtiovf_set_notify_addr(virtvdev);
>>>> +		if (ret) {
>>>> +			vfio_pci_core_disable(vdev);
>>>> +			return ret;
>>>> +		}
>>>> +	}
>>>> +
>>>> +	vfio_pci_core_finish_enable(vdev);
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static int virtiovf_get_device_config_size(unsigned short device)
>>>> +{
>>>> +	/* Network card */
>>>> +	return offsetofend(struct virtio_net_config, status);
>>>> +}
>>>> +
>>>> +static int virtiovf_read_notify_info(struct virtiovf_pci_core_device *virtvdev)
>>>> +{
>>>> +	u64 offset;
>>>> +	int ret;
>>>> +	u8 bar;
>>>> +
>>>> +	ret = virtio_pci_admin_legacy_io_notify_info(virtvdev->core_device.pdev,
>>>> +				VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM,
>>>> +				&bar, &offset);
>>>> +	if (ret)
>>>> +		return ret;
>>>> +
>>>> +	virtvdev->notify_bar = bar;
>>>> +	virtvdev->notify_offset = offset;
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
>>>> +{
>>>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>>>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>>>> +	struct pci_dev *pdev;
>>>> +	int ret;
>>>> +
>>>> +	ret = vfio_pci_core_init_dev(core_vdev);
>>>> +	if (ret)
>>>> +		return ret;
>>>> +
>>>> +	pdev = virtvdev->core_device.pdev;
>>>> +	ret = virtiovf_read_notify_info(virtvdev);
>>>> +	if (ret)
>>>> +		return ret;
>>>> +
>>>> +	/* Being ready with a buffer that supports MSIX */
>>>> +	virtvdev->bar0_virtual_buf_size = VIRTIO_PCI_CONFIG_OFF(true) +
>>>> +				virtiovf_get_device_config_size(pdev->device);
>>>> +	virtvdev->bar0_virtual_buf = kzalloc(virtvdev->bar0_virtual_buf_size,
>>>> +					     GFP_KERNEL);
>>>> +	if (!virtvdev->bar0_virtual_buf)
>>>> +		return -ENOMEM;
>>>> +	mutex_init(&virtvdev->bar_mutex);
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static void virtiovf_pci_core_release_dev(struct vfio_device *core_vdev)
>>>> +{
>>>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>>>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>>>> +
>>>> +	kfree(virtvdev->bar0_virtual_buf);
>>>> +	vfio_pci_core_release_dev(core_vdev);
>>>> +}
>>>> +
>>>> +static const struct vfio_device_ops virtiovf_vfio_pci_tran_ops = {
>>>> +	.name = "virtio-vfio-pci-trans",
>>>> +	.init = virtiovf_pci_init_device,
>>>> +	.release = virtiovf_pci_core_release_dev,
>>>> +	.open_device = virtiovf_pci_open_device,
>>>> +	.close_device = vfio_pci_core_close_device,
>>>> +	.ioctl = virtiovf_vfio_pci_core_ioctl,
>>>> +	.read = virtiovf_pci_core_read,
>>>> +	.write = virtiovf_pci_core_write,
>>>> +	.mmap = vfio_pci_core_mmap,
>>>> +	.request = vfio_pci_core_request,
>>>> +	.match = vfio_pci_core_match,
>>>> +	.bind_iommufd = vfio_iommufd_physical_bind,
>>>> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
>>>> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
>>>> +};
>>>> +
>>>> +static const struct vfio_device_ops virtiovf_vfio_pci_ops = {
>>>> +	.name = "virtio-vfio-pci",
>>>> +	.init = vfio_pci_core_init_dev,
>>>> +	.release = vfio_pci_core_release_dev,
>>>> +	.open_device = virtiovf_pci_open_device,
>>>> +	.close_device = vfio_pci_core_close_device,
>>>> +	.ioctl = vfio_pci_core_ioctl,
>>>> +	.device_feature = vfio_pci_core_ioctl_feature,
>>>> +	.read = vfio_pci_core_read,
>>>> +	.write = vfio_pci_core_write,
>>>> +	.mmap = vfio_pci_core_mmap,
>>>> +	.request = vfio_pci_core_request,
>>>> +	.match = vfio_pci_core_match,
>>>> +	.bind_iommufd = vfio_iommufd_physical_bind,
>>>> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
>>>> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
>>>> +};
>>>> +
>>>> +static bool virtiovf_bar0_exists(struct pci_dev *pdev)
>>>> +{
>>>> +	struct resource *res = pdev->resource;
>>>> +
>>>> +	return res->flags ? true : false;
>>>> +}
>>>> +
>>>> +static int virtiovf_pci_probe(struct pci_dev *pdev,
>>>> +			      const struct pci_device_id *id)
>>>> +{
>>>> +	const struct vfio_device_ops *ops = &virtiovf_vfio_pci_ops;
>>>> +	struct virtiovf_pci_core_device *virtvdev;
>>>> +	int ret;
>>>> +
>>>> +	if (pdev->is_virtfn && virtio_pci_admin_has_legacy_io(pdev) &&
>>>> +	    !virtiovf_bar0_exists(pdev))
>>>> +		ops = &virtiovf_vfio_pci_tran_ops;
>>>> +
>>>> +	virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
>>>> +				     &pdev->dev, ops);
>>>> +	if (IS_ERR(virtvdev))
>>>> +		return PTR_ERR(virtvdev);
>>>> +
>>>> +	dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
>>>> +	ret = vfio_pci_core_register_device(&virtvdev->core_device);
>>>> +	if (ret)
>>>> +		goto out;
>>>> +	return 0;
>>>> +out:
>>>> +	vfio_put_device(&virtvdev->core_device.vdev);
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +static void virtiovf_pci_remove(struct pci_dev *pdev)
>>>> +{
>>>> +	struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
>>>> +
>>>> +	vfio_pci_core_unregister_device(&virtvdev->core_device);
>>>> +	vfio_put_device(&virtvdev->core_device.vdev);
>>>> +}
>>>> +
>>>> +static const struct pci_device_id virtiovf_pci_table[] = {
>>>> +	/* Only virtio-net is supported/tested so far */
>>>> +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1041) },
>>>> +	{}
>>>> +};
>>>> +
>>>> +MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
>>>> +
>>>> +static struct pci_driver virtiovf_pci_driver = {
>>>> +	.name = KBUILD_MODNAME,
>>>> +	.id_table = virtiovf_pci_table,
>>>> +	.probe = virtiovf_pci_probe,
>>>> +	.remove = virtiovf_pci_remove,
>>>> +	.err_handler = &vfio_pci_core_err_handlers,
>>>> +	.driver_managed_dma = true,
>>>> +};
>>>> +
>>>> +module_pci_driver(virtiovf_pci_driver);
>>>> +
>>>> +MODULE_LICENSE("GPL");
>>>> +MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
>>>> +MODULE_DESCRIPTION(
>>>> +	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO NET devices");
>>>    
>>
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2023-12-05 14:21 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-29 14:37 [PATCH V4 vfio 0/9] Introduce a vfio driver over virtio devices Yishai Hadas
2023-11-29 14:37 ` [PATCH V4 vfio 1/9] virtio: Define feature bit for administration virtqueue Yishai Hadas
2023-11-29 14:37 ` [PATCH V4 vfio 2/9] virtio-pci: Introduce admin virtqueue Yishai Hadas
2023-11-29 14:37 ` [PATCH V4 vfio 3/9] virtio-pci: Introduce admin command sending function Yishai Hadas
2023-11-29 14:37 ` [PATCH V4 vfio 4/9] virtio-pci: Introduce admin commands Yishai Hadas
2023-11-30  9:52   ` Michael S. Tsirkin
2023-11-30 10:35     ` Yishai Hadas
2023-11-30 10:45       ` Michael S. Tsirkin
2023-11-29 14:37 ` [PATCH V4 vfio 5/9] virtio-pci: Initialize the supported " Yishai Hadas
2023-11-29 14:37 ` [PATCH V4 vfio 6/9] virtio-pci: Introduce APIs to execute legacy IO " Yishai Hadas
2023-11-29 14:37 ` [PATCH V4 vfio 7/9] vfio/pci: Expose vfio_pci_core_setup_barmap() Yishai Hadas
2023-11-29 14:37 ` [PATCH V4 vfio 8/9] vfio/pci: Expose vfio_pci_iowrite/read##size() Yishai Hadas
2023-11-30 19:20   ` Alex Williamson
2023-12-03 14:14     ` Yishai Hadas
2023-12-04 22:57       ` Alex Williamson
2023-11-29 14:37 ` [PATCH V4 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices Yishai Hadas
2023-11-30 22:10   ` Alex Williamson
2023-12-03 14:54     ` Yishai Hadas
2023-12-04 23:32       ` Alex Williamson
2023-12-05 14:20         ` Yishai Hadas
2023-11-30 10:07 ` [PATCH V4 vfio 0/9] " Michael S. Tsirkin
2023-11-30 15:47   ` Yishai Hadas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.