All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH vfio 00/11] Introduce a vfio driver over virtio devices
@ 2023-09-21 12:40 ` Yishai Hadas
  0 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-09-21 12:40 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, maorg, virtualization, jiri, leonro

This series introduce a vfio driver over virtio devices to support the
legacy interface functionality for VFs.

Background, from the virtio spec [1].
--------------------------------------------------------------------
In some systems, there is a need to support a virtio legacy driver with
a device that does not directly support the legacy interface. In such
scenarios, a group owner device can provide the legacy interface
functionality for the group member devices. The driver of the owner
device can then access the legacy interface of a member device on behalf
of the legacy member device driver.

For example, with the SR-IOV group type, group members (VFs) can not
present the legacy interface in an I/O BAR in BAR0 as expected by the
legacy pci driver. If the legacy driver is running inside a virtual
machine, the hypervisor executing the virtual machine can present a
virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
legacy driver accesses to this I/O BAR and forwards them to the group
owner device (PF) using group administration commands.
--------------------------------------------------------------------

The first 7 patches are in the virtio area and handle the below:
- Introduce the admin virtqueue infrastcture.
- Expose APIs to enable upper layers as of vfio, net, etc 
  to execute admin commands.
- Expose the layout of the commands that should be used for
  supporting the legacy access.

The above follows the virtio spec that was lastly accepted in that area
[1].

The last 4 patches are in the vfio area and handle the below:
- Expose some APIs from vfio/pci to be used by the vfio/virtio driver.
- Expose admin commands over virtio device.
- Introduce a vfio driver over virtio devices to support the legacy
  interface functionality for VFs. 

The series was tested successfully over virtio-net VFs in the host,
while running in the guest both modern and legacy drivers.

[1]
https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c

Yishai

Feng Liu (7):
  virtio-pci: Use virtio pci device layer vq info instead of generic one
  virtio: Define feature bit for administration virtqueue
  virtio-pci: Introduce admin virtqueue
  virtio: Expose the synchronous command helper function
  virtio-pci: Introduce admin command sending function
  virtio-pci: Introduce API to get PF virtio device from VF PCI device
  virtio-pci: Introduce admin commands

Yishai Hadas (4):
  vfio/pci: Expose vfio_pci_core_setup_barmap()
  vfio/pci: Expose vfio_pci_iowrite/read##size()
  vfio/virtio: Expose admin commands over virtio device
  vfio/virtio: Introduce a vfio driver over virtio devices

 MAINTAINERS                            |   6 +
 drivers/net/virtio_net.c               |  21 +-
 drivers/vfio/pci/Kconfig               |   2 +
 drivers/vfio/pci/Makefile              |   2 +
 drivers/vfio/pci/vfio_pci_core.c       |  25 ++
 drivers/vfio/pci/vfio_pci_rdwr.c       |  38 +-
 drivers/vfio/pci/virtio/Kconfig        |  15 +
 drivers/vfio/pci/virtio/Makefile       |   4 +
 drivers/vfio/pci/virtio/cmd.c          | 146 +++++++
 drivers/vfio/pci/virtio/cmd.h          |  35 ++
 drivers/vfio/pci/virtio/main.c         | 546 +++++++++++++++++++++++++
 drivers/virtio/Makefile                |   2 +-
 drivers/virtio/virtio.c                |  44 +-
 drivers/virtio/virtio_pci_common.c     |  24 +-
 drivers/virtio/virtio_pci_common.h     |  17 +-
 drivers/virtio/virtio_pci_modern.c     |  12 +-
 drivers/virtio/virtio_pci_modern_avq.c | 138 +++++++
 drivers/virtio/virtio_ring.c           |  27 ++
 include/linux/vfio_pci_core.h          |  20 +
 include/linux/virtio.h                 |  19 +
 include/linux/virtio_config.h          |   7 +
 include/linux/virtio_pci_modern.h      |   3 +
 include/uapi/linux/virtio_config.h     |   8 +-
 include/uapi/linux/virtio_pci.h        |  66 +++
 24 files changed, 1171 insertions(+), 56 deletions(-)
 create mode 100644 drivers/vfio/pci/virtio/Kconfig
 create mode 100644 drivers/vfio/pci/virtio/Makefile
 create mode 100644 drivers/vfio/pci/virtio/cmd.c
 create mode 100644 drivers/vfio/pci/virtio/cmd.h
 create mode 100644 drivers/vfio/pci/virtio/main.c
 create mode 100644 drivers/virtio/virtio_pci_modern_avq.c

-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* [PATCH vfio 00/11] Introduce a vfio driver over virtio devices
@ 2023-09-21 12:40 ` Yishai Hadas
  0 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas @ 2023-09-21 12:40 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, leonro, yishaih, maorg

This series introduce a vfio driver over virtio devices to support the
legacy interface functionality for VFs.

Background, from the virtio spec [1].
--------------------------------------------------------------------
In some systems, there is a need to support a virtio legacy driver with
a device that does not directly support the legacy interface. In such
scenarios, a group owner device can provide the legacy interface
functionality for the group member devices. The driver of the owner
device can then access the legacy interface of a member device on behalf
of the legacy member device driver.

For example, with the SR-IOV group type, group members (VFs) can not
present the legacy interface in an I/O BAR in BAR0 as expected by the
legacy pci driver. If the legacy driver is running inside a virtual
machine, the hypervisor executing the virtual machine can present a
virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
legacy driver accesses to this I/O BAR and forwards them to the group
owner device (PF) using group administration commands.
--------------------------------------------------------------------

The first 7 patches are in the virtio area and handle the below:
- Introduce the admin virtqueue infrastcture.
- Expose APIs to enable upper layers as of vfio, net, etc 
  to execute admin commands.
- Expose the layout of the commands that should be used for
  supporting the legacy access.

The above follows the virtio spec that was lastly accepted in that area
[1].

The last 4 patches are in the vfio area and handle the below:
- Expose some APIs from vfio/pci to be used by the vfio/virtio driver.
- Expose admin commands over virtio device.
- Introduce a vfio driver over virtio devices to support the legacy
  interface functionality for VFs. 

The series was tested successfully over virtio-net VFs in the host,
while running in the guest both modern and legacy drivers.

[1]
https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c

Yishai

Feng Liu (7):
  virtio-pci: Use virtio pci device layer vq info instead of generic one
  virtio: Define feature bit for administration virtqueue
  virtio-pci: Introduce admin virtqueue
  virtio: Expose the synchronous command helper function
  virtio-pci: Introduce admin command sending function
  virtio-pci: Introduce API to get PF virtio device from VF PCI device
  virtio-pci: Introduce admin commands

Yishai Hadas (4):
  vfio/pci: Expose vfio_pci_core_setup_barmap()
  vfio/pci: Expose vfio_pci_iowrite/read##size()
  vfio/virtio: Expose admin commands over virtio device
  vfio/virtio: Introduce a vfio driver over virtio devices

 MAINTAINERS                            |   6 +
 drivers/net/virtio_net.c               |  21 +-
 drivers/vfio/pci/Kconfig               |   2 +
 drivers/vfio/pci/Makefile              |   2 +
 drivers/vfio/pci/vfio_pci_core.c       |  25 ++
 drivers/vfio/pci/vfio_pci_rdwr.c       |  38 +-
 drivers/vfio/pci/virtio/Kconfig        |  15 +
 drivers/vfio/pci/virtio/Makefile       |   4 +
 drivers/vfio/pci/virtio/cmd.c          | 146 +++++++
 drivers/vfio/pci/virtio/cmd.h          |  35 ++
 drivers/vfio/pci/virtio/main.c         | 546 +++++++++++++++++++++++++
 drivers/virtio/Makefile                |   2 +-
 drivers/virtio/virtio.c                |  44 +-
 drivers/virtio/virtio_pci_common.c     |  24 +-
 drivers/virtio/virtio_pci_common.h     |  17 +-
 drivers/virtio/virtio_pci_modern.c     |  12 +-
 drivers/virtio/virtio_pci_modern_avq.c | 138 +++++++
 drivers/virtio/virtio_ring.c           |  27 ++
 include/linux/vfio_pci_core.h          |  20 +
 include/linux/virtio.h                 |  19 +
 include/linux/virtio_config.h          |   7 +
 include/linux/virtio_pci_modern.h      |   3 +
 include/uapi/linux/virtio_config.h     |   8 +-
 include/uapi/linux/virtio_pci.h        |  66 +++
 24 files changed, 1171 insertions(+), 56 deletions(-)
 create mode 100644 drivers/vfio/pci/virtio/Kconfig
 create mode 100644 drivers/vfio/pci/virtio/Makefile
 create mode 100644 drivers/vfio/pci/virtio/cmd.c
 create mode 100644 drivers/vfio/pci/virtio/cmd.h
 create mode 100644 drivers/vfio/pci/virtio/main.c
 create mode 100644 drivers/virtio/virtio_pci_modern_avq.c

-- 
2.27.0


^ permalink raw reply	[flat|nested] 321+ messages in thread

* [PATCH vfio 01/11] virtio-pci: Use virtio pci device layer vq info instead of generic one
  2023-09-21 12:40 ` Yishai Hadas
@ 2023-09-21 12:40   ` Yishai Hadas
  -1 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-09-21 12:40 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, maorg, virtualization, jiri, leonro

From: Feng Liu <feliu@nvidia.com>

Currently VQ deletion callback vp_del_vqs() processes generic
virtio_device level VQ list instead of VQ information available at PCI
layer.

To adhere to the layering, use the pci device level VQ information
stored in the virtqueues or vqs.

This also prepares the code to handle PCI layer admin vq life cycle to
be managed within the pci layer and thereby avoid undesired deletion of
admin vq by upper layer drivers (net, console, vfio), in the del_vqs()
callback.

Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/virtio/virtio_pci_common.c | 12 +++++++++---
 drivers/virtio/virtio_pci_common.h |  1 +
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
index c2524a7207cf..7a3e6edc4dd6 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -232,12 +232,16 @@ static void vp_del_vq(struct virtqueue *vq)
 void vp_del_vqs(struct virtio_device *vdev)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	struct virtqueue *vq, *n;
+	struct virtqueue *vq;
 	int i;
 
-	list_for_each_entry_safe(vq, n, &vdev->vqs, list) {
+	for (i = 0; i < vp_dev->nvqs; i++) {
+		if (!vp_dev->vqs[i])
+			continue;
+
+		vq = vp_dev->vqs[i]->vq;
 		if (vp_dev->per_vq_vectors) {
-			int v = vp_dev->vqs[vq->index]->msix_vector;
+			int v = vp_dev->vqs[i]->msix_vector;
 
 			if (v != VIRTIO_MSI_NO_VECTOR) {
 				int irq = pci_irq_vector(vp_dev->pci_dev, v);
@@ -294,6 +298,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned int nvqs,
 	vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
 	if (!vp_dev->vqs)
 		return -ENOMEM;
+	vp_dev->nvqs = nvqs;
 
 	if (per_vq_vectors) {
 		/* Best option: one for change interrupt, one per vq. */
@@ -365,6 +370,7 @@ static int vp_find_vqs_intx(struct virtio_device *vdev, unsigned int nvqs,
 	vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
 	if (!vp_dev->vqs)
 		return -ENOMEM;
+	vp_dev->nvqs = nvqs;
 
 	err = request_irq(vp_dev->pci_dev->irq, vp_interrupt, IRQF_SHARED,
 			dev_name(&vdev->dev), vp_dev);
diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
index 4b773bd7c58c..602021967aaa 100644
--- a/drivers/virtio/virtio_pci_common.h
+++ b/drivers/virtio/virtio_pci_common.h
@@ -60,6 +60,7 @@ struct virtio_pci_device {
 
 	/* array of all queues for house-keeping */
 	struct virtio_pci_vq_info **vqs;
+	u32 nvqs;
 
 	/* MSI-X support */
 	int msix_enabled;
-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 321+ messages in thread

* [PATCH vfio 01/11] virtio-pci: Use virtio pci device layer vq info instead of generic one
@ 2023-09-21 12:40   ` Yishai Hadas
  0 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas @ 2023-09-21 12:40 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, leonro, yishaih, maorg

From: Feng Liu <feliu@nvidia.com>

Currently VQ deletion callback vp_del_vqs() processes generic
virtio_device level VQ list instead of VQ information available at PCI
layer.

To adhere to the layering, use the pci device level VQ information
stored in the virtqueues or vqs.

This also prepares the code to handle PCI layer admin vq life cycle to
be managed within the pci layer and thereby avoid undesired deletion of
admin vq by upper layer drivers (net, console, vfio), in the del_vqs()
callback.

Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/virtio/virtio_pci_common.c | 12 +++++++++---
 drivers/virtio/virtio_pci_common.h |  1 +
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
index c2524a7207cf..7a3e6edc4dd6 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -232,12 +232,16 @@ static void vp_del_vq(struct virtqueue *vq)
 void vp_del_vqs(struct virtio_device *vdev)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	struct virtqueue *vq, *n;
+	struct virtqueue *vq;
 	int i;
 
-	list_for_each_entry_safe(vq, n, &vdev->vqs, list) {
+	for (i = 0; i < vp_dev->nvqs; i++) {
+		if (!vp_dev->vqs[i])
+			continue;
+
+		vq = vp_dev->vqs[i]->vq;
 		if (vp_dev->per_vq_vectors) {
-			int v = vp_dev->vqs[vq->index]->msix_vector;
+			int v = vp_dev->vqs[i]->msix_vector;
 
 			if (v != VIRTIO_MSI_NO_VECTOR) {
 				int irq = pci_irq_vector(vp_dev->pci_dev, v);
@@ -294,6 +298,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned int nvqs,
 	vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
 	if (!vp_dev->vqs)
 		return -ENOMEM;
+	vp_dev->nvqs = nvqs;
 
 	if (per_vq_vectors) {
 		/* Best option: one for change interrupt, one per vq. */
@@ -365,6 +370,7 @@ static int vp_find_vqs_intx(struct virtio_device *vdev, unsigned int nvqs,
 	vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
 	if (!vp_dev->vqs)
 		return -ENOMEM;
+	vp_dev->nvqs = nvqs;
 
 	err = request_irq(vp_dev->pci_dev->irq, vp_interrupt, IRQF_SHARED,
 			dev_name(&vdev->dev), vp_dev);
diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
index 4b773bd7c58c..602021967aaa 100644
--- a/drivers/virtio/virtio_pci_common.h
+++ b/drivers/virtio/virtio_pci_common.h
@@ -60,6 +60,7 @@ struct virtio_pci_device {
 
 	/* array of all queues for house-keeping */
 	struct virtio_pci_vq_info **vqs;
+	u32 nvqs;
 
 	/* MSI-X support */
 	int msix_enabled;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 321+ messages in thread

* [PATCH vfio 02/11] virtio: Define feature bit for administration virtqueue
  2023-09-21 12:40 ` Yishai Hadas
@ 2023-09-21 12:40   ` Yishai Hadas
  -1 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-09-21 12:40 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, maorg, virtualization, jiri, leonro

From: Feng Liu <feliu@nvidia.com>

Introduce VIRTIO_F_ADMIN_VQ which is used for administration virtqueue
support.

Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 include/uapi/linux/virtio_config.h | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
index 2c712c654165..09d694968b14 100644
--- a/include/uapi/linux/virtio_config.h
+++ b/include/uapi/linux/virtio_config.h
@@ -52,7 +52,7 @@
  * rest are per-device feature bits.
  */
 #define VIRTIO_TRANSPORT_F_START	28
-#define VIRTIO_TRANSPORT_F_END		41
+#define VIRTIO_TRANSPORT_F_END		42
 
 #ifndef VIRTIO_CONFIG_NO_LEGACY
 /* Do we get callbacks when the ring is completely used, even if we've
@@ -109,4 +109,10 @@
  * This feature indicates that the driver can reset a queue individually.
  */
 #define VIRTIO_F_RING_RESET		40
+
+/*
+ * This feature indicates that the device support administration virtqueues.
+ */
+#define VIRTIO_F_ADMIN_VQ		41
+
 #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 321+ messages in thread

* [PATCH vfio 02/11] virtio: Define feature bit for administration virtqueue
@ 2023-09-21 12:40   ` Yishai Hadas
  0 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas @ 2023-09-21 12:40 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, leonro, yishaih, maorg

From: Feng Liu <feliu@nvidia.com>

Introduce VIRTIO_F_ADMIN_VQ which is used for administration virtqueue
support.

Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 include/uapi/linux/virtio_config.h | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
index 2c712c654165..09d694968b14 100644
--- a/include/uapi/linux/virtio_config.h
+++ b/include/uapi/linux/virtio_config.h
@@ -52,7 +52,7 @@
  * rest are per-device feature bits.
  */
 #define VIRTIO_TRANSPORT_F_START	28
-#define VIRTIO_TRANSPORT_F_END		41
+#define VIRTIO_TRANSPORT_F_END		42
 
 #ifndef VIRTIO_CONFIG_NO_LEGACY
 /* Do we get callbacks when the ring is completely used, even if we've
@@ -109,4 +109,10 @@
  * This feature indicates that the driver can reset a queue individually.
  */
 #define VIRTIO_F_RING_RESET		40
+
+/*
+ * This feature indicates that the device support administration virtqueues.
+ */
+#define VIRTIO_F_ADMIN_VQ		41
+
 #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 321+ messages in thread

* [PATCH vfio 03/11] virtio-pci: Introduce admin virtqueue
  2023-09-21 12:40 ` Yishai Hadas
@ 2023-09-21 12:40   ` Yishai Hadas
  -1 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-09-21 12:40 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, maorg, virtualization, jiri, leonro

From: Feng Liu <feliu@nvidia.com>

Introduce support for the admin virtqueue. By negotiating
VIRTIO_F_ADMIN_VQ feature, driver detects capability and creates one
administration virtqueue. Administration virtqueue implementation in
virtio pci generic layer, enables multiple types of upper layer
drivers such as vfio, net, blk to utilize it.

Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/virtio/Makefile                |  2 +-
 drivers/virtio/virtio.c                | 37 +++++++++++++--
 drivers/virtio/virtio_pci_common.h     | 15 +++++-
 drivers/virtio/virtio_pci_modern.c     | 10 +++-
 drivers/virtio/virtio_pci_modern_avq.c | 65 ++++++++++++++++++++++++++
 include/linux/virtio_config.h          |  4 ++
 include/linux/virtio_pci_modern.h      |  3 ++
 7 files changed, 129 insertions(+), 7 deletions(-)
 create mode 100644 drivers/virtio/virtio_pci_modern_avq.c

diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 8e98d24917cc..dcc535b5b4d9 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -5,7 +5,7 @@ obj-$(CONFIG_VIRTIO_PCI_LIB) += virtio_pci_modern_dev.o
 obj-$(CONFIG_VIRTIO_PCI_LIB_LEGACY) += virtio_pci_legacy_dev.o
 obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
 obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
-virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
+virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o virtio_pci_modern_avq.o
 virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
 obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
 obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 3893dc29eb26..f4080692b351 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -302,9 +302,15 @@ static int virtio_dev_probe(struct device *_d)
 	if (err)
 		goto err;
 
+	if (dev->config->create_avq) {
+		err = dev->config->create_avq(dev);
+		if (err)
+			goto err;
+	}
+
 	err = drv->probe(dev);
 	if (err)
-		goto err;
+		goto err_probe;
 
 	/* If probe didn't do it, mark device DRIVER_OK ourselves. */
 	if (!(dev->config->get_status(dev) & VIRTIO_CONFIG_S_DRIVER_OK))
@@ -316,6 +322,10 @@ static int virtio_dev_probe(struct device *_d)
 	virtio_config_enable(dev);
 
 	return 0;
+
+err_probe:
+	if (dev->config->destroy_avq)
+		dev->config->destroy_avq(dev);
 err:
 	virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
 	return err;
@@ -331,6 +341,9 @@ static void virtio_dev_remove(struct device *_d)
 
 	drv->remove(dev);
 
+	if (dev->config->destroy_avq)
+		dev->config->destroy_avq(dev);
+
 	/* Driver should have reset device. */
 	WARN_ON_ONCE(dev->config->get_status(dev));
 
@@ -489,13 +502,20 @@ EXPORT_SYMBOL_GPL(unregister_virtio_device);
 int virtio_device_freeze(struct virtio_device *dev)
 {
 	struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
+	int ret;
 
 	virtio_config_disable(dev);
 
 	dev->failed = dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED;
 
-	if (drv && drv->freeze)
-		return drv->freeze(dev);
+	if (drv && drv->freeze) {
+		ret = drv->freeze(dev);
+		if (ret)
+			return ret;
+	}
+
+	if (dev->config->destroy_avq)
+		dev->config->destroy_avq(dev);
 
 	return 0;
 }
@@ -532,10 +552,16 @@ int virtio_device_restore(struct virtio_device *dev)
 	if (ret)
 		goto err;
 
+	if (dev->config->create_avq) {
+		ret = dev->config->create_avq(dev);
+		if (ret)
+			goto err;
+	}
+
 	if (drv->restore) {
 		ret = drv->restore(dev);
 		if (ret)
-			goto err;
+			goto err_restore;
 	}
 
 	/* If restore didn't do it, mark device DRIVER_OK ourselves. */
@@ -546,6 +572,9 @@ int virtio_device_restore(struct virtio_device *dev)
 
 	return 0;
 
+err_restore:
+	if (dev->config->destroy_avq)
+		dev->config->destroy_avq(dev);
 err:
 	virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
 	return ret;
diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
index 602021967aaa..9bffa95274b6 100644
--- a/drivers/virtio/virtio_pci_common.h
+++ b/drivers/virtio/virtio_pci_common.h
@@ -41,6 +41,14 @@ struct virtio_pci_vq_info {
 	unsigned int msix_vector;
 };
 
+struct virtio_avq {
+	/* Virtqueue info associated with this admin queue. */
+	struct virtio_pci_vq_info info;
+	/* Name of the admin queue: avq.$index. */
+	char name[10];
+	u16 vq_index;
+};
+
 /* Our device structure */
 struct virtio_pci_device {
 	struct virtio_device vdev;
@@ -58,10 +66,13 @@ struct virtio_pci_device {
 	spinlock_t lock;
 	struct list_head virtqueues;
 
-	/* array of all queues for house-keeping */
+	/* Array of all virtqueues reported in the
+	 * PCI common config num_queues field
+	 */
 	struct virtio_pci_vq_info **vqs;
 	u32 nvqs;
 
+	struct virtio_avq *admin;
 	/* MSI-X support */
 	int msix_enabled;
 	int intx_enabled;
@@ -115,6 +126,8 @@ int vp_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
 		const char * const names[], const bool *ctx,
 		struct irq_affinity *desc);
 const char *vp_bus_name(struct virtio_device *vdev);
+void vp_destroy_avq(struct virtio_device *vdev);
+int vp_create_avq(struct virtio_device *vdev);
 
 /* Setup the affinity for a virtqueue:
  * - force the affinity for per vq vector
diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
index d6bb68ba84e5..a72c87687196 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -37,6 +37,9 @@ static void vp_transport_features(struct virtio_device *vdev, u64 features)
 
 	if (features & BIT_ULL(VIRTIO_F_RING_RESET))
 		__virtio_set_bit(vdev, VIRTIO_F_RING_RESET);
+
+	if (features & BIT_ULL(VIRTIO_F_ADMIN_VQ))
+		__virtio_set_bit(vdev, VIRTIO_F_ADMIN_VQ);
 }
 
 /* virtio config->finalize_features() implementation */
@@ -317,7 +320,8 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
 	else
 		notify = vp_notify;
 
-	if (index >= vp_modern_get_num_queues(mdev))
+	if (!((index < vp_modern_get_num_queues(mdev) ||
+	      (vp_dev->admin && vp_dev->admin->vq_index == index))))
 		return ERR_PTR(-EINVAL);
 
 	/* Check if queue is either not available or already active. */
@@ -509,6 +513,8 @@ static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
 	.get_shm_region  = vp_get_shm_region,
 	.disable_vq_and_reset = vp_modern_disable_vq_and_reset,
 	.enable_vq_after_reset = vp_modern_enable_vq_after_reset,
+	.create_avq = vp_create_avq,
+	.destroy_avq = vp_destroy_avq,
 };
 
 static const struct virtio_config_ops virtio_pci_config_ops = {
@@ -529,6 +535,8 @@ static const struct virtio_config_ops virtio_pci_config_ops = {
 	.get_shm_region  = vp_get_shm_region,
 	.disable_vq_and_reset = vp_modern_disable_vq_and_reset,
 	.enable_vq_after_reset = vp_modern_enable_vq_after_reset,
+	.create_avq = vp_create_avq,
+	.destroy_avq = vp_destroy_avq,
 };
 
 /* the PCI probing function */
diff --git a/drivers/virtio/virtio_pci_modern_avq.c b/drivers/virtio/virtio_pci_modern_avq.c
new file mode 100644
index 000000000000..114579ad788f
--- /dev/null
+++ b/drivers/virtio/virtio_pci_modern_avq.c
@@ -0,0 +1,65 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include <linux/virtio.h>
+#include "virtio_pci_common.h"
+
+static u16 vp_modern_avq_num(struct virtio_pci_modern_device *mdev)
+{
+	struct virtio_pci_modern_common_cfg __iomem *cfg;
+
+	cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common;
+	return vp_ioread16(&cfg->admin_queue_num);
+}
+
+static u16 vp_modern_avq_index(struct virtio_pci_modern_device *mdev)
+{
+	struct virtio_pci_modern_common_cfg __iomem *cfg;
+
+	cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common;
+	return vp_ioread16(&cfg->admin_queue_index);
+}
+
+int vp_create_avq(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	struct virtio_avq *avq;
+	struct virtqueue *vq;
+	u16 admin_q_num;
+
+	if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ))
+		return 0;
+
+	admin_q_num = vp_modern_avq_num(&vp_dev->mdev);
+	if (!admin_q_num)
+		return -EINVAL;
+
+	vp_dev->admin = kzalloc(sizeof(*vp_dev->admin), GFP_KERNEL);
+	if (!vp_dev->admin)
+		return -ENOMEM;
+
+	avq = vp_dev->admin;
+	avq->vq_index = vp_modern_avq_index(&vp_dev->mdev);
+	sprintf(avq->name, "avq.%u", avq->vq_index);
+	vq = vp_dev->setup_vq(vp_dev, &vp_dev->admin->info, avq->vq_index, NULL,
+			      avq->name, NULL, VIRTIO_MSI_NO_VECTOR);
+	if (IS_ERR(vq)) {
+		dev_err(&vdev->dev, "failed to setup admin virtqueue");
+		kfree(vp_dev->admin);
+		return PTR_ERR(vq);
+	}
+
+	vp_dev->admin->info.vq = vq;
+	vp_modern_set_queue_enable(&vp_dev->mdev, avq->info.vq->index, true);
+	return 0;
+}
+
+void vp_destroy_avq(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+
+	if (!vp_dev->admin)
+		return;
+
+	vp_dev->del_vq(&vp_dev->admin->info);
+	kfree(vp_dev->admin);
+}
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index 2b3438de2c4d..028c51ea90ee 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -93,6 +93,8 @@ typedef void vq_callback_t(struct virtqueue *);
  *	Returns 0 on success or error status
  *	If disable_vq_and_reset is set, then enable_vq_after_reset must also be
  *	set.
+ * @create_avq: initialize admin virtqueue resource.
+ * @destroy_avq: destroy admin virtqueue resource.
  */
 struct virtio_config_ops {
 	void (*get)(struct virtio_device *vdev, unsigned offset,
@@ -120,6 +122,8 @@ struct virtio_config_ops {
 			       struct virtio_shm_region *region, u8 id);
 	int (*disable_vq_and_reset)(struct virtqueue *vq);
 	int (*enable_vq_after_reset)(struct virtqueue *vq);
+	int (*create_avq)(struct virtio_device *vdev);
+	void (*destroy_avq)(struct virtio_device *vdev);
 };
 
 /* If driver didn't advertise the feature, it will never appear. */
diff --git a/include/linux/virtio_pci_modern.h b/include/linux/virtio_pci_modern.h
index 067ac1d789bc..f6cb13d858fd 100644
--- a/include/linux/virtio_pci_modern.h
+++ b/include/linux/virtio_pci_modern.h
@@ -10,6 +10,9 @@ struct virtio_pci_modern_common_cfg {
 
 	__le16 queue_notify_data;	/* read-write */
 	__le16 queue_reset;		/* read-write */
+
+	__le16 admin_queue_index;	/* read-only */
+	__le16 admin_queue_num;		/* read-only */
 };
 
 struct virtio_pci_modern_device {
-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 321+ messages in thread

* [PATCH vfio 03/11] virtio-pci: Introduce admin virtqueue
@ 2023-09-21 12:40   ` Yishai Hadas
  0 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas @ 2023-09-21 12:40 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, leonro, yishaih, maorg

From: Feng Liu <feliu@nvidia.com>

Introduce support for the admin virtqueue. By negotiating
VIRTIO_F_ADMIN_VQ feature, driver detects capability and creates one
administration virtqueue. Administration virtqueue implementation in
virtio pci generic layer, enables multiple types of upper layer
drivers such as vfio, net, blk to utilize it.

Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/virtio/Makefile                |  2 +-
 drivers/virtio/virtio.c                | 37 +++++++++++++--
 drivers/virtio/virtio_pci_common.h     | 15 +++++-
 drivers/virtio/virtio_pci_modern.c     | 10 +++-
 drivers/virtio/virtio_pci_modern_avq.c | 65 ++++++++++++++++++++++++++
 include/linux/virtio_config.h          |  4 ++
 include/linux/virtio_pci_modern.h      |  3 ++
 7 files changed, 129 insertions(+), 7 deletions(-)
 create mode 100644 drivers/virtio/virtio_pci_modern_avq.c

diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 8e98d24917cc..dcc535b5b4d9 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -5,7 +5,7 @@ obj-$(CONFIG_VIRTIO_PCI_LIB) += virtio_pci_modern_dev.o
 obj-$(CONFIG_VIRTIO_PCI_LIB_LEGACY) += virtio_pci_legacy_dev.o
 obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
 obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
-virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
+virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o virtio_pci_modern_avq.o
 virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
 obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
 obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 3893dc29eb26..f4080692b351 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -302,9 +302,15 @@ static int virtio_dev_probe(struct device *_d)
 	if (err)
 		goto err;
 
+	if (dev->config->create_avq) {
+		err = dev->config->create_avq(dev);
+		if (err)
+			goto err;
+	}
+
 	err = drv->probe(dev);
 	if (err)
-		goto err;
+		goto err_probe;
 
 	/* If probe didn't do it, mark device DRIVER_OK ourselves. */
 	if (!(dev->config->get_status(dev) & VIRTIO_CONFIG_S_DRIVER_OK))
@@ -316,6 +322,10 @@ static int virtio_dev_probe(struct device *_d)
 	virtio_config_enable(dev);
 
 	return 0;
+
+err_probe:
+	if (dev->config->destroy_avq)
+		dev->config->destroy_avq(dev);
 err:
 	virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
 	return err;
@@ -331,6 +341,9 @@ static void virtio_dev_remove(struct device *_d)
 
 	drv->remove(dev);
 
+	if (dev->config->destroy_avq)
+		dev->config->destroy_avq(dev);
+
 	/* Driver should have reset device. */
 	WARN_ON_ONCE(dev->config->get_status(dev));
 
@@ -489,13 +502,20 @@ EXPORT_SYMBOL_GPL(unregister_virtio_device);
 int virtio_device_freeze(struct virtio_device *dev)
 {
 	struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
+	int ret;
 
 	virtio_config_disable(dev);
 
 	dev->failed = dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED;
 
-	if (drv && drv->freeze)
-		return drv->freeze(dev);
+	if (drv && drv->freeze) {
+		ret = drv->freeze(dev);
+		if (ret)
+			return ret;
+	}
+
+	if (dev->config->destroy_avq)
+		dev->config->destroy_avq(dev);
 
 	return 0;
 }
@@ -532,10 +552,16 @@ int virtio_device_restore(struct virtio_device *dev)
 	if (ret)
 		goto err;
 
+	if (dev->config->create_avq) {
+		ret = dev->config->create_avq(dev);
+		if (ret)
+			goto err;
+	}
+
 	if (drv->restore) {
 		ret = drv->restore(dev);
 		if (ret)
-			goto err;
+			goto err_restore;
 	}
 
 	/* If restore didn't do it, mark device DRIVER_OK ourselves. */
@@ -546,6 +572,9 @@ int virtio_device_restore(struct virtio_device *dev)
 
 	return 0;
 
+err_restore:
+	if (dev->config->destroy_avq)
+		dev->config->destroy_avq(dev);
 err:
 	virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
 	return ret;
diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
index 602021967aaa..9bffa95274b6 100644
--- a/drivers/virtio/virtio_pci_common.h
+++ b/drivers/virtio/virtio_pci_common.h
@@ -41,6 +41,14 @@ struct virtio_pci_vq_info {
 	unsigned int msix_vector;
 };
 
+struct virtio_avq {
+	/* Virtqueue info associated with this admin queue. */
+	struct virtio_pci_vq_info info;
+	/* Name of the admin queue: avq.$index. */
+	char name[10];
+	u16 vq_index;
+};
+
 /* Our device structure */
 struct virtio_pci_device {
 	struct virtio_device vdev;
@@ -58,10 +66,13 @@ struct virtio_pci_device {
 	spinlock_t lock;
 	struct list_head virtqueues;
 
-	/* array of all queues for house-keeping */
+	/* Array of all virtqueues reported in the
+	 * PCI common config num_queues field
+	 */
 	struct virtio_pci_vq_info **vqs;
 	u32 nvqs;
 
+	struct virtio_avq *admin;
 	/* MSI-X support */
 	int msix_enabled;
 	int intx_enabled;
@@ -115,6 +126,8 @@ int vp_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
 		const char * const names[], const bool *ctx,
 		struct irq_affinity *desc);
 const char *vp_bus_name(struct virtio_device *vdev);
+void vp_destroy_avq(struct virtio_device *vdev);
+int vp_create_avq(struct virtio_device *vdev);
 
 /* Setup the affinity for a virtqueue:
  * - force the affinity for per vq vector
diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
index d6bb68ba84e5..a72c87687196 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -37,6 +37,9 @@ static void vp_transport_features(struct virtio_device *vdev, u64 features)
 
 	if (features & BIT_ULL(VIRTIO_F_RING_RESET))
 		__virtio_set_bit(vdev, VIRTIO_F_RING_RESET);
+
+	if (features & BIT_ULL(VIRTIO_F_ADMIN_VQ))
+		__virtio_set_bit(vdev, VIRTIO_F_ADMIN_VQ);
 }
 
 /* virtio config->finalize_features() implementation */
@@ -317,7 +320,8 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
 	else
 		notify = vp_notify;
 
-	if (index >= vp_modern_get_num_queues(mdev))
+	if (!((index < vp_modern_get_num_queues(mdev) ||
+	      (vp_dev->admin && vp_dev->admin->vq_index == index))))
 		return ERR_PTR(-EINVAL);
 
 	/* Check if queue is either not available or already active. */
@@ -509,6 +513,8 @@ static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
 	.get_shm_region  = vp_get_shm_region,
 	.disable_vq_and_reset = vp_modern_disable_vq_and_reset,
 	.enable_vq_after_reset = vp_modern_enable_vq_after_reset,
+	.create_avq = vp_create_avq,
+	.destroy_avq = vp_destroy_avq,
 };
 
 static const struct virtio_config_ops virtio_pci_config_ops = {
@@ -529,6 +535,8 @@ static const struct virtio_config_ops virtio_pci_config_ops = {
 	.get_shm_region  = vp_get_shm_region,
 	.disable_vq_and_reset = vp_modern_disable_vq_and_reset,
 	.enable_vq_after_reset = vp_modern_enable_vq_after_reset,
+	.create_avq = vp_create_avq,
+	.destroy_avq = vp_destroy_avq,
 };
 
 /* the PCI probing function */
diff --git a/drivers/virtio/virtio_pci_modern_avq.c b/drivers/virtio/virtio_pci_modern_avq.c
new file mode 100644
index 000000000000..114579ad788f
--- /dev/null
+++ b/drivers/virtio/virtio_pci_modern_avq.c
@@ -0,0 +1,65 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include <linux/virtio.h>
+#include "virtio_pci_common.h"
+
+static u16 vp_modern_avq_num(struct virtio_pci_modern_device *mdev)
+{
+	struct virtio_pci_modern_common_cfg __iomem *cfg;
+
+	cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common;
+	return vp_ioread16(&cfg->admin_queue_num);
+}
+
+static u16 vp_modern_avq_index(struct virtio_pci_modern_device *mdev)
+{
+	struct virtio_pci_modern_common_cfg __iomem *cfg;
+
+	cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common;
+	return vp_ioread16(&cfg->admin_queue_index);
+}
+
+int vp_create_avq(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	struct virtio_avq *avq;
+	struct virtqueue *vq;
+	u16 admin_q_num;
+
+	if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ))
+		return 0;
+
+	admin_q_num = vp_modern_avq_num(&vp_dev->mdev);
+	if (!admin_q_num)
+		return -EINVAL;
+
+	vp_dev->admin = kzalloc(sizeof(*vp_dev->admin), GFP_KERNEL);
+	if (!vp_dev->admin)
+		return -ENOMEM;
+
+	avq = vp_dev->admin;
+	avq->vq_index = vp_modern_avq_index(&vp_dev->mdev);
+	sprintf(avq->name, "avq.%u", avq->vq_index);
+	vq = vp_dev->setup_vq(vp_dev, &vp_dev->admin->info, avq->vq_index, NULL,
+			      avq->name, NULL, VIRTIO_MSI_NO_VECTOR);
+	if (IS_ERR(vq)) {
+		dev_err(&vdev->dev, "failed to setup admin virtqueue");
+		kfree(vp_dev->admin);
+		return PTR_ERR(vq);
+	}
+
+	vp_dev->admin->info.vq = vq;
+	vp_modern_set_queue_enable(&vp_dev->mdev, avq->info.vq->index, true);
+	return 0;
+}
+
+void vp_destroy_avq(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+
+	if (!vp_dev->admin)
+		return;
+
+	vp_dev->del_vq(&vp_dev->admin->info);
+	kfree(vp_dev->admin);
+}
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index 2b3438de2c4d..028c51ea90ee 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -93,6 +93,8 @@ typedef void vq_callback_t(struct virtqueue *);
  *	Returns 0 on success or error status
  *	If disable_vq_and_reset is set, then enable_vq_after_reset must also be
  *	set.
+ * @create_avq: initialize admin virtqueue resource.
+ * @destroy_avq: destroy admin virtqueue resource.
  */
 struct virtio_config_ops {
 	void (*get)(struct virtio_device *vdev, unsigned offset,
@@ -120,6 +122,8 @@ struct virtio_config_ops {
 			       struct virtio_shm_region *region, u8 id);
 	int (*disable_vq_and_reset)(struct virtqueue *vq);
 	int (*enable_vq_after_reset)(struct virtqueue *vq);
+	int (*create_avq)(struct virtio_device *vdev);
+	void (*destroy_avq)(struct virtio_device *vdev);
 };
 
 /* If driver didn't advertise the feature, it will never appear. */
diff --git a/include/linux/virtio_pci_modern.h b/include/linux/virtio_pci_modern.h
index 067ac1d789bc..f6cb13d858fd 100644
--- a/include/linux/virtio_pci_modern.h
+++ b/include/linux/virtio_pci_modern.h
@@ -10,6 +10,9 @@ struct virtio_pci_modern_common_cfg {
 
 	__le16 queue_notify_data;	/* read-write */
 	__le16 queue_reset;		/* read-write */
+
+	__le16 admin_queue_index;	/* read-only */
+	__le16 admin_queue_num;		/* read-only */
 };
 
 struct virtio_pci_modern_device {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 321+ messages in thread

* [PATCH vfio 04/11] virtio: Expose the synchronous command helper function
  2023-09-21 12:40 ` Yishai Hadas
@ 2023-09-21 12:40   ` Yishai Hadas
  -1 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-09-21 12:40 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, maorg, virtualization, jiri, leonro

From: Feng Liu <feliu@nvidia.com>

Synchronous command helper function is exposed at virtio layer,
so that ctrl virtqueue and admin virtqueues can reuse this helper
function to send synchronous commands.

Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/net/virtio_net.c     | 21 ++++++---------------
 drivers/virtio/virtio_ring.c | 27 +++++++++++++++++++++++++++
 include/linux/virtio.h       |  7 +++++++
 3 files changed, 40 insertions(+), 15 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index fe7f314d65c9..65c210b0fb9e 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2451,7 +2451,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
 				 struct scatterlist *out)
 {
 	struct scatterlist *sgs[4], hdr, stat;
-	unsigned out_num = 0, tmp;
+	unsigned int out_num = 0;
 	int ret;
 
 	/* Caller should know better */
@@ -2472,23 +2472,14 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
 	sgs[out_num] = &stat;
 
 	BUG_ON(out_num + 1 > ARRAY_SIZE(sgs));
-	ret = virtqueue_add_sgs(vi->cvq, sgs, out_num, 1, vi, GFP_ATOMIC);
-	if (ret < 0) {
-		dev_warn(&vi->vdev->dev,
-			 "Failed to add sgs for command vq: %d\n.", ret);
+	ret = virtqueue_exec_cmd(vi->cvq, sgs, out_num, 1, vi, GFP_ATOMIC);
+	if (ret) {
+		dev_err(&vi->vdev->dev,
+			"Failed to exec command vq(%s,%d): %d\n",
+			vi->cvq->name, vi->cvq->index, ret);
 		return false;
 	}
 
-	if (unlikely(!virtqueue_kick(vi->cvq)))
-		return vi->ctrl->status == VIRTIO_NET_OK;
-
-	/* Spin for a response, the kick causes an ioport write, trapping
-	 * into the hypervisor, so the request should be handled immediately.
-	 */
-	while (!virtqueue_get_buf(vi->cvq, &tmp) &&
-	       !virtqueue_is_broken(vi->cvq))
-		cpu_relax();
-
 	return vi->ctrl->status == VIRTIO_NET_OK;
 }
 
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 51d8f3299c10..253905c0b008 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -3251,4 +3251,31 @@ void virtqueue_dma_sync_single_range_for_device(struct virtqueue *_vq,
 }
 EXPORT_SYMBOL_GPL(virtqueue_dma_sync_single_range_for_device);
 
+int virtqueue_exec_cmd(struct virtqueue *vq,
+		       struct scatterlist **sgs,
+		       unsigned int out_num,
+		       unsigned int in_num,
+		       void *data,
+		       gfp_t gfp)
+{
+	int ret, len;
+
+	ret = virtqueue_add_sgs(vq, sgs, out_num, in_num, data, gfp);
+	if (ret < 0)
+		return ret;
+
+	if (unlikely(!virtqueue_kick(vq)))
+		return -EIO;
+
+	/* Spin for a response, the kick causes an ioport write, trapping
+	 * into the hypervisor, so the request should be handled immediately.
+	 */
+	while (!virtqueue_get_buf(vq, &len) &&
+	       !virtqueue_is_broken(vq))
+		cpu_relax();
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(virtqueue_exec_cmd);
+
 MODULE_LICENSE("GPL");
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 4cc614a38376..9d39706bed10 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -103,6 +103,13 @@ int virtqueue_resize(struct virtqueue *vq, u32 num,
 int virtqueue_reset(struct virtqueue *vq,
 		    void (*recycle)(struct virtqueue *vq, void *buf));
 
+int virtqueue_exec_cmd(struct virtqueue *vq,
+		       struct scatterlist **sgs,
+		       unsigned int out_num,
+		       unsigned int in_num,
+		       void *data,
+		       gfp_t gfp);
+
 /**
  * struct virtio_device - representation of a device using virtio
  * @index: unique position on the virtio bus
-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 321+ messages in thread

* [PATCH vfio 04/11] virtio: Expose the synchronous command helper function
@ 2023-09-21 12:40   ` Yishai Hadas
  0 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas @ 2023-09-21 12:40 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, leonro, yishaih, maorg

From: Feng Liu <feliu@nvidia.com>

Synchronous command helper function is exposed at virtio layer,
so that ctrl virtqueue and admin virtqueues can reuse this helper
function to send synchronous commands.

Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/net/virtio_net.c     | 21 ++++++---------------
 drivers/virtio/virtio_ring.c | 27 +++++++++++++++++++++++++++
 include/linux/virtio.h       |  7 +++++++
 3 files changed, 40 insertions(+), 15 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index fe7f314d65c9..65c210b0fb9e 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2451,7 +2451,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
 				 struct scatterlist *out)
 {
 	struct scatterlist *sgs[4], hdr, stat;
-	unsigned out_num = 0, tmp;
+	unsigned int out_num = 0;
 	int ret;
 
 	/* Caller should know better */
@@ -2472,23 +2472,14 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
 	sgs[out_num] = &stat;
 
 	BUG_ON(out_num + 1 > ARRAY_SIZE(sgs));
-	ret = virtqueue_add_sgs(vi->cvq, sgs, out_num, 1, vi, GFP_ATOMIC);
-	if (ret < 0) {
-		dev_warn(&vi->vdev->dev,
-			 "Failed to add sgs for command vq: %d\n.", ret);
+	ret = virtqueue_exec_cmd(vi->cvq, sgs, out_num, 1, vi, GFP_ATOMIC);
+	if (ret) {
+		dev_err(&vi->vdev->dev,
+			"Failed to exec command vq(%s,%d): %d\n",
+			vi->cvq->name, vi->cvq->index, ret);
 		return false;
 	}
 
-	if (unlikely(!virtqueue_kick(vi->cvq)))
-		return vi->ctrl->status == VIRTIO_NET_OK;
-
-	/* Spin for a response, the kick causes an ioport write, trapping
-	 * into the hypervisor, so the request should be handled immediately.
-	 */
-	while (!virtqueue_get_buf(vi->cvq, &tmp) &&
-	       !virtqueue_is_broken(vi->cvq))
-		cpu_relax();
-
 	return vi->ctrl->status == VIRTIO_NET_OK;
 }
 
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 51d8f3299c10..253905c0b008 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -3251,4 +3251,31 @@ void virtqueue_dma_sync_single_range_for_device(struct virtqueue *_vq,
 }
 EXPORT_SYMBOL_GPL(virtqueue_dma_sync_single_range_for_device);
 
+int virtqueue_exec_cmd(struct virtqueue *vq,
+		       struct scatterlist **sgs,
+		       unsigned int out_num,
+		       unsigned int in_num,
+		       void *data,
+		       gfp_t gfp)
+{
+	int ret, len;
+
+	ret = virtqueue_add_sgs(vq, sgs, out_num, in_num, data, gfp);
+	if (ret < 0)
+		return ret;
+
+	if (unlikely(!virtqueue_kick(vq)))
+		return -EIO;
+
+	/* Spin for a response, the kick causes an ioport write, trapping
+	 * into the hypervisor, so the request should be handled immediately.
+	 */
+	while (!virtqueue_get_buf(vq, &len) &&
+	       !virtqueue_is_broken(vq))
+		cpu_relax();
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(virtqueue_exec_cmd);
+
 MODULE_LICENSE("GPL");
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 4cc614a38376..9d39706bed10 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -103,6 +103,13 @@ int virtqueue_resize(struct virtqueue *vq, u32 num,
 int virtqueue_reset(struct virtqueue *vq,
 		    void (*recycle)(struct virtqueue *vq, void *buf));
 
+int virtqueue_exec_cmd(struct virtqueue *vq,
+		       struct scatterlist **sgs,
+		       unsigned int out_num,
+		       unsigned int in_num,
+		       void *data,
+		       gfp_t gfp);
+
 /**
  * struct virtio_device - representation of a device using virtio
  * @index: unique position on the virtio bus
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 321+ messages in thread

* [PATCH vfio 05/11] virtio-pci: Introduce admin command sending function
  2023-09-21 12:40 ` Yishai Hadas
@ 2023-09-21 12:40   ` Yishai Hadas
  -1 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-09-21 12:40 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, maorg, virtualization, jiri, leonro

From: Feng Liu <feliu@nvidia.com>

Add support for sending admin command through admin virtqueue interface,
and expose generic API to execute virtio admin command. Reuse the send
synchronous command helper function at virtio transport layer. In
addition, add new result state of admin command and admin commands range
definitions.

Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/virtio/virtio.c                |  7 +++
 drivers/virtio/virtio_pci_common.h     |  1 +
 drivers/virtio/virtio_pci_modern.c     |  2 +
 drivers/virtio/virtio_pci_modern_avq.c | 73 ++++++++++++++++++++++++++
 include/linux/virtio.h                 | 11 ++++
 include/linux/virtio_config.h          |  3 ++
 include/uapi/linux/virtio_pci.h        | 22 ++++++++
 7 files changed, 119 insertions(+)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index f4080692b351..dd71f584a1bd 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -582,6 +582,13 @@ int virtio_device_restore(struct virtio_device *dev)
 EXPORT_SYMBOL_GPL(virtio_device_restore);
 #endif
 
+int virtio_admin_cmd_exec(struct virtio_device *vdev,
+			  struct virtio_admin_cmd *cmd)
+{
+	return vdev->config->exec_admin_cmd(vdev, cmd);
+}
+EXPORT_SYMBOL_GPL(virtio_admin_cmd_exec);
+
 static int virtio_init(void)
 {
 	if (bus_register(&virtio_bus) != 0)
diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
index 9bffa95274b6..a579f1338263 100644
--- a/drivers/virtio/virtio_pci_common.h
+++ b/drivers/virtio/virtio_pci_common.h
@@ -128,6 +128,7 @@ int vp_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
 const char *vp_bus_name(struct virtio_device *vdev);
 void vp_destroy_avq(struct virtio_device *vdev);
 int vp_create_avq(struct virtio_device *vdev);
+int vp_avq_cmd_exec(struct virtio_device *vdev, struct virtio_admin_cmd *cmd);
 
 /* Setup the affinity for a virtqueue:
  * - force the affinity for per vq vector
diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
index a72c87687196..cac18872b088 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -515,6 +515,7 @@ static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
 	.enable_vq_after_reset = vp_modern_enable_vq_after_reset,
 	.create_avq = vp_create_avq,
 	.destroy_avq = vp_destroy_avq,
+	.exec_admin_cmd = vp_avq_cmd_exec,
 };
 
 static const struct virtio_config_ops virtio_pci_config_ops = {
@@ -537,6 +538,7 @@ static const struct virtio_config_ops virtio_pci_config_ops = {
 	.enable_vq_after_reset = vp_modern_enable_vq_after_reset,
 	.create_avq = vp_create_avq,
 	.destroy_avq = vp_destroy_avq,
+	.exec_admin_cmd = vp_avq_cmd_exec,
 };
 
 /* the PCI probing function */
diff --git a/drivers/virtio/virtio_pci_modern_avq.c b/drivers/virtio/virtio_pci_modern_avq.c
index 114579ad788f..ca3fe10f616d 100644
--- a/drivers/virtio/virtio_pci_modern_avq.c
+++ b/drivers/virtio/virtio_pci_modern_avq.c
@@ -19,6 +19,79 @@ static u16 vp_modern_avq_index(struct virtio_pci_modern_device *mdev)
 	return vp_ioread16(&cfg->admin_queue_index);
 }
 
+#define VIRTIO_AVQ_SGS_MAX	4
+
+int vp_avq_cmd_exec(struct virtio_device *vdev, struct virtio_admin_cmd *cmd)
+{
+	struct scatterlist *sgs[VIRTIO_AVQ_SGS_MAX], hdr, stat;
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	struct virtio_admin_cmd_status *va_status;
+	unsigned int out_num = 0, in_num = 0;
+	struct virtio_admin_cmd_hdr *va_hdr;
+	struct virtqueue *avq;
+	u16 status;
+	int ret;
+
+	avq = vp_dev->admin ? vp_dev->admin->info.vq : NULL;
+	if (!avq)
+		return -EOPNOTSUPP;
+
+	va_status = kzalloc(sizeof(*va_status), GFP_KERNEL);
+	if (!va_status)
+		return -ENOMEM;
+
+	va_hdr = kzalloc(sizeof(*va_hdr), GFP_KERNEL);
+	if (!va_hdr) {
+		ret = -ENOMEM;
+		goto err_alloc;
+	}
+
+	va_hdr->opcode = cmd->opcode;
+	va_hdr->group_type = cmd->group_type;
+	va_hdr->group_member_id = cmd->group_member_id;
+
+	/* Add header */
+	sg_init_one(&hdr, va_hdr, sizeof(*va_hdr));
+	sgs[out_num] = &hdr;
+	out_num++;
+
+	if (cmd->data_sg) {
+		sgs[out_num] = cmd->data_sg;
+		out_num++;
+	}
+
+	/* Add return status */
+	sg_init_one(&stat, va_status, sizeof(*va_status));
+	sgs[out_num + in_num] = &stat;
+	in_num++;
+
+	if (cmd->result_sg) {
+		sgs[out_num + in_num] = cmd->result_sg;
+		in_num++;
+	}
+
+	ret = virtqueue_exec_cmd(avq, sgs, out_num, in_num, sgs, GFP_KERNEL);
+	if (ret) {
+		dev_err(&vdev->dev,
+			"Failed to execute command on admin vq: %d\n.", ret);
+		goto err_cmd_exec;
+	}
+
+	status = le16_to_cpu(va_status->status);
+	if (status != VIRTIO_ADMIN_STATUS_OK) {
+		dev_err(&vdev->dev,
+			"admin command error: status(%#x) qualifier(%#x)\n",
+			status, le16_to_cpu(va_status->status_qualifier));
+		ret = -status;
+	}
+
+err_cmd_exec:
+	kfree(va_hdr);
+err_alloc:
+	kfree(va_status);
+	return ret;
+}
+
 int vp_create_avq(struct virtio_device *vdev)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 9d39706bed10..094a2ef1c8b8 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -110,6 +110,14 @@ int virtqueue_exec_cmd(struct virtqueue *vq,
 		       void *data,
 		       gfp_t gfp);
 
+struct virtio_admin_cmd {
+	__le16 opcode;
+	__le16 group_type;
+	__le64 group_member_id;
+	struct scatterlist *data_sg;
+	struct scatterlist *result_sg;
+};
+
 /**
  * struct virtio_device - representation of a device using virtio
  * @index: unique position on the virtio bus
@@ -207,6 +215,9 @@ static inline struct virtio_driver *drv_to_virtio(struct device_driver *drv)
 	return container_of(drv, struct virtio_driver, driver);
 }
 
+int virtio_admin_cmd_exec(struct virtio_device *vdev,
+			  struct virtio_admin_cmd *cmd);
+
 int register_virtio_driver(struct virtio_driver *drv);
 void unregister_virtio_driver(struct virtio_driver *drv);
 
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index 028c51ea90ee..e213173e1291 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -95,6 +95,7 @@ typedef void vq_callback_t(struct virtqueue *);
  *	set.
  * @create_avq: initialize admin virtqueue resource.
  * @destroy_avq: destroy admin virtqueue resource.
+ * @exec_admin_cmd: Send admin command and get result.
  */
 struct virtio_config_ops {
 	void (*get)(struct virtio_device *vdev, unsigned offset,
@@ -124,6 +125,8 @@ struct virtio_config_ops {
 	int (*enable_vq_after_reset)(struct virtqueue *vq);
 	int (*create_avq)(struct virtio_device *vdev);
 	void (*destroy_avq)(struct virtio_device *vdev);
+	int (*exec_admin_cmd)(struct virtio_device *vdev,
+			      struct virtio_admin_cmd *cmd);
 };
 
 /* If driver didn't advertise the feature, it will never appear. */
diff --git a/include/uapi/linux/virtio_pci.h b/include/uapi/linux/virtio_pci.h
index f703afc7ad31..1f1ac6ac07df 100644
--- a/include/uapi/linux/virtio_pci.h
+++ b/include/uapi/linux/virtio_pci.h
@@ -207,4 +207,26 @@ struct virtio_pci_cfg_cap {
 
 #endif /* VIRTIO_PCI_NO_MODERN */
 
+/* Admin command status. */
+#define VIRTIO_ADMIN_STATUS_OK		0
+
+struct virtio_admin_cmd_hdr {
+	__le16 opcode;
+	/*
+	 * 1 - SR-IOV
+	 * 2-65535 - reserved
+	 */
+	__le16 group_type;
+	/* Unused, reserved for future extensions. */
+	__u8 reserved1[12];
+	__le64 group_member_id;
+} __packed;
+
+struct virtio_admin_cmd_status {
+	__le16 status;
+	__le16 status_qualifier;
+	/* Unused, reserved for future extensions. */
+	__u8 reserved2[4];
+} __packed;
+
 #endif
-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 321+ messages in thread

* [PATCH vfio 05/11] virtio-pci: Introduce admin command sending function
@ 2023-09-21 12:40   ` Yishai Hadas
  0 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas @ 2023-09-21 12:40 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, leonro, yishaih, maorg

From: Feng Liu <feliu@nvidia.com>

Add support for sending admin command through admin virtqueue interface,
and expose generic API to execute virtio admin command. Reuse the send
synchronous command helper function at virtio transport layer. In
addition, add new result state of admin command and admin commands range
definitions.

Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/virtio/virtio.c                |  7 +++
 drivers/virtio/virtio_pci_common.h     |  1 +
 drivers/virtio/virtio_pci_modern.c     |  2 +
 drivers/virtio/virtio_pci_modern_avq.c | 73 ++++++++++++++++++++++++++
 include/linux/virtio.h                 | 11 ++++
 include/linux/virtio_config.h          |  3 ++
 include/uapi/linux/virtio_pci.h        | 22 ++++++++
 7 files changed, 119 insertions(+)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index f4080692b351..dd71f584a1bd 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -582,6 +582,13 @@ int virtio_device_restore(struct virtio_device *dev)
 EXPORT_SYMBOL_GPL(virtio_device_restore);
 #endif
 
+int virtio_admin_cmd_exec(struct virtio_device *vdev,
+			  struct virtio_admin_cmd *cmd)
+{
+	return vdev->config->exec_admin_cmd(vdev, cmd);
+}
+EXPORT_SYMBOL_GPL(virtio_admin_cmd_exec);
+
 static int virtio_init(void)
 {
 	if (bus_register(&virtio_bus) != 0)
diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
index 9bffa95274b6..a579f1338263 100644
--- a/drivers/virtio/virtio_pci_common.h
+++ b/drivers/virtio/virtio_pci_common.h
@@ -128,6 +128,7 @@ int vp_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
 const char *vp_bus_name(struct virtio_device *vdev);
 void vp_destroy_avq(struct virtio_device *vdev);
 int vp_create_avq(struct virtio_device *vdev);
+int vp_avq_cmd_exec(struct virtio_device *vdev, struct virtio_admin_cmd *cmd);
 
 /* Setup the affinity for a virtqueue:
  * - force the affinity for per vq vector
diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
index a72c87687196..cac18872b088 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -515,6 +515,7 @@ static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
 	.enable_vq_after_reset = vp_modern_enable_vq_after_reset,
 	.create_avq = vp_create_avq,
 	.destroy_avq = vp_destroy_avq,
+	.exec_admin_cmd = vp_avq_cmd_exec,
 };
 
 static const struct virtio_config_ops virtio_pci_config_ops = {
@@ -537,6 +538,7 @@ static const struct virtio_config_ops virtio_pci_config_ops = {
 	.enable_vq_after_reset = vp_modern_enable_vq_after_reset,
 	.create_avq = vp_create_avq,
 	.destroy_avq = vp_destroy_avq,
+	.exec_admin_cmd = vp_avq_cmd_exec,
 };
 
 /* the PCI probing function */
diff --git a/drivers/virtio/virtio_pci_modern_avq.c b/drivers/virtio/virtio_pci_modern_avq.c
index 114579ad788f..ca3fe10f616d 100644
--- a/drivers/virtio/virtio_pci_modern_avq.c
+++ b/drivers/virtio/virtio_pci_modern_avq.c
@@ -19,6 +19,79 @@ static u16 vp_modern_avq_index(struct virtio_pci_modern_device *mdev)
 	return vp_ioread16(&cfg->admin_queue_index);
 }
 
+#define VIRTIO_AVQ_SGS_MAX	4
+
+int vp_avq_cmd_exec(struct virtio_device *vdev, struct virtio_admin_cmd *cmd)
+{
+	struct scatterlist *sgs[VIRTIO_AVQ_SGS_MAX], hdr, stat;
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	struct virtio_admin_cmd_status *va_status;
+	unsigned int out_num = 0, in_num = 0;
+	struct virtio_admin_cmd_hdr *va_hdr;
+	struct virtqueue *avq;
+	u16 status;
+	int ret;
+
+	avq = vp_dev->admin ? vp_dev->admin->info.vq : NULL;
+	if (!avq)
+		return -EOPNOTSUPP;
+
+	va_status = kzalloc(sizeof(*va_status), GFP_KERNEL);
+	if (!va_status)
+		return -ENOMEM;
+
+	va_hdr = kzalloc(sizeof(*va_hdr), GFP_KERNEL);
+	if (!va_hdr) {
+		ret = -ENOMEM;
+		goto err_alloc;
+	}
+
+	va_hdr->opcode = cmd->opcode;
+	va_hdr->group_type = cmd->group_type;
+	va_hdr->group_member_id = cmd->group_member_id;
+
+	/* Add header */
+	sg_init_one(&hdr, va_hdr, sizeof(*va_hdr));
+	sgs[out_num] = &hdr;
+	out_num++;
+
+	if (cmd->data_sg) {
+		sgs[out_num] = cmd->data_sg;
+		out_num++;
+	}
+
+	/* Add return status */
+	sg_init_one(&stat, va_status, sizeof(*va_status));
+	sgs[out_num + in_num] = &stat;
+	in_num++;
+
+	if (cmd->result_sg) {
+		sgs[out_num + in_num] = cmd->result_sg;
+		in_num++;
+	}
+
+	ret = virtqueue_exec_cmd(avq, sgs, out_num, in_num, sgs, GFP_KERNEL);
+	if (ret) {
+		dev_err(&vdev->dev,
+			"Failed to execute command on admin vq: %d\n.", ret);
+		goto err_cmd_exec;
+	}
+
+	status = le16_to_cpu(va_status->status);
+	if (status != VIRTIO_ADMIN_STATUS_OK) {
+		dev_err(&vdev->dev,
+			"admin command error: status(%#x) qualifier(%#x)\n",
+			status, le16_to_cpu(va_status->status_qualifier));
+		ret = -status;
+	}
+
+err_cmd_exec:
+	kfree(va_hdr);
+err_alloc:
+	kfree(va_status);
+	return ret;
+}
+
 int vp_create_avq(struct virtio_device *vdev)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 9d39706bed10..094a2ef1c8b8 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -110,6 +110,14 @@ int virtqueue_exec_cmd(struct virtqueue *vq,
 		       void *data,
 		       gfp_t gfp);
 
+struct virtio_admin_cmd {
+	__le16 opcode;
+	__le16 group_type;
+	__le64 group_member_id;
+	struct scatterlist *data_sg;
+	struct scatterlist *result_sg;
+};
+
 /**
  * struct virtio_device - representation of a device using virtio
  * @index: unique position on the virtio bus
@@ -207,6 +215,9 @@ static inline struct virtio_driver *drv_to_virtio(struct device_driver *drv)
 	return container_of(drv, struct virtio_driver, driver);
 }
 
+int virtio_admin_cmd_exec(struct virtio_device *vdev,
+			  struct virtio_admin_cmd *cmd);
+
 int register_virtio_driver(struct virtio_driver *drv);
 void unregister_virtio_driver(struct virtio_driver *drv);
 
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index 028c51ea90ee..e213173e1291 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -95,6 +95,7 @@ typedef void vq_callback_t(struct virtqueue *);
  *	set.
  * @create_avq: initialize admin virtqueue resource.
  * @destroy_avq: destroy admin virtqueue resource.
+ * @exec_admin_cmd: Send admin command and get result.
  */
 struct virtio_config_ops {
 	void (*get)(struct virtio_device *vdev, unsigned offset,
@@ -124,6 +125,8 @@ struct virtio_config_ops {
 	int (*enable_vq_after_reset)(struct virtqueue *vq);
 	int (*create_avq)(struct virtio_device *vdev);
 	void (*destroy_avq)(struct virtio_device *vdev);
+	int (*exec_admin_cmd)(struct virtio_device *vdev,
+			      struct virtio_admin_cmd *cmd);
 };
 
 /* If driver didn't advertise the feature, it will never appear. */
diff --git a/include/uapi/linux/virtio_pci.h b/include/uapi/linux/virtio_pci.h
index f703afc7ad31..1f1ac6ac07df 100644
--- a/include/uapi/linux/virtio_pci.h
+++ b/include/uapi/linux/virtio_pci.h
@@ -207,4 +207,26 @@ struct virtio_pci_cfg_cap {
 
 #endif /* VIRTIO_PCI_NO_MODERN */
 
+/* Admin command status. */
+#define VIRTIO_ADMIN_STATUS_OK		0
+
+struct virtio_admin_cmd_hdr {
+	__le16 opcode;
+	/*
+	 * 1 - SR-IOV
+	 * 2-65535 - reserved
+	 */
+	__le16 group_type;
+	/* Unused, reserved for future extensions. */
+	__u8 reserved1[12];
+	__le64 group_member_id;
+} __packed;
+
+struct virtio_admin_cmd_status {
+	__le16 status;
+	__le16 status_qualifier;
+	/* Unused, reserved for future extensions. */
+	__u8 reserved2[4];
+} __packed;
+
 #endif
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 321+ messages in thread

* [PATCH vfio 06/11] virtio-pci: Introduce API to get PF virtio device from VF PCI device
  2023-09-21 12:40 ` Yishai Hadas
@ 2023-09-21 12:40   ` Yishai Hadas
  -1 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-09-21 12:40 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, maorg, virtualization, jiri, leonro

From: Feng Liu <feliu@nvidia.com>

Introduce API to get PF virtio device from the given VF PCI device so
that other modules such as vfio in subsequent patch can use it.

Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/virtio/virtio_pci_common.c | 12 ++++++++++++
 include/linux/virtio.h             |  1 +
 2 files changed, 13 insertions(+)

diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
index 7a3e6edc4dd6..c64484cd5b13 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -648,6 +648,18 @@ static struct pci_driver virtio_pci_driver = {
 	.sriov_configure = virtio_pci_sriov_configure,
 };
 
+struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev)
+{
+	struct virtio_pci_device *pf_vp_dev;
+
+	pf_vp_dev = pci_iov_get_pf_drvdata(pdev, &virtio_pci_driver);
+	if (IS_ERR(pf_vp_dev))
+		return NULL;
+
+	return &pf_vp_dev->vdev;
+}
+EXPORT_SYMBOL_GPL(virtio_pci_vf_get_pf_dev);
+
 module_pci_driver(virtio_pci_driver);
 
 MODULE_AUTHOR("Anthony Liguori <aliguori@us.ibm.com>");
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 094a2ef1c8b8..4ae088ea9299 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -217,6 +217,7 @@ static inline struct virtio_driver *drv_to_virtio(struct device_driver *drv)
 
 int virtio_admin_cmd_exec(struct virtio_device *vdev,
 			  struct virtio_admin_cmd *cmd);
+struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev);
 
 int register_virtio_driver(struct virtio_driver *drv);
 void unregister_virtio_driver(struct virtio_driver *drv);
-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 321+ messages in thread

* [PATCH vfio 06/11] virtio-pci: Introduce API to get PF virtio device from VF PCI device
@ 2023-09-21 12:40   ` Yishai Hadas
  0 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas @ 2023-09-21 12:40 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, leonro, yishaih, maorg

From: Feng Liu <feliu@nvidia.com>

Introduce API to get PF virtio device from the given VF PCI device so
that other modules such as vfio in subsequent patch can use it.

Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/virtio/virtio_pci_common.c | 12 ++++++++++++
 include/linux/virtio.h             |  1 +
 2 files changed, 13 insertions(+)

diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
index 7a3e6edc4dd6..c64484cd5b13 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -648,6 +648,18 @@ static struct pci_driver virtio_pci_driver = {
 	.sriov_configure = virtio_pci_sriov_configure,
 };
 
+struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev)
+{
+	struct virtio_pci_device *pf_vp_dev;
+
+	pf_vp_dev = pci_iov_get_pf_drvdata(pdev, &virtio_pci_driver);
+	if (IS_ERR(pf_vp_dev))
+		return NULL;
+
+	return &pf_vp_dev->vdev;
+}
+EXPORT_SYMBOL_GPL(virtio_pci_vf_get_pf_dev);
+
 module_pci_driver(virtio_pci_driver);
 
 MODULE_AUTHOR("Anthony Liguori <aliguori@us.ibm.com>");
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 094a2ef1c8b8..4ae088ea9299 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -217,6 +217,7 @@ static inline struct virtio_driver *drv_to_virtio(struct device_driver *drv)
 
 int virtio_admin_cmd_exec(struct virtio_device *vdev,
 			  struct virtio_admin_cmd *cmd);
+struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev);
 
 int register_virtio_driver(struct virtio_driver *drv);
 void unregister_virtio_driver(struct virtio_driver *drv);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 321+ messages in thread

* [PATCH vfio 07/11] virtio-pci: Introduce admin commands
  2023-09-21 12:40 ` Yishai Hadas
@ 2023-09-21 12:40   ` Yishai Hadas
  -1 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-09-21 12:40 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, maorg, virtualization, jiri, leonro

From: Feng Liu <feliu@nvidia.com>

Introduces admin commands, as follow:

The "list query" command can be used by the driver to query the
set of admin commands supported by the virtio device.
The "list use" command is used to inform the virtio device which
admin commands the driver will use.
The "legacy common cfg rd/wr" commands are used to read from/write
into the legacy common configuration structure.
The "legacy dev cfg rd/wr" commands are used to read from/write
into the legacy device configuration structure.
The "notify info" command is used to query the notification region
information.

Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 include/uapi/linux/virtio_pci.h | 44 +++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/include/uapi/linux/virtio_pci.h b/include/uapi/linux/virtio_pci.h
index 1f1ac6ac07df..2bf275ad0f20 100644
--- a/include/uapi/linux/virtio_pci.h
+++ b/include/uapi/linux/virtio_pci.h
@@ -210,6 +210,23 @@ struct virtio_pci_cfg_cap {
 /* Admin command status. */
 #define VIRTIO_ADMIN_STATUS_OK		0
 
+/* Admin command opcode. */
+#define VIRTIO_ADMIN_CMD_LIST_QUERY	0x0
+#define VIRTIO_ADMIN_CMD_LIST_USE	0x1
+
+/* Admin command group type. */
+#define VIRTIO_ADMIN_GROUP_TYPE_SRIOV	0x1
+
+/* Transitional device admin command. */
+#define VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE	0x2
+#define VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ		0x3
+#define VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE		0x4
+#define VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ		0x5
+#define VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO		0x6
+
+/* Increment MAX_OPCODE to next value when new opcode is added */
+#define VIRTIO_ADMIN_MAX_CMD_OPCODE			0x6
+
 struct virtio_admin_cmd_hdr {
 	__le16 opcode;
 	/*
@@ -229,4 +246,31 @@ struct virtio_admin_cmd_status {
 	__u8 reserved2[4];
 } __packed;
 
+struct virtio_admin_cmd_legacy_wr_data {
+	u8 offset; /* Starting offset of the register(s) to write. */
+	u8 reserved[7];
+	u8 registers[];
+} __packed;
+
+struct virtio_admin_cmd_legacy_rd_data {
+	u8 offset; /* Starting offset of the register(s) to read. */
+} __packed;
+
+#define VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END 0
+#define VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_DEV 0x1
+#define VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM 0x2
+
+#define VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO 4
+
+struct virtio_admin_cmd_notify_info_data {
+	u8 flags; /* 0 = end of list, 1 = owner device, 2 = member device */
+	u8 bar; /* BAR of the member or the owner device */
+	u8 padding[6];
+	__le64 offset; /* Offset within bar. */
+}; __packed
+
+struct virtio_admin_cmd_notify_info_result {
+	struct virtio_admin_cmd_notify_info_data entries[VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO];
+};
+
 #endif
-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 321+ messages in thread

* [PATCH vfio 07/11] virtio-pci: Introduce admin commands
@ 2023-09-21 12:40   ` Yishai Hadas
  0 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas @ 2023-09-21 12:40 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, leonro, yishaih, maorg

From: Feng Liu <feliu@nvidia.com>

Introduces admin commands, as follow:

The "list query" command can be used by the driver to query the
set of admin commands supported by the virtio device.
The "list use" command is used to inform the virtio device which
admin commands the driver will use.
The "legacy common cfg rd/wr" commands are used to read from/write
into the legacy common configuration structure.
The "legacy dev cfg rd/wr" commands are used to read from/write
into the legacy device configuration structure.
The "notify info" command is used to query the notification region
information.

Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 include/uapi/linux/virtio_pci.h | 44 +++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/include/uapi/linux/virtio_pci.h b/include/uapi/linux/virtio_pci.h
index 1f1ac6ac07df..2bf275ad0f20 100644
--- a/include/uapi/linux/virtio_pci.h
+++ b/include/uapi/linux/virtio_pci.h
@@ -210,6 +210,23 @@ struct virtio_pci_cfg_cap {
 /* Admin command status. */
 #define VIRTIO_ADMIN_STATUS_OK		0
 
+/* Admin command opcode. */
+#define VIRTIO_ADMIN_CMD_LIST_QUERY	0x0
+#define VIRTIO_ADMIN_CMD_LIST_USE	0x1
+
+/* Admin command group type. */
+#define VIRTIO_ADMIN_GROUP_TYPE_SRIOV	0x1
+
+/* Transitional device admin command. */
+#define VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE	0x2
+#define VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ		0x3
+#define VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE		0x4
+#define VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ		0x5
+#define VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO		0x6
+
+/* Increment MAX_OPCODE to next value when new opcode is added */
+#define VIRTIO_ADMIN_MAX_CMD_OPCODE			0x6
+
 struct virtio_admin_cmd_hdr {
 	__le16 opcode;
 	/*
@@ -229,4 +246,31 @@ struct virtio_admin_cmd_status {
 	__u8 reserved2[4];
 } __packed;
 
+struct virtio_admin_cmd_legacy_wr_data {
+	u8 offset; /* Starting offset of the register(s) to write. */
+	u8 reserved[7];
+	u8 registers[];
+} __packed;
+
+struct virtio_admin_cmd_legacy_rd_data {
+	u8 offset; /* Starting offset of the register(s) to read. */
+} __packed;
+
+#define VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END 0
+#define VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_DEV 0x1
+#define VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM 0x2
+
+#define VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO 4
+
+struct virtio_admin_cmd_notify_info_data {
+	u8 flags; /* 0 = end of list, 1 = owner device, 2 = member device */
+	u8 bar; /* BAR of the member or the owner device */
+	u8 padding[6];
+	__le64 offset; /* Offset within bar. */
+}; __packed
+
+struct virtio_admin_cmd_notify_info_result {
+	struct virtio_admin_cmd_notify_info_data entries[VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO];
+};
+
 #endif
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 321+ messages in thread

* [PATCH vfio 08/11] vfio/pci: Expose vfio_pci_core_setup_barmap()
  2023-09-21 12:40 ` Yishai Hadas
@ 2023-09-21 12:40   ` Yishai Hadas
  -1 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-09-21 12:40 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, maorg, virtualization, jiri, leonro

Expose vfio_pci_core_setup_barmap() to be used by drivers.

This will let drivers to mmap a BAR and re-use it from both vfio and the
driver when it's applicable.

This API will be used in the next patches by the vfio/virtio coming
driver.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 25 +++++++++++++++++++++++++
 drivers/vfio/pci/vfio_pci_rdwr.c | 28 ++--------------------------
 include/linux/vfio_pci_core.h    |  1 +
 3 files changed, 28 insertions(+), 26 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 1929103ee59a..b56111ed8a8c 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -684,6 +684,31 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
 }
 EXPORT_SYMBOL_GPL(vfio_pci_core_disable);
 
+int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar)
+{
+	struct pci_dev *pdev = vdev->pdev;
+	void __iomem *io;
+	int ret;
+
+	if (vdev->barmap[bar])
+		return 0;
+
+	ret = pci_request_selected_regions(pdev, 1 << bar, "vfio");
+	if (ret)
+		return ret;
+
+	io = pci_iomap(pdev, bar, 0);
+	if (!io) {
+		pci_release_selected_regions(pdev, 1 << bar);
+		return -ENOMEM;
+	}
+
+	vdev->barmap[bar] = io;
+
+	return 0;
+}
+EXPORT_SYMBOL(vfio_pci_core_setup_barmap);
+
 void vfio_pci_core_close_device(struct vfio_device *core_vdev)
 {
 	struct vfio_pci_core_device *vdev =
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index e27de61ac9fe..6f08b3ecbb89 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -200,30 +200,6 @@ static ssize_t do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem,
 	return done;
 }
 
-static int vfio_pci_setup_barmap(struct vfio_pci_core_device *vdev, int bar)
-{
-	struct pci_dev *pdev = vdev->pdev;
-	int ret;
-	void __iomem *io;
-
-	if (vdev->barmap[bar])
-		return 0;
-
-	ret = pci_request_selected_regions(pdev, 1 << bar, "vfio");
-	if (ret)
-		return ret;
-
-	io = pci_iomap(pdev, bar, 0);
-	if (!io) {
-		pci_release_selected_regions(pdev, 1 << bar);
-		return -ENOMEM;
-	}
-
-	vdev->barmap[bar] = io;
-
-	return 0;
-}
-
 ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
 			size_t count, loff_t *ppos, bool iswrite)
 {
@@ -262,7 +238,7 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
 		}
 		x_end = end;
 	} else {
-		int ret = vfio_pci_setup_barmap(vdev, bar);
+		int ret = vfio_pci_core_setup_barmap(vdev, bar);
 		if (ret) {
 			done = ret;
 			goto out;
@@ -438,7 +414,7 @@ int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset,
 		return -EINVAL;
 #endif
 
-	ret = vfio_pci_setup_barmap(vdev, bar);
+	ret = vfio_pci_core_setup_barmap(vdev, bar);
 	if (ret)
 		return ret;
 
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 562e8754869d..67ac58e20e1d 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -127,6 +127,7 @@ int vfio_pci_core_match(struct vfio_device *core_vdev, char *buf);
 int vfio_pci_core_enable(struct vfio_pci_core_device *vdev);
 void vfio_pci_core_disable(struct vfio_pci_core_device *vdev);
 void vfio_pci_core_finish_enable(struct vfio_pci_core_device *vdev);
+int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar);
 pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev,
 						pci_channel_state_t state);
 
-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 321+ messages in thread

* [PATCH vfio 08/11] vfio/pci: Expose vfio_pci_core_setup_barmap()
@ 2023-09-21 12:40   ` Yishai Hadas
  0 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas @ 2023-09-21 12:40 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, leonro, yishaih, maorg

Expose vfio_pci_core_setup_barmap() to be used by drivers.

This will let drivers to mmap a BAR and re-use it from both vfio and the
driver when it's applicable.

This API will be used in the next patches by the vfio/virtio coming
driver.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 25 +++++++++++++++++++++++++
 drivers/vfio/pci/vfio_pci_rdwr.c | 28 ++--------------------------
 include/linux/vfio_pci_core.h    |  1 +
 3 files changed, 28 insertions(+), 26 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 1929103ee59a..b56111ed8a8c 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -684,6 +684,31 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
 }
 EXPORT_SYMBOL_GPL(vfio_pci_core_disable);
 
+int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar)
+{
+	struct pci_dev *pdev = vdev->pdev;
+	void __iomem *io;
+	int ret;
+
+	if (vdev->barmap[bar])
+		return 0;
+
+	ret = pci_request_selected_regions(pdev, 1 << bar, "vfio");
+	if (ret)
+		return ret;
+
+	io = pci_iomap(pdev, bar, 0);
+	if (!io) {
+		pci_release_selected_regions(pdev, 1 << bar);
+		return -ENOMEM;
+	}
+
+	vdev->barmap[bar] = io;
+
+	return 0;
+}
+EXPORT_SYMBOL(vfio_pci_core_setup_barmap);
+
 void vfio_pci_core_close_device(struct vfio_device *core_vdev)
 {
 	struct vfio_pci_core_device *vdev =
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index e27de61ac9fe..6f08b3ecbb89 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -200,30 +200,6 @@ static ssize_t do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem,
 	return done;
 }
 
-static int vfio_pci_setup_barmap(struct vfio_pci_core_device *vdev, int bar)
-{
-	struct pci_dev *pdev = vdev->pdev;
-	int ret;
-	void __iomem *io;
-
-	if (vdev->barmap[bar])
-		return 0;
-
-	ret = pci_request_selected_regions(pdev, 1 << bar, "vfio");
-	if (ret)
-		return ret;
-
-	io = pci_iomap(pdev, bar, 0);
-	if (!io) {
-		pci_release_selected_regions(pdev, 1 << bar);
-		return -ENOMEM;
-	}
-
-	vdev->barmap[bar] = io;
-
-	return 0;
-}
-
 ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
 			size_t count, loff_t *ppos, bool iswrite)
 {
@@ -262,7 +238,7 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
 		}
 		x_end = end;
 	} else {
-		int ret = vfio_pci_setup_barmap(vdev, bar);
+		int ret = vfio_pci_core_setup_barmap(vdev, bar);
 		if (ret) {
 			done = ret;
 			goto out;
@@ -438,7 +414,7 @@ int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset,
 		return -EINVAL;
 #endif
 
-	ret = vfio_pci_setup_barmap(vdev, bar);
+	ret = vfio_pci_core_setup_barmap(vdev, bar);
 	if (ret)
 		return ret;
 
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 562e8754869d..67ac58e20e1d 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -127,6 +127,7 @@ int vfio_pci_core_match(struct vfio_device *core_vdev, char *buf);
 int vfio_pci_core_enable(struct vfio_pci_core_device *vdev);
 void vfio_pci_core_disable(struct vfio_pci_core_device *vdev);
 void vfio_pci_core_finish_enable(struct vfio_pci_core_device *vdev);
+int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar);
 pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev,
 						pci_channel_state_t state);
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 321+ messages in thread

* [PATCH vfio 09/11] vfio/pci: Expose vfio_pci_iowrite/read##size()
  2023-09-21 12:40 ` Yishai Hadas
@ 2023-09-21 12:40   ` Yishai Hadas
  -1 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-09-21 12:40 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, maorg, virtualization, jiri, leonro

Expose vfio_pci_iowrite/read##size() to let it be used by drivers.

This functionality is needed to enable direct access to some physical
BAR of the device with the proper locks/checks in place.

The next patches from this series will use this functionality on a data
path flow when a direct access to the BAR is needed.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/vfio/pci/vfio_pci_rdwr.c | 10 ++++++----
 include/linux/vfio_pci_core.h    | 19 +++++++++++++++++++
 2 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index 6f08b3ecbb89..5d84bad7d30c 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -38,7 +38,7 @@
 #define vfio_iowrite8	iowrite8
 
 #define VFIO_IOWRITE(size) \
-static int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev,		\
+int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev,		\
 			bool test_mem, u##size val, void __iomem *io)	\
 {									\
 	if (test_mem) {							\
@@ -55,7 +55,8 @@ static int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev,		\
 		up_read(&vdev->memory_lock);				\
 									\
 	return 0;							\
-}
+}									\
+EXPORT_SYMBOL(vfio_pci_iowrite##size);
 
 VFIO_IOWRITE(8)
 VFIO_IOWRITE(16)
@@ -65,7 +66,7 @@ VFIO_IOWRITE(64)
 #endif
 
 #define VFIO_IOREAD(size) \
-static int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev,		\
+int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev,		\
 			bool test_mem, u##size *val, void __iomem *io)	\
 {									\
 	if (test_mem) {							\
@@ -82,7 +83,8 @@ static int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev,		\
 		up_read(&vdev->memory_lock);				\
 									\
 	return 0;							\
-}
+}									\
+EXPORT_SYMBOL(vfio_pci_ioread##size);
 
 VFIO_IOREAD(8)
 VFIO_IOREAD(16)
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 67ac58e20e1d..22c915317788 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -131,4 +131,23 @@ int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar);
 pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev,
 						pci_channel_state_t state);
 
+#define VFIO_IOWRITE_DECLATION(size) \
+int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev,		\
+			bool test_mem, u##size val, void __iomem *io);
+
+VFIO_IOWRITE_DECLATION(8)
+VFIO_IOWRITE_DECLATION(16)
+VFIO_IOWRITE_DECLATION(32)
+#ifdef iowrite64
+VFIO_IOWRITE_DECLATION(64)
+#endif
+
+#define VFIO_IOREAD_DECLATION(size) \
+int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev,		\
+			bool test_mem, u##size *val, void __iomem *io);
+
+VFIO_IOREAD_DECLATION(8)
+VFIO_IOREAD_DECLATION(16)
+VFIO_IOREAD_DECLATION(32)
+
 #endif /* VFIO_PCI_CORE_H */
-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 321+ messages in thread

* [PATCH vfio 09/11] vfio/pci: Expose vfio_pci_iowrite/read##size()
@ 2023-09-21 12:40   ` Yishai Hadas
  0 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas @ 2023-09-21 12:40 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, leonro, yishaih, maorg

Expose vfio_pci_iowrite/read##size() to let it be used by drivers.

This functionality is needed to enable direct access to some physical
BAR of the device with the proper locks/checks in place.

The next patches from this series will use this functionality on a data
path flow when a direct access to the BAR is needed.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/vfio/pci/vfio_pci_rdwr.c | 10 ++++++----
 include/linux/vfio_pci_core.h    | 19 +++++++++++++++++++
 2 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index 6f08b3ecbb89..5d84bad7d30c 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -38,7 +38,7 @@
 #define vfio_iowrite8	iowrite8
 
 #define VFIO_IOWRITE(size) \
-static int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev,		\
+int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev,		\
 			bool test_mem, u##size val, void __iomem *io)	\
 {									\
 	if (test_mem) {							\
@@ -55,7 +55,8 @@ static int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev,		\
 		up_read(&vdev->memory_lock);				\
 									\
 	return 0;							\
-}
+}									\
+EXPORT_SYMBOL(vfio_pci_iowrite##size);
 
 VFIO_IOWRITE(8)
 VFIO_IOWRITE(16)
@@ -65,7 +66,7 @@ VFIO_IOWRITE(64)
 #endif
 
 #define VFIO_IOREAD(size) \
-static int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev,		\
+int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev,		\
 			bool test_mem, u##size *val, void __iomem *io)	\
 {									\
 	if (test_mem) {							\
@@ -82,7 +83,8 @@ static int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev,		\
 		up_read(&vdev->memory_lock);				\
 									\
 	return 0;							\
-}
+}									\
+EXPORT_SYMBOL(vfio_pci_ioread##size);
 
 VFIO_IOREAD(8)
 VFIO_IOREAD(16)
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 67ac58e20e1d..22c915317788 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -131,4 +131,23 @@ int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar);
 pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev,
 						pci_channel_state_t state);
 
+#define VFIO_IOWRITE_DECLATION(size) \
+int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev,		\
+			bool test_mem, u##size val, void __iomem *io);
+
+VFIO_IOWRITE_DECLATION(8)
+VFIO_IOWRITE_DECLATION(16)
+VFIO_IOWRITE_DECLATION(32)
+#ifdef iowrite64
+VFIO_IOWRITE_DECLATION(64)
+#endif
+
+#define VFIO_IOREAD_DECLATION(size) \
+int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev,		\
+			bool test_mem, u##size *val, void __iomem *io);
+
+VFIO_IOREAD_DECLATION(8)
+VFIO_IOREAD_DECLATION(16)
+VFIO_IOREAD_DECLATION(32)
+
 #endif /* VFIO_PCI_CORE_H */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 321+ messages in thread

* [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-09-21 12:40 ` Yishai Hadas
@ 2023-09-21 12:40   ` Yishai Hadas
  -1 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-09-21 12:40 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, maorg, virtualization, jiri, leonro

Expose admin commands over the virtio device, to be used by the
vfio-virtio driver in the next patches.

It includes: list query/use, legacy write/read, read notify_info.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/vfio/pci/virtio/cmd.c | 146 ++++++++++++++++++++++++++++++++++
 drivers/vfio/pci/virtio/cmd.h |  27 +++++++
 2 files changed, 173 insertions(+)
 create mode 100644 drivers/vfio/pci/virtio/cmd.c
 create mode 100644 drivers/vfio/pci/virtio/cmd.h

diff --git a/drivers/vfio/pci/virtio/cmd.c b/drivers/vfio/pci/virtio/cmd.c
new file mode 100644
index 000000000000..f068239cdbb0
--- /dev/null
+++ b/drivers/vfio/pci/virtio/cmd.c
@@ -0,0 +1,146 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/*
+ * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
+ */
+
+#include "cmd.h"
+
+int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
+{
+	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
+	struct scatterlist out_sg;
+	struct virtio_admin_cmd cmd = {};
+
+	if (!virtio_dev)
+		return -ENOTCONN;
+
+	sg_init_one(&out_sg, buf, buf_size);
+	cmd.opcode = VIRTIO_ADMIN_CMD_LIST_QUERY;
+	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
+	cmd.result_sg = &out_sg;
+
+	return virtio_admin_cmd_exec(virtio_dev, &cmd);
+}
+
+int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
+{
+	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
+	struct scatterlist in_sg;
+	struct virtio_admin_cmd cmd = {};
+
+	if (!virtio_dev)
+		return -ENOTCONN;
+
+	sg_init_one(&in_sg, buf, buf_size);
+	cmd.opcode = VIRTIO_ADMIN_CMD_LIST_USE;
+	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
+	cmd.data_sg = &in_sg;
+
+	return virtio_admin_cmd_exec(virtio_dev, &cmd);
+}
+
+int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
+			  u8 offset, u8 size, u8 *buf)
+{
+	struct virtio_device *virtio_dev =
+		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
+	struct virtio_admin_cmd_data_lr_write *in;
+	struct scatterlist in_sg;
+	struct virtio_admin_cmd cmd = {};
+	int ret;
+
+	if (!virtio_dev)
+		return -ENOTCONN;
+
+	in = kzalloc(sizeof(*in) + size, GFP_KERNEL);
+	if (!in)
+		return -ENOMEM;
+
+	in->offset = offset;
+	memcpy(in->registers, buf, size);
+	sg_init_one(&in_sg, in, sizeof(*in) + size);
+	cmd.opcode = opcode;
+	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
+	cmd.group_member_id = virtvdev->vf_id + 1;
+	cmd.data_sg = &in_sg;
+	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
+
+	kfree(in);
+	return ret;
+}
+
+int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
+			 u8 offset, u8 size, u8 *buf)
+{
+	struct virtio_device *virtio_dev =
+		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
+	struct virtio_admin_cmd_data_lr_read *in;
+	struct scatterlist in_sg, out_sg;
+	struct virtio_admin_cmd cmd = {};
+	int ret;
+
+	if (!virtio_dev)
+		return -ENOTCONN;
+
+	in = kzalloc(sizeof(*in), GFP_KERNEL);
+	if (!in)
+		return -ENOMEM;
+
+	in->offset = offset;
+	sg_init_one(&in_sg, in, sizeof(*in));
+	sg_init_one(&out_sg, buf, size);
+	cmd.opcode = opcode;
+	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
+	cmd.data_sg = &in_sg;
+	cmd.result_sg = &out_sg;
+	cmd.group_member_id = virtvdev->vf_id + 1;
+	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
+
+	kfree(in);
+	return ret;
+}
+
+int virtiovf_cmd_lq_read_notify(struct virtiovf_pci_core_device *virtvdev,
+				u8 req_bar_flags, u8 *bar, u64 *bar_offset)
+{
+	struct virtio_device *virtio_dev =
+		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
+	struct virtio_admin_cmd_notify_info_result *out;
+	struct scatterlist out_sg;
+	struct virtio_admin_cmd cmd = {};
+	int ret;
+
+	if (!virtio_dev)
+		return -ENOTCONN;
+
+	out = kzalloc(sizeof(*out), GFP_KERNEL);
+	if (!out)
+		return -ENOMEM;
+
+	sg_init_one(&out_sg, out, sizeof(*out));
+	cmd.opcode = VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO;
+	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
+	cmd.result_sg = &out_sg;
+	cmd.group_member_id = virtvdev->vf_id + 1;
+	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
+	if (!ret) {
+		struct virtio_admin_cmd_notify_info_data *entry;
+		int i;
+
+		ret = -ENOENT;
+		for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
+			entry = &out->entries[i];
+			if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
+				break;
+			if (entry->flags != req_bar_flags)
+				continue;
+			*bar = entry->bar;
+			*bar_offset = le64_to_cpu(entry->offset);
+			ret = 0;
+			break;
+		}
+	}
+
+	kfree(out);
+	return ret;
+}
diff --git a/drivers/vfio/pci/virtio/cmd.h b/drivers/vfio/pci/virtio/cmd.h
new file mode 100644
index 000000000000..c2a3645f4b90
--- /dev/null
+++ b/drivers/vfio/pci/virtio/cmd.h
@@ -0,0 +1,27 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/*
+ * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ */
+
+#ifndef VIRTIO_VFIO_CMD_H
+#define VIRTIO_VFIO_CMD_H
+
+#include <linux/kernel.h>
+#include <linux/virtio.h>
+#include <linux/vfio_pci_core.h>
+#include <linux/virtio_pci.h>
+
+struct virtiovf_pci_core_device {
+	struct vfio_pci_core_device core_device;
+	int vf_id;
+};
+
+int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
+int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size);
+int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
+			  u8 offset, u8 size, u8 *buf);
+int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
+			 u8 offset, u8 size, u8 *buf);
+int virtiovf_cmd_lq_read_notify(struct virtiovf_pci_core_device *virtvdev,
+				u8 req_bar_flags, u8 *bar, u64 *bar_offset);
+#endif /* VIRTIO_VFIO_CMD_H */
-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 321+ messages in thread

* [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-09-21 12:40   ` Yishai Hadas
  0 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas @ 2023-09-21 12:40 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, leonro, yishaih, maorg

Expose admin commands over the virtio device, to be used by the
vfio-virtio driver in the next patches.

It includes: list query/use, legacy write/read, read notify_info.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/vfio/pci/virtio/cmd.c | 146 ++++++++++++++++++++++++++++++++++
 drivers/vfio/pci/virtio/cmd.h |  27 +++++++
 2 files changed, 173 insertions(+)
 create mode 100644 drivers/vfio/pci/virtio/cmd.c
 create mode 100644 drivers/vfio/pci/virtio/cmd.h

diff --git a/drivers/vfio/pci/virtio/cmd.c b/drivers/vfio/pci/virtio/cmd.c
new file mode 100644
index 000000000000..f068239cdbb0
--- /dev/null
+++ b/drivers/vfio/pci/virtio/cmd.c
@@ -0,0 +1,146 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/*
+ * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
+ */
+
+#include "cmd.h"
+
+int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
+{
+	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
+	struct scatterlist out_sg;
+	struct virtio_admin_cmd cmd = {};
+
+	if (!virtio_dev)
+		return -ENOTCONN;
+
+	sg_init_one(&out_sg, buf, buf_size);
+	cmd.opcode = VIRTIO_ADMIN_CMD_LIST_QUERY;
+	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
+	cmd.result_sg = &out_sg;
+
+	return virtio_admin_cmd_exec(virtio_dev, &cmd);
+}
+
+int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
+{
+	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
+	struct scatterlist in_sg;
+	struct virtio_admin_cmd cmd = {};
+
+	if (!virtio_dev)
+		return -ENOTCONN;
+
+	sg_init_one(&in_sg, buf, buf_size);
+	cmd.opcode = VIRTIO_ADMIN_CMD_LIST_USE;
+	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
+	cmd.data_sg = &in_sg;
+
+	return virtio_admin_cmd_exec(virtio_dev, &cmd);
+}
+
+int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
+			  u8 offset, u8 size, u8 *buf)
+{
+	struct virtio_device *virtio_dev =
+		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
+	struct virtio_admin_cmd_data_lr_write *in;
+	struct scatterlist in_sg;
+	struct virtio_admin_cmd cmd = {};
+	int ret;
+
+	if (!virtio_dev)
+		return -ENOTCONN;
+
+	in = kzalloc(sizeof(*in) + size, GFP_KERNEL);
+	if (!in)
+		return -ENOMEM;
+
+	in->offset = offset;
+	memcpy(in->registers, buf, size);
+	sg_init_one(&in_sg, in, sizeof(*in) + size);
+	cmd.opcode = opcode;
+	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
+	cmd.group_member_id = virtvdev->vf_id + 1;
+	cmd.data_sg = &in_sg;
+	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
+
+	kfree(in);
+	return ret;
+}
+
+int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
+			 u8 offset, u8 size, u8 *buf)
+{
+	struct virtio_device *virtio_dev =
+		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
+	struct virtio_admin_cmd_data_lr_read *in;
+	struct scatterlist in_sg, out_sg;
+	struct virtio_admin_cmd cmd = {};
+	int ret;
+
+	if (!virtio_dev)
+		return -ENOTCONN;
+
+	in = kzalloc(sizeof(*in), GFP_KERNEL);
+	if (!in)
+		return -ENOMEM;
+
+	in->offset = offset;
+	sg_init_one(&in_sg, in, sizeof(*in));
+	sg_init_one(&out_sg, buf, size);
+	cmd.opcode = opcode;
+	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
+	cmd.data_sg = &in_sg;
+	cmd.result_sg = &out_sg;
+	cmd.group_member_id = virtvdev->vf_id + 1;
+	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
+
+	kfree(in);
+	return ret;
+}
+
+int virtiovf_cmd_lq_read_notify(struct virtiovf_pci_core_device *virtvdev,
+				u8 req_bar_flags, u8 *bar, u64 *bar_offset)
+{
+	struct virtio_device *virtio_dev =
+		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
+	struct virtio_admin_cmd_notify_info_result *out;
+	struct scatterlist out_sg;
+	struct virtio_admin_cmd cmd = {};
+	int ret;
+
+	if (!virtio_dev)
+		return -ENOTCONN;
+
+	out = kzalloc(sizeof(*out), GFP_KERNEL);
+	if (!out)
+		return -ENOMEM;
+
+	sg_init_one(&out_sg, out, sizeof(*out));
+	cmd.opcode = VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO;
+	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
+	cmd.result_sg = &out_sg;
+	cmd.group_member_id = virtvdev->vf_id + 1;
+	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
+	if (!ret) {
+		struct virtio_admin_cmd_notify_info_data *entry;
+		int i;
+
+		ret = -ENOENT;
+		for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
+			entry = &out->entries[i];
+			if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
+				break;
+			if (entry->flags != req_bar_flags)
+				continue;
+			*bar = entry->bar;
+			*bar_offset = le64_to_cpu(entry->offset);
+			ret = 0;
+			break;
+		}
+	}
+
+	kfree(out);
+	return ret;
+}
diff --git a/drivers/vfio/pci/virtio/cmd.h b/drivers/vfio/pci/virtio/cmd.h
new file mode 100644
index 000000000000..c2a3645f4b90
--- /dev/null
+++ b/drivers/vfio/pci/virtio/cmd.h
@@ -0,0 +1,27 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/*
+ * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ */
+
+#ifndef VIRTIO_VFIO_CMD_H
+#define VIRTIO_VFIO_CMD_H
+
+#include <linux/kernel.h>
+#include <linux/virtio.h>
+#include <linux/vfio_pci_core.h>
+#include <linux/virtio_pci.h>
+
+struct virtiovf_pci_core_device {
+	struct vfio_pci_core_device core_device;
+	int vf_id;
+};
+
+int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
+int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size);
+int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
+			  u8 offset, u8 size, u8 *buf);
+int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
+			 u8 offset, u8 size, u8 *buf);
+int virtiovf_cmd_lq_read_notify(struct virtiovf_pci_core_device *virtvdev,
+				u8 req_bar_flags, u8 *bar, u64 *bar_offset);
+#endif /* VIRTIO_VFIO_CMD_H */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 321+ messages in thread

* [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 12:40 ` Yishai Hadas
@ 2023-09-21 12:40   ` Yishai Hadas
  -1 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-09-21 12:40 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, maorg, virtualization, jiri, leonro

Introduce a vfio driver over virtio devices to support the legacy
interface functionality for VFs.

Background, from the virtio spec [1].
--------------------------------------------------------------------
In some systems, there is a need to support a virtio legacy driver with
a device that does not directly support the legacy interface. In such
scenarios, a group owner device can provide the legacy interface
functionality for the group member devices. The driver of the owner
device can then access the legacy interface of a member device on behalf
of the legacy member device driver.

For example, with the SR-IOV group type, group members (VFs) can not
present the legacy interface in an I/O BAR in BAR0 as expected by the
legacy pci driver. If the legacy driver is running inside a virtual
machine, the hypervisor executing the virtual machine can present a
virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
legacy driver accesses to this I/O BAR and forwards them to the group
owner device (PF) using group administration commands.
--------------------------------------------------------------------

Specifically, this driver adds support for a virtio-net VF to be exposed
as a transitional device to a guest driver and allows the legacy IO BAR
functionality on top.

This allows a VM which uses a legacy virtio-net driver in the guest to
work transparently over a VF which its driver in the host is that new
driver.

The driver can be extended easily to support some other types of virtio
devices (e.g virtio-blk), by adding in a few places the specific type
properties as was done for virtio-net.

For now, only the virtio-net use case was tested and as such we introduce
the support only for such a device.

Practically,
Upon probing a VF for a virtio-net device, in case its PF supports
legacy access over the virtio admin commands and the VF doesn't have BAR
0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
transitional device with I/O BAR in BAR 0.

The existence of the simulated I/O bar is reported later on by
overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
exposes itself as a transitional device by overwriting some properties
upon reading its config space.

Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
guest may use it via read/write calls according to the virtio
specification.

Any read/write towards the control parts of the BAR will be captured by
the new driver and will be translated into admin commands towards the
device.

Any data path read/write access (i.e. virtio driver notifications) will
be forwarded to the physical BAR which its properties were supplied by
the command VIRTIO_PCI_QUEUE_NOTIFY upon the probing/init flow.

With that code in place a legacy driver in the guest has the look and
feel as if having a transitional device with legacy support for both its
control and data path flows.

[1]
https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 MAINTAINERS                      |   6 +
 drivers/vfio/pci/Kconfig         |   2 +
 drivers/vfio/pci/Makefile        |   2 +
 drivers/vfio/pci/virtio/Kconfig  |  15 +
 drivers/vfio/pci/virtio/Makefile |   4 +
 drivers/vfio/pci/virtio/cmd.c    |   4 +-
 drivers/vfio/pci/virtio/cmd.h    |   8 +
 drivers/vfio/pci/virtio/main.c   | 546 +++++++++++++++++++++++++++++++
 8 files changed, 585 insertions(+), 2 deletions(-)
 create mode 100644 drivers/vfio/pci/virtio/Kconfig
 create mode 100644 drivers/vfio/pci/virtio/Makefile
 create mode 100644 drivers/vfio/pci/virtio/main.c

diff --git a/MAINTAINERS b/MAINTAINERS
index bf0f54c24f81..5098418c8389 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -22624,6 +22624,12 @@ L:	kvm@vger.kernel.org
 S:	Maintained
 F:	drivers/vfio/pci/mlx5/
 
+VFIO VIRTIO PCI DRIVER
+M:	Yishai Hadas <yishaih@nvidia.com>
+L:	kvm@vger.kernel.org
+S:	Maintained
+F:	drivers/vfio/pci/virtio
+
 VFIO PCI DEVICE SPECIFIC DRIVERS
 R:	Jason Gunthorpe <jgg@nvidia.com>
 R:	Yishai Hadas <yishaih@nvidia.com>
diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
index 8125e5f37832..18c397df566d 100644
--- a/drivers/vfio/pci/Kconfig
+++ b/drivers/vfio/pci/Kconfig
@@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
 
 source "drivers/vfio/pci/pds/Kconfig"
 
+source "drivers/vfio/pci/virtio/Kconfig"
+
 endmenu
diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
index 45167be462d8..046139a4eca5 100644
--- a/drivers/vfio/pci/Makefile
+++ b/drivers/vfio/pci/Makefile
@@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
 obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
 
 obj-$(CONFIG_PDS_VFIO_PCI) += pds/
+
+obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
new file mode 100644
index 000000000000..89eddce8b1bd
--- /dev/null
+++ b/drivers/vfio/pci/virtio/Kconfig
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: GPL-2.0-only
+config VIRTIO_VFIO_PCI
+        tristate "VFIO support for VIRTIO PCI devices"
+        depends on VIRTIO_PCI
+        select VFIO_PCI_CORE
+        help
+          This provides support for exposing VIRTIO VF devices using the VFIO
+          framework that can work with a legacy virtio driver in the guest.
+          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
+          not indicate I/O Space.
+          As of that this driver emulated I/O BAR in software to let a VF be
+          seen as a transitional device in the guest and let it work with
+          a legacy driver.
+
+          If you don't know what to do here, say N.
diff --git a/drivers/vfio/pci/virtio/Makefile b/drivers/vfio/pci/virtio/Makefile
new file mode 100644
index 000000000000..584372648a03
--- /dev/null
+++ b/drivers/vfio/pci/virtio/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0-only
+obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio-vfio-pci.o
+virtio-vfio-pci-y := main.o cmd.o
+
diff --git a/drivers/vfio/pci/virtio/cmd.c b/drivers/vfio/pci/virtio/cmd.c
index f068239cdbb0..aea9d25fbf1d 100644
--- a/drivers/vfio/pci/virtio/cmd.c
+++ b/drivers/vfio/pci/virtio/cmd.c
@@ -44,7 +44,7 @@ int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
 {
 	struct virtio_device *virtio_dev =
 		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
-	struct virtio_admin_cmd_data_lr_write *in;
+	struct virtio_admin_cmd_legacy_wr_data *in;
 	struct scatterlist in_sg;
 	struct virtio_admin_cmd cmd = {};
 	int ret;
@@ -74,7 +74,7 @@ int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
 {
 	struct virtio_device *virtio_dev =
 		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
-	struct virtio_admin_cmd_data_lr_read *in;
+	struct virtio_admin_cmd_legacy_rd_data *in;
 	struct scatterlist in_sg, out_sg;
 	struct virtio_admin_cmd cmd = {};
 	int ret;
diff --git a/drivers/vfio/pci/virtio/cmd.h b/drivers/vfio/pci/virtio/cmd.h
index c2a3645f4b90..347b1dc85570 100644
--- a/drivers/vfio/pci/virtio/cmd.h
+++ b/drivers/vfio/pci/virtio/cmd.h
@@ -13,7 +13,15 @@
 
 struct virtiovf_pci_core_device {
 	struct vfio_pci_core_device core_device;
+	u8 bar0_virtual_buf_size;
+	u8 *bar0_virtual_buf;
+	/* synchronize access to the virtual buf */
+	struct mutex bar_mutex;
 	int vf_id;
+	void __iomem *notify_addr;
+	u32 notify_offset;
+	u8 notify_bar;
+	u8 pci_cmd_io :1;
 };
 
 int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
diff --git a/drivers/vfio/pci/virtio/main.c b/drivers/vfio/pci/virtio/main.c
new file mode 100644
index 000000000000..2486991c49f3
--- /dev/null
+++ b/drivers/vfio/pci/virtio/main.c
@@ -0,0 +1,546 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
+ */
+
+#include <linux/device.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/pci.h>
+#include <linux/pm_runtime.h>
+#include <linux/types.h>
+#include <linux/uaccess.h>
+#include <linux/vfio.h>
+#include <linux/vfio_pci_core.h>
+#include <linux/virtio_pci.h>
+#include <linux/virtio_net.h>
+#include <linux/virtio_pci_modern.h>
+
+#include "cmd.h"
+
+#define VIRTIO_LEGACY_IO_BAR_HEADER_LEN 20
+#define VIRTIO_LEGACY_IO_BAR_MSIX_HEADER_LEN 4
+
+static int virtiovf_issue_lr_cmd(struct virtiovf_pci_core_device *virtvdev,
+				 loff_t pos, char __user *buf,
+				 size_t count, bool read)
+{
+	u8 *bar0_buf = virtvdev->bar0_virtual_buf;
+	u16 opcode;
+	int ret;
+
+	mutex_lock(&virtvdev->bar_mutex);
+	if (read) {
+		opcode = (pos < VIRTIO_PCI_CONFIG_OFF(true)) ?
+			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ :
+			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ;
+		ret = virtiovf_cmd_lr_read(virtvdev, opcode, pos,
+					   count, bar0_buf + pos);
+		if (ret)
+			goto out;
+		if (copy_to_user(buf, bar0_buf + pos, count))
+			ret = -EFAULT;
+		goto out;
+	}
+
+	if (copy_from_user(bar0_buf + pos, buf, count)) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	opcode = (pos < VIRTIO_PCI_CONFIG_OFF(true)) ?
+			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE :
+			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE;
+	ret = virtiovf_cmd_lr_write(virtvdev, opcode, pos, count,
+				    bar0_buf + pos);
+out:
+	mutex_unlock(&virtvdev->bar_mutex);
+	return ret;
+}
+
+static int
+translate_io_bar_to_mem_bar(struct virtiovf_pci_core_device *virtvdev,
+			    loff_t pos, char __user *buf,
+			    size_t count, bool read)
+{
+	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
+	u16 queue_notify;
+	int ret;
+
+	if (pos + count > virtvdev->bar0_virtual_buf_size)
+		return -EINVAL;
+
+	switch (pos) {
+	case VIRTIO_PCI_QUEUE_NOTIFY:
+		if (count != sizeof(queue_notify))
+			return -EINVAL;
+		if (read) {
+			ret = vfio_pci_ioread16(core_device, true, &queue_notify,
+						virtvdev->notify_addr);
+			if (ret)
+				return ret;
+			if (copy_to_user(buf, &queue_notify,
+					 sizeof(queue_notify)))
+				return -EFAULT;
+			break;
+		}
+
+		if (copy_from_user(&queue_notify, buf, count))
+			return -EFAULT;
+
+		ret = vfio_pci_iowrite16(core_device, true, queue_notify,
+					 virtvdev->notify_addr);
+		break;
+	default:
+		ret = virtiovf_issue_lr_cmd(virtvdev, pos, buf, count, read);
+	}
+
+	return ret ? ret : count;
+}
+
+static bool range_contains_range(loff_t range1_start, size_t count1,
+				 loff_t range2_start, size_t count2,
+				 loff_t *start_offset)
+{
+	if (range1_start <= range2_start &&
+	    range1_start + count1 >= range2_start + count2) {
+		*start_offset = range2_start - range1_start;
+		return true;
+	}
+	return false;
+}
+
+static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
+					char __user *buf, size_t count,
+					loff_t *ppos)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
+	loff_t copy_offset;
+	__le32 val32;
+	__le16 val16;
+	u8 val8;
+	int ret;
+
+	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
+	if (ret < 0)
+		return ret;
+
+	if (range_contains_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
+				 &copy_offset)) {
+		val16 = cpu_to_le16(0x1000);
+		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
+			return -EFAULT;
+	}
+
+	if (virtvdev->pci_cmd_io &&
+	    range_contains_range(pos, count, PCI_COMMAND, sizeof(val16),
+				 &copy_offset)) {
+		if (copy_from_user(&val16, buf, sizeof(val16)))
+			return -EFAULT;
+		val16 |= cpu_to_le16(PCI_COMMAND_IO);
+		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
+			return -EFAULT;
+	}
+
+	if (range_contains_range(pos, count, PCI_REVISION_ID, sizeof(val8),
+				 &copy_offset)) {
+		/* Transional needs to have revision 0 */
+		val8 = 0;
+		if (copy_to_user(buf + copy_offset, &val8, sizeof(val8)))
+			return -EFAULT;
+	}
+
+	if (range_contains_range(pos, count, PCI_BASE_ADDRESS_0, sizeof(val32),
+				 &copy_offset)) {
+		val32 = cpu_to_le32(PCI_BASE_ADDRESS_SPACE_IO);
+		if (copy_to_user(buf + copy_offset, &val32, sizeof(val32)))
+			return -EFAULT;
+	}
+
+	if (range_contains_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16),
+				 &copy_offset)) {
+		/* Transitional devices use the PCI subsystem device id as
+		 * virtio device id, same as legacy driver always did.
+		 */
+		val16 = cpu_to_le16(VIRTIO_ID_NET);
+		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
+			return -EFAULT;
+	}
+
+	return count;
+}
+
+static ssize_t
+virtiovf_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
+		       size_t count, loff_t *ppos)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	struct pci_dev *pdev = virtvdev->core_device.pdev;
+	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
+	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
+	int ret;
+
+	if (!count)
+		return 0;
+
+	if (index == VFIO_PCI_CONFIG_REGION_INDEX)
+		return virtiovf_pci_read_config(core_vdev, buf, count, ppos);
+
+	if (index != VFIO_PCI_BAR0_REGION_INDEX)
+		return vfio_pci_core_read(core_vdev, buf, count, ppos);
+
+	ret = pm_runtime_resume_and_get(&pdev->dev);
+	if (ret) {
+		pci_info_ratelimited(pdev, "runtime resume failed %d\n",
+				     ret);
+		return -EIO;
+	}
+
+	ret = translate_io_bar_to_mem_bar(virtvdev, pos, buf, count, true);
+	pm_runtime_put(&pdev->dev);
+	return ret;
+}
+
+static ssize_t
+virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
+			size_t count, loff_t *ppos)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	struct pci_dev *pdev = virtvdev->core_device.pdev;
+	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
+	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
+	int ret;
+
+	if (!count)
+		return 0;
+
+	if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
+		loff_t copy_offset;
+		u16 cmd;
+
+		if (range_contains_range(pos, count, PCI_COMMAND, sizeof(cmd),
+					 &copy_offset)) {
+			if (copy_from_user(&cmd, buf + copy_offset, sizeof(cmd)))
+				return -EFAULT;
+			virtvdev->pci_cmd_io = (cmd & PCI_COMMAND_IO);
+		}
+	}
+
+	if (index != VFIO_PCI_BAR0_REGION_INDEX)
+		return vfio_pci_core_write(core_vdev, buf, count, ppos);
+
+	ret = pm_runtime_resume_and_get(&pdev->dev);
+	if (ret) {
+		pci_info_ratelimited(pdev, "runtime resume failed %d\n", ret);
+		return -EIO;
+	}
+
+	ret = translate_io_bar_to_mem_bar(virtvdev, pos, (char __user *)buf, count, false);
+	pm_runtime_put(&pdev->dev);
+	return ret;
+}
+
+static int
+virtiovf_pci_ioctl_get_region_info(struct vfio_device *core_vdev,
+				   unsigned int cmd, unsigned long arg)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	unsigned long minsz = offsetofend(struct vfio_region_info, offset);
+	void __user *uarg = (void __user *)arg;
+	struct vfio_region_info info = {};
+
+	if (copy_from_user(&info, uarg, minsz))
+		return -EFAULT;
+
+	if (info.argsz < minsz)
+		return -EINVAL;
+
+	switch (info.index) {
+	case VFIO_PCI_BAR0_REGION_INDEX:
+		info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
+		info.size = virtvdev->bar0_virtual_buf_size;
+		info.flags = VFIO_REGION_INFO_FLAG_READ |
+			     VFIO_REGION_INFO_FLAG_WRITE;
+		return copy_to_user(uarg, &info, minsz) ? -EFAULT : 0;
+	default:
+		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
+	}
+}
+
+static long
+virtiovf_vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
+			     unsigned long arg)
+{
+	switch (cmd) {
+	case VFIO_DEVICE_GET_REGION_INFO:
+		return virtiovf_pci_ioctl_get_region_info(core_vdev, cmd, arg);
+	default:
+		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
+	}
+}
+
+static int
+virtiovf_set_notify_addr(struct virtiovf_pci_core_device *virtvdev)
+{
+	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
+	int ret;
+
+	/* Setup the BAR where the 'notify' exists to be used by vfio as well
+	 * This will let us mmap it only once and use it when needed.
+	 */
+	ret = vfio_pci_core_setup_barmap(core_device,
+					 virtvdev->notify_bar);
+	if (ret)
+		return ret;
+
+	virtvdev->notify_addr = core_device->barmap[virtvdev->notify_bar] +
+			virtvdev->notify_offset;
+	return 0;
+}
+
+static int virtiovf_pci_open_device(struct vfio_device *core_vdev)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	struct vfio_pci_core_device *vdev = &virtvdev->core_device;
+	int ret;
+
+	ret = vfio_pci_core_enable(vdev);
+	if (ret)
+		return ret;
+
+	if (virtvdev->bar0_virtual_buf) {
+		/* upon close_device() the vfio_pci_core_disable() is called
+		 * and will close all the previous mmaps, so it seems that the
+		 * valid life cycle for the 'notify' addr is per open/close.
+		 */
+		ret = virtiovf_set_notify_addr(virtvdev);
+		if (ret) {
+			vfio_pci_core_disable(vdev);
+			return ret;
+		}
+	}
+
+	vfio_pci_core_finish_enable(vdev);
+	return 0;
+}
+
+static void virtiovf_pci_close_device(struct vfio_device *core_vdev)
+{
+	vfio_pci_core_close_device(core_vdev);
+}
+
+static int virtiovf_get_device_config_size(unsigned short device)
+{
+	switch (device) {
+	case 0x1041:
+		/* network card */
+		return offsetofend(struct virtio_net_config, status);
+	default:
+		return 0;
+	}
+}
+
+static int virtiovf_read_notify_info(struct virtiovf_pci_core_device *virtvdev)
+{
+	u64 offset;
+	int ret;
+	u8 bar;
+
+	ret = virtiovf_cmd_lq_read_notify(virtvdev,
+				VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM,
+				&bar, &offset);
+	if (ret)
+		return ret;
+
+	virtvdev->notify_bar = bar;
+	virtvdev->notify_offset = offset;
+	return 0;
+}
+
+static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	struct pci_dev *pdev;
+	int ret;
+
+	ret = vfio_pci_core_init_dev(core_vdev);
+	if (ret)
+		return ret;
+
+	pdev = virtvdev->core_device.pdev;
+	virtvdev->vf_id = pci_iov_vf_id(pdev);
+	if (virtvdev->vf_id < 0)
+		return -EINVAL;
+
+	ret = virtiovf_read_notify_info(virtvdev);
+	if (ret)
+		return ret;
+
+	virtvdev->bar0_virtual_buf_size = VIRTIO_LEGACY_IO_BAR_HEADER_LEN +
+		VIRTIO_LEGACY_IO_BAR_MSIX_HEADER_LEN +
+		virtiovf_get_device_config_size(pdev->device);
+	virtvdev->bar0_virtual_buf = kzalloc(virtvdev->bar0_virtual_buf_size,
+					     GFP_KERNEL);
+	if (!virtvdev->bar0_virtual_buf)
+		return -ENOMEM;
+	mutex_init(&virtvdev->bar_mutex);
+	return 0;
+}
+
+static void virtiovf_pci_core_release_dev(struct vfio_device *core_vdev)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+
+	kfree(virtvdev->bar0_virtual_buf);
+	vfio_pci_core_release_dev(core_vdev);
+}
+
+static const struct vfio_device_ops virtiovf_acc_vfio_pci_tran_ops = {
+	.name = "virtio-transitional-vfio-pci",
+	.init = virtiovf_pci_init_device,
+	.release = virtiovf_pci_core_release_dev,
+	.open_device = virtiovf_pci_open_device,
+	.close_device = virtiovf_pci_close_device,
+	.ioctl = virtiovf_vfio_pci_core_ioctl,
+	.read = virtiovf_pci_core_read,
+	.write = virtiovf_pci_core_write,
+	.mmap = vfio_pci_core_mmap,
+	.request = vfio_pci_core_request,
+	.match = vfio_pci_core_match,
+	.bind_iommufd = vfio_iommufd_physical_bind,
+	.unbind_iommufd = vfio_iommufd_physical_unbind,
+	.attach_ioas = vfio_iommufd_physical_attach_ioas,
+};
+
+static const struct vfio_device_ops virtiovf_acc_vfio_pci_ops = {
+	.name = "virtio-acc-vfio-pci",
+	.init = vfio_pci_core_init_dev,
+	.release = vfio_pci_core_release_dev,
+	.open_device = virtiovf_pci_open_device,
+	.close_device = virtiovf_pci_close_device,
+	.ioctl = vfio_pci_core_ioctl,
+	.device_feature = vfio_pci_core_ioctl_feature,
+	.read = vfio_pci_core_read,
+	.write = vfio_pci_core_write,
+	.mmap = vfio_pci_core_mmap,
+	.request = vfio_pci_core_request,
+	.match = vfio_pci_core_match,
+	.bind_iommufd = vfio_iommufd_physical_bind,
+	.unbind_iommufd = vfio_iommufd_physical_unbind,
+	.attach_ioas = vfio_iommufd_physical_attach_ioas,
+};
+
+static bool virtiovf_bar0_exists(struct pci_dev *pdev)
+{
+	struct resource *res = pdev->resource;
+
+	return res->flags ? true : false;
+}
+
+#define VIRTIOVF_USE_ADMIN_CMD_BITMAP \
+	(BIT_ULL(VIRTIO_ADMIN_CMD_LIST_QUERY) | \
+	 BIT_ULL(VIRTIO_ADMIN_CMD_LIST_USE) | \
+	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE) | \
+	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ) | \
+	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE) | \
+	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
+	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
+
+static bool virtiovf_support_legacy_access(struct pci_dev *pdev)
+{
+	int buf_size = DIV_ROUND_UP(VIRTIO_ADMIN_MAX_CMD_OPCODE, 64) * 8;
+	u8 *buf;
+	int ret;
+
+	/* Only virtio-net is supported/tested so far */
+	if (pdev->device != 0x1041)
+		return false;
+
+	buf = kzalloc(buf_size, GFP_KERNEL);
+	if (!buf)
+		return false;
+
+	ret = virtiovf_cmd_list_query(pdev, buf, buf_size);
+	if (ret)
+		goto end;
+
+	if ((le64_to_cpup((__le64 *)buf) & VIRTIOVF_USE_ADMIN_CMD_BITMAP) !=
+		VIRTIOVF_USE_ADMIN_CMD_BITMAP) {
+		ret = -EOPNOTSUPP;
+		goto end;
+	}
+
+	/* confirm the used commands */
+	memset(buf, 0, buf_size);
+	*(__le64 *)buf = cpu_to_le64(VIRTIOVF_USE_ADMIN_CMD_BITMAP);
+	ret = virtiovf_cmd_list_use(pdev, buf, buf_size);
+
+end:
+	kfree(buf);
+	return ret ? false : true;
+}
+
+static int virtiovf_pci_probe(struct pci_dev *pdev,
+			      const struct pci_device_id *id)
+{
+	const struct vfio_device_ops *ops = &virtiovf_acc_vfio_pci_ops;
+	struct virtiovf_pci_core_device *virtvdev;
+	int ret;
+
+	if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
+	    !virtiovf_bar0_exists(pdev) && pdev->msix_cap)
+		ops = &virtiovf_acc_vfio_pci_tran_ops;
+
+	virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
+				     &pdev->dev, ops);
+	if (IS_ERR(virtvdev))
+		return PTR_ERR(virtvdev);
+
+	dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
+	ret = vfio_pci_core_register_device(&virtvdev->core_device);
+	if (ret)
+		goto out;
+	return 0;
+out:
+	vfio_put_device(&virtvdev->core_device.vdev);
+	return ret;
+}
+
+static void virtiovf_pci_remove(struct pci_dev *pdev)
+{
+	struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
+
+	vfio_pci_core_unregister_device(&virtvdev->core_device);
+	vfio_put_device(&virtvdev->core_device.vdev);
+}
+
+static const struct pci_device_id virtiovf_pci_table[] = {
+	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, PCI_ANY_ID) },
+	{}
+};
+
+MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
+
+static struct pci_driver virtiovf_pci_driver = {
+	.name = KBUILD_MODNAME,
+	.id_table = virtiovf_pci_table,
+	.probe = virtiovf_pci_probe,
+	.remove = virtiovf_pci_remove,
+	.err_handler = &vfio_pci_core_err_handlers,
+	.driver_managed_dma = true,
+};
+
+module_pci_driver(virtiovf_pci_driver);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
+MODULE_DESCRIPTION(
+	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO device family");
-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 321+ messages in thread

* [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-21 12:40   ` Yishai Hadas
  0 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas @ 2023-09-21 12:40 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, leonro, yishaih, maorg

Introduce a vfio driver over virtio devices to support the legacy
interface functionality for VFs.

Background, from the virtio spec [1].
--------------------------------------------------------------------
In some systems, there is a need to support a virtio legacy driver with
a device that does not directly support the legacy interface. In such
scenarios, a group owner device can provide the legacy interface
functionality for the group member devices. The driver of the owner
device can then access the legacy interface of a member device on behalf
of the legacy member device driver.

For example, with the SR-IOV group type, group members (VFs) can not
present the legacy interface in an I/O BAR in BAR0 as expected by the
legacy pci driver. If the legacy driver is running inside a virtual
machine, the hypervisor executing the virtual machine can present a
virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
legacy driver accesses to this I/O BAR and forwards them to the group
owner device (PF) using group administration commands.
--------------------------------------------------------------------

Specifically, this driver adds support for a virtio-net VF to be exposed
as a transitional device to a guest driver and allows the legacy IO BAR
functionality on top.

This allows a VM which uses a legacy virtio-net driver in the guest to
work transparently over a VF which its driver in the host is that new
driver.

The driver can be extended easily to support some other types of virtio
devices (e.g virtio-blk), by adding in a few places the specific type
properties as was done for virtio-net.

For now, only the virtio-net use case was tested and as such we introduce
the support only for such a device.

Practically,
Upon probing a VF for a virtio-net device, in case its PF supports
legacy access over the virtio admin commands and the VF doesn't have BAR
0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
transitional device with I/O BAR in BAR 0.

The existence of the simulated I/O bar is reported later on by
overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
exposes itself as a transitional device by overwriting some properties
upon reading its config space.

Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
guest may use it via read/write calls according to the virtio
specification.

Any read/write towards the control parts of the BAR will be captured by
the new driver and will be translated into admin commands towards the
device.

Any data path read/write access (i.e. virtio driver notifications) will
be forwarded to the physical BAR which its properties were supplied by
the command VIRTIO_PCI_QUEUE_NOTIFY upon the probing/init flow.

With that code in place a legacy driver in the guest has the look and
feel as if having a transitional device with legacy support for both its
control and data path flows.

[1]
https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 MAINTAINERS                      |   6 +
 drivers/vfio/pci/Kconfig         |   2 +
 drivers/vfio/pci/Makefile        |   2 +
 drivers/vfio/pci/virtio/Kconfig  |  15 +
 drivers/vfio/pci/virtio/Makefile |   4 +
 drivers/vfio/pci/virtio/cmd.c    |   4 +-
 drivers/vfio/pci/virtio/cmd.h    |   8 +
 drivers/vfio/pci/virtio/main.c   | 546 +++++++++++++++++++++++++++++++
 8 files changed, 585 insertions(+), 2 deletions(-)
 create mode 100644 drivers/vfio/pci/virtio/Kconfig
 create mode 100644 drivers/vfio/pci/virtio/Makefile
 create mode 100644 drivers/vfio/pci/virtio/main.c

diff --git a/MAINTAINERS b/MAINTAINERS
index bf0f54c24f81..5098418c8389 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -22624,6 +22624,12 @@ L:	kvm@vger.kernel.org
 S:	Maintained
 F:	drivers/vfio/pci/mlx5/
 
+VFIO VIRTIO PCI DRIVER
+M:	Yishai Hadas <yishaih@nvidia.com>
+L:	kvm@vger.kernel.org
+S:	Maintained
+F:	drivers/vfio/pci/virtio
+
 VFIO PCI DEVICE SPECIFIC DRIVERS
 R:	Jason Gunthorpe <jgg@nvidia.com>
 R:	Yishai Hadas <yishaih@nvidia.com>
diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
index 8125e5f37832..18c397df566d 100644
--- a/drivers/vfio/pci/Kconfig
+++ b/drivers/vfio/pci/Kconfig
@@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
 
 source "drivers/vfio/pci/pds/Kconfig"
 
+source "drivers/vfio/pci/virtio/Kconfig"
+
 endmenu
diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
index 45167be462d8..046139a4eca5 100644
--- a/drivers/vfio/pci/Makefile
+++ b/drivers/vfio/pci/Makefile
@@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
 obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
 
 obj-$(CONFIG_PDS_VFIO_PCI) += pds/
+
+obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
new file mode 100644
index 000000000000..89eddce8b1bd
--- /dev/null
+++ b/drivers/vfio/pci/virtio/Kconfig
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: GPL-2.0-only
+config VIRTIO_VFIO_PCI
+        tristate "VFIO support for VIRTIO PCI devices"
+        depends on VIRTIO_PCI
+        select VFIO_PCI_CORE
+        help
+          This provides support for exposing VIRTIO VF devices using the VFIO
+          framework that can work with a legacy virtio driver in the guest.
+          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
+          not indicate I/O Space.
+          As of that this driver emulated I/O BAR in software to let a VF be
+          seen as a transitional device in the guest and let it work with
+          a legacy driver.
+
+          If you don't know what to do here, say N.
diff --git a/drivers/vfio/pci/virtio/Makefile b/drivers/vfio/pci/virtio/Makefile
new file mode 100644
index 000000000000..584372648a03
--- /dev/null
+++ b/drivers/vfio/pci/virtio/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0-only
+obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio-vfio-pci.o
+virtio-vfio-pci-y := main.o cmd.o
+
diff --git a/drivers/vfio/pci/virtio/cmd.c b/drivers/vfio/pci/virtio/cmd.c
index f068239cdbb0..aea9d25fbf1d 100644
--- a/drivers/vfio/pci/virtio/cmd.c
+++ b/drivers/vfio/pci/virtio/cmd.c
@@ -44,7 +44,7 @@ int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
 {
 	struct virtio_device *virtio_dev =
 		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
-	struct virtio_admin_cmd_data_lr_write *in;
+	struct virtio_admin_cmd_legacy_wr_data *in;
 	struct scatterlist in_sg;
 	struct virtio_admin_cmd cmd = {};
 	int ret;
@@ -74,7 +74,7 @@ int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
 {
 	struct virtio_device *virtio_dev =
 		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
-	struct virtio_admin_cmd_data_lr_read *in;
+	struct virtio_admin_cmd_legacy_rd_data *in;
 	struct scatterlist in_sg, out_sg;
 	struct virtio_admin_cmd cmd = {};
 	int ret;
diff --git a/drivers/vfio/pci/virtio/cmd.h b/drivers/vfio/pci/virtio/cmd.h
index c2a3645f4b90..347b1dc85570 100644
--- a/drivers/vfio/pci/virtio/cmd.h
+++ b/drivers/vfio/pci/virtio/cmd.h
@@ -13,7 +13,15 @@
 
 struct virtiovf_pci_core_device {
 	struct vfio_pci_core_device core_device;
+	u8 bar0_virtual_buf_size;
+	u8 *bar0_virtual_buf;
+	/* synchronize access to the virtual buf */
+	struct mutex bar_mutex;
 	int vf_id;
+	void __iomem *notify_addr;
+	u32 notify_offset;
+	u8 notify_bar;
+	u8 pci_cmd_io :1;
 };
 
 int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
diff --git a/drivers/vfio/pci/virtio/main.c b/drivers/vfio/pci/virtio/main.c
new file mode 100644
index 000000000000..2486991c49f3
--- /dev/null
+++ b/drivers/vfio/pci/virtio/main.c
@@ -0,0 +1,546 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
+ */
+
+#include <linux/device.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/pci.h>
+#include <linux/pm_runtime.h>
+#include <linux/types.h>
+#include <linux/uaccess.h>
+#include <linux/vfio.h>
+#include <linux/vfio_pci_core.h>
+#include <linux/virtio_pci.h>
+#include <linux/virtio_net.h>
+#include <linux/virtio_pci_modern.h>
+
+#include "cmd.h"
+
+#define VIRTIO_LEGACY_IO_BAR_HEADER_LEN 20
+#define VIRTIO_LEGACY_IO_BAR_MSIX_HEADER_LEN 4
+
+static int virtiovf_issue_lr_cmd(struct virtiovf_pci_core_device *virtvdev,
+				 loff_t pos, char __user *buf,
+				 size_t count, bool read)
+{
+	u8 *bar0_buf = virtvdev->bar0_virtual_buf;
+	u16 opcode;
+	int ret;
+
+	mutex_lock(&virtvdev->bar_mutex);
+	if (read) {
+		opcode = (pos < VIRTIO_PCI_CONFIG_OFF(true)) ?
+			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ :
+			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ;
+		ret = virtiovf_cmd_lr_read(virtvdev, opcode, pos,
+					   count, bar0_buf + pos);
+		if (ret)
+			goto out;
+		if (copy_to_user(buf, bar0_buf + pos, count))
+			ret = -EFAULT;
+		goto out;
+	}
+
+	if (copy_from_user(bar0_buf + pos, buf, count)) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	opcode = (pos < VIRTIO_PCI_CONFIG_OFF(true)) ?
+			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE :
+			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE;
+	ret = virtiovf_cmd_lr_write(virtvdev, opcode, pos, count,
+				    bar0_buf + pos);
+out:
+	mutex_unlock(&virtvdev->bar_mutex);
+	return ret;
+}
+
+static int
+translate_io_bar_to_mem_bar(struct virtiovf_pci_core_device *virtvdev,
+			    loff_t pos, char __user *buf,
+			    size_t count, bool read)
+{
+	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
+	u16 queue_notify;
+	int ret;
+
+	if (pos + count > virtvdev->bar0_virtual_buf_size)
+		return -EINVAL;
+
+	switch (pos) {
+	case VIRTIO_PCI_QUEUE_NOTIFY:
+		if (count != sizeof(queue_notify))
+			return -EINVAL;
+		if (read) {
+			ret = vfio_pci_ioread16(core_device, true, &queue_notify,
+						virtvdev->notify_addr);
+			if (ret)
+				return ret;
+			if (copy_to_user(buf, &queue_notify,
+					 sizeof(queue_notify)))
+				return -EFAULT;
+			break;
+		}
+
+		if (copy_from_user(&queue_notify, buf, count))
+			return -EFAULT;
+
+		ret = vfio_pci_iowrite16(core_device, true, queue_notify,
+					 virtvdev->notify_addr);
+		break;
+	default:
+		ret = virtiovf_issue_lr_cmd(virtvdev, pos, buf, count, read);
+	}
+
+	return ret ? ret : count;
+}
+
+static bool range_contains_range(loff_t range1_start, size_t count1,
+				 loff_t range2_start, size_t count2,
+				 loff_t *start_offset)
+{
+	if (range1_start <= range2_start &&
+	    range1_start + count1 >= range2_start + count2) {
+		*start_offset = range2_start - range1_start;
+		return true;
+	}
+	return false;
+}
+
+static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
+					char __user *buf, size_t count,
+					loff_t *ppos)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
+	loff_t copy_offset;
+	__le32 val32;
+	__le16 val16;
+	u8 val8;
+	int ret;
+
+	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
+	if (ret < 0)
+		return ret;
+
+	if (range_contains_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
+				 &copy_offset)) {
+		val16 = cpu_to_le16(0x1000);
+		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
+			return -EFAULT;
+	}
+
+	if (virtvdev->pci_cmd_io &&
+	    range_contains_range(pos, count, PCI_COMMAND, sizeof(val16),
+				 &copy_offset)) {
+		if (copy_from_user(&val16, buf, sizeof(val16)))
+			return -EFAULT;
+		val16 |= cpu_to_le16(PCI_COMMAND_IO);
+		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
+			return -EFAULT;
+	}
+
+	if (range_contains_range(pos, count, PCI_REVISION_ID, sizeof(val8),
+				 &copy_offset)) {
+		/* Transional needs to have revision 0 */
+		val8 = 0;
+		if (copy_to_user(buf + copy_offset, &val8, sizeof(val8)))
+			return -EFAULT;
+	}
+
+	if (range_contains_range(pos, count, PCI_BASE_ADDRESS_0, sizeof(val32),
+				 &copy_offset)) {
+		val32 = cpu_to_le32(PCI_BASE_ADDRESS_SPACE_IO);
+		if (copy_to_user(buf + copy_offset, &val32, sizeof(val32)))
+			return -EFAULT;
+	}
+
+	if (range_contains_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16),
+				 &copy_offset)) {
+		/* Transitional devices use the PCI subsystem device id as
+		 * virtio device id, same as legacy driver always did.
+		 */
+		val16 = cpu_to_le16(VIRTIO_ID_NET);
+		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
+			return -EFAULT;
+	}
+
+	return count;
+}
+
+static ssize_t
+virtiovf_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
+		       size_t count, loff_t *ppos)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	struct pci_dev *pdev = virtvdev->core_device.pdev;
+	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
+	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
+	int ret;
+
+	if (!count)
+		return 0;
+
+	if (index == VFIO_PCI_CONFIG_REGION_INDEX)
+		return virtiovf_pci_read_config(core_vdev, buf, count, ppos);
+
+	if (index != VFIO_PCI_BAR0_REGION_INDEX)
+		return vfio_pci_core_read(core_vdev, buf, count, ppos);
+
+	ret = pm_runtime_resume_and_get(&pdev->dev);
+	if (ret) {
+		pci_info_ratelimited(pdev, "runtime resume failed %d\n",
+				     ret);
+		return -EIO;
+	}
+
+	ret = translate_io_bar_to_mem_bar(virtvdev, pos, buf, count, true);
+	pm_runtime_put(&pdev->dev);
+	return ret;
+}
+
+static ssize_t
+virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
+			size_t count, loff_t *ppos)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	struct pci_dev *pdev = virtvdev->core_device.pdev;
+	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
+	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
+	int ret;
+
+	if (!count)
+		return 0;
+
+	if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
+		loff_t copy_offset;
+		u16 cmd;
+
+		if (range_contains_range(pos, count, PCI_COMMAND, sizeof(cmd),
+					 &copy_offset)) {
+			if (copy_from_user(&cmd, buf + copy_offset, sizeof(cmd)))
+				return -EFAULT;
+			virtvdev->pci_cmd_io = (cmd & PCI_COMMAND_IO);
+		}
+	}
+
+	if (index != VFIO_PCI_BAR0_REGION_INDEX)
+		return vfio_pci_core_write(core_vdev, buf, count, ppos);
+
+	ret = pm_runtime_resume_and_get(&pdev->dev);
+	if (ret) {
+		pci_info_ratelimited(pdev, "runtime resume failed %d\n", ret);
+		return -EIO;
+	}
+
+	ret = translate_io_bar_to_mem_bar(virtvdev, pos, (char __user *)buf, count, false);
+	pm_runtime_put(&pdev->dev);
+	return ret;
+}
+
+static int
+virtiovf_pci_ioctl_get_region_info(struct vfio_device *core_vdev,
+				   unsigned int cmd, unsigned long arg)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	unsigned long minsz = offsetofend(struct vfio_region_info, offset);
+	void __user *uarg = (void __user *)arg;
+	struct vfio_region_info info = {};
+
+	if (copy_from_user(&info, uarg, minsz))
+		return -EFAULT;
+
+	if (info.argsz < minsz)
+		return -EINVAL;
+
+	switch (info.index) {
+	case VFIO_PCI_BAR0_REGION_INDEX:
+		info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
+		info.size = virtvdev->bar0_virtual_buf_size;
+		info.flags = VFIO_REGION_INFO_FLAG_READ |
+			     VFIO_REGION_INFO_FLAG_WRITE;
+		return copy_to_user(uarg, &info, minsz) ? -EFAULT : 0;
+	default:
+		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
+	}
+}
+
+static long
+virtiovf_vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
+			     unsigned long arg)
+{
+	switch (cmd) {
+	case VFIO_DEVICE_GET_REGION_INFO:
+		return virtiovf_pci_ioctl_get_region_info(core_vdev, cmd, arg);
+	default:
+		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
+	}
+}
+
+static int
+virtiovf_set_notify_addr(struct virtiovf_pci_core_device *virtvdev)
+{
+	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
+	int ret;
+
+	/* Setup the BAR where the 'notify' exists to be used by vfio as well
+	 * This will let us mmap it only once and use it when needed.
+	 */
+	ret = vfio_pci_core_setup_barmap(core_device,
+					 virtvdev->notify_bar);
+	if (ret)
+		return ret;
+
+	virtvdev->notify_addr = core_device->barmap[virtvdev->notify_bar] +
+			virtvdev->notify_offset;
+	return 0;
+}
+
+static int virtiovf_pci_open_device(struct vfio_device *core_vdev)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	struct vfio_pci_core_device *vdev = &virtvdev->core_device;
+	int ret;
+
+	ret = vfio_pci_core_enable(vdev);
+	if (ret)
+		return ret;
+
+	if (virtvdev->bar0_virtual_buf) {
+		/* upon close_device() the vfio_pci_core_disable() is called
+		 * and will close all the previous mmaps, so it seems that the
+		 * valid life cycle for the 'notify' addr is per open/close.
+		 */
+		ret = virtiovf_set_notify_addr(virtvdev);
+		if (ret) {
+			vfio_pci_core_disable(vdev);
+			return ret;
+		}
+	}
+
+	vfio_pci_core_finish_enable(vdev);
+	return 0;
+}
+
+static void virtiovf_pci_close_device(struct vfio_device *core_vdev)
+{
+	vfio_pci_core_close_device(core_vdev);
+}
+
+static int virtiovf_get_device_config_size(unsigned short device)
+{
+	switch (device) {
+	case 0x1041:
+		/* network card */
+		return offsetofend(struct virtio_net_config, status);
+	default:
+		return 0;
+	}
+}
+
+static int virtiovf_read_notify_info(struct virtiovf_pci_core_device *virtvdev)
+{
+	u64 offset;
+	int ret;
+	u8 bar;
+
+	ret = virtiovf_cmd_lq_read_notify(virtvdev,
+				VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM,
+				&bar, &offset);
+	if (ret)
+		return ret;
+
+	virtvdev->notify_bar = bar;
+	virtvdev->notify_offset = offset;
+	return 0;
+}
+
+static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	struct pci_dev *pdev;
+	int ret;
+
+	ret = vfio_pci_core_init_dev(core_vdev);
+	if (ret)
+		return ret;
+
+	pdev = virtvdev->core_device.pdev;
+	virtvdev->vf_id = pci_iov_vf_id(pdev);
+	if (virtvdev->vf_id < 0)
+		return -EINVAL;
+
+	ret = virtiovf_read_notify_info(virtvdev);
+	if (ret)
+		return ret;
+
+	virtvdev->bar0_virtual_buf_size = VIRTIO_LEGACY_IO_BAR_HEADER_LEN +
+		VIRTIO_LEGACY_IO_BAR_MSIX_HEADER_LEN +
+		virtiovf_get_device_config_size(pdev->device);
+	virtvdev->bar0_virtual_buf = kzalloc(virtvdev->bar0_virtual_buf_size,
+					     GFP_KERNEL);
+	if (!virtvdev->bar0_virtual_buf)
+		return -ENOMEM;
+	mutex_init(&virtvdev->bar_mutex);
+	return 0;
+}
+
+static void virtiovf_pci_core_release_dev(struct vfio_device *core_vdev)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+
+	kfree(virtvdev->bar0_virtual_buf);
+	vfio_pci_core_release_dev(core_vdev);
+}
+
+static const struct vfio_device_ops virtiovf_acc_vfio_pci_tran_ops = {
+	.name = "virtio-transitional-vfio-pci",
+	.init = virtiovf_pci_init_device,
+	.release = virtiovf_pci_core_release_dev,
+	.open_device = virtiovf_pci_open_device,
+	.close_device = virtiovf_pci_close_device,
+	.ioctl = virtiovf_vfio_pci_core_ioctl,
+	.read = virtiovf_pci_core_read,
+	.write = virtiovf_pci_core_write,
+	.mmap = vfio_pci_core_mmap,
+	.request = vfio_pci_core_request,
+	.match = vfio_pci_core_match,
+	.bind_iommufd = vfio_iommufd_physical_bind,
+	.unbind_iommufd = vfio_iommufd_physical_unbind,
+	.attach_ioas = vfio_iommufd_physical_attach_ioas,
+};
+
+static const struct vfio_device_ops virtiovf_acc_vfio_pci_ops = {
+	.name = "virtio-acc-vfio-pci",
+	.init = vfio_pci_core_init_dev,
+	.release = vfio_pci_core_release_dev,
+	.open_device = virtiovf_pci_open_device,
+	.close_device = virtiovf_pci_close_device,
+	.ioctl = vfio_pci_core_ioctl,
+	.device_feature = vfio_pci_core_ioctl_feature,
+	.read = vfio_pci_core_read,
+	.write = vfio_pci_core_write,
+	.mmap = vfio_pci_core_mmap,
+	.request = vfio_pci_core_request,
+	.match = vfio_pci_core_match,
+	.bind_iommufd = vfio_iommufd_physical_bind,
+	.unbind_iommufd = vfio_iommufd_physical_unbind,
+	.attach_ioas = vfio_iommufd_physical_attach_ioas,
+};
+
+static bool virtiovf_bar0_exists(struct pci_dev *pdev)
+{
+	struct resource *res = pdev->resource;
+
+	return res->flags ? true : false;
+}
+
+#define VIRTIOVF_USE_ADMIN_CMD_BITMAP \
+	(BIT_ULL(VIRTIO_ADMIN_CMD_LIST_QUERY) | \
+	 BIT_ULL(VIRTIO_ADMIN_CMD_LIST_USE) | \
+	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE) | \
+	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ) | \
+	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE) | \
+	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
+	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
+
+static bool virtiovf_support_legacy_access(struct pci_dev *pdev)
+{
+	int buf_size = DIV_ROUND_UP(VIRTIO_ADMIN_MAX_CMD_OPCODE, 64) * 8;
+	u8 *buf;
+	int ret;
+
+	/* Only virtio-net is supported/tested so far */
+	if (pdev->device != 0x1041)
+		return false;
+
+	buf = kzalloc(buf_size, GFP_KERNEL);
+	if (!buf)
+		return false;
+
+	ret = virtiovf_cmd_list_query(pdev, buf, buf_size);
+	if (ret)
+		goto end;
+
+	if ((le64_to_cpup((__le64 *)buf) & VIRTIOVF_USE_ADMIN_CMD_BITMAP) !=
+		VIRTIOVF_USE_ADMIN_CMD_BITMAP) {
+		ret = -EOPNOTSUPP;
+		goto end;
+	}
+
+	/* confirm the used commands */
+	memset(buf, 0, buf_size);
+	*(__le64 *)buf = cpu_to_le64(VIRTIOVF_USE_ADMIN_CMD_BITMAP);
+	ret = virtiovf_cmd_list_use(pdev, buf, buf_size);
+
+end:
+	kfree(buf);
+	return ret ? false : true;
+}
+
+static int virtiovf_pci_probe(struct pci_dev *pdev,
+			      const struct pci_device_id *id)
+{
+	const struct vfio_device_ops *ops = &virtiovf_acc_vfio_pci_ops;
+	struct virtiovf_pci_core_device *virtvdev;
+	int ret;
+
+	if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
+	    !virtiovf_bar0_exists(pdev) && pdev->msix_cap)
+		ops = &virtiovf_acc_vfio_pci_tran_ops;
+
+	virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
+				     &pdev->dev, ops);
+	if (IS_ERR(virtvdev))
+		return PTR_ERR(virtvdev);
+
+	dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
+	ret = vfio_pci_core_register_device(&virtvdev->core_device);
+	if (ret)
+		goto out;
+	return 0;
+out:
+	vfio_put_device(&virtvdev->core_device.vdev);
+	return ret;
+}
+
+static void virtiovf_pci_remove(struct pci_dev *pdev)
+{
+	struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
+
+	vfio_pci_core_unregister_device(&virtvdev->core_device);
+	vfio_put_device(&virtvdev->core_device.vdev);
+}
+
+static const struct pci_device_id virtiovf_pci_table[] = {
+	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, PCI_ANY_ID) },
+	{}
+};
+
+MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
+
+static struct pci_driver virtiovf_pci_driver = {
+	.name = KBUILD_MODNAME,
+	.id_table = virtiovf_pci_table,
+	.probe = virtiovf_pci_probe,
+	.remove = virtiovf_pci_remove,
+	.err_handler = &vfio_pci_core_err_handlers,
+	.driver_managed_dma = true,
+};
+
+module_pci_driver(virtiovf_pci_driver);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
+MODULE_DESCRIPTION(
+	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO device family");
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-09-21 12:40   ` Yishai Hadas
@ 2023-09-21 13:08     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 13:08 UTC (permalink / raw)
  To: Yishai Hadas; +Cc: kvm, maorg, virtualization, jgg, jiri, leonro

On Thu, Sep 21, 2023 at 03:40:39PM +0300, Yishai Hadas wrote:
> Expose admin commands over the virtio device, to be used by the
> vfio-virtio driver in the next patches.
> 
> It includes: list query/use, legacy write/read, read notify_info.
> 
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>


I don't get the motivation for this and the next patch.
We already have vdpa that seems to do exactly this:
drive virtio from userspace. Why do we need these extra 1000
lines of code in vfio - just because we can?
Not to talk about user confusion all this will cause.


> ---
>  drivers/vfio/pci/virtio/cmd.c | 146 ++++++++++++++++++++++++++++++++++
>  drivers/vfio/pci/virtio/cmd.h |  27 +++++++
>  2 files changed, 173 insertions(+)
>  create mode 100644 drivers/vfio/pci/virtio/cmd.c
>  create mode 100644 drivers/vfio/pci/virtio/cmd.h
> 
> diff --git a/drivers/vfio/pci/virtio/cmd.c b/drivers/vfio/pci/virtio/cmd.c
> new file mode 100644
> index 000000000000..f068239cdbb0
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/cmd.c
> @@ -0,0 +1,146 @@
> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> +/*
> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> + */
> +
> +#include "cmd.h"
> +
> +int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
> +{
> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> +	struct scatterlist out_sg;
> +	struct virtio_admin_cmd cmd = {};
> +
> +	if (!virtio_dev)
> +		return -ENOTCONN;
> +
> +	sg_init_one(&out_sg, buf, buf_size);
> +	cmd.opcode = VIRTIO_ADMIN_CMD_LIST_QUERY;
> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> +	cmd.result_sg = &out_sg;
> +
> +	return virtio_admin_cmd_exec(virtio_dev, &cmd);
> +}
> +
> +int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
> +{
> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> +	struct scatterlist in_sg;
> +	struct virtio_admin_cmd cmd = {};
> +
> +	if (!virtio_dev)
> +		return -ENOTCONN;
> +
> +	sg_init_one(&in_sg, buf, buf_size);
> +	cmd.opcode = VIRTIO_ADMIN_CMD_LIST_USE;
> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> +	cmd.data_sg = &in_sg;
> +
> +	return virtio_admin_cmd_exec(virtio_dev, &cmd);
> +}
> +
> +int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> +			  u8 offset, u8 size, u8 *buf)
> +{
> +	struct virtio_device *virtio_dev =
> +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> +	struct virtio_admin_cmd_data_lr_write *in;
> +	struct scatterlist in_sg;
> +	struct virtio_admin_cmd cmd = {};
> +	int ret;
> +
> +	if (!virtio_dev)
> +		return -ENOTCONN;
> +
> +	in = kzalloc(sizeof(*in) + size, GFP_KERNEL);
> +	if (!in)
> +		return -ENOMEM;
> +
> +	in->offset = offset;
> +	memcpy(in->registers, buf, size);
> +	sg_init_one(&in_sg, in, sizeof(*in) + size);
> +	cmd.opcode = opcode;
> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> +	cmd.group_member_id = virtvdev->vf_id + 1;
> +	cmd.data_sg = &in_sg;
> +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
> +
> +	kfree(in);
> +	return ret;
> +}
> +
> +int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> +			 u8 offset, u8 size, u8 *buf)
> +{
> +	struct virtio_device *virtio_dev =
> +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> +	struct virtio_admin_cmd_data_lr_read *in;
> +	struct scatterlist in_sg, out_sg;
> +	struct virtio_admin_cmd cmd = {};
> +	int ret;
> +
> +	if (!virtio_dev)
> +		return -ENOTCONN;
> +
> +	in = kzalloc(sizeof(*in), GFP_KERNEL);
> +	if (!in)
> +		return -ENOMEM;
> +
> +	in->offset = offset;
> +	sg_init_one(&in_sg, in, sizeof(*in));
> +	sg_init_one(&out_sg, buf, size);
> +	cmd.opcode = opcode;
> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> +	cmd.data_sg = &in_sg;
> +	cmd.result_sg = &out_sg;
> +	cmd.group_member_id = virtvdev->vf_id + 1;
> +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
> +
> +	kfree(in);
> +	return ret;
> +}
> +
> +int virtiovf_cmd_lq_read_notify(struct virtiovf_pci_core_device *virtvdev,
> +				u8 req_bar_flags, u8 *bar, u64 *bar_offset)
> +{
> +	struct virtio_device *virtio_dev =
> +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> +	struct virtio_admin_cmd_notify_info_result *out;
> +	struct scatterlist out_sg;
> +	struct virtio_admin_cmd cmd = {};
> +	int ret;
> +
> +	if (!virtio_dev)
> +		return -ENOTCONN;
> +
> +	out = kzalloc(sizeof(*out), GFP_KERNEL);
> +	if (!out)
> +		return -ENOMEM;
> +
> +	sg_init_one(&out_sg, out, sizeof(*out));
> +	cmd.opcode = VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO;
> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> +	cmd.result_sg = &out_sg;
> +	cmd.group_member_id = virtvdev->vf_id + 1;
> +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
> +	if (!ret) {
> +		struct virtio_admin_cmd_notify_info_data *entry;
> +		int i;
> +
> +		ret = -ENOENT;
> +		for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
> +			entry = &out->entries[i];
> +			if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
> +				break;
> +			if (entry->flags != req_bar_flags)
> +				continue;
> +			*bar = entry->bar;
> +			*bar_offset = le64_to_cpu(entry->offset);
> +			ret = 0;
> +			break;
> +		}
> +	}
> +
> +	kfree(out);
> +	return ret;
> +}
> diff --git a/drivers/vfio/pci/virtio/cmd.h b/drivers/vfio/pci/virtio/cmd.h
> new file mode 100644
> index 000000000000..c2a3645f4b90
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/cmd.h
> @@ -0,0 +1,27 @@
> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> +/*
> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
> + */
> +
> +#ifndef VIRTIO_VFIO_CMD_H
> +#define VIRTIO_VFIO_CMD_H
> +
> +#include <linux/kernel.h>
> +#include <linux/virtio.h>
> +#include <linux/vfio_pci_core.h>
> +#include <linux/virtio_pci.h>
> +
> +struct virtiovf_pci_core_device {
> +	struct vfio_pci_core_device core_device;
> +	int vf_id;
> +};
> +
> +int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
> +int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size);
> +int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> +			  u8 offset, u8 size, u8 *buf);
> +int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> +			 u8 offset, u8 size, u8 *buf);
> +int virtiovf_cmd_lq_read_notify(struct virtiovf_pci_core_device *virtvdev,
> +				u8 req_bar_flags, u8 *bar, u64 *bar_offset);
> +#endif /* VIRTIO_VFIO_CMD_H */
> -- 
> 2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-09-21 13:08     ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 13:08 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: alex.williamson, jasowang, jgg, kvm, virtualization, parav,
	feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, Sep 21, 2023 at 03:40:39PM +0300, Yishai Hadas wrote:
> Expose admin commands over the virtio device, to be used by the
> vfio-virtio driver in the next patches.
> 
> It includes: list query/use, legacy write/read, read notify_info.
> 
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>


I don't get the motivation for this and the next patch.
We already have vdpa that seems to do exactly this:
drive virtio from userspace. Why do we need these extra 1000
lines of code in vfio - just because we can?
Not to talk about user confusion all this will cause.


> ---
>  drivers/vfio/pci/virtio/cmd.c | 146 ++++++++++++++++++++++++++++++++++
>  drivers/vfio/pci/virtio/cmd.h |  27 +++++++
>  2 files changed, 173 insertions(+)
>  create mode 100644 drivers/vfio/pci/virtio/cmd.c
>  create mode 100644 drivers/vfio/pci/virtio/cmd.h
> 
> diff --git a/drivers/vfio/pci/virtio/cmd.c b/drivers/vfio/pci/virtio/cmd.c
> new file mode 100644
> index 000000000000..f068239cdbb0
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/cmd.c
> @@ -0,0 +1,146 @@
> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> +/*
> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> + */
> +
> +#include "cmd.h"
> +
> +int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
> +{
> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> +	struct scatterlist out_sg;
> +	struct virtio_admin_cmd cmd = {};
> +
> +	if (!virtio_dev)
> +		return -ENOTCONN;
> +
> +	sg_init_one(&out_sg, buf, buf_size);
> +	cmd.opcode = VIRTIO_ADMIN_CMD_LIST_QUERY;
> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> +	cmd.result_sg = &out_sg;
> +
> +	return virtio_admin_cmd_exec(virtio_dev, &cmd);
> +}
> +
> +int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
> +{
> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> +	struct scatterlist in_sg;
> +	struct virtio_admin_cmd cmd = {};
> +
> +	if (!virtio_dev)
> +		return -ENOTCONN;
> +
> +	sg_init_one(&in_sg, buf, buf_size);
> +	cmd.opcode = VIRTIO_ADMIN_CMD_LIST_USE;
> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> +	cmd.data_sg = &in_sg;
> +
> +	return virtio_admin_cmd_exec(virtio_dev, &cmd);
> +}
> +
> +int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> +			  u8 offset, u8 size, u8 *buf)
> +{
> +	struct virtio_device *virtio_dev =
> +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> +	struct virtio_admin_cmd_data_lr_write *in;
> +	struct scatterlist in_sg;
> +	struct virtio_admin_cmd cmd = {};
> +	int ret;
> +
> +	if (!virtio_dev)
> +		return -ENOTCONN;
> +
> +	in = kzalloc(sizeof(*in) + size, GFP_KERNEL);
> +	if (!in)
> +		return -ENOMEM;
> +
> +	in->offset = offset;
> +	memcpy(in->registers, buf, size);
> +	sg_init_one(&in_sg, in, sizeof(*in) + size);
> +	cmd.opcode = opcode;
> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> +	cmd.group_member_id = virtvdev->vf_id + 1;
> +	cmd.data_sg = &in_sg;
> +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
> +
> +	kfree(in);
> +	return ret;
> +}
> +
> +int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> +			 u8 offset, u8 size, u8 *buf)
> +{
> +	struct virtio_device *virtio_dev =
> +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> +	struct virtio_admin_cmd_data_lr_read *in;
> +	struct scatterlist in_sg, out_sg;
> +	struct virtio_admin_cmd cmd = {};
> +	int ret;
> +
> +	if (!virtio_dev)
> +		return -ENOTCONN;
> +
> +	in = kzalloc(sizeof(*in), GFP_KERNEL);
> +	if (!in)
> +		return -ENOMEM;
> +
> +	in->offset = offset;
> +	sg_init_one(&in_sg, in, sizeof(*in));
> +	sg_init_one(&out_sg, buf, size);
> +	cmd.opcode = opcode;
> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> +	cmd.data_sg = &in_sg;
> +	cmd.result_sg = &out_sg;
> +	cmd.group_member_id = virtvdev->vf_id + 1;
> +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
> +
> +	kfree(in);
> +	return ret;
> +}
> +
> +int virtiovf_cmd_lq_read_notify(struct virtiovf_pci_core_device *virtvdev,
> +				u8 req_bar_flags, u8 *bar, u64 *bar_offset)
> +{
> +	struct virtio_device *virtio_dev =
> +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> +	struct virtio_admin_cmd_notify_info_result *out;
> +	struct scatterlist out_sg;
> +	struct virtio_admin_cmd cmd = {};
> +	int ret;
> +
> +	if (!virtio_dev)
> +		return -ENOTCONN;
> +
> +	out = kzalloc(sizeof(*out), GFP_KERNEL);
> +	if (!out)
> +		return -ENOMEM;
> +
> +	sg_init_one(&out_sg, out, sizeof(*out));
> +	cmd.opcode = VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO;
> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> +	cmd.result_sg = &out_sg;
> +	cmd.group_member_id = virtvdev->vf_id + 1;
> +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
> +	if (!ret) {
> +		struct virtio_admin_cmd_notify_info_data *entry;
> +		int i;
> +
> +		ret = -ENOENT;
> +		for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
> +			entry = &out->entries[i];
> +			if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
> +				break;
> +			if (entry->flags != req_bar_flags)
> +				continue;
> +			*bar = entry->bar;
> +			*bar_offset = le64_to_cpu(entry->offset);
> +			ret = 0;
> +			break;
> +		}
> +	}
> +
> +	kfree(out);
> +	return ret;
> +}
> diff --git a/drivers/vfio/pci/virtio/cmd.h b/drivers/vfio/pci/virtio/cmd.h
> new file mode 100644
> index 000000000000..c2a3645f4b90
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/cmd.h
> @@ -0,0 +1,27 @@
> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> +/*
> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
> + */
> +
> +#ifndef VIRTIO_VFIO_CMD_H
> +#define VIRTIO_VFIO_CMD_H
> +
> +#include <linux/kernel.h>
> +#include <linux/virtio.h>
> +#include <linux/vfio_pci_core.h>
> +#include <linux/virtio_pci.h>
> +
> +struct virtiovf_pci_core_device {
> +	struct vfio_pci_core_device core_device;
> +	int vf_id;
> +};
> +
> +int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
> +int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size);
> +int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> +			  u8 offset, u8 size, u8 *buf);
> +int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> +			 u8 offset, u8 size, u8 *buf);
> +int virtiovf_cmd_lq_read_notify(struct virtiovf_pci_core_device *virtvdev,
> +				u8 req_bar_flags, u8 *bar, u64 *bar_offset);
> +#endif /* VIRTIO_VFIO_CMD_H */
> -- 
> 2.27.0


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 12:40   ` Yishai Hadas
@ 2023-09-21 13:16     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 13:16 UTC (permalink / raw)
  To: Yishai Hadas; +Cc: kvm, maorg, virtualization, jgg, jiri, leonro


>  MAINTAINERS                      |   6 +
>  drivers/vfio/pci/Kconfig         |   2 +
>  drivers/vfio/pci/Makefile        |   2 +
>  drivers/vfio/pci/virtio/Kconfig  |  15 +
>  drivers/vfio/pci/virtio/Makefile |   4 +
>  drivers/vfio/pci/virtio/cmd.c    |   4 +-
>  drivers/vfio/pci/virtio/cmd.h    |   8 +
>  drivers/vfio/pci/virtio/main.c   | 546 +++++++++++++++++++++++++++++++
>  8 files changed, 585 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/vfio/pci/virtio/Kconfig
>  create mode 100644 drivers/vfio/pci/virtio/Makefile
>  create mode 100644 drivers/vfio/pci/virtio/main.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index bf0f54c24f81..5098418c8389 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -22624,6 +22624,12 @@ L:	kvm@vger.kernel.org
>  S:	Maintained
>  F:	drivers/vfio/pci/mlx5/
>  
> +VFIO VIRTIO PCI DRIVER
> +M:	Yishai Hadas <yishaih@nvidia.com>
> +L:	kvm@vger.kernel.org
> +S:	Maintained
> +F:	drivers/vfio/pci/virtio
> +
>  VFIO PCI DEVICE SPECIFIC DRIVERS
>  R:	Jason Gunthorpe <jgg@nvidia.com>
>  R:	Yishai Hadas <yishaih@nvidia.com>

Tying two subsystems together like this is going to cause pain when
merging. God forbid there's something e.g. virtio net specific
(and there's going to be for sure) - now we are talking 3 subsystems.

Case in point all other virtio drivers are nicely grouped, have a common
mailing list etc etc.  This one is completely separate to the point
where people won't even remember to copy the virtio mailing list.


diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
new file mode 100644
index 000000000000..89eddce8b1bd
--- /dev/null
+++ b/drivers/vfio/pci/virtio/Kconfig
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: GPL-2.0-only
+config VIRTIO_VFIO_PCI
+        tristate "VFIO support for VIRTIO PCI devices"
+        depends on VIRTIO_PCI
+        select VFIO_PCI_CORE
+        help
+          This provides support for exposing VIRTIO VF devices using the VFIO
+          framework that can work with a legacy virtio driver in the guest.
+          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
+          not indicate I/O Space.
+          As of that this driver emulated I/O BAR in software to let a VF be
+          seen as a transitional device in the guest and let it work with
+          a legacy driver.
+
+          If you don't know what to do here, say N.

I don't promise we'll remember to poke at vfio if we tweak something
in the virtio kconfig.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-21 13:16     ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 13:16 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: alex.williamson, jasowang, jgg, kvm, virtualization, parav,
	feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg


>  MAINTAINERS                      |   6 +
>  drivers/vfio/pci/Kconfig         |   2 +
>  drivers/vfio/pci/Makefile        |   2 +
>  drivers/vfio/pci/virtio/Kconfig  |  15 +
>  drivers/vfio/pci/virtio/Makefile |   4 +
>  drivers/vfio/pci/virtio/cmd.c    |   4 +-
>  drivers/vfio/pci/virtio/cmd.h    |   8 +
>  drivers/vfio/pci/virtio/main.c   | 546 +++++++++++++++++++++++++++++++
>  8 files changed, 585 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/vfio/pci/virtio/Kconfig
>  create mode 100644 drivers/vfio/pci/virtio/Makefile
>  create mode 100644 drivers/vfio/pci/virtio/main.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index bf0f54c24f81..5098418c8389 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -22624,6 +22624,12 @@ L:	kvm@vger.kernel.org
>  S:	Maintained
>  F:	drivers/vfio/pci/mlx5/
>  
> +VFIO VIRTIO PCI DRIVER
> +M:	Yishai Hadas <yishaih@nvidia.com>
> +L:	kvm@vger.kernel.org
> +S:	Maintained
> +F:	drivers/vfio/pci/virtio
> +
>  VFIO PCI DEVICE SPECIFIC DRIVERS
>  R:	Jason Gunthorpe <jgg@nvidia.com>
>  R:	Yishai Hadas <yishaih@nvidia.com>

Tying two subsystems together like this is going to cause pain when
merging. God forbid there's something e.g. virtio net specific
(and there's going to be for sure) - now we are talking 3 subsystems.

Case in point all other virtio drivers are nicely grouped, have a common
mailing list etc etc.  This one is completely separate to the point
where people won't even remember to copy the virtio mailing list.


diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
new file mode 100644
index 000000000000..89eddce8b1bd
--- /dev/null
+++ b/drivers/vfio/pci/virtio/Kconfig
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: GPL-2.0-only
+config VIRTIO_VFIO_PCI
+        tristate "VFIO support for VIRTIO PCI devices"
+        depends on VIRTIO_PCI
+        select VFIO_PCI_CORE
+        help
+          This provides support for exposing VIRTIO VF devices using the VFIO
+          framework that can work with a legacy virtio driver in the guest.
+          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
+          not indicate I/O Space.
+          As of that this driver emulated I/O BAR in software to let a VF be
+          seen as a transitional device in the guest and let it work with
+          a legacy driver.
+
+          If you don't know what to do here, say N.

I don't promise we'll remember to poke at vfio if we tweak something
in the virtio kconfig.

-- 
MST


^ permalink raw reply related	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 12:40   ` Yishai Hadas
@ 2023-09-21 13:33     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 13:33 UTC (permalink / raw)
  To: Yishai Hadas; +Cc: kvm, maorg, virtualization, jgg, jiri, leonro

On Thu, Sep 21, 2023 at 03:40:40PM +0300, Yishai Hadas wrote:
> +#define VIRTIO_LEGACY_IO_BAR_HEADER_LEN 20
> +#define VIRTIO_LEGACY_IO_BAR_MSIX_HEADER_LEN 4

This is exactly part of VIRTIO_PCI_CONFIG_OFF duplicated.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-21 13:33     ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 13:33 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: alex.williamson, jasowang, jgg, kvm, virtualization, parav,
	feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, Sep 21, 2023 at 03:40:40PM +0300, Yishai Hadas wrote:
> +#define VIRTIO_LEGACY_IO_BAR_HEADER_LEN 20
> +#define VIRTIO_LEGACY_IO_BAR_MSIX_HEADER_LEN 4

This is exactly part of VIRTIO_PCI_CONFIG_OFF duplicated.

-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 01/11] virtio-pci: Use virtio pci device layer vq info instead of generic one
  2023-09-21 12:40   ` Yishai Hadas
@ 2023-09-21 13:46     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 13:46 UTC (permalink / raw)
  To: Yishai Hadas; +Cc: kvm, maorg, virtualization, jgg, jiri, leonro

On Thu, Sep 21, 2023 at 03:40:30PM +0300, Yishai Hadas wrote:
> From: Feng Liu <feliu@nvidia.com>
> 
> Currently VQ deletion callback vp_del_vqs() processes generic
> virtio_device level VQ list instead of VQ information available at PCI
> layer.
> 
> To adhere to the layering, use the pci device level VQ information
> stored in the virtqueues or vqs.
> 
> This also prepares the code to handle PCI layer admin vq life cycle to
> be managed within the pci layer and thereby avoid undesired deletion of
> admin vq by upper layer drivers (net, console, vfio), in the del_vqs()
> callback.

> Signed-off-by: Feng Liu <feliu@nvidia.com>
> Reviewed-by: Parav Pandit <parav@nvidia.com>
> Reviewed-by: Jiri Pirko <jiri@nvidia.com>
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> ---
>  drivers/virtio/virtio_pci_common.c | 12 +++++++++---
>  drivers/virtio/virtio_pci_common.h |  1 +
>  2 files changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> index c2524a7207cf..7a3e6edc4dd6 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -232,12 +232,16 @@ static void vp_del_vq(struct virtqueue *vq)
>  void vp_del_vqs(struct virtio_device *vdev)
>  {
>  	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> -	struct virtqueue *vq, *n;
> +	struct virtqueue *vq;
>  	int i;
>  
> -	list_for_each_entry_safe(vq, n, &vdev->vqs, list) {
> +	for (i = 0; i < vp_dev->nvqs; i++) {
> +		if (!vp_dev->vqs[i])
> +			continue;
> +
> +		vq = vp_dev->vqs[i]->vq;
>  		if (vp_dev->per_vq_vectors) {
> -			int v = vp_dev->vqs[vq->index]->msix_vector;
> +			int v = vp_dev->vqs[i]->msix_vector;
>  
>  			if (v != VIRTIO_MSI_NO_VECTOR) {
>  				int irq = pci_irq_vector(vp_dev->pci_dev, v);
> @@ -294,6 +298,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned int nvqs,
>  	vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
>  	if (!vp_dev->vqs)
>  		return -ENOMEM;
> +	vp_dev->nvqs = nvqs;
>  
>  	if (per_vq_vectors) {
>  		/* Best option: one for change interrupt, one per vq. */
> @@ -365,6 +370,7 @@ static int vp_find_vqs_intx(struct virtio_device *vdev, unsigned int nvqs,
>  	vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
>  	if (!vp_dev->vqs)
>  		return -ENOMEM;
> +	vp_dev->nvqs = nvqs;
>  
>  	err = request_irq(vp_dev->pci_dev->irq, vp_interrupt, IRQF_SHARED,
>  			dev_name(&vdev->dev), vp_dev);
> diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
> index 4b773bd7c58c..602021967aaa 100644
> --- a/drivers/virtio/virtio_pci_common.h
> +++ b/drivers/virtio/virtio_pci_common.h
> @@ -60,6 +60,7 @@ struct virtio_pci_device {
>  
>  	/* array of all queues for house-keeping */
>  	struct virtio_pci_vq_info **vqs;
> +	u32 nvqs;

I don't much like it that we are adding more duplicated info here.
In fact, we tried removing the vqs array in
5c34d002dcc7a6dd665a19d098b4f4cd5501ba1a - there was some bug in that
patch and the author didn't have the time to debug
so I reverted but I don't really think we need to add to that.

>  
>  	/* MSI-X support */
>  	int msix_enabled;
> -- 
> 2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 01/11] virtio-pci: Use virtio pci device layer vq info instead of generic one
@ 2023-09-21 13:46     ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 13:46 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: alex.williamson, jasowang, jgg, kvm, virtualization, parav,
	feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, Sep 21, 2023 at 03:40:30PM +0300, Yishai Hadas wrote:
> From: Feng Liu <feliu@nvidia.com>
> 
> Currently VQ deletion callback vp_del_vqs() processes generic
> virtio_device level VQ list instead of VQ information available at PCI
> layer.
> 
> To adhere to the layering, use the pci device level VQ information
> stored in the virtqueues or vqs.
> 
> This also prepares the code to handle PCI layer admin vq life cycle to
> be managed within the pci layer and thereby avoid undesired deletion of
> admin vq by upper layer drivers (net, console, vfio), in the del_vqs()
> callback.

> Signed-off-by: Feng Liu <feliu@nvidia.com>
> Reviewed-by: Parav Pandit <parav@nvidia.com>
> Reviewed-by: Jiri Pirko <jiri@nvidia.com>
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> ---
>  drivers/virtio/virtio_pci_common.c | 12 +++++++++---
>  drivers/virtio/virtio_pci_common.h |  1 +
>  2 files changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> index c2524a7207cf..7a3e6edc4dd6 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -232,12 +232,16 @@ static void vp_del_vq(struct virtqueue *vq)
>  void vp_del_vqs(struct virtio_device *vdev)
>  {
>  	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> -	struct virtqueue *vq, *n;
> +	struct virtqueue *vq;
>  	int i;
>  
> -	list_for_each_entry_safe(vq, n, &vdev->vqs, list) {
> +	for (i = 0; i < vp_dev->nvqs; i++) {
> +		if (!vp_dev->vqs[i])
> +			continue;
> +
> +		vq = vp_dev->vqs[i]->vq;
>  		if (vp_dev->per_vq_vectors) {
> -			int v = vp_dev->vqs[vq->index]->msix_vector;
> +			int v = vp_dev->vqs[i]->msix_vector;
>  
>  			if (v != VIRTIO_MSI_NO_VECTOR) {
>  				int irq = pci_irq_vector(vp_dev->pci_dev, v);
> @@ -294,6 +298,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned int nvqs,
>  	vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
>  	if (!vp_dev->vqs)
>  		return -ENOMEM;
> +	vp_dev->nvqs = nvqs;
>  
>  	if (per_vq_vectors) {
>  		/* Best option: one for change interrupt, one per vq. */
> @@ -365,6 +370,7 @@ static int vp_find_vqs_intx(struct virtio_device *vdev, unsigned int nvqs,
>  	vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
>  	if (!vp_dev->vqs)
>  		return -ENOMEM;
> +	vp_dev->nvqs = nvqs;
>  
>  	err = request_irq(vp_dev->pci_dev->irq, vp_interrupt, IRQF_SHARED,
>  			dev_name(&vdev->dev), vp_dev);
> diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
> index 4b773bd7c58c..602021967aaa 100644
> --- a/drivers/virtio/virtio_pci_common.h
> +++ b/drivers/virtio/virtio_pci_common.h
> @@ -60,6 +60,7 @@ struct virtio_pci_device {
>  
>  	/* array of all queues for house-keeping */
>  	struct virtio_pci_vq_info **vqs;
> +	u32 nvqs;

I don't much like it that we are adding more duplicated info here.
In fact, we tried removing the vqs array in
5c34d002dcc7a6dd665a19d098b4f4cd5501ba1a - there was some bug in that
patch and the author didn't have the time to debug
so I reverted but I don't really think we need to add to that.

>  
>  	/* MSI-X support */
>  	int msix_enabled;
> -- 
> 2.27.0


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 03/11] virtio-pci: Introduce admin virtqueue
  2023-09-21 12:40   ` Yishai Hadas
@ 2023-09-21 13:57     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 13:57 UTC (permalink / raw)
  To: Yishai Hadas; +Cc: kvm, maorg, virtualization, jgg, jiri, leonro

On Thu, Sep 21, 2023 at 03:40:32PM +0300, Yishai Hadas wrote:
> From: Feng Liu <feliu@nvidia.com>
> 
> Introduce support for the admin virtqueue. By negotiating
> VIRTIO_F_ADMIN_VQ feature, driver detects capability and creates one
> administration virtqueue. Administration virtqueue implementation in
> virtio pci generic layer, enables multiple types of upper layer
> drivers such as vfio, net, blk to utilize it.
> 
> Signed-off-by: Feng Liu <feliu@nvidia.com>
> Reviewed-by: Parav Pandit <parav@nvidia.com>
> Reviewed-by: Jiri Pirko <jiri@nvidia.com>
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> ---
>  drivers/virtio/Makefile                |  2 +-
>  drivers/virtio/virtio.c                | 37 +++++++++++++--
>  drivers/virtio/virtio_pci_common.h     | 15 +++++-
>  drivers/virtio/virtio_pci_modern.c     | 10 +++-
>  drivers/virtio/virtio_pci_modern_avq.c | 65 ++++++++++++++++++++++++++

if you have a .c file without a .h file you know there's something
fishy. Just add this inside drivers/virtio/virtio_pci_modern.c ?

>  include/linux/virtio_config.h          |  4 ++
>  include/linux/virtio_pci_modern.h      |  3 ++
>  7 files changed, 129 insertions(+), 7 deletions(-)
>  create mode 100644 drivers/virtio/virtio_pci_modern_avq.c
> 
> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> index 8e98d24917cc..dcc535b5b4d9 100644
> --- a/drivers/virtio/Makefile
> +++ b/drivers/virtio/Makefile
> @@ -5,7 +5,7 @@ obj-$(CONFIG_VIRTIO_PCI_LIB) += virtio_pci_modern_dev.o
>  obj-$(CONFIG_VIRTIO_PCI_LIB_LEGACY) += virtio_pci_legacy_dev.o
>  obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
>  obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
> -virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
> +virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o virtio_pci_modern_avq.o
>  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
>  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
>  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 3893dc29eb26..f4080692b351 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -302,9 +302,15 @@ static int virtio_dev_probe(struct device *_d)
>  	if (err)
>  		goto err;
>  
> +	if (dev->config->create_avq) {
> +		err = dev->config->create_avq(dev);
> +		if (err)
> +			goto err;
> +	}
> +
>  	err = drv->probe(dev);
>  	if (err)
> -		goto err;
> +		goto err_probe;
>  
>  	/* If probe didn't do it, mark device DRIVER_OK ourselves. */
>  	if (!(dev->config->get_status(dev) & VIRTIO_CONFIG_S_DRIVER_OK))
> @@ -316,6 +322,10 @@ static int virtio_dev_probe(struct device *_d)
>  	virtio_config_enable(dev);
>  
>  	return 0;
> +
> +err_probe:
> +	if (dev->config->destroy_avq)
> +		dev->config->destroy_avq(dev);
>  err:
>  	virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
>  	return err;
> @@ -331,6 +341,9 @@ static void virtio_dev_remove(struct device *_d)
>  
>  	drv->remove(dev);
>  
> +	if (dev->config->destroy_avq)
> +		dev->config->destroy_avq(dev);
> +
>  	/* Driver should have reset device. */
>  	WARN_ON_ONCE(dev->config->get_status(dev));
>  
> @@ -489,13 +502,20 @@ EXPORT_SYMBOL_GPL(unregister_virtio_device);
>  int virtio_device_freeze(struct virtio_device *dev)
>  {
>  	struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
> +	int ret;
>  
>  	virtio_config_disable(dev);
>  
>  	dev->failed = dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED;
>  
> -	if (drv && drv->freeze)
> -		return drv->freeze(dev);
> +	if (drv && drv->freeze) {
> +		ret = drv->freeze(dev);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	if (dev->config->destroy_avq)
> +		dev->config->destroy_avq(dev);
>  
>  	return 0;
>  }
> @@ -532,10 +552,16 @@ int virtio_device_restore(struct virtio_device *dev)
>  	if (ret)
>  		goto err;
>  
> +	if (dev->config->create_avq) {
> +		ret = dev->config->create_avq(dev);
> +		if (ret)
> +			goto err;
> +	}
> +
>  	if (drv->restore) {
>  		ret = drv->restore(dev);
>  		if (ret)
> -			goto err;
> +			goto err_restore;
>  	}
>  
>  	/* If restore didn't do it, mark device DRIVER_OK ourselves. */
> @@ -546,6 +572,9 @@ int virtio_device_restore(struct virtio_device *dev)
>  
>  	return 0;
>  
> +err_restore:
> +	if (dev->config->destroy_avq)
> +		dev->config->destroy_avq(dev);
>  err:
>  	virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
>  	return ret;
> diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
> index 602021967aaa..9bffa95274b6 100644
> --- a/drivers/virtio/virtio_pci_common.h
> +++ b/drivers/virtio/virtio_pci_common.h
> @@ -41,6 +41,14 @@ struct virtio_pci_vq_info {
>  	unsigned int msix_vector;
>  };
>  
> +struct virtio_avq {

admin_vq would be better. and this is pci specific yes? so virtio_pci_

> +	/* Virtqueue info associated with this admin queue. */
> +	struct virtio_pci_vq_info info;
> +	/* Name of the admin queue: avq.$index. */
> +	char name[10];
> +	u16 vq_index;
> +};
> +
>  /* Our device structure */
>  struct virtio_pci_device {
>  	struct virtio_device vdev;
> @@ -58,10 +66,13 @@ struct virtio_pci_device {
>  	spinlock_t lock;
>  	struct list_head virtqueues;
>  
> -	/* array of all queues for house-keeping */
> +	/* Array of all virtqueues reported in the
> +	 * PCI common config num_queues field
> +	 */
>  	struct virtio_pci_vq_info **vqs;
>  	u32 nvqs;
>  
> +	struct virtio_avq *admin;

and this could be thinkably admin_vq.

>  	/* MSI-X support */
>  	int msix_enabled;
>  	int intx_enabled;
> @@ -115,6 +126,8 @@ int vp_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
>  		const char * const names[], const bool *ctx,
>  		struct irq_affinity *desc);
>  const char *vp_bus_name(struct virtio_device *vdev);
> +void vp_destroy_avq(struct virtio_device *vdev);
> +int vp_create_avq(struct virtio_device *vdev);
>  
>  /* Setup the affinity for a virtqueue:
>   * - force the affinity for per vq vector
> diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
> index d6bb68ba84e5..a72c87687196 100644
> --- a/drivers/virtio/virtio_pci_modern.c
> +++ b/drivers/virtio/virtio_pci_modern.c
> @@ -37,6 +37,9 @@ static void vp_transport_features(struct virtio_device *vdev, u64 features)
>  
>  	if (features & BIT_ULL(VIRTIO_F_RING_RESET))
>  		__virtio_set_bit(vdev, VIRTIO_F_RING_RESET);
> +
> +	if (features & BIT_ULL(VIRTIO_F_ADMIN_VQ))
> +		__virtio_set_bit(vdev, VIRTIO_F_ADMIN_VQ);
>  }
>  
>  /* virtio config->finalize_features() implementation */
> @@ -317,7 +320,8 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
>  	else
>  		notify = vp_notify;
>  
> -	if (index >= vp_modern_get_num_queues(mdev))
> +	if (!((index < vp_modern_get_num_queues(mdev) ||
> +	      (vp_dev->admin && vp_dev->admin->vq_index == index))))
>  		return ERR_PTR(-EINVAL);
>  
>  	/* Check if queue is either not available or already active. */
> @@ -509,6 +513,8 @@ static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
>  	.get_shm_region  = vp_get_shm_region,
>  	.disable_vq_and_reset = vp_modern_disable_vq_and_reset,
>  	.enable_vq_after_reset = vp_modern_enable_vq_after_reset,
> +	.create_avq = vp_create_avq,
> +	.destroy_avq = vp_destroy_avq,
>  };
>  
>  static const struct virtio_config_ops virtio_pci_config_ops = {
> @@ -529,6 +535,8 @@ static const struct virtio_config_ops virtio_pci_config_ops = {
>  	.get_shm_region  = vp_get_shm_region,
>  	.disable_vq_and_reset = vp_modern_disable_vq_and_reset,
>  	.enable_vq_after_reset = vp_modern_enable_vq_after_reset,
> +	.create_avq = vp_create_avq,
> +	.destroy_avq = vp_destroy_avq,
>  };
>  
>  /* the PCI probing function */
> diff --git a/drivers/virtio/virtio_pci_modern_avq.c b/drivers/virtio/virtio_pci_modern_avq.c
> new file mode 100644
> index 000000000000..114579ad788f
> --- /dev/null
> +++ b/drivers/virtio/virtio_pci_modern_avq.c
> @@ -0,0 +1,65 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +
> +#include <linux/virtio.h>
> +#include "virtio_pci_common.h"
> +
> +static u16 vp_modern_avq_num(struct virtio_pci_modern_device *mdev)
> +{
> +	struct virtio_pci_modern_common_cfg __iomem *cfg;
> +
> +	cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common;
> +	return vp_ioread16(&cfg->admin_queue_num);
> +}
> +
> +static u16 vp_modern_avq_index(struct virtio_pci_modern_device *mdev)
> +{
> +	struct virtio_pci_modern_common_cfg __iomem *cfg;
> +
> +	cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common;
> +	return vp_ioread16(&cfg->admin_queue_index);
> +}
> +
> +int vp_create_avq(struct virtio_device *vdev)
> +{
> +	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> +	struct virtio_avq *avq;
> +	struct virtqueue *vq;
> +	u16 admin_q_num;
> +
> +	if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ))
> +		return 0;
> +
> +	admin_q_num = vp_modern_avq_num(&vp_dev->mdev);
> +	if (!admin_q_num)
> +		return -EINVAL;
> +
> +	vp_dev->admin = kzalloc(sizeof(*vp_dev->admin), GFP_KERNEL);
> +	if (!vp_dev->admin)
> +		return -ENOMEM;
> +
> +	avq = vp_dev->admin;
> +	avq->vq_index = vp_modern_avq_index(&vp_dev->mdev);
> +	sprintf(avq->name, "avq.%u", avq->vq_index);
> +	vq = vp_dev->setup_vq(vp_dev, &vp_dev->admin->info, avq->vq_index, NULL,
> +			      avq->name, NULL, VIRTIO_MSI_NO_VECTOR);
> +	if (IS_ERR(vq)) {
> +		dev_err(&vdev->dev, "failed to setup admin virtqueue");
> +		kfree(vp_dev->admin);
> +		return PTR_ERR(vq);
> +	}
> +
> +	vp_dev->admin->info.vq = vq;
> +	vp_modern_set_queue_enable(&vp_dev->mdev, avq->info.vq->index, true);
> +	return 0;
> +}
> +
> +void vp_destroy_avq(struct virtio_device *vdev)
> +{
> +	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> +
> +	if (!vp_dev->admin)
> +		return;
> +
> +	vp_dev->del_vq(&vp_dev->admin->info);
> +	kfree(vp_dev->admin);
> +}
> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> index 2b3438de2c4d..028c51ea90ee 100644
> --- a/include/linux/virtio_config.h
> +++ b/include/linux/virtio_config.h
> @@ -93,6 +93,8 @@ typedef void vq_callback_t(struct virtqueue *);
>   *	Returns 0 on success or error status
>   *	If disable_vq_and_reset is set, then enable_vq_after_reset must also be
>   *	set.
> + * @create_avq: initialize admin virtqueue resource.
> + * @destroy_avq: destroy admin virtqueue resource.
>   */
>  struct virtio_config_ops {
>  	void (*get)(struct virtio_device *vdev, unsigned offset,
> @@ -120,6 +122,8 @@ struct virtio_config_ops {
>  			       struct virtio_shm_region *region, u8 id);
>  	int (*disable_vq_and_reset)(struct virtqueue *vq);
>  	int (*enable_vq_after_reset)(struct virtqueue *vq);
> +	int (*create_avq)(struct virtio_device *vdev);
> +	void (*destroy_avq)(struct virtio_device *vdev);
>  };
>  
>  /* If driver didn't advertise the feature, it will never appear. */
> diff --git a/include/linux/virtio_pci_modern.h b/include/linux/virtio_pci_modern.h
> index 067ac1d789bc..f6cb13d858fd 100644
> --- a/include/linux/virtio_pci_modern.h
> +++ b/include/linux/virtio_pci_modern.h
> @@ -10,6 +10,9 @@ struct virtio_pci_modern_common_cfg {
>  
>  	__le16 queue_notify_data;	/* read-write */
>  	__le16 queue_reset;		/* read-write */
> +
> +	__le16 admin_queue_index;	/* read-only */
> +	__le16 admin_queue_num;		/* read-only */
>  };


ouch.
actually there's a problem

        mdev->common = vp_modern_map_capability(mdev, common,
                                      sizeof(struct virtio_pci_common_cfg), 4,
                                      0, sizeof(struct virtio_pci_common_cfg),
                                      NULL, NULL);

extending this structure means some calls will start failing on
existing devices.

even more of an ouch, when we added queue_notify_data and queue_reset we
also possibly broke some devices. well hopefully not since no one
reported failures but we really need to fix that.


>  
>  struct virtio_pci_modern_device {
> -- 
> 2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 03/11] virtio-pci: Introduce admin virtqueue
@ 2023-09-21 13:57     ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 13:57 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: alex.williamson, jasowang, jgg, kvm, virtualization, parav,
	feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, Sep 21, 2023 at 03:40:32PM +0300, Yishai Hadas wrote:
> From: Feng Liu <feliu@nvidia.com>
> 
> Introduce support for the admin virtqueue. By negotiating
> VIRTIO_F_ADMIN_VQ feature, driver detects capability and creates one
> administration virtqueue. Administration virtqueue implementation in
> virtio pci generic layer, enables multiple types of upper layer
> drivers such as vfio, net, blk to utilize it.
> 
> Signed-off-by: Feng Liu <feliu@nvidia.com>
> Reviewed-by: Parav Pandit <parav@nvidia.com>
> Reviewed-by: Jiri Pirko <jiri@nvidia.com>
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> ---
>  drivers/virtio/Makefile                |  2 +-
>  drivers/virtio/virtio.c                | 37 +++++++++++++--
>  drivers/virtio/virtio_pci_common.h     | 15 +++++-
>  drivers/virtio/virtio_pci_modern.c     | 10 +++-
>  drivers/virtio/virtio_pci_modern_avq.c | 65 ++++++++++++++++++++++++++

if you have a .c file without a .h file you know there's something
fishy. Just add this inside drivers/virtio/virtio_pci_modern.c ?

>  include/linux/virtio_config.h          |  4 ++
>  include/linux/virtio_pci_modern.h      |  3 ++
>  7 files changed, 129 insertions(+), 7 deletions(-)
>  create mode 100644 drivers/virtio/virtio_pci_modern_avq.c
> 
> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> index 8e98d24917cc..dcc535b5b4d9 100644
> --- a/drivers/virtio/Makefile
> +++ b/drivers/virtio/Makefile
> @@ -5,7 +5,7 @@ obj-$(CONFIG_VIRTIO_PCI_LIB) += virtio_pci_modern_dev.o
>  obj-$(CONFIG_VIRTIO_PCI_LIB_LEGACY) += virtio_pci_legacy_dev.o
>  obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
>  obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
> -virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
> +virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o virtio_pci_modern_avq.o
>  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
>  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
>  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 3893dc29eb26..f4080692b351 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -302,9 +302,15 @@ static int virtio_dev_probe(struct device *_d)
>  	if (err)
>  		goto err;
>  
> +	if (dev->config->create_avq) {
> +		err = dev->config->create_avq(dev);
> +		if (err)
> +			goto err;
> +	}
> +
>  	err = drv->probe(dev);
>  	if (err)
> -		goto err;
> +		goto err_probe;
>  
>  	/* If probe didn't do it, mark device DRIVER_OK ourselves. */
>  	if (!(dev->config->get_status(dev) & VIRTIO_CONFIG_S_DRIVER_OK))
> @@ -316,6 +322,10 @@ static int virtio_dev_probe(struct device *_d)
>  	virtio_config_enable(dev);
>  
>  	return 0;
> +
> +err_probe:
> +	if (dev->config->destroy_avq)
> +		dev->config->destroy_avq(dev);
>  err:
>  	virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
>  	return err;
> @@ -331,6 +341,9 @@ static void virtio_dev_remove(struct device *_d)
>  
>  	drv->remove(dev);
>  
> +	if (dev->config->destroy_avq)
> +		dev->config->destroy_avq(dev);
> +
>  	/* Driver should have reset device. */
>  	WARN_ON_ONCE(dev->config->get_status(dev));
>  
> @@ -489,13 +502,20 @@ EXPORT_SYMBOL_GPL(unregister_virtio_device);
>  int virtio_device_freeze(struct virtio_device *dev)
>  {
>  	struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
> +	int ret;
>  
>  	virtio_config_disable(dev);
>  
>  	dev->failed = dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED;
>  
> -	if (drv && drv->freeze)
> -		return drv->freeze(dev);
> +	if (drv && drv->freeze) {
> +		ret = drv->freeze(dev);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	if (dev->config->destroy_avq)
> +		dev->config->destroy_avq(dev);
>  
>  	return 0;
>  }
> @@ -532,10 +552,16 @@ int virtio_device_restore(struct virtio_device *dev)
>  	if (ret)
>  		goto err;
>  
> +	if (dev->config->create_avq) {
> +		ret = dev->config->create_avq(dev);
> +		if (ret)
> +			goto err;
> +	}
> +
>  	if (drv->restore) {
>  		ret = drv->restore(dev);
>  		if (ret)
> -			goto err;
> +			goto err_restore;
>  	}
>  
>  	/* If restore didn't do it, mark device DRIVER_OK ourselves. */
> @@ -546,6 +572,9 @@ int virtio_device_restore(struct virtio_device *dev)
>  
>  	return 0;
>  
> +err_restore:
> +	if (dev->config->destroy_avq)
> +		dev->config->destroy_avq(dev);
>  err:
>  	virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
>  	return ret;
> diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
> index 602021967aaa..9bffa95274b6 100644
> --- a/drivers/virtio/virtio_pci_common.h
> +++ b/drivers/virtio/virtio_pci_common.h
> @@ -41,6 +41,14 @@ struct virtio_pci_vq_info {
>  	unsigned int msix_vector;
>  };
>  
> +struct virtio_avq {

admin_vq would be better. and this is pci specific yes? so virtio_pci_

> +	/* Virtqueue info associated with this admin queue. */
> +	struct virtio_pci_vq_info info;
> +	/* Name of the admin queue: avq.$index. */
> +	char name[10];
> +	u16 vq_index;
> +};
> +
>  /* Our device structure */
>  struct virtio_pci_device {
>  	struct virtio_device vdev;
> @@ -58,10 +66,13 @@ struct virtio_pci_device {
>  	spinlock_t lock;
>  	struct list_head virtqueues;
>  
> -	/* array of all queues for house-keeping */
> +	/* Array of all virtqueues reported in the
> +	 * PCI common config num_queues field
> +	 */
>  	struct virtio_pci_vq_info **vqs;
>  	u32 nvqs;
>  
> +	struct virtio_avq *admin;

and this could be thinkably admin_vq.

>  	/* MSI-X support */
>  	int msix_enabled;
>  	int intx_enabled;
> @@ -115,6 +126,8 @@ int vp_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
>  		const char * const names[], const bool *ctx,
>  		struct irq_affinity *desc);
>  const char *vp_bus_name(struct virtio_device *vdev);
> +void vp_destroy_avq(struct virtio_device *vdev);
> +int vp_create_avq(struct virtio_device *vdev);
>  
>  /* Setup the affinity for a virtqueue:
>   * - force the affinity for per vq vector
> diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
> index d6bb68ba84e5..a72c87687196 100644
> --- a/drivers/virtio/virtio_pci_modern.c
> +++ b/drivers/virtio/virtio_pci_modern.c
> @@ -37,6 +37,9 @@ static void vp_transport_features(struct virtio_device *vdev, u64 features)
>  
>  	if (features & BIT_ULL(VIRTIO_F_RING_RESET))
>  		__virtio_set_bit(vdev, VIRTIO_F_RING_RESET);
> +
> +	if (features & BIT_ULL(VIRTIO_F_ADMIN_VQ))
> +		__virtio_set_bit(vdev, VIRTIO_F_ADMIN_VQ);
>  }
>  
>  /* virtio config->finalize_features() implementation */
> @@ -317,7 +320,8 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
>  	else
>  		notify = vp_notify;
>  
> -	if (index >= vp_modern_get_num_queues(mdev))
> +	if (!((index < vp_modern_get_num_queues(mdev) ||
> +	      (vp_dev->admin && vp_dev->admin->vq_index == index))))
>  		return ERR_PTR(-EINVAL);
>  
>  	/* Check if queue is either not available or already active. */
> @@ -509,6 +513,8 @@ static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
>  	.get_shm_region  = vp_get_shm_region,
>  	.disable_vq_and_reset = vp_modern_disable_vq_and_reset,
>  	.enable_vq_after_reset = vp_modern_enable_vq_after_reset,
> +	.create_avq = vp_create_avq,
> +	.destroy_avq = vp_destroy_avq,
>  };
>  
>  static const struct virtio_config_ops virtio_pci_config_ops = {
> @@ -529,6 +535,8 @@ static const struct virtio_config_ops virtio_pci_config_ops = {
>  	.get_shm_region  = vp_get_shm_region,
>  	.disable_vq_and_reset = vp_modern_disable_vq_and_reset,
>  	.enable_vq_after_reset = vp_modern_enable_vq_after_reset,
> +	.create_avq = vp_create_avq,
> +	.destroy_avq = vp_destroy_avq,
>  };
>  
>  /* the PCI probing function */
> diff --git a/drivers/virtio/virtio_pci_modern_avq.c b/drivers/virtio/virtio_pci_modern_avq.c
> new file mode 100644
> index 000000000000..114579ad788f
> --- /dev/null
> +++ b/drivers/virtio/virtio_pci_modern_avq.c
> @@ -0,0 +1,65 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +
> +#include <linux/virtio.h>
> +#include "virtio_pci_common.h"
> +
> +static u16 vp_modern_avq_num(struct virtio_pci_modern_device *mdev)
> +{
> +	struct virtio_pci_modern_common_cfg __iomem *cfg;
> +
> +	cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common;
> +	return vp_ioread16(&cfg->admin_queue_num);
> +}
> +
> +static u16 vp_modern_avq_index(struct virtio_pci_modern_device *mdev)
> +{
> +	struct virtio_pci_modern_common_cfg __iomem *cfg;
> +
> +	cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common;
> +	return vp_ioread16(&cfg->admin_queue_index);
> +}
> +
> +int vp_create_avq(struct virtio_device *vdev)
> +{
> +	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> +	struct virtio_avq *avq;
> +	struct virtqueue *vq;
> +	u16 admin_q_num;
> +
> +	if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ))
> +		return 0;
> +
> +	admin_q_num = vp_modern_avq_num(&vp_dev->mdev);
> +	if (!admin_q_num)
> +		return -EINVAL;
> +
> +	vp_dev->admin = kzalloc(sizeof(*vp_dev->admin), GFP_KERNEL);
> +	if (!vp_dev->admin)
> +		return -ENOMEM;
> +
> +	avq = vp_dev->admin;
> +	avq->vq_index = vp_modern_avq_index(&vp_dev->mdev);
> +	sprintf(avq->name, "avq.%u", avq->vq_index);
> +	vq = vp_dev->setup_vq(vp_dev, &vp_dev->admin->info, avq->vq_index, NULL,
> +			      avq->name, NULL, VIRTIO_MSI_NO_VECTOR);
> +	if (IS_ERR(vq)) {
> +		dev_err(&vdev->dev, "failed to setup admin virtqueue");
> +		kfree(vp_dev->admin);
> +		return PTR_ERR(vq);
> +	}
> +
> +	vp_dev->admin->info.vq = vq;
> +	vp_modern_set_queue_enable(&vp_dev->mdev, avq->info.vq->index, true);
> +	return 0;
> +}
> +
> +void vp_destroy_avq(struct virtio_device *vdev)
> +{
> +	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> +
> +	if (!vp_dev->admin)
> +		return;
> +
> +	vp_dev->del_vq(&vp_dev->admin->info);
> +	kfree(vp_dev->admin);
> +}
> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> index 2b3438de2c4d..028c51ea90ee 100644
> --- a/include/linux/virtio_config.h
> +++ b/include/linux/virtio_config.h
> @@ -93,6 +93,8 @@ typedef void vq_callback_t(struct virtqueue *);
>   *	Returns 0 on success or error status
>   *	If disable_vq_and_reset is set, then enable_vq_after_reset must also be
>   *	set.
> + * @create_avq: initialize admin virtqueue resource.
> + * @destroy_avq: destroy admin virtqueue resource.
>   */
>  struct virtio_config_ops {
>  	void (*get)(struct virtio_device *vdev, unsigned offset,
> @@ -120,6 +122,8 @@ struct virtio_config_ops {
>  			       struct virtio_shm_region *region, u8 id);
>  	int (*disable_vq_and_reset)(struct virtqueue *vq);
>  	int (*enable_vq_after_reset)(struct virtqueue *vq);
> +	int (*create_avq)(struct virtio_device *vdev);
> +	void (*destroy_avq)(struct virtio_device *vdev);
>  };
>  
>  /* If driver didn't advertise the feature, it will never appear. */
> diff --git a/include/linux/virtio_pci_modern.h b/include/linux/virtio_pci_modern.h
> index 067ac1d789bc..f6cb13d858fd 100644
> --- a/include/linux/virtio_pci_modern.h
> +++ b/include/linux/virtio_pci_modern.h
> @@ -10,6 +10,9 @@ struct virtio_pci_modern_common_cfg {
>  
>  	__le16 queue_notify_data;	/* read-write */
>  	__le16 queue_reset;		/* read-write */
> +
> +	__le16 admin_queue_index;	/* read-only */
> +	__le16 admin_queue_num;		/* read-only */
>  };


ouch.
actually there's a problem

        mdev->common = vp_modern_map_capability(mdev, common,
                                      sizeof(struct virtio_pci_common_cfg), 4,
                                      0, sizeof(struct virtio_pci_common_cfg),
                                      NULL, NULL);

extending this structure means some calls will start failing on
existing devices.

even more of an ouch, when we added queue_notify_data and queue_reset we
also possibly broke some devices. well hopefully not since no one
reported failures but we really need to fix that.


>  
>  struct virtio_pci_modern_device {
> -- 
> 2.27.0


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 13:16     ` Michael S. Tsirkin
  (?)
@ 2023-09-21 14:11     ` Jason Gunthorpe
  2023-09-21 14:16         ` Michael S. Tsirkin
  -1 siblings, 1 reply; 321+ messages in thread
From: Jason Gunthorpe @ 2023-09-21 14:11 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Yishai Hadas, alex.williamson, jasowang, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, Sep 21, 2023 at 09:16:21AM -0400, Michael S. Tsirkin wrote:

> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index bf0f54c24f81..5098418c8389 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -22624,6 +22624,12 @@ L:	kvm@vger.kernel.org
> >  S:	Maintained
> >  F:	drivers/vfio/pci/mlx5/
> >  
> > +VFIO VIRTIO PCI DRIVER
> > +M:	Yishai Hadas <yishaih@nvidia.com>
> > +L:	kvm@vger.kernel.org
> > +S:	Maintained
> > +F:	drivers/vfio/pci/virtio
> > +
> >  VFIO PCI DEVICE SPECIFIC DRIVERS
> >  R:	Jason Gunthorpe <jgg@nvidia.com>
> >  R:	Yishai Hadas <yishaih@nvidia.com>
> 
> Tying two subsystems together like this is going to cause pain when
> merging. God forbid there's something e.g. virtio net specific
> (and there's going to be for sure) - now we are talking 3
> subsystems.

Cross subsystem stuff is normal in the kernel. Drivers should be
placed in their most logical spot - this driver exposes a VFIO
interface so it belongs here.

Your exact argument works the same from the VFIO perspective, someone
has to have code that belongs to them outside their little sphere
here.

> Case in point all other virtio drivers are nicely grouped, have a common
> mailing list etc etc.  This one is completely separate to the point
> where people won't even remember to copy the virtio mailing list.

The virtio mailing list should probably be added to the maintainers
enry

Jason

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 14:11     ` Jason Gunthorpe
@ 2023-09-21 14:16         ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 14:16 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kvm, maorg, virtualization, jiri, leonro

On Thu, Sep 21, 2023 at 11:11:25AM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 09:16:21AM -0400, Michael S. Tsirkin wrote:
> 
> > > diff --git a/MAINTAINERS b/MAINTAINERS
> > > index bf0f54c24f81..5098418c8389 100644
> > > --- a/MAINTAINERS
> > > +++ b/MAINTAINERS
> > > @@ -22624,6 +22624,12 @@ L:	kvm@vger.kernel.org
> > >  S:	Maintained
> > >  F:	drivers/vfio/pci/mlx5/
> > >  
> > > +VFIO VIRTIO PCI DRIVER
> > > +M:	Yishai Hadas <yishaih@nvidia.com>
> > > +L:	kvm@vger.kernel.org
> > > +S:	Maintained
> > > +F:	drivers/vfio/pci/virtio
> > > +
> > >  VFIO PCI DEVICE SPECIFIC DRIVERS
> > >  R:	Jason Gunthorpe <jgg@nvidia.com>
> > >  R:	Yishai Hadas <yishaih@nvidia.com>
> > 
> > Tying two subsystems together like this is going to cause pain when
> > merging. God forbid there's something e.g. virtio net specific
> > (and there's going to be for sure) - now we are talking 3
> > subsystems.
> 
> Cross subsystem stuff is normal in the kernel.

Yea. But it's completely spurious here - virtio has its own way
to work with userspace which is vdpa and let's just use that.
Keeps things nice and contained.

> Drivers should be
> placed in their most logical spot - this driver exposes a VFIO
> interface so it belongs here.
> 
> Your exact argument works the same from the VFIO perspective, someone
> has to have code that belongs to them outside their little sphere
> here.
> 
> > Case in point all other virtio drivers are nicely grouped, have a common
> > mailing list etc etc.  This one is completely separate to the point
> > where people won't even remember to copy the virtio mailing list.
> 
> The virtio mailing list should probably be added to the maintainers
> enry
> 
> Jason

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-21 14:16         ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 14:16 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yishai Hadas, alex.williamson, jasowang, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, Sep 21, 2023 at 11:11:25AM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 09:16:21AM -0400, Michael S. Tsirkin wrote:
> 
> > > diff --git a/MAINTAINERS b/MAINTAINERS
> > > index bf0f54c24f81..5098418c8389 100644
> > > --- a/MAINTAINERS
> > > +++ b/MAINTAINERS
> > > @@ -22624,6 +22624,12 @@ L:	kvm@vger.kernel.org
> > >  S:	Maintained
> > >  F:	drivers/vfio/pci/mlx5/
> > >  
> > > +VFIO VIRTIO PCI DRIVER
> > > +M:	Yishai Hadas <yishaih@nvidia.com>
> > > +L:	kvm@vger.kernel.org
> > > +S:	Maintained
> > > +F:	drivers/vfio/pci/virtio
> > > +
> > >  VFIO PCI DEVICE SPECIFIC DRIVERS
> > >  R:	Jason Gunthorpe <jgg@nvidia.com>
> > >  R:	Yishai Hadas <yishaih@nvidia.com>
> > 
> > Tying two subsystems together like this is going to cause pain when
> > merging. God forbid there's something e.g. virtio net specific
> > (and there's going to be for sure) - now we are talking 3
> > subsystems.
> 
> Cross subsystem stuff is normal in the kernel.

Yea. But it's completely spurious here - virtio has its own way
to work with userspace which is vdpa and let's just use that.
Keeps things nice and contained.

> Drivers should be
> placed in their most logical spot - this driver exposes a VFIO
> interface so it belongs here.
> 
> Your exact argument works the same from the VFIO perspective, someone
> has to have code that belongs to them outside their little sphere
> here.
> 
> > Case in point all other virtio drivers are nicely grouped, have a common
> > mailing list etc etc.  This one is completely separate to the point
> > where people won't even remember to copy the virtio mailing list.
> 
> The virtio mailing list should probably be added to the maintainers
> enry
> 
> Jason


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 08/11] vfio/pci: Expose vfio_pci_core_setup_barmap()
  2023-09-21 12:40   ` Yishai Hadas
@ 2023-09-21 16:35     ` Alex Williamson
  -1 siblings, 0 replies; 321+ messages in thread
From: Alex Williamson @ 2023-09-21 16:35 UTC (permalink / raw)
  To: Yishai Hadas; +Cc: kvm, mst, maorg, virtualization, jgg, jiri, leonro

On Thu, 21 Sep 2023 15:40:37 +0300
Yishai Hadas <yishaih@nvidia.com> wrote:

> Expose vfio_pci_core_setup_barmap() to be used by drivers.
> 
> This will let drivers to mmap a BAR and re-use it from both vfio and the
> driver when it's applicable.
> 
> This API will be used in the next patches by the vfio/virtio coming
> driver.
> 
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> ---
>  drivers/vfio/pci/vfio_pci_core.c | 25 +++++++++++++++++++++++++
>  drivers/vfio/pci/vfio_pci_rdwr.c | 28 ++--------------------------
>  include/linux/vfio_pci_core.h    |  1 +
>  3 files changed, 28 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 1929103ee59a..b56111ed8a8c 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -684,6 +684,31 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
>  }
>  EXPORT_SYMBOL_GPL(vfio_pci_core_disable);
>  
> +int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar)
> +{
> +	struct pci_dev *pdev = vdev->pdev;
> +	void __iomem *io;
> +	int ret;
> +
> +	if (vdev->barmap[bar])
> +		return 0;
> +
> +	ret = pci_request_selected_regions(pdev, 1 << bar, "vfio");
> +	if (ret)
> +		return ret;
> +
> +	io = pci_iomap(pdev, bar, 0);
> +	if (!io) {
> +		pci_release_selected_regions(pdev, 1 << bar);
> +		return -ENOMEM;
> +	}
> +
> +	vdev->barmap[bar] = io;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(vfio_pci_core_setup_barmap);

Not to endorse the rest of this yet, but minimally _GPL, same for the
following patch.  Thanks,

Alex

> +
>  void vfio_pci_core_close_device(struct vfio_device *core_vdev)
>  {
>  	struct vfio_pci_core_device *vdev =
> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
> index e27de61ac9fe..6f08b3ecbb89 100644
> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
> @@ -200,30 +200,6 @@ static ssize_t do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem,
>  	return done;
>  }
>  
> -static int vfio_pci_setup_barmap(struct vfio_pci_core_device *vdev, int bar)
> -{
> -	struct pci_dev *pdev = vdev->pdev;
> -	int ret;
> -	void __iomem *io;
> -
> -	if (vdev->barmap[bar])
> -		return 0;
> -
> -	ret = pci_request_selected_regions(pdev, 1 << bar, "vfio");
> -	if (ret)
> -		return ret;
> -
> -	io = pci_iomap(pdev, bar, 0);
> -	if (!io) {
> -		pci_release_selected_regions(pdev, 1 << bar);
> -		return -ENOMEM;
> -	}
> -
> -	vdev->barmap[bar] = io;
> -
> -	return 0;
> -}
> -
>  ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
>  			size_t count, loff_t *ppos, bool iswrite)
>  {
> @@ -262,7 +238,7 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
>  		}
>  		x_end = end;
>  	} else {
> -		int ret = vfio_pci_setup_barmap(vdev, bar);
> +		int ret = vfio_pci_core_setup_barmap(vdev, bar);
>  		if (ret) {
>  			done = ret;
>  			goto out;
> @@ -438,7 +414,7 @@ int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset,
>  		return -EINVAL;
>  #endif
>  
> -	ret = vfio_pci_setup_barmap(vdev, bar);
> +	ret = vfio_pci_core_setup_barmap(vdev, bar);
>  	if (ret)
>  		return ret;
>  
> diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
> index 562e8754869d..67ac58e20e1d 100644
> --- a/include/linux/vfio_pci_core.h
> +++ b/include/linux/vfio_pci_core.h
> @@ -127,6 +127,7 @@ int vfio_pci_core_match(struct vfio_device *core_vdev, char *buf);
>  int vfio_pci_core_enable(struct vfio_pci_core_device *vdev);
>  void vfio_pci_core_disable(struct vfio_pci_core_device *vdev);
>  void vfio_pci_core_finish_enable(struct vfio_pci_core_device *vdev);
> +int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar);
>  pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev,
>  						pci_channel_state_t state);
>  

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 08/11] vfio/pci: Expose vfio_pci_core_setup_barmap()
@ 2023-09-21 16:35     ` Alex Williamson
  0 siblings, 0 replies; 321+ messages in thread
From: Alex Williamson @ 2023-09-21 16:35 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: mst, jasowang, jgg, kvm, virtualization, parav, feliu, jiri,
	kevin.tian, joao.m.martins, leonro, maorg

On Thu, 21 Sep 2023 15:40:37 +0300
Yishai Hadas <yishaih@nvidia.com> wrote:

> Expose vfio_pci_core_setup_barmap() to be used by drivers.
> 
> This will let drivers to mmap a BAR and re-use it from both vfio and the
> driver when it's applicable.
> 
> This API will be used in the next patches by the vfio/virtio coming
> driver.
> 
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> ---
>  drivers/vfio/pci/vfio_pci_core.c | 25 +++++++++++++++++++++++++
>  drivers/vfio/pci/vfio_pci_rdwr.c | 28 ++--------------------------
>  include/linux/vfio_pci_core.h    |  1 +
>  3 files changed, 28 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 1929103ee59a..b56111ed8a8c 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -684,6 +684,31 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
>  }
>  EXPORT_SYMBOL_GPL(vfio_pci_core_disable);
>  
> +int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar)
> +{
> +	struct pci_dev *pdev = vdev->pdev;
> +	void __iomem *io;
> +	int ret;
> +
> +	if (vdev->barmap[bar])
> +		return 0;
> +
> +	ret = pci_request_selected_regions(pdev, 1 << bar, "vfio");
> +	if (ret)
> +		return ret;
> +
> +	io = pci_iomap(pdev, bar, 0);
> +	if (!io) {
> +		pci_release_selected_regions(pdev, 1 << bar);
> +		return -ENOMEM;
> +	}
> +
> +	vdev->barmap[bar] = io;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(vfio_pci_core_setup_barmap);

Not to endorse the rest of this yet, but minimally _GPL, same for the
following patch.  Thanks,

Alex

> +
>  void vfio_pci_core_close_device(struct vfio_device *core_vdev)
>  {
>  	struct vfio_pci_core_device *vdev =
> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
> index e27de61ac9fe..6f08b3ecbb89 100644
> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
> @@ -200,30 +200,6 @@ static ssize_t do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem,
>  	return done;
>  }
>  
> -static int vfio_pci_setup_barmap(struct vfio_pci_core_device *vdev, int bar)
> -{
> -	struct pci_dev *pdev = vdev->pdev;
> -	int ret;
> -	void __iomem *io;
> -
> -	if (vdev->barmap[bar])
> -		return 0;
> -
> -	ret = pci_request_selected_regions(pdev, 1 << bar, "vfio");
> -	if (ret)
> -		return ret;
> -
> -	io = pci_iomap(pdev, bar, 0);
> -	if (!io) {
> -		pci_release_selected_regions(pdev, 1 << bar);
> -		return -ENOMEM;
> -	}
> -
> -	vdev->barmap[bar] = io;
> -
> -	return 0;
> -}
> -
>  ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
>  			size_t count, loff_t *ppos, bool iswrite)
>  {
> @@ -262,7 +238,7 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
>  		}
>  		x_end = end;
>  	} else {
> -		int ret = vfio_pci_setup_barmap(vdev, bar);
> +		int ret = vfio_pci_core_setup_barmap(vdev, bar);
>  		if (ret) {
>  			done = ret;
>  			goto out;
> @@ -438,7 +414,7 @@ int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset,
>  		return -EINVAL;
>  #endif
>  
> -	ret = vfio_pci_setup_barmap(vdev, bar);
> +	ret = vfio_pci_core_setup_barmap(vdev, bar);
>  	if (ret)
>  		return ret;
>  
> diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
> index 562e8754869d..67ac58e20e1d 100644
> --- a/include/linux/vfio_pci_core.h
> +++ b/include/linux/vfio_pci_core.h
> @@ -127,6 +127,7 @@ int vfio_pci_core_match(struct vfio_device *core_vdev, char *buf);
>  int vfio_pci_core_enable(struct vfio_pci_core_device *vdev);
>  void vfio_pci_core_disable(struct vfio_pci_core_device *vdev);
>  void vfio_pci_core_finish_enable(struct vfio_pci_core_device *vdev);
> +int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar);
>  pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev,
>  						pci_channel_state_t state);
>  


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 14:16         ` Michael S. Tsirkin
  (?)
@ 2023-09-21 16:41         ` Jason Gunthorpe
  2023-09-21 16:53             ` Michael S. Tsirkin
  -1 siblings, 1 reply; 321+ messages in thread
From: Jason Gunthorpe @ 2023-09-21 16:41 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Yishai Hadas, alex.williamson, jasowang, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, Sep 21, 2023 at 10:16:04AM -0400, Michael S. Tsirkin wrote:
> On Thu, Sep 21, 2023 at 11:11:25AM -0300, Jason Gunthorpe wrote:
> > On Thu, Sep 21, 2023 at 09:16:21AM -0400, Michael S. Tsirkin wrote:
> > 
> > > > diff --git a/MAINTAINERS b/MAINTAINERS
> > > > index bf0f54c24f81..5098418c8389 100644
> > > > --- a/MAINTAINERS
> > > > +++ b/MAINTAINERS
> > > > @@ -22624,6 +22624,12 @@ L:	kvm@vger.kernel.org
> > > >  S:	Maintained
> > > >  F:	drivers/vfio/pci/mlx5/
> > > >  
> > > > +VFIO VIRTIO PCI DRIVER
> > > > +M:	Yishai Hadas <yishaih@nvidia.com>
> > > > +L:	kvm@vger.kernel.org
> > > > +S:	Maintained
> > > > +F:	drivers/vfio/pci/virtio
> > > > +
> > > >  VFIO PCI DEVICE SPECIFIC DRIVERS
> > > >  R:	Jason Gunthorpe <jgg@nvidia.com>
> > > >  R:	Yishai Hadas <yishaih@nvidia.com>
> > > 
> > > Tying two subsystems together like this is going to cause pain when
> > > merging. God forbid there's something e.g. virtio net specific
> > > (and there's going to be for sure) - now we are talking 3
> > > subsystems.
> > 
> > Cross subsystem stuff is normal in the kernel.
> 
> Yea. But it's completely spurious here - virtio has its own way
> to work with userspace which is vdpa and let's just use that.
> Keeps things nice and contained.

vdpa is not vfio, I don't know how you can suggest vdpa is a
replacement for a vfio driver. They are completely different
things.

Each side has its own strengths, and vfio especially is accelerating
in its capability in way that vpda is not. eg if an iommufd conversion
had been done by now for vdpa I might be more sympathetic. Asking for
someone else to do a huge amount of pointless work to improve vdpa
just to level of this vfio driver already is at is ridiculous.

vdpa is great for certain kinds of HW, let it focus on that, don't try
to paint it as an alternative to vfio. It isn't.

Jason

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 12:40   ` Yishai Hadas
@ 2023-09-21 16:43     ` Alex Williamson
  -1 siblings, 0 replies; 321+ messages in thread
From: Alex Williamson @ 2023-09-21 16:43 UTC (permalink / raw)
  To: Yishai Hadas; +Cc: kvm, mst, maorg, virtualization, jgg, jiri, leonro

On Thu, 21 Sep 2023 15:40:40 +0300
Yishai Hadas <yishaih@nvidia.com> wrote:

> Introduce a vfio driver over virtio devices to support the legacy
> interface functionality for VFs.
> 
> Background, from the virtio spec [1].
> --------------------------------------------------------------------
> In some systems, there is a need to support a virtio legacy driver with
> a device that does not directly support the legacy interface. In such
> scenarios, a group owner device can provide the legacy interface
> functionality for the group member devices. The driver of the owner
> device can then access the legacy interface of a member device on behalf
> of the legacy member device driver.
> 
> For example, with the SR-IOV group type, group members (VFs) can not
> present the legacy interface in an I/O BAR in BAR0 as expected by the
> legacy pci driver. If the legacy driver is running inside a virtual
> machine, the hypervisor executing the virtual machine can present a
> virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
> legacy driver accesses to this I/O BAR and forwards them to the group
> owner device (PF) using group administration commands.
> --------------------------------------------------------------------
> 
> Specifically, this driver adds support for a virtio-net VF to be exposed
> as a transitional device to a guest driver and allows the legacy IO BAR
> functionality on top.
> 
> This allows a VM which uses a legacy virtio-net driver in the guest to
> work transparently over a VF which its driver in the host is that new
> driver.
> 
> The driver can be extended easily to support some other types of virtio
> devices (e.g virtio-blk), by adding in a few places the specific type
> properties as was done for virtio-net.
> 
> For now, only the virtio-net use case was tested and as such we introduce
> the support only for such a device.
> 
> Practically,
> Upon probing a VF for a virtio-net device, in case its PF supports
> legacy access over the virtio admin commands and the VF doesn't have BAR
> 0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
> transitional device with I/O BAR in BAR 0.
> 
> The existence of the simulated I/O bar is reported later on by
> overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
> exposes itself as a transitional device by overwriting some properties
> upon reading its config space.
> 
> Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
> guest may use it via read/write calls according to the virtio
> specification.
> 
> Any read/write towards the control parts of the BAR will be captured by
> the new driver and will be translated into admin commands towards the
> device.
> 
> Any data path read/write access (i.e. virtio driver notifications) will
> be forwarded to the physical BAR which its properties were supplied by
> the command VIRTIO_PCI_QUEUE_NOTIFY upon the probing/init flow.
> 
> With that code in place a legacy driver in the guest has the look and
> feel as if having a transitional device with legacy support for both its
> control and data path flows.

Why do we need to enable a "legacy" driver in the guest?  The very name
suggests there's an alternative driver that perhaps doesn't require
this I/O BAR.  Why don't we just require the non-legacy driver in the
guest rather than increase our maintenance burden?  Thanks,

Alex

> 
> [1]
> https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
> 
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> ---
>  MAINTAINERS                      |   6 +
>  drivers/vfio/pci/Kconfig         |   2 +
>  drivers/vfio/pci/Makefile        |   2 +
>  drivers/vfio/pci/virtio/Kconfig  |  15 +
>  drivers/vfio/pci/virtio/Makefile |   4 +
>  drivers/vfio/pci/virtio/cmd.c    |   4 +-
>  drivers/vfio/pci/virtio/cmd.h    |   8 +
>  drivers/vfio/pci/virtio/main.c   | 546 +++++++++++++++++++++++++++++++
>  8 files changed, 585 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/vfio/pci/virtio/Kconfig
>  create mode 100644 drivers/vfio/pci/virtio/Makefile
>  create mode 100644 drivers/vfio/pci/virtio/main.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index bf0f54c24f81..5098418c8389 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -22624,6 +22624,12 @@ L:	kvm@vger.kernel.org
>  S:	Maintained
>  F:	drivers/vfio/pci/mlx5/
>  
> +VFIO VIRTIO PCI DRIVER
> +M:	Yishai Hadas <yishaih@nvidia.com>
> +L:	kvm@vger.kernel.org
> +S:	Maintained
> +F:	drivers/vfio/pci/virtio
> +
>  VFIO PCI DEVICE SPECIFIC DRIVERS
>  R:	Jason Gunthorpe <jgg@nvidia.com>
>  R:	Yishai Hadas <yishaih@nvidia.com>
> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> index 8125e5f37832..18c397df566d 100644
> --- a/drivers/vfio/pci/Kconfig
> +++ b/drivers/vfio/pci/Kconfig
> @@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
>  
>  source "drivers/vfio/pci/pds/Kconfig"
>  
> +source "drivers/vfio/pci/virtio/Kconfig"
> +
>  endmenu
> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
> index 45167be462d8..046139a4eca5 100644
> --- a/drivers/vfio/pci/Makefile
> +++ b/drivers/vfio/pci/Makefile
> @@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
>  obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
>  
>  obj-$(CONFIG_PDS_VFIO_PCI) += pds/
> +
> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
> diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
> new file mode 100644
> index 000000000000..89eddce8b1bd
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/Kconfig
> @@ -0,0 +1,15 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +config VIRTIO_VFIO_PCI
> +        tristate "VFIO support for VIRTIO PCI devices"
> +        depends on VIRTIO_PCI
> +        select VFIO_PCI_CORE
> +        help
> +          This provides support for exposing VIRTIO VF devices using the VFIO
> +          framework that can work with a legacy virtio driver in the guest.
> +          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
> +          not indicate I/O Space.
> +          As of that this driver emulated I/O BAR in software to let a VF be
> +          seen as a transitional device in the guest and let it work with
> +          a legacy driver.
> +
> +          If you don't know what to do here, say N.
> diff --git a/drivers/vfio/pci/virtio/Makefile b/drivers/vfio/pci/virtio/Makefile
> new file mode 100644
> index 000000000000..584372648a03
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/Makefile
> @@ -0,0 +1,4 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio-vfio-pci.o
> +virtio-vfio-pci-y := main.o cmd.o
> +
> diff --git a/drivers/vfio/pci/virtio/cmd.c b/drivers/vfio/pci/virtio/cmd.c
> index f068239cdbb0..aea9d25fbf1d 100644
> --- a/drivers/vfio/pci/virtio/cmd.c
> +++ b/drivers/vfio/pci/virtio/cmd.c
> @@ -44,7 +44,7 @@ int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
>  {
>  	struct virtio_device *virtio_dev =
>  		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> -	struct virtio_admin_cmd_data_lr_write *in;
> +	struct virtio_admin_cmd_legacy_wr_data *in;
>  	struct scatterlist in_sg;
>  	struct virtio_admin_cmd cmd = {};
>  	int ret;
> @@ -74,7 +74,7 @@ int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
>  {
>  	struct virtio_device *virtio_dev =
>  		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> -	struct virtio_admin_cmd_data_lr_read *in;
> +	struct virtio_admin_cmd_legacy_rd_data *in;
>  	struct scatterlist in_sg, out_sg;
>  	struct virtio_admin_cmd cmd = {};
>  	int ret;
> diff --git a/drivers/vfio/pci/virtio/cmd.h b/drivers/vfio/pci/virtio/cmd.h
> index c2a3645f4b90..347b1dc85570 100644
> --- a/drivers/vfio/pci/virtio/cmd.h
> +++ b/drivers/vfio/pci/virtio/cmd.h
> @@ -13,7 +13,15 @@
>  
>  struct virtiovf_pci_core_device {
>  	struct vfio_pci_core_device core_device;
> +	u8 bar0_virtual_buf_size;
> +	u8 *bar0_virtual_buf;
> +	/* synchronize access to the virtual buf */
> +	struct mutex bar_mutex;
>  	int vf_id;
> +	void __iomem *notify_addr;
> +	u32 notify_offset;
> +	u8 notify_bar;
> +	u8 pci_cmd_io :1;
>  };
>  
>  int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
> diff --git a/drivers/vfio/pci/virtio/main.c b/drivers/vfio/pci/virtio/main.c
> new file mode 100644
> index 000000000000..2486991c49f3
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/main.c
> @@ -0,0 +1,546 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> + */
> +
> +#include <linux/device.h>
> +#include <linux/module.h>
> +#include <linux/mutex.h>
> +#include <linux/pci.h>
> +#include <linux/pm_runtime.h>
> +#include <linux/types.h>
> +#include <linux/uaccess.h>
> +#include <linux/vfio.h>
> +#include <linux/vfio_pci_core.h>
> +#include <linux/virtio_pci.h>
> +#include <linux/virtio_net.h>
> +#include <linux/virtio_pci_modern.h>
> +
> +#include "cmd.h"
> +
> +#define VIRTIO_LEGACY_IO_BAR_HEADER_LEN 20
> +#define VIRTIO_LEGACY_IO_BAR_MSIX_HEADER_LEN 4
> +
> +static int virtiovf_issue_lr_cmd(struct virtiovf_pci_core_device *virtvdev,
> +				 loff_t pos, char __user *buf,
> +				 size_t count, bool read)
> +{
> +	u8 *bar0_buf = virtvdev->bar0_virtual_buf;
> +	u16 opcode;
> +	int ret;
> +
> +	mutex_lock(&virtvdev->bar_mutex);
> +	if (read) {
> +		opcode = (pos < VIRTIO_PCI_CONFIG_OFF(true)) ?
> +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ :
> +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ;
> +		ret = virtiovf_cmd_lr_read(virtvdev, opcode, pos,
> +					   count, bar0_buf + pos);
> +		if (ret)
> +			goto out;
> +		if (copy_to_user(buf, bar0_buf + pos, count))
> +			ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	if (copy_from_user(bar0_buf + pos, buf, count)) {
> +		ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	opcode = (pos < VIRTIO_PCI_CONFIG_OFF(true)) ?
> +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE :
> +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE;
> +	ret = virtiovf_cmd_lr_write(virtvdev, opcode, pos, count,
> +				    bar0_buf + pos);
> +out:
> +	mutex_unlock(&virtvdev->bar_mutex);
> +	return ret;
> +}
> +
> +static int
> +translate_io_bar_to_mem_bar(struct virtiovf_pci_core_device *virtvdev,
> +			    loff_t pos, char __user *buf,
> +			    size_t count, bool read)
> +{
> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
> +	u16 queue_notify;
> +	int ret;
> +
> +	if (pos + count > virtvdev->bar0_virtual_buf_size)
> +		return -EINVAL;
> +
> +	switch (pos) {
> +	case VIRTIO_PCI_QUEUE_NOTIFY:
> +		if (count != sizeof(queue_notify))
> +			return -EINVAL;
> +		if (read) {
> +			ret = vfio_pci_ioread16(core_device, true, &queue_notify,
> +						virtvdev->notify_addr);
> +			if (ret)
> +				return ret;
> +			if (copy_to_user(buf, &queue_notify,
> +					 sizeof(queue_notify)))
> +				return -EFAULT;
> +			break;
> +		}
> +
> +		if (copy_from_user(&queue_notify, buf, count))
> +			return -EFAULT;
> +
> +		ret = vfio_pci_iowrite16(core_device, true, queue_notify,
> +					 virtvdev->notify_addr);
> +		break;
> +	default:
> +		ret = virtiovf_issue_lr_cmd(virtvdev, pos, buf, count, read);
> +	}
> +
> +	return ret ? ret : count;
> +}
> +
> +static bool range_contains_range(loff_t range1_start, size_t count1,
> +				 loff_t range2_start, size_t count2,
> +				 loff_t *start_offset)
> +{
> +	if (range1_start <= range2_start &&
> +	    range1_start + count1 >= range2_start + count2) {
> +		*start_offset = range2_start - range1_start;
> +		return true;
> +	}
> +	return false;
> +}
> +
> +static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
> +					char __user *buf, size_t count,
> +					loff_t *ppos)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +	loff_t copy_offset;
> +	__le32 val32;
> +	__le16 val16;
> +	u8 val8;
> +	int ret;
> +
> +	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
> +	if (ret < 0)
> +		return ret;
> +
> +	if (range_contains_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
> +				 &copy_offset)) {
> +		val16 = cpu_to_le16(0x1000);
> +		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
> +			return -EFAULT;
> +	}
> +
> +	if (virtvdev->pci_cmd_io &&
> +	    range_contains_range(pos, count, PCI_COMMAND, sizeof(val16),
> +				 &copy_offset)) {
> +		if (copy_from_user(&val16, buf, sizeof(val16)))
> +			return -EFAULT;
> +		val16 |= cpu_to_le16(PCI_COMMAND_IO);
> +		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
> +			return -EFAULT;
> +	}
> +
> +	if (range_contains_range(pos, count, PCI_REVISION_ID, sizeof(val8),
> +				 &copy_offset)) {
> +		/* Transional needs to have revision 0 */
> +		val8 = 0;
> +		if (copy_to_user(buf + copy_offset, &val8, sizeof(val8)))
> +			return -EFAULT;
> +	}
> +
> +	if (range_contains_range(pos, count, PCI_BASE_ADDRESS_0, sizeof(val32),
> +				 &copy_offset)) {
> +		val32 = cpu_to_le32(PCI_BASE_ADDRESS_SPACE_IO);
> +		if (copy_to_user(buf + copy_offset, &val32, sizeof(val32)))
> +			return -EFAULT;
> +	}
> +
> +	if (range_contains_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16),
> +				 &copy_offset)) {
> +		/* Transitional devices use the PCI subsystem device id as
> +		 * virtio device id, same as legacy driver always did.
> +		 */
> +		val16 = cpu_to_le16(VIRTIO_ID_NET);
> +		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
> +			return -EFAULT;
> +	}
> +
> +	return count;
> +}
> +
> +static ssize_t
> +virtiovf_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
> +		       size_t count, loff_t *ppos)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +	int ret;
> +
> +	if (!count)
> +		return 0;
> +
> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX)
> +		return virtiovf_pci_read_config(core_vdev, buf, count, ppos);
> +
> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
> +		return vfio_pci_core_read(core_vdev, buf, count, ppos);
> +
> +	ret = pm_runtime_resume_and_get(&pdev->dev);
> +	if (ret) {
> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n",
> +				     ret);
> +		return -EIO;
> +	}
> +
> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, buf, count, true);
> +	pm_runtime_put(&pdev->dev);
> +	return ret;
> +}
> +
> +static ssize_t
> +virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
> +			size_t count, loff_t *ppos)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +	int ret;
> +
> +	if (!count)
> +		return 0;
> +
> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
> +		loff_t copy_offset;
> +		u16 cmd;
> +
> +		if (range_contains_range(pos, count, PCI_COMMAND, sizeof(cmd),
> +					 &copy_offset)) {
> +			if (copy_from_user(&cmd, buf + copy_offset, sizeof(cmd)))
> +				return -EFAULT;
> +			virtvdev->pci_cmd_io = (cmd & PCI_COMMAND_IO);
> +		}
> +	}
> +
> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
> +		return vfio_pci_core_write(core_vdev, buf, count, ppos);
> +
> +	ret = pm_runtime_resume_and_get(&pdev->dev);
> +	if (ret) {
> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n", ret);
> +		return -EIO;
> +	}
> +
> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, (char __user *)buf, count, false);
> +	pm_runtime_put(&pdev->dev);
> +	return ret;
> +}
> +
> +static int
> +virtiovf_pci_ioctl_get_region_info(struct vfio_device *core_vdev,
> +				   unsigned int cmd, unsigned long arg)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	unsigned long minsz = offsetofend(struct vfio_region_info, offset);
> +	void __user *uarg = (void __user *)arg;
> +	struct vfio_region_info info = {};
> +
> +	if (copy_from_user(&info, uarg, minsz))
> +		return -EFAULT;
> +
> +	if (info.argsz < minsz)
> +		return -EINVAL;
> +
> +	switch (info.index) {
> +	case VFIO_PCI_BAR0_REGION_INDEX:
> +		info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
> +		info.size = virtvdev->bar0_virtual_buf_size;
> +		info.flags = VFIO_REGION_INFO_FLAG_READ |
> +			     VFIO_REGION_INFO_FLAG_WRITE;
> +		return copy_to_user(uarg, &info, minsz) ? -EFAULT : 0;
> +	default:
> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
> +	}
> +}
> +
> +static long
> +virtiovf_vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
> +			     unsigned long arg)
> +{
> +	switch (cmd) {
> +	case VFIO_DEVICE_GET_REGION_INFO:
> +		return virtiovf_pci_ioctl_get_region_info(core_vdev, cmd, arg);
> +	default:
> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
> +	}
> +}
> +
> +static int
> +virtiovf_set_notify_addr(struct virtiovf_pci_core_device *virtvdev)
> +{
> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
> +	int ret;
> +
> +	/* Setup the BAR where the 'notify' exists to be used by vfio as well
> +	 * This will let us mmap it only once and use it when needed.
> +	 */
> +	ret = vfio_pci_core_setup_barmap(core_device,
> +					 virtvdev->notify_bar);
> +	if (ret)
> +		return ret;
> +
> +	virtvdev->notify_addr = core_device->barmap[virtvdev->notify_bar] +
> +			virtvdev->notify_offset;
> +	return 0;
> +}
> +
> +static int virtiovf_pci_open_device(struct vfio_device *core_vdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct vfio_pci_core_device *vdev = &virtvdev->core_device;
> +	int ret;
> +
> +	ret = vfio_pci_core_enable(vdev);
> +	if (ret)
> +		return ret;
> +
> +	if (virtvdev->bar0_virtual_buf) {
> +		/* upon close_device() the vfio_pci_core_disable() is called
> +		 * and will close all the previous mmaps, so it seems that the
> +		 * valid life cycle for the 'notify' addr is per open/close.
> +		 */
> +		ret = virtiovf_set_notify_addr(virtvdev);
> +		if (ret) {
> +			vfio_pci_core_disable(vdev);
> +			return ret;
> +		}
> +	}
> +
> +	vfio_pci_core_finish_enable(vdev);
> +	return 0;
> +}
> +
> +static void virtiovf_pci_close_device(struct vfio_device *core_vdev)
> +{
> +	vfio_pci_core_close_device(core_vdev);
> +}
> +
> +static int virtiovf_get_device_config_size(unsigned short device)
> +{
> +	switch (device) {
> +	case 0x1041:
> +		/* network card */
> +		return offsetofend(struct virtio_net_config, status);
> +	default:
> +		return 0;
> +	}
> +}
> +
> +static int virtiovf_read_notify_info(struct virtiovf_pci_core_device *virtvdev)
> +{
> +	u64 offset;
> +	int ret;
> +	u8 bar;
> +
> +	ret = virtiovf_cmd_lq_read_notify(virtvdev,
> +				VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM,
> +				&bar, &offset);
> +	if (ret)
> +		return ret;
> +
> +	virtvdev->notify_bar = bar;
> +	virtvdev->notify_offset = offset;
> +	return 0;
> +}
> +
> +static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct pci_dev *pdev;
> +	int ret;
> +
> +	ret = vfio_pci_core_init_dev(core_vdev);
> +	if (ret)
> +		return ret;
> +
> +	pdev = virtvdev->core_device.pdev;
> +	virtvdev->vf_id = pci_iov_vf_id(pdev);
> +	if (virtvdev->vf_id < 0)
> +		return -EINVAL;
> +
> +	ret = virtiovf_read_notify_info(virtvdev);
> +	if (ret)
> +		return ret;
> +
> +	virtvdev->bar0_virtual_buf_size = VIRTIO_LEGACY_IO_BAR_HEADER_LEN +
> +		VIRTIO_LEGACY_IO_BAR_MSIX_HEADER_LEN +
> +		virtiovf_get_device_config_size(pdev->device);
> +	virtvdev->bar0_virtual_buf = kzalloc(virtvdev->bar0_virtual_buf_size,
> +					     GFP_KERNEL);
> +	if (!virtvdev->bar0_virtual_buf)
> +		return -ENOMEM;
> +	mutex_init(&virtvdev->bar_mutex);
> +	return 0;
> +}
> +
> +static void virtiovf_pci_core_release_dev(struct vfio_device *core_vdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +
> +	kfree(virtvdev->bar0_virtual_buf);
> +	vfio_pci_core_release_dev(core_vdev);
> +}
> +
> +static const struct vfio_device_ops virtiovf_acc_vfio_pci_tran_ops = {
> +	.name = "virtio-transitional-vfio-pci",
> +	.init = virtiovf_pci_init_device,
> +	.release = virtiovf_pci_core_release_dev,
> +	.open_device = virtiovf_pci_open_device,
> +	.close_device = virtiovf_pci_close_device,
> +	.ioctl = virtiovf_vfio_pci_core_ioctl,
> +	.read = virtiovf_pci_core_read,
> +	.write = virtiovf_pci_core_write,
> +	.mmap = vfio_pci_core_mmap,
> +	.request = vfio_pci_core_request,
> +	.match = vfio_pci_core_match,
> +	.bind_iommufd = vfio_iommufd_physical_bind,
> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
> +};
> +
> +static const struct vfio_device_ops virtiovf_acc_vfio_pci_ops = {
> +	.name = "virtio-acc-vfio-pci",
> +	.init = vfio_pci_core_init_dev,
> +	.release = vfio_pci_core_release_dev,
> +	.open_device = virtiovf_pci_open_device,
> +	.close_device = virtiovf_pci_close_device,
> +	.ioctl = vfio_pci_core_ioctl,
> +	.device_feature = vfio_pci_core_ioctl_feature,
> +	.read = vfio_pci_core_read,
> +	.write = vfio_pci_core_write,
> +	.mmap = vfio_pci_core_mmap,
> +	.request = vfio_pci_core_request,
> +	.match = vfio_pci_core_match,
> +	.bind_iommufd = vfio_iommufd_physical_bind,
> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
> +};
> +
> +static bool virtiovf_bar0_exists(struct pci_dev *pdev)
> +{
> +	struct resource *res = pdev->resource;
> +
> +	return res->flags ? true : false;
> +}
> +
> +#define VIRTIOVF_USE_ADMIN_CMD_BITMAP \
> +	(BIT_ULL(VIRTIO_ADMIN_CMD_LIST_QUERY) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LIST_USE) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
> +
> +static bool virtiovf_support_legacy_access(struct pci_dev *pdev)
> +{
> +	int buf_size = DIV_ROUND_UP(VIRTIO_ADMIN_MAX_CMD_OPCODE, 64) * 8;
> +	u8 *buf;
> +	int ret;
> +
> +	/* Only virtio-net is supported/tested so far */
> +	if (pdev->device != 0x1041)
> +		return false;
> +
> +	buf = kzalloc(buf_size, GFP_KERNEL);
> +	if (!buf)
> +		return false;
> +
> +	ret = virtiovf_cmd_list_query(pdev, buf, buf_size);
> +	if (ret)
> +		goto end;
> +
> +	if ((le64_to_cpup((__le64 *)buf) & VIRTIOVF_USE_ADMIN_CMD_BITMAP) !=
> +		VIRTIOVF_USE_ADMIN_CMD_BITMAP) {
> +		ret = -EOPNOTSUPP;
> +		goto end;
> +	}
> +
> +	/* confirm the used commands */
> +	memset(buf, 0, buf_size);
> +	*(__le64 *)buf = cpu_to_le64(VIRTIOVF_USE_ADMIN_CMD_BITMAP);
> +	ret = virtiovf_cmd_list_use(pdev, buf, buf_size);
> +
> +end:
> +	kfree(buf);
> +	return ret ? false : true;
> +}
> +
> +static int virtiovf_pci_probe(struct pci_dev *pdev,
> +			      const struct pci_device_id *id)
> +{
> +	const struct vfio_device_ops *ops = &virtiovf_acc_vfio_pci_ops;
> +	struct virtiovf_pci_core_device *virtvdev;
> +	int ret;
> +
> +	if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
> +	    !virtiovf_bar0_exists(pdev) && pdev->msix_cap)
> +		ops = &virtiovf_acc_vfio_pci_tran_ops;
> +
> +	virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
> +				     &pdev->dev, ops);
> +	if (IS_ERR(virtvdev))
> +		return PTR_ERR(virtvdev);
> +
> +	dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
> +	ret = vfio_pci_core_register_device(&virtvdev->core_device);
> +	if (ret)
> +		goto out;
> +	return 0;
> +out:
> +	vfio_put_device(&virtvdev->core_device.vdev);
> +	return ret;
> +}
> +
> +static void virtiovf_pci_remove(struct pci_dev *pdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
> +
> +	vfio_pci_core_unregister_device(&virtvdev->core_device);
> +	vfio_put_device(&virtvdev->core_device.vdev);
> +}
> +
> +static const struct pci_device_id virtiovf_pci_table[] = {
> +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, PCI_ANY_ID) },
> +	{}
> +};
> +
> +MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
> +
> +static struct pci_driver virtiovf_pci_driver = {
> +	.name = KBUILD_MODNAME,
> +	.id_table = virtiovf_pci_table,
> +	.probe = virtiovf_pci_probe,
> +	.remove = virtiovf_pci_remove,
> +	.err_handler = &vfio_pci_core_err_handlers,
> +	.driver_managed_dma = true,
> +};
> +
> +module_pci_driver(virtiovf_pci_driver);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
> +MODULE_DESCRIPTION(
> +	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO device family");

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-21 16:43     ` Alex Williamson
  0 siblings, 0 replies; 321+ messages in thread
From: Alex Williamson @ 2023-09-21 16:43 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: mst, jasowang, jgg, kvm, virtualization, parav, feliu, jiri,
	kevin.tian, joao.m.martins, leonro, maorg

On Thu, 21 Sep 2023 15:40:40 +0300
Yishai Hadas <yishaih@nvidia.com> wrote:

> Introduce a vfio driver over virtio devices to support the legacy
> interface functionality for VFs.
> 
> Background, from the virtio spec [1].
> --------------------------------------------------------------------
> In some systems, there is a need to support a virtio legacy driver with
> a device that does not directly support the legacy interface. In such
> scenarios, a group owner device can provide the legacy interface
> functionality for the group member devices. The driver of the owner
> device can then access the legacy interface of a member device on behalf
> of the legacy member device driver.
> 
> For example, with the SR-IOV group type, group members (VFs) can not
> present the legacy interface in an I/O BAR in BAR0 as expected by the
> legacy pci driver. If the legacy driver is running inside a virtual
> machine, the hypervisor executing the virtual machine can present a
> virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
> legacy driver accesses to this I/O BAR and forwards them to the group
> owner device (PF) using group administration commands.
> --------------------------------------------------------------------
> 
> Specifically, this driver adds support for a virtio-net VF to be exposed
> as a transitional device to a guest driver and allows the legacy IO BAR
> functionality on top.
> 
> This allows a VM which uses a legacy virtio-net driver in the guest to
> work transparently over a VF which its driver in the host is that new
> driver.
> 
> The driver can be extended easily to support some other types of virtio
> devices (e.g virtio-blk), by adding in a few places the specific type
> properties as was done for virtio-net.
> 
> For now, only the virtio-net use case was tested and as such we introduce
> the support only for such a device.
> 
> Practically,
> Upon probing a VF for a virtio-net device, in case its PF supports
> legacy access over the virtio admin commands and the VF doesn't have BAR
> 0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
> transitional device with I/O BAR in BAR 0.
> 
> The existence of the simulated I/O bar is reported later on by
> overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
> exposes itself as a transitional device by overwriting some properties
> upon reading its config space.
> 
> Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
> guest may use it via read/write calls according to the virtio
> specification.
> 
> Any read/write towards the control parts of the BAR will be captured by
> the new driver and will be translated into admin commands towards the
> device.
> 
> Any data path read/write access (i.e. virtio driver notifications) will
> be forwarded to the physical BAR which its properties were supplied by
> the command VIRTIO_PCI_QUEUE_NOTIFY upon the probing/init flow.
> 
> With that code in place a legacy driver in the guest has the look and
> feel as if having a transitional device with legacy support for both its
> control and data path flows.

Why do we need to enable a "legacy" driver in the guest?  The very name
suggests there's an alternative driver that perhaps doesn't require
this I/O BAR.  Why don't we just require the non-legacy driver in the
guest rather than increase our maintenance burden?  Thanks,

Alex

> 
> [1]
> https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
> 
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> ---
>  MAINTAINERS                      |   6 +
>  drivers/vfio/pci/Kconfig         |   2 +
>  drivers/vfio/pci/Makefile        |   2 +
>  drivers/vfio/pci/virtio/Kconfig  |  15 +
>  drivers/vfio/pci/virtio/Makefile |   4 +
>  drivers/vfio/pci/virtio/cmd.c    |   4 +-
>  drivers/vfio/pci/virtio/cmd.h    |   8 +
>  drivers/vfio/pci/virtio/main.c   | 546 +++++++++++++++++++++++++++++++
>  8 files changed, 585 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/vfio/pci/virtio/Kconfig
>  create mode 100644 drivers/vfio/pci/virtio/Makefile
>  create mode 100644 drivers/vfio/pci/virtio/main.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index bf0f54c24f81..5098418c8389 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -22624,6 +22624,12 @@ L:	kvm@vger.kernel.org
>  S:	Maintained
>  F:	drivers/vfio/pci/mlx5/
>  
> +VFIO VIRTIO PCI DRIVER
> +M:	Yishai Hadas <yishaih@nvidia.com>
> +L:	kvm@vger.kernel.org
> +S:	Maintained
> +F:	drivers/vfio/pci/virtio
> +
>  VFIO PCI DEVICE SPECIFIC DRIVERS
>  R:	Jason Gunthorpe <jgg@nvidia.com>
>  R:	Yishai Hadas <yishaih@nvidia.com>
> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> index 8125e5f37832..18c397df566d 100644
> --- a/drivers/vfio/pci/Kconfig
> +++ b/drivers/vfio/pci/Kconfig
> @@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
>  
>  source "drivers/vfio/pci/pds/Kconfig"
>  
> +source "drivers/vfio/pci/virtio/Kconfig"
> +
>  endmenu
> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
> index 45167be462d8..046139a4eca5 100644
> --- a/drivers/vfio/pci/Makefile
> +++ b/drivers/vfio/pci/Makefile
> @@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
>  obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
>  
>  obj-$(CONFIG_PDS_VFIO_PCI) += pds/
> +
> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
> diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
> new file mode 100644
> index 000000000000..89eddce8b1bd
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/Kconfig
> @@ -0,0 +1,15 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +config VIRTIO_VFIO_PCI
> +        tristate "VFIO support for VIRTIO PCI devices"
> +        depends on VIRTIO_PCI
> +        select VFIO_PCI_CORE
> +        help
> +          This provides support for exposing VIRTIO VF devices using the VFIO
> +          framework that can work with a legacy virtio driver in the guest.
> +          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
> +          not indicate I/O Space.
> +          As of that this driver emulated I/O BAR in software to let a VF be
> +          seen as a transitional device in the guest and let it work with
> +          a legacy driver.
> +
> +          If you don't know what to do here, say N.
> diff --git a/drivers/vfio/pci/virtio/Makefile b/drivers/vfio/pci/virtio/Makefile
> new file mode 100644
> index 000000000000..584372648a03
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/Makefile
> @@ -0,0 +1,4 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio-vfio-pci.o
> +virtio-vfio-pci-y := main.o cmd.o
> +
> diff --git a/drivers/vfio/pci/virtio/cmd.c b/drivers/vfio/pci/virtio/cmd.c
> index f068239cdbb0..aea9d25fbf1d 100644
> --- a/drivers/vfio/pci/virtio/cmd.c
> +++ b/drivers/vfio/pci/virtio/cmd.c
> @@ -44,7 +44,7 @@ int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
>  {
>  	struct virtio_device *virtio_dev =
>  		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> -	struct virtio_admin_cmd_data_lr_write *in;
> +	struct virtio_admin_cmd_legacy_wr_data *in;
>  	struct scatterlist in_sg;
>  	struct virtio_admin_cmd cmd = {};
>  	int ret;
> @@ -74,7 +74,7 @@ int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
>  {
>  	struct virtio_device *virtio_dev =
>  		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> -	struct virtio_admin_cmd_data_lr_read *in;
> +	struct virtio_admin_cmd_legacy_rd_data *in;
>  	struct scatterlist in_sg, out_sg;
>  	struct virtio_admin_cmd cmd = {};
>  	int ret;
> diff --git a/drivers/vfio/pci/virtio/cmd.h b/drivers/vfio/pci/virtio/cmd.h
> index c2a3645f4b90..347b1dc85570 100644
> --- a/drivers/vfio/pci/virtio/cmd.h
> +++ b/drivers/vfio/pci/virtio/cmd.h
> @@ -13,7 +13,15 @@
>  
>  struct virtiovf_pci_core_device {
>  	struct vfio_pci_core_device core_device;
> +	u8 bar0_virtual_buf_size;
> +	u8 *bar0_virtual_buf;
> +	/* synchronize access to the virtual buf */
> +	struct mutex bar_mutex;
>  	int vf_id;
> +	void __iomem *notify_addr;
> +	u32 notify_offset;
> +	u8 notify_bar;
> +	u8 pci_cmd_io :1;
>  };
>  
>  int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
> diff --git a/drivers/vfio/pci/virtio/main.c b/drivers/vfio/pci/virtio/main.c
> new file mode 100644
> index 000000000000..2486991c49f3
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/main.c
> @@ -0,0 +1,546 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> + */
> +
> +#include <linux/device.h>
> +#include <linux/module.h>
> +#include <linux/mutex.h>
> +#include <linux/pci.h>
> +#include <linux/pm_runtime.h>
> +#include <linux/types.h>
> +#include <linux/uaccess.h>
> +#include <linux/vfio.h>
> +#include <linux/vfio_pci_core.h>
> +#include <linux/virtio_pci.h>
> +#include <linux/virtio_net.h>
> +#include <linux/virtio_pci_modern.h>
> +
> +#include "cmd.h"
> +
> +#define VIRTIO_LEGACY_IO_BAR_HEADER_LEN 20
> +#define VIRTIO_LEGACY_IO_BAR_MSIX_HEADER_LEN 4
> +
> +static int virtiovf_issue_lr_cmd(struct virtiovf_pci_core_device *virtvdev,
> +				 loff_t pos, char __user *buf,
> +				 size_t count, bool read)
> +{
> +	u8 *bar0_buf = virtvdev->bar0_virtual_buf;
> +	u16 opcode;
> +	int ret;
> +
> +	mutex_lock(&virtvdev->bar_mutex);
> +	if (read) {
> +		opcode = (pos < VIRTIO_PCI_CONFIG_OFF(true)) ?
> +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ :
> +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ;
> +		ret = virtiovf_cmd_lr_read(virtvdev, opcode, pos,
> +					   count, bar0_buf + pos);
> +		if (ret)
> +			goto out;
> +		if (copy_to_user(buf, bar0_buf + pos, count))
> +			ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	if (copy_from_user(bar0_buf + pos, buf, count)) {
> +		ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	opcode = (pos < VIRTIO_PCI_CONFIG_OFF(true)) ?
> +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE :
> +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE;
> +	ret = virtiovf_cmd_lr_write(virtvdev, opcode, pos, count,
> +				    bar0_buf + pos);
> +out:
> +	mutex_unlock(&virtvdev->bar_mutex);
> +	return ret;
> +}
> +
> +static int
> +translate_io_bar_to_mem_bar(struct virtiovf_pci_core_device *virtvdev,
> +			    loff_t pos, char __user *buf,
> +			    size_t count, bool read)
> +{
> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
> +	u16 queue_notify;
> +	int ret;
> +
> +	if (pos + count > virtvdev->bar0_virtual_buf_size)
> +		return -EINVAL;
> +
> +	switch (pos) {
> +	case VIRTIO_PCI_QUEUE_NOTIFY:
> +		if (count != sizeof(queue_notify))
> +			return -EINVAL;
> +		if (read) {
> +			ret = vfio_pci_ioread16(core_device, true, &queue_notify,
> +						virtvdev->notify_addr);
> +			if (ret)
> +				return ret;
> +			if (copy_to_user(buf, &queue_notify,
> +					 sizeof(queue_notify)))
> +				return -EFAULT;
> +			break;
> +		}
> +
> +		if (copy_from_user(&queue_notify, buf, count))
> +			return -EFAULT;
> +
> +		ret = vfio_pci_iowrite16(core_device, true, queue_notify,
> +					 virtvdev->notify_addr);
> +		break;
> +	default:
> +		ret = virtiovf_issue_lr_cmd(virtvdev, pos, buf, count, read);
> +	}
> +
> +	return ret ? ret : count;
> +}
> +
> +static bool range_contains_range(loff_t range1_start, size_t count1,
> +				 loff_t range2_start, size_t count2,
> +				 loff_t *start_offset)
> +{
> +	if (range1_start <= range2_start &&
> +	    range1_start + count1 >= range2_start + count2) {
> +		*start_offset = range2_start - range1_start;
> +		return true;
> +	}
> +	return false;
> +}
> +
> +static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
> +					char __user *buf, size_t count,
> +					loff_t *ppos)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +	loff_t copy_offset;
> +	__le32 val32;
> +	__le16 val16;
> +	u8 val8;
> +	int ret;
> +
> +	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
> +	if (ret < 0)
> +		return ret;
> +
> +	if (range_contains_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
> +				 &copy_offset)) {
> +		val16 = cpu_to_le16(0x1000);
> +		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
> +			return -EFAULT;
> +	}
> +
> +	if (virtvdev->pci_cmd_io &&
> +	    range_contains_range(pos, count, PCI_COMMAND, sizeof(val16),
> +				 &copy_offset)) {
> +		if (copy_from_user(&val16, buf, sizeof(val16)))
> +			return -EFAULT;
> +		val16 |= cpu_to_le16(PCI_COMMAND_IO);
> +		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
> +			return -EFAULT;
> +	}
> +
> +	if (range_contains_range(pos, count, PCI_REVISION_ID, sizeof(val8),
> +				 &copy_offset)) {
> +		/* Transional needs to have revision 0 */
> +		val8 = 0;
> +		if (copy_to_user(buf + copy_offset, &val8, sizeof(val8)))
> +			return -EFAULT;
> +	}
> +
> +	if (range_contains_range(pos, count, PCI_BASE_ADDRESS_0, sizeof(val32),
> +				 &copy_offset)) {
> +		val32 = cpu_to_le32(PCI_BASE_ADDRESS_SPACE_IO);
> +		if (copy_to_user(buf + copy_offset, &val32, sizeof(val32)))
> +			return -EFAULT;
> +	}
> +
> +	if (range_contains_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16),
> +				 &copy_offset)) {
> +		/* Transitional devices use the PCI subsystem device id as
> +		 * virtio device id, same as legacy driver always did.
> +		 */
> +		val16 = cpu_to_le16(VIRTIO_ID_NET);
> +		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
> +			return -EFAULT;
> +	}
> +
> +	return count;
> +}
> +
> +static ssize_t
> +virtiovf_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
> +		       size_t count, loff_t *ppos)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +	int ret;
> +
> +	if (!count)
> +		return 0;
> +
> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX)
> +		return virtiovf_pci_read_config(core_vdev, buf, count, ppos);
> +
> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
> +		return vfio_pci_core_read(core_vdev, buf, count, ppos);
> +
> +	ret = pm_runtime_resume_and_get(&pdev->dev);
> +	if (ret) {
> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n",
> +				     ret);
> +		return -EIO;
> +	}
> +
> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, buf, count, true);
> +	pm_runtime_put(&pdev->dev);
> +	return ret;
> +}
> +
> +static ssize_t
> +virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
> +			size_t count, loff_t *ppos)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +	int ret;
> +
> +	if (!count)
> +		return 0;
> +
> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
> +		loff_t copy_offset;
> +		u16 cmd;
> +
> +		if (range_contains_range(pos, count, PCI_COMMAND, sizeof(cmd),
> +					 &copy_offset)) {
> +			if (copy_from_user(&cmd, buf + copy_offset, sizeof(cmd)))
> +				return -EFAULT;
> +			virtvdev->pci_cmd_io = (cmd & PCI_COMMAND_IO);
> +		}
> +	}
> +
> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
> +		return vfio_pci_core_write(core_vdev, buf, count, ppos);
> +
> +	ret = pm_runtime_resume_and_get(&pdev->dev);
> +	if (ret) {
> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n", ret);
> +		return -EIO;
> +	}
> +
> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, (char __user *)buf, count, false);
> +	pm_runtime_put(&pdev->dev);
> +	return ret;
> +}
> +
> +static int
> +virtiovf_pci_ioctl_get_region_info(struct vfio_device *core_vdev,
> +				   unsigned int cmd, unsigned long arg)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	unsigned long minsz = offsetofend(struct vfio_region_info, offset);
> +	void __user *uarg = (void __user *)arg;
> +	struct vfio_region_info info = {};
> +
> +	if (copy_from_user(&info, uarg, minsz))
> +		return -EFAULT;
> +
> +	if (info.argsz < minsz)
> +		return -EINVAL;
> +
> +	switch (info.index) {
> +	case VFIO_PCI_BAR0_REGION_INDEX:
> +		info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
> +		info.size = virtvdev->bar0_virtual_buf_size;
> +		info.flags = VFIO_REGION_INFO_FLAG_READ |
> +			     VFIO_REGION_INFO_FLAG_WRITE;
> +		return copy_to_user(uarg, &info, minsz) ? -EFAULT : 0;
> +	default:
> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
> +	}
> +}
> +
> +static long
> +virtiovf_vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
> +			     unsigned long arg)
> +{
> +	switch (cmd) {
> +	case VFIO_DEVICE_GET_REGION_INFO:
> +		return virtiovf_pci_ioctl_get_region_info(core_vdev, cmd, arg);
> +	default:
> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
> +	}
> +}
> +
> +static int
> +virtiovf_set_notify_addr(struct virtiovf_pci_core_device *virtvdev)
> +{
> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
> +	int ret;
> +
> +	/* Setup the BAR where the 'notify' exists to be used by vfio as well
> +	 * This will let us mmap it only once and use it when needed.
> +	 */
> +	ret = vfio_pci_core_setup_barmap(core_device,
> +					 virtvdev->notify_bar);
> +	if (ret)
> +		return ret;
> +
> +	virtvdev->notify_addr = core_device->barmap[virtvdev->notify_bar] +
> +			virtvdev->notify_offset;
> +	return 0;
> +}
> +
> +static int virtiovf_pci_open_device(struct vfio_device *core_vdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct vfio_pci_core_device *vdev = &virtvdev->core_device;
> +	int ret;
> +
> +	ret = vfio_pci_core_enable(vdev);
> +	if (ret)
> +		return ret;
> +
> +	if (virtvdev->bar0_virtual_buf) {
> +		/* upon close_device() the vfio_pci_core_disable() is called
> +		 * and will close all the previous mmaps, so it seems that the
> +		 * valid life cycle for the 'notify' addr is per open/close.
> +		 */
> +		ret = virtiovf_set_notify_addr(virtvdev);
> +		if (ret) {
> +			vfio_pci_core_disable(vdev);
> +			return ret;
> +		}
> +	}
> +
> +	vfio_pci_core_finish_enable(vdev);
> +	return 0;
> +}
> +
> +static void virtiovf_pci_close_device(struct vfio_device *core_vdev)
> +{
> +	vfio_pci_core_close_device(core_vdev);
> +}
> +
> +static int virtiovf_get_device_config_size(unsigned short device)
> +{
> +	switch (device) {
> +	case 0x1041:
> +		/* network card */
> +		return offsetofend(struct virtio_net_config, status);
> +	default:
> +		return 0;
> +	}
> +}
> +
> +static int virtiovf_read_notify_info(struct virtiovf_pci_core_device *virtvdev)
> +{
> +	u64 offset;
> +	int ret;
> +	u8 bar;
> +
> +	ret = virtiovf_cmd_lq_read_notify(virtvdev,
> +				VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM,
> +				&bar, &offset);
> +	if (ret)
> +		return ret;
> +
> +	virtvdev->notify_bar = bar;
> +	virtvdev->notify_offset = offset;
> +	return 0;
> +}
> +
> +static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct pci_dev *pdev;
> +	int ret;
> +
> +	ret = vfio_pci_core_init_dev(core_vdev);
> +	if (ret)
> +		return ret;
> +
> +	pdev = virtvdev->core_device.pdev;
> +	virtvdev->vf_id = pci_iov_vf_id(pdev);
> +	if (virtvdev->vf_id < 0)
> +		return -EINVAL;
> +
> +	ret = virtiovf_read_notify_info(virtvdev);
> +	if (ret)
> +		return ret;
> +
> +	virtvdev->bar0_virtual_buf_size = VIRTIO_LEGACY_IO_BAR_HEADER_LEN +
> +		VIRTIO_LEGACY_IO_BAR_MSIX_HEADER_LEN +
> +		virtiovf_get_device_config_size(pdev->device);
> +	virtvdev->bar0_virtual_buf = kzalloc(virtvdev->bar0_virtual_buf_size,
> +					     GFP_KERNEL);
> +	if (!virtvdev->bar0_virtual_buf)
> +		return -ENOMEM;
> +	mutex_init(&virtvdev->bar_mutex);
> +	return 0;
> +}
> +
> +static void virtiovf_pci_core_release_dev(struct vfio_device *core_vdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +
> +	kfree(virtvdev->bar0_virtual_buf);
> +	vfio_pci_core_release_dev(core_vdev);
> +}
> +
> +static const struct vfio_device_ops virtiovf_acc_vfio_pci_tran_ops = {
> +	.name = "virtio-transitional-vfio-pci",
> +	.init = virtiovf_pci_init_device,
> +	.release = virtiovf_pci_core_release_dev,
> +	.open_device = virtiovf_pci_open_device,
> +	.close_device = virtiovf_pci_close_device,
> +	.ioctl = virtiovf_vfio_pci_core_ioctl,
> +	.read = virtiovf_pci_core_read,
> +	.write = virtiovf_pci_core_write,
> +	.mmap = vfio_pci_core_mmap,
> +	.request = vfio_pci_core_request,
> +	.match = vfio_pci_core_match,
> +	.bind_iommufd = vfio_iommufd_physical_bind,
> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
> +};
> +
> +static const struct vfio_device_ops virtiovf_acc_vfio_pci_ops = {
> +	.name = "virtio-acc-vfio-pci",
> +	.init = vfio_pci_core_init_dev,
> +	.release = vfio_pci_core_release_dev,
> +	.open_device = virtiovf_pci_open_device,
> +	.close_device = virtiovf_pci_close_device,
> +	.ioctl = vfio_pci_core_ioctl,
> +	.device_feature = vfio_pci_core_ioctl_feature,
> +	.read = vfio_pci_core_read,
> +	.write = vfio_pci_core_write,
> +	.mmap = vfio_pci_core_mmap,
> +	.request = vfio_pci_core_request,
> +	.match = vfio_pci_core_match,
> +	.bind_iommufd = vfio_iommufd_physical_bind,
> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
> +};
> +
> +static bool virtiovf_bar0_exists(struct pci_dev *pdev)
> +{
> +	struct resource *res = pdev->resource;
> +
> +	return res->flags ? true : false;
> +}
> +
> +#define VIRTIOVF_USE_ADMIN_CMD_BITMAP \
> +	(BIT_ULL(VIRTIO_ADMIN_CMD_LIST_QUERY) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LIST_USE) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
> +
> +static bool virtiovf_support_legacy_access(struct pci_dev *pdev)
> +{
> +	int buf_size = DIV_ROUND_UP(VIRTIO_ADMIN_MAX_CMD_OPCODE, 64) * 8;
> +	u8 *buf;
> +	int ret;
> +
> +	/* Only virtio-net is supported/tested so far */
> +	if (pdev->device != 0x1041)
> +		return false;
> +
> +	buf = kzalloc(buf_size, GFP_KERNEL);
> +	if (!buf)
> +		return false;
> +
> +	ret = virtiovf_cmd_list_query(pdev, buf, buf_size);
> +	if (ret)
> +		goto end;
> +
> +	if ((le64_to_cpup((__le64 *)buf) & VIRTIOVF_USE_ADMIN_CMD_BITMAP) !=
> +		VIRTIOVF_USE_ADMIN_CMD_BITMAP) {
> +		ret = -EOPNOTSUPP;
> +		goto end;
> +	}
> +
> +	/* confirm the used commands */
> +	memset(buf, 0, buf_size);
> +	*(__le64 *)buf = cpu_to_le64(VIRTIOVF_USE_ADMIN_CMD_BITMAP);
> +	ret = virtiovf_cmd_list_use(pdev, buf, buf_size);
> +
> +end:
> +	kfree(buf);
> +	return ret ? false : true;
> +}
> +
> +static int virtiovf_pci_probe(struct pci_dev *pdev,
> +			      const struct pci_device_id *id)
> +{
> +	const struct vfio_device_ops *ops = &virtiovf_acc_vfio_pci_ops;
> +	struct virtiovf_pci_core_device *virtvdev;
> +	int ret;
> +
> +	if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
> +	    !virtiovf_bar0_exists(pdev) && pdev->msix_cap)
> +		ops = &virtiovf_acc_vfio_pci_tran_ops;
> +
> +	virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
> +				     &pdev->dev, ops);
> +	if (IS_ERR(virtvdev))
> +		return PTR_ERR(virtvdev);
> +
> +	dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
> +	ret = vfio_pci_core_register_device(&virtvdev->core_device);
> +	if (ret)
> +		goto out;
> +	return 0;
> +out:
> +	vfio_put_device(&virtvdev->core_device.vdev);
> +	return ret;
> +}
> +
> +static void virtiovf_pci_remove(struct pci_dev *pdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
> +
> +	vfio_pci_core_unregister_device(&virtvdev->core_device);
> +	vfio_put_device(&virtvdev->core_device.vdev);
> +}
> +
> +static const struct pci_device_id virtiovf_pci_table[] = {
> +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, PCI_ANY_ID) },
> +	{}
> +};
> +
> +MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
> +
> +static struct pci_driver virtiovf_pci_driver = {
> +	.name = KBUILD_MODNAME,
> +	.id_table = virtiovf_pci_table,
> +	.probe = virtiovf_pci_probe,
> +	.remove = virtiovf_pci_remove,
> +	.err_handler = &vfio_pci_core_err_handlers,
> +	.driver_managed_dma = true,
> +};
> +
> +module_pci_driver(virtiovf_pci_driver);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
> +MODULE_DESCRIPTION(
> +	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO device family");


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 16:43     ` Alex Williamson
  (?)
@ 2023-09-21 16:52     ` Jason Gunthorpe
  2023-09-21 17:01         ` Michael S. Tsirkin
  -1 siblings, 1 reply; 321+ messages in thread
From: Jason Gunthorpe @ 2023-09-21 16:52 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Yishai Hadas, mst, jasowang, kvm, virtualization, parav, feliu,
	jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, Sep 21, 2023 at 10:43:50AM -0600, Alex Williamson wrote:

> > With that code in place a legacy driver in the guest has the look and
> > feel as if having a transitional device with legacy support for both its
> > control and data path flows.
> 
> Why do we need to enable a "legacy" driver in the guest?  The very name
> suggests there's an alternative driver that perhaps doesn't require
> this I/O BAR.  Why don't we just require the non-legacy driver in the
> guest rather than increase our maintenance burden?  Thanks,

It was my reaction also.

Apparently there is a big deployed base of people using old guest VMs
with old drivers and they do not want to update their VMs. It is the
same basic reason why qemu supports all those weird old machine types
and HW emulations. The desire is to support these old devices so that
old VMs can work unchanged.

Jason

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 16:41         ` Jason Gunthorpe
@ 2023-09-21 16:53             ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 16:53 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kvm, maorg, virtualization, jiri, leonro

On Thu, Sep 21, 2023 at 01:41:39PM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 10:16:04AM -0400, Michael S. Tsirkin wrote:
> > On Thu, Sep 21, 2023 at 11:11:25AM -0300, Jason Gunthorpe wrote:
> > > On Thu, Sep 21, 2023 at 09:16:21AM -0400, Michael S. Tsirkin wrote:
> > > 
> > > > > diff --git a/MAINTAINERS b/MAINTAINERS
> > > > > index bf0f54c24f81..5098418c8389 100644
> > > > > --- a/MAINTAINERS
> > > > > +++ b/MAINTAINERS
> > > > > @@ -22624,6 +22624,12 @@ L:	kvm@vger.kernel.org
> > > > >  S:	Maintained
> > > > >  F:	drivers/vfio/pci/mlx5/
> > > > >  
> > > > > +VFIO VIRTIO PCI DRIVER
> > > > > +M:	Yishai Hadas <yishaih@nvidia.com>
> > > > > +L:	kvm@vger.kernel.org
> > > > > +S:	Maintained
> > > > > +F:	drivers/vfio/pci/virtio
> > > > > +
> > > > >  VFIO PCI DEVICE SPECIFIC DRIVERS
> > > > >  R:	Jason Gunthorpe <jgg@nvidia.com>
> > > > >  R:	Yishai Hadas <yishaih@nvidia.com>
> > > > 
> > > > Tying two subsystems together like this is going to cause pain when
> > > > merging. God forbid there's something e.g. virtio net specific
> > > > (and there's going to be for sure) - now we are talking 3
> > > > subsystems.
> > > 
> > > Cross subsystem stuff is normal in the kernel.
> > 
> > Yea. But it's completely spurious here - virtio has its own way
> > to work with userspace which is vdpa and let's just use that.
> > Keeps things nice and contained.
> 
> vdpa is not vfio, I don't know how you can suggest vdpa is a
> replacement for a vfio driver. They are completely different
> things.
> Each side has its own strengths, and vfio especially is accelerating
> in its capability in way that vpda is not. eg if an iommufd conversion
> had been done by now for vdpa I might be more sympathetic.

Yea, I agree iommufd is a big problem with vdpa right now. Cindy was
sick and I didn't know and kept assuming she's working on this. I don't
think it's a huge amount of work though.  I'll take a look.
Is there anything else though? Do tell.

> Asking for
> someone else to do a huge amount of pointless work to improve vdpa
> just to level of this vfio driver already is at is ridiculous.
> 
> vdpa is great for certain kinds of HW, let it focus on that, don't try
> to paint it as an alternative to vfio. It isn't.
> 
> Jason

There are a bunch of things that I think are important for virtio
that are completely out of scope for vfio, such as migrating
cross-vendor. What is the huge amount of work am I asking to do?



-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-21 16:53             ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 16:53 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yishai Hadas, alex.williamson, jasowang, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, Sep 21, 2023 at 01:41:39PM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 10:16:04AM -0400, Michael S. Tsirkin wrote:
> > On Thu, Sep 21, 2023 at 11:11:25AM -0300, Jason Gunthorpe wrote:
> > > On Thu, Sep 21, 2023 at 09:16:21AM -0400, Michael S. Tsirkin wrote:
> > > 
> > > > > diff --git a/MAINTAINERS b/MAINTAINERS
> > > > > index bf0f54c24f81..5098418c8389 100644
> > > > > --- a/MAINTAINERS
> > > > > +++ b/MAINTAINERS
> > > > > @@ -22624,6 +22624,12 @@ L:	kvm@vger.kernel.org
> > > > >  S:	Maintained
> > > > >  F:	drivers/vfio/pci/mlx5/
> > > > >  
> > > > > +VFIO VIRTIO PCI DRIVER
> > > > > +M:	Yishai Hadas <yishaih@nvidia.com>
> > > > > +L:	kvm@vger.kernel.org
> > > > > +S:	Maintained
> > > > > +F:	drivers/vfio/pci/virtio
> > > > > +
> > > > >  VFIO PCI DEVICE SPECIFIC DRIVERS
> > > > >  R:	Jason Gunthorpe <jgg@nvidia.com>
> > > > >  R:	Yishai Hadas <yishaih@nvidia.com>
> > > > 
> > > > Tying two subsystems together like this is going to cause pain when
> > > > merging. God forbid there's something e.g. virtio net specific
> > > > (and there's going to be for sure) - now we are talking 3
> > > > subsystems.
> > > 
> > > Cross subsystem stuff is normal in the kernel.
> > 
> > Yea. But it's completely spurious here - virtio has its own way
> > to work with userspace which is vdpa and let's just use that.
> > Keeps things nice and contained.
> 
> vdpa is not vfio, I don't know how you can suggest vdpa is a
> replacement for a vfio driver. They are completely different
> things.
> Each side has its own strengths, and vfio especially is accelerating
> in its capability in way that vpda is not. eg if an iommufd conversion
> had been done by now for vdpa I might be more sympathetic.

Yea, I agree iommufd is a big problem with vdpa right now. Cindy was
sick and I didn't know and kept assuming she's working on this. I don't
think it's a huge amount of work though.  I'll take a look.
Is there anything else though? Do tell.

> Asking for
> someone else to do a huge amount of pointless work to improve vdpa
> just to level of this vfio driver already is at is ridiculous.
> 
> vdpa is great for certain kinds of HW, let it focus on that, don't try
> to paint it as an alternative to vfio. It isn't.
> 
> Jason

There are a bunch of things that I think are important for virtio
that are completely out of scope for vfio, such as migrating
cross-vendor. What is the huge amount of work am I asking to do?



-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 16:52     ` Jason Gunthorpe
@ 2023-09-21 17:01         ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 17:01 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kvm, maorg, virtualization, jiri, leonro

On Thu, Sep 21, 2023 at 01:52:24PM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 10:43:50AM -0600, Alex Williamson wrote:
> 
> > > With that code in place a legacy driver in the guest has the look and
> > > feel as if having a transitional device with legacy support for both its
> > > control and data path flows.
> > 
> > Why do we need to enable a "legacy" driver in the guest?  The very name
> > suggests there's an alternative driver that perhaps doesn't require
> > this I/O BAR.  Why don't we just require the non-legacy driver in the
> > guest rather than increase our maintenance burden?  Thanks,
> 
> It was my reaction also.
> 
> Apparently there is a big deployed base of people using old guest VMs
> with old drivers and they do not want to update their VMs. It is the
> same basic reason why qemu supports all those weird old machine types
> and HW emulations. The desire is to support these old devices so that
> old VMs can work unchanged.
> 
> Jason

And you are saying all these very old VMs use such a large number of
legacy devices that over-counting of locked memory due to vdpa not
correctly using iommufd is a problem that urgently needs to be solved
otherwise the solution has no value?

Another question I'm interested in is whether there's actually a
performance benefit to using this as compared to just software
vhost. I note there's a VM exit on each IO access, so ... perhaps?
Would be nice to see some numbers.


-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-21 17:01         ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 17:01 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alex Williamson, Yishai Hadas, jasowang, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, Sep 21, 2023 at 01:52:24PM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 10:43:50AM -0600, Alex Williamson wrote:
> 
> > > With that code in place a legacy driver in the guest has the look and
> > > feel as if having a transitional device with legacy support for both its
> > > control and data path flows.
> > 
> > Why do we need to enable a "legacy" driver in the guest?  The very name
> > suggests there's an alternative driver that perhaps doesn't require
> > this I/O BAR.  Why don't we just require the non-legacy driver in the
> > guest rather than increase our maintenance burden?  Thanks,
> 
> It was my reaction also.
> 
> Apparently there is a big deployed base of people using old guest VMs
> with old drivers and they do not want to update their VMs. It is the
> same basic reason why qemu supports all those weird old machine types
> and HW emulations. The desire is to support these old devices so that
> old VMs can work unchanged.
> 
> Jason

And you are saying all these very old VMs use such a large number of
legacy devices that over-counting of locked memory due to vdpa not
correctly using iommufd is a problem that urgently needs to be solved
otherwise the solution has no value?

Another question I'm interested in is whether there's actually a
performance benefit to using this as compared to just software
vhost. I note there's a VM exit on each IO access, so ... perhaps?
Would be nice to see some numbers.


-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 17:01         ` Michael S. Tsirkin
  (?)
@ 2023-09-21 17:07         ` Jason Gunthorpe
  2023-09-21 17:21             ` Michael S. Tsirkin
  -1 siblings, 1 reply; 321+ messages in thread
From: Jason Gunthorpe @ 2023-09-21 17:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Yishai Hadas, jasowang, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, Sep 21, 2023 at 01:01:12PM -0400, Michael S. Tsirkin wrote:
> On Thu, Sep 21, 2023 at 01:52:24PM -0300, Jason Gunthorpe wrote:
> > On Thu, Sep 21, 2023 at 10:43:50AM -0600, Alex Williamson wrote:
> > 
> > > > With that code in place a legacy driver in the guest has the look and
> > > > feel as if having a transitional device with legacy support for both its
> > > > control and data path flows.
> > > 
> > > Why do we need to enable a "legacy" driver in the guest?  The very name
> > > suggests there's an alternative driver that perhaps doesn't require
> > > this I/O BAR.  Why don't we just require the non-legacy driver in the
> > > guest rather than increase our maintenance burden?  Thanks,
> > 
> > It was my reaction also.
> > 
> > Apparently there is a big deployed base of people using old guest VMs
> > with old drivers and they do not want to update their VMs. It is the
> > same basic reason why qemu supports all those weird old machine types
> > and HW emulations. The desire is to support these old devices so that
> > old VMs can work unchanged.
> > 
> > Jason
> 
> And you are saying all these very old VMs use such a large number of
> legacy devices that over-counting of locked memory due to vdpa not
> correctly using iommufd is a problem that urgently needs to be solved
> otherwise the solution has no value?

No one has said that.

iommufd is gaining alot more functions than just pinned memory
accounting.

> Another question I'm interested in is whether there's actually a
> performance benefit to using this as compared to just software
> vhost. I note there's a VM exit on each IO access, so ... perhaps?
> Would be nice to see some numbers.

At least a single trap compared with an entire per-packet SW flow
undoubtably uses alot less CPU power in the hypervisor.

Jason

^ permalink raw reply	[flat|nested] 321+ messages in thread

* RE: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 17:01         ` Michael S. Tsirkin
@ 2023-09-21 17:09           ` Parav Pandit
  -1 siblings, 0 replies; 321+ messages in thread
From: Parav Pandit via Virtualization @ 2023-09-21 17:09 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Gunthorpe
  Cc: kvm, Maor Gottlieb, virtualization, Jiri Pirko, Leon Romanovsky



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Thursday, September 21, 2023 10:31 PM

> Another question I'm interested in is whether there's actually a performance
> benefit to using this as compared to just software vhost. I note there's a VM exit
> on each IO access, so ... perhaps?
> Would be nice to see some numbers.

Packet rate and bandwidth are close are only 10% lower than modern device due to the batching of driver notification.
Bw tested with iperf with one and multiple queues. 
Packet rate tested with testpmd.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* RE: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-21 17:09           ` Parav Pandit
  0 siblings, 0 replies; 321+ messages in thread
From: Parav Pandit @ 2023-09-21 17:09 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Gunthorpe
  Cc: Alex Williamson, Yishai Hadas, jasowang, kvm, virtualization,
	Feng Liu, Jiri Pirko, kevin.tian, joao.m.martins,
	Leon Romanovsky, Maor Gottlieb



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Thursday, September 21, 2023 10:31 PM

> Another question I'm interested in is whether there's actually a performance
> benefit to using this as compared to just software vhost. I note there's a VM exit
> on each IO access, so ... perhaps?
> Would be nice to see some numbers.

Packet rate and bandwidth are close are only 10% lower than modern device due to the batching of driver notification.
Bw tested with iperf with one and multiple queues. 
Packet rate tested with testpmd.

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 17:07         ` Jason Gunthorpe
@ 2023-09-21 17:21             ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 17:21 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kvm, maorg, virtualization, jiri, leonro

On Thu, Sep 21, 2023 at 02:07:09PM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 01:01:12PM -0400, Michael S. Tsirkin wrote:
> > On Thu, Sep 21, 2023 at 01:52:24PM -0300, Jason Gunthorpe wrote:
> > > On Thu, Sep 21, 2023 at 10:43:50AM -0600, Alex Williamson wrote:
> > > 
> > > > > With that code in place a legacy driver in the guest has the look and
> > > > > feel as if having a transitional device with legacy support for both its
> > > > > control and data path flows.
> > > > 
> > > > Why do we need to enable a "legacy" driver in the guest?  The very name
> > > > suggests there's an alternative driver that perhaps doesn't require
> > > > this I/O BAR.  Why don't we just require the non-legacy driver in the
> > > > guest rather than increase our maintenance burden?  Thanks,
> > > 
> > > It was my reaction also.
> > > 
> > > Apparently there is a big deployed base of people using old guest VMs
> > > with old drivers and they do not want to update their VMs. It is the
> > > same basic reason why qemu supports all those weird old machine types
> > > and HW emulations. The desire is to support these old devices so that
> > > old VMs can work unchanged.
> > > 
> > > Jason
> > 
> > And you are saying all these very old VMs use such a large number of
> > legacy devices that over-counting of locked memory due to vdpa not
> > correctly using iommufd is a problem that urgently needs to be solved
> > otherwise the solution has no value?
> 
> No one has said that.
> 
> iommufd is gaining alot more functions than just pinned memory
> accounting.

Yea it's very useful - it's also useful for vdpa whether this patchset
goes in or not.  At some level, if vdpa can't keep up then maybe going
the vfio route is justified. I'm not sure why didn't anyone fix iommufd
yet - looks like a small amount of work. I'll see if I can address it
quickly because we already have virtio accelerators under vdpa and it
seems confusing to people to use vdpa for some and vfio for others, with
overlapping but slightly incompatible functionality.  I'll get back next
week, in either case. I am however genuinely curious whether all the new
functionality is actually useful for these legacy guests.

> > Another question I'm interested in is whether there's actually a
> > performance benefit to using this as compared to just software
> > vhost. I note there's a VM exit on each IO access, so ... perhaps?
> > Would be nice to see some numbers.
> 
> At least a single trap compared with an entire per-packet SW flow
> undoubtably uses alot less CPU power in the hypervisor.
> 
> Jason

Something like the shadow vq thing will be more or less equivalent then?
That's upstream in qemu and needs no hardware support. Worth comparing
against.  Anyway, there's presumably actual hardware this was tested
with, so why guess? Just test and post numbers.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-21 17:21             ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 17:21 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alex Williamson, Yishai Hadas, jasowang, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, Sep 21, 2023 at 02:07:09PM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 01:01:12PM -0400, Michael S. Tsirkin wrote:
> > On Thu, Sep 21, 2023 at 01:52:24PM -0300, Jason Gunthorpe wrote:
> > > On Thu, Sep 21, 2023 at 10:43:50AM -0600, Alex Williamson wrote:
> > > 
> > > > > With that code in place a legacy driver in the guest has the look and
> > > > > feel as if having a transitional device with legacy support for both its
> > > > > control and data path flows.
> > > > 
> > > > Why do we need to enable a "legacy" driver in the guest?  The very name
> > > > suggests there's an alternative driver that perhaps doesn't require
> > > > this I/O BAR.  Why don't we just require the non-legacy driver in the
> > > > guest rather than increase our maintenance burden?  Thanks,
> > > 
> > > It was my reaction also.
> > > 
> > > Apparently there is a big deployed base of people using old guest VMs
> > > with old drivers and they do not want to update their VMs. It is the
> > > same basic reason why qemu supports all those weird old machine types
> > > and HW emulations. The desire is to support these old devices so that
> > > old VMs can work unchanged.
> > > 
> > > Jason
> > 
> > And you are saying all these very old VMs use such a large number of
> > legacy devices that over-counting of locked memory due to vdpa not
> > correctly using iommufd is a problem that urgently needs to be solved
> > otherwise the solution has no value?
> 
> No one has said that.
> 
> iommufd is gaining alot more functions than just pinned memory
> accounting.

Yea it's very useful - it's also useful for vdpa whether this patchset
goes in or not.  At some level, if vdpa can't keep up then maybe going
the vfio route is justified. I'm not sure why didn't anyone fix iommufd
yet - looks like a small amount of work. I'll see if I can address it
quickly because we already have virtio accelerators under vdpa and it
seems confusing to people to use vdpa for some and vfio for others, with
overlapping but slightly incompatible functionality.  I'll get back next
week, in either case. I am however genuinely curious whether all the new
functionality is actually useful for these legacy guests.

> > Another question I'm interested in is whether there's actually a
> > performance benefit to using this as compared to just software
> > vhost. I note there's a VM exit on each IO access, so ... perhaps?
> > Would be nice to see some numbers.
> 
> At least a single trap compared with an entire per-packet SW flow
> undoubtably uses alot less CPU power in the hypervisor.
> 
> Jason

Something like the shadow vq thing will be more or less equivalent then?
That's upstream in qemu and needs no hardware support. Worth comparing
against.  Anyway, there's presumably actual hardware this was tested
with, so why guess? Just test and post numbers.

-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 17:09           ` Parav Pandit
@ 2023-09-21 17:24             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 17:24 UTC (permalink / raw)
  To: Parav Pandit
  Cc: kvm, Maor Gottlieb, virtualization, Jason Gunthorpe, Jiri Pirko,
	Leon Romanovsky

On Thu, Sep 21, 2023 at 05:09:04PM +0000, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Thursday, September 21, 2023 10:31 PM
> 
> > Another question I'm interested in is whether there's actually a performance
> > benefit to using this as compared to just software vhost. I note there's a VM exit
> > on each IO access, so ... perhaps?
> > Would be nice to see some numbers.
> 
> Packet rate and bandwidth are close are only 10% lower than modern device due to the batching of driver notification.
> Bw tested with iperf with one and multiple queues. 
> Packet rate tested with testpmd.

Nice, good to know.  Could you compare this with vdpa with shadow vq
enabled?  That's probably the closest equivalent that needs
no kernel or hardware work.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-21 17:24             ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 17:24 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Jason Gunthorpe, Alex Williamson, Yishai Hadas, jasowang, kvm,
	virtualization, Feng Liu, Jiri Pirko, kevin.tian, joao.m.martins,
	Leon Romanovsky, Maor Gottlieb

On Thu, Sep 21, 2023 at 05:09:04PM +0000, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Thursday, September 21, 2023 10:31 PM
> 
> > Another question I'm interested in is whether there's actually a performance
> > benefit to using this as compared to just software vhost. I note there's a VM exit
> > on each IO access, so ... perhaps?
> > Would be nice to see some numbers.
> 
> Packet rate and bandwidth are close are only 10% lower than modern device due to the batching of driver notification.
> Bw tested with iperf with one and multiple queues. 
> Packet rate tested with testpmd.

Nice, good to know.  Could you compare this with vdpa with shadow vq
enabled?  That's probably the closest equivalent that needs
no kernel or hardware work.

-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 17:21             ` Michael S. Tsirkin
  (?)
@ 2023-09-21 17:44             ` Jason Gunthorpe
  2023-09-21 17:55                 ` Michael S. Tsirkin
  -1 siblings, 1 reply; 321+ messages in thread
From: Jason Gunthorpe @ 2023-09-21 17:44 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Yishai Hadas, jasowang, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, Sep 21, 2023 at 01:21:26PM -0400, Michael S. Tsirkin wrote:
> Yea it's very useful - it's also useful for vdpa whether this patchset
> goes in or not.  At some level, if vdpa can't keep up then maybe going
> the vfio route is justified. I'm not sure why didn't anyone fix iommufd
> yet - looks like a small amount of work. I'll see if I can address it
> quickly because we already have virtio accelerators under vdpa and it
> seems confusing to people to use vdpa for some and vfio for others, with
> overlapping but slightly incompatible functionality.  I'll get back next
> week, in either case. I am however genuinely curious whether all the new
> functionality is actually useful for these legacy guests.

It doesn't have much to do with the guests - this is new hypervisor
functionality to make the hypervisor do more things. This stuff can
still work with old VMs.

> > > Another question I'm interested in is whether there's actually a
> > > performance benefit to using this as compared to just software
> > > vhost. I note there's a VM exit on each IO access, so ... perhaps?
> > > Would be nice to see some numbers.
> > 
> > At least a single trap compared with an entire per-packet SW flow
> > undoubtably uses alot less CPU power in the hypervisor.
>
> Something like the shadow vq thing will be more or less equivalent
> then?

Huh? It still has the entire netdev stack to go through on every
packet before it reaches the real virtio device.

> That's upstream in qemu and needs no hardware support. Worth comparing
> against.  Anyway, there's presumably actual hardware this was tested
> with, so why guess? Just test and post numbers.

Our prior benchmarking put our VPDA/VFIO solutions at something like
2x-3x improvement over the qemu SW path it replaces.

Parav said 10% is lost, so 10% of 3x is still 3x better :)

I thought we all agreed on this when vdpa was created in the first
place, the all SW path was hopeless to get high performance out of?

Jason

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 17:44             ` Jason Gunthorpe
@ 2023-09-21 17:55                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 17:55 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kvm, maorg, virtualization, jiri, leonro

On Thu, Sep 21, 2023 at 02:44:50PM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 01:21:26PM -0400, Michael S. Tsirkin wrote:
> > Yea it's very useful - it's also useful for vdpa whether this patchset
> > goes in or not.  At some level, if vdpa can't keep up then maybe going
> > the vfio route is justified. I'm not sure why didn't anyone fix iommufd
> > yet - looks like a small amount of work. I'll see if I can address it
> > quickly because we already have virtio accelerators under vdpa and it
> > seems confusing to people to use vdpa for some and vfio for others, with
> > overlapping but slightly incompatible functionality.  I'll get back next
> > week, in either case. I am however genuinely curious whether all the new
> > functionality is actually useful for these legacy guests.
> 
> It doesn't have much to do with the guests - this is new hypervisor
> functionality to make the hypervisor do more things. This stuff can
> still work with old VMs.
> 
> > > > Another question I'm interested in is whether there's actually a
> > > > performance benefit to using this as compared to just software
> > > > vhost. I note there's a VM exit on each IO access, so ... perhaps?
> > > > Would be nice to see some numbers.
> > > 
> > > At least a single trap compared with an entire per-packet SW flow
> > > undoubtably uses alot less CPU power in the hypervisor.
> >
> > Something like the shadow vq thing will be more or less equivalent
> > then?
> 
> Huh? It still has the entire netdev stack to go through on every
> packet before it reaches the real virtio device.

No - shadow vq just tweaks the descriptor and forwards it to
the modern vdpa hardware. No net stack involved.

> > That's upstream in qemu and needs no hardware support. Worth comparing
> > against.  Anyway, there's presumably actual hardware this was tested
> > with, so why guess? Just test and post numbers.
> 
> Our prior benchmarking put our VPDA/VFIO solutions at something like
> 2x-3x improvement over the qemu SW path it replaces.
> Parav said 10% is lost, so 10% of 3x is still 3x better :)
> 
> I thought we all agreed on this when vdpa was created in the first
> place, the all SW path was hopeless to get high performance out of?
> 
> Jason

That's not what I'm asking about though - not what shadow vq does,
shadow vq is a vdpa feature.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-21 17:55                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 17:55 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alex Williamson, Yishai Hadas, jasowang, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, Sep 21, 2023 at 02:44:50PM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 01:21:26PM -0400, Michael S. Tsirkin wrote:
> > Yea it's very useful - it's also useful for vdpa whether this patchset
> > goes in or not.  At some level, if vdpa can't keep up then maybe going
> > the vfio route is justified. I'm not sure why didn't anyone fix iommufd
> > yet - looks like a small amount of work. I'll see if I can address it
> > quickly because we already have virtio accelerators under vdpa and it
> > seems confusing to people to use vdpa for some and vfio for others, with
> > overlapping but slightly incompatible functionality.  I'll get back next
> > week, in either case. I am however genuinely curious whether all the new
> > functionality is actually useful for these legacy guests.
> 
> It doesn't have much to do with the guests - this is new hypervisor
> functionality to make the hypervisor do more things. This stuff can
> still work with old VMs.
> 
> > > > Another question I'm interested in is whether there's actually a
> > > > performance benefit to using this as compared to just software
> > > > vhost. I note there's a VM exit on each IO access, so ... perhaps?
> > > > Would be nice to see some numbers.
> > > 
> > > At least a single trap compared with an entire per-packet SW flow
> > > undoubtably uses alot less CPU power in the hypervisor.
> >
> > Something like the shadow vq thing will be more or less equivalent
> > then?
> 
> Huh? It still has the entire netdev stack to go through on every
> packet before it reaches the real virtio device.

No - shadow vq just tweaks the descriptor and forwards it to
the modern vdpa hardware. No net stack involved.

> > That's upstream in qemu and needs no hardware support. Worth comparing
> > against.  Anyway, there's presumably actual hardware this was tested
> > with, so why guess? Just test and post numbers.
> 
> Our prior benchmarking put our VPDA/VFIO solutions at something like
> 2x-3x improvement over the qemu SW path it replaces.
> Parav said 10% is lost, so 10% of 3x is still 3x better :)
> 
> I thought we all agreed on this when vdpa was created in the first
> place, the all SW path was hopeless to get high performance out of?
> 
> Jason

That's not what I'm asking about though - not what shadow vq does,
shadow vq is a vdpa feature.


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 17:55                 ` Michael S. Tsirkin
  (?)
@ 2023-09-21 18:16                 ` Jason Gunthorpe
  2023-09-21 19:34                     ` Michael S. Tsirkin
  -1 siblings, 1 reply; 321+ messages in thread
From: Jason Gunthorpe @ 2023-09-21 18:16 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Yishai Hadas, jasowang, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, Sep 21, 2023 at 01:55:42PM -0400, Michael S. Tsirkin wrote:

> That's not what I'm asking about though - not what shadow vq does,
> shadow vq is a vdpa feature.

That's just VDPA then. We already talked about why VDPA is not a
replacement for VFIO.

I agree you can probably get decent performance out of choosing VDPA
over VFIO. That doesn't justify VDPA as a replacement for VFIO.

Jason

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 16:53             ` Michael S. Tsirkin
  (?)
@ 2023-09-21 18:39             ` Jason Gunthorpe
  2023-09-21 19:13                 ` Michael S. Tsirkin
                                 ` (2 more replies)
  -1 siblings, 3 replies; 321+ messages in thread
From: Jason Gunthorpe @ 2023-09-21 18:39 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Yishai Hadas, alex.williamson, jasowang, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, Sep 21, 2023 at 12:53:04PM -0400, Michael S. Tsirkin wrote:
> > vdpa is not vfio, I don't know how you can suggest vdpa is a
> > replacement for a vfio driver. They are completely different
> > things.
> > Each side has its own strengths, and vfio especially is accelerating
> > in its capability in way that vpda is not. eg if an iommufd conversion
> > had been done by now for vdpa I might be more sympathetic.
> 
> Yea, I agree iommufd is a big problem with vdpa right now. Cindy was
> sick and I didn't know and kept assuming she's working on this. I don't
> think it's a huge amount of work though.  I'll take a look.
> Is there anything else though? Do tell.

Confidential compute will never work with VDPA's approach.

> There are a bunch of things that I think are important for virtio
> that are completely out of scope for vfio, such as migrating
> cross-vendor. 

VFIO supports migration, if you want to have cross-vendor migration
then make a standard that describes the VFIO migration data format for
virtio devices.

> What is the huge amount of work am I asking to do?

You are asking us to invest in the complexity of VDPA through out
(keep it working, keep it secure, invest time in deploying and
debugging in the field)

When it doesn't provide *ANY* value to the solution.

The starting point is a completely working vfio PCI function and the
end goal is to put that function into a VM. That is VFIO, not VDPA.

VPDA is fine for what it does, but it is not a reasonable replacement
for VFIO.

Jason

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 18:39             ` Jason Gunthorpe
@ 2023-09-21 19:13                 ` Michael S. Tsirkin
  2023-09-21 19:17                 ` Michael S. Tsirkin
  2023-09-22  3:45                 ` Zhu, Lingshan
  2 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 19:13 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kvm, maorg, virtualization, jiri, leonro

On Thu, Sep 21, 2023 at 03:39:26PM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 12:53:04PM -0400, Michael S. Tsirkin wrote:
> > > vdpa is not vfio, I don't know how you can suggest vdpa is a
> > > replacement for a vfio driver. They are completely different
> > > things.
> > > Each side has its own strengths, and vfio especially is accelerating
> > > in its capability in way that vpda is not. eg if an iommufd conversion
> > > had been done by now for vdpa I might be more sympathetic.
> > 
> > Yea, I agree iommufd is a big problem with vdpa right now. Cindy was
> > sick and I didn't know and kept assuming she's working on this. I don't
> > think it's a huge amount of work though.  I'll take a look.
> > Is there anything else though? Do tell.
> 
> Confidential compute will never work with VDPA's approach.

I don't see how what this patchset is doing is different
wrt to Confidential compute - you trap IO accesses and emulate.
Care to elaborate?


> > There are a bunch of things that I think are important for virtio
> > that are completely out of scope for vfio, such as migrating
> > cross-vendor. 
> 
> VFIO supports migration, if you want to have cross-vendor migration
> then make a standard that describes the VFIO migration data format for
> virtio devices.

This has nothing to do with data formats - you need two devices to
behave identically. Which is what VDPA is about really.

> > What is the huge amount of work am I asking to do?
> 
> You are asking us to invest in the complexity of VDPA through out
> (keep it working, keep it secure, invest time in deploying and
> debugging in the field)
> 
> When it doesn't provide *ANY* value to the solution.

There's no "the solution" - this sounds like a vendor only caring about
solutions that involve that vendor's hardware exclusively, a little.

> The starting point is a completely working vfio PCI function and the
> end goal is to put that function into a VM. That is VFIO, not VDPA.
> 
> VPDA is fine for what it does, but it is not a reasonable replacement
> for VFIO.
> 
> Jason

VDPA basically should be a kind of "VFIO for virtio".

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-21 19:13                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 19:13 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yishai Hadas, alex.williamson, jasowang, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, Sep 21, 2023 at 03:39:26PM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 12:53:04PM -0400, Michael S. Tsirkin wrote:
> > > vdpa is not vfio, I don't know how you can suggest vdpa is a
> > > replacement for a vfio driver. They are completely different
> > > things.
> > > Each side has its own strengths, and vfio especially is accelerating
> > > in its capability in way that vpda is not. eg if an iommufd conversion
> > > had been done by now for vdpa I might be more sympathetic.
> > 
> > Yea, I agree iommufd is a big problem with vdpa right now. Cindy was
> > sick and I didn't know and kept assuming she's working on this. I don't
> > think it's a huge amount of work though.  I'll take a look.
> > Is there anything else though? Do tell.
> 
> Confidential compute will never work with VDPA's approach.

I don't see how what this patchset is doing is different
wrt to Confidential compute - you trap IO accesses and emulate.
Care to elaborate?


> > There are a bunch of things that I think are important for virtio
> > that are completely out of scope for vfio, such as migrating
> > cross-vendor. 
> 
> VFIO supports migration, if you want to have cross-vendor migration
> then make a standard that describes the VFIO migration data format for
> virtio devices.

This has nothing to do with data formats - you need two devices to
behave identically. Which is what VDPA is about really.

> > What is the huge amount of work am I asking to do?
> 
> You are asking us to invest in the complexity of VDPA through out
> (keep it working, keep it secure, invest time in deploying and
> debugging in the field)
> 
> When it doesn't provide *ANY* value to the solution.

There's no "the solution" - this sounds like a vendor only caring about
solutions that involve that vendor's hardware exclusively, a little.

> The starting point is a completely working vfio PCI function and the
> end goal is to put that function into a VM. That is VFIO, not VDPA.
> 
> VPDA is fine for what it does, but it is not a reasonable replacement
> for VFIO.
> 
> Jason

VDPA basically should be a kind of "VFIO for virtio".

-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 18:39             ` Jason Gunthorpe
@ 2023-09-21 19:17                 ` Michael S. Tsirkin
  2023-09-21 19:17                 ` Michael S. Tsirkin
  2023-09-22  3:45                 ` Zhu, Lingshan
  2 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 19:17 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kvm, maorg, virtualization, jiri, leonro

On Thu, Sep 21, 2023 at 03:39:26PM -0300, Jason Gunthorpe wrote:
> > What is the huge amount of work am I asking to do?
> 
> You are asking us to invest in the complexity of VDPA through out
> (keep it working, keep it secure, invest time in deploying and
> debugging in the field)

I'm asking you to do nothing of the kind - I am saying that this code
will have to be duplicated in vdpa, and so I am asking what exactly is
missing to just keep it all there. So far you said iommufd and
note I didn't ask you to add iommufd to vdpa though that would be nice ;)
I just said I'll look into it in the next several days.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-21 19:17                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 19:17 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yishai Hadas, alex.williamson, jasowang, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, Sep 21, 2023 at 03:39:26PM -0300, Jason Gunthorpe wrote:
> > What is the huge amount of work am I asking to do?
> 
> You are asking us to invest in the complexity of VDPA through out
> (keep it working, keep it secure, invest time in deploying and
> debugging in the field)

I'm asking you to do nothing of the kind - I am saying that this code
will have to be duplicated in vdpa, and so I am asking what exactly is
missing to just keep it all there. So far you said iommufd and
note I didn't ask you to add iommufd to vdpa though that would be nice ;)
I just said I'll look into it in the next several days.

-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 18:16                 ` Jason Gunthorpe
@ 2023-09-21 19:34                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 19:34 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kvm, maorg, virtualization, jiri, leonro

On Thu, Sep 21, 2023 at 03:16:37PM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 01:55:42PM -0400, Michael S. Tsirkin wrote:
> 
> > That's not what I'm asking about though - not what shadow vq does,
> > shadow vq is a vdpa feature.
> 
> That's just VDPA then. We already talked about why VDPA is not a
> replacement for VFIO.

It does however work universally, by software, without any special
hardware support. Which is kind of why I am curious - if VDPA needs this
proxy code because shadow vq is slower then that's an argument for not
having it in two places, and trying to improve vdpa to use iommufd if
that's easy/practical.  If instead VDPA gives the same speed with just
shadow vq then keeping this hack in vfio seems like less of a problem.
Finally if VDPA is faster then maybe you will reconsider using it ;)

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-21 19:34                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 19:34 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alex Williamson, Yishai Hadas, jasowang, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, Sep 21, 2023 at 03:16:37PM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 01:55:42PM -0400, Michael S. Tsirkin wrote:
> 
> > That's not what I'm asking about though - not what shadow vq does,
> > shadow vq is a vdpa feature.
> 
> That's just VDPA then. We already talked about why VDPA is not a
> replacement for VFIO.

It does however work universally, by software, without any special
hardware support. Which is kind of why I am curious - if VDPA needs this
proxy code because shadow vq is slower then that's an argument for not
having it in two places, and trying to improve vdpa to use iommufd if
that's easy/practical.  If instead VDPA gives the same speed with just
shadow vq then keeping this hack in vfio seems like less of a problem.
Finally if VDPA is faster then maybe you will reconsider using it ;)

-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 19:13                 ` Michael S. Tsirkin
  (?)
@ 2023-09-21 19:49                 ` Jason Gunthorpe
  2023-09-21 20:45                     ` Michael S. Tsirkin
  2023-09-22  3:01                     ` Jason Wang
  -1 siblings, 2 replies; 321+ messages in thread
From: Jason Gunthorpe @ 2023-09-21 19:49 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Yishai Hadas, alex.williamson, jasowang, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, Sep 21, 2023 at 03:13:10PM -0400, Michael S. Tsirkin wrote:
> On Thu, Sep 21, 2023 at 03:39:26PM -0300, Jason Gunthorpe wrote:
> > On Thu, Sep 21, 2023 at 12:53:04PM -0400, Michael S. Tsirkin wrote:
> > > > vdpa is not vfio, I don't know how you can suggest vdpa is a
> > > > replacement for a vfio driver. They are completely different
> > > > things.
> > > > Each side has its own strengths, and vfio especially is accelerating
> > > > in its capability in way that vpda is not. eg if an iommufd conversion
> > > > had been done by now for vdpa I might be more sympathetic.
> > > 
> > > Yea, I agree iommufd is a big problem with vdpa right now. Cindy was
> > > sick and I didn't know and kept assuming she's working on this. I don't
> > > think it's a huge amount of work though.  I'll take a look.
> > > Is there anything else though? Do tell.
> > 
> > Confidential compute will never work with VDPA's approach.
> 
> I don't see how what this patchset is doing is different
> wrt to Confidential compute - you trap IO accesses and emulate.
> Care to elaborate?

This patch series isn't about confidential compute, you asked about
the future. VFIO will support confidential compute in the future, VDPA
will not.

> > > There are a bunch of things that I think are important for virtio
> > > that are completely out of scope for vfio, such as migrating
> > > cross-vendor. 
> > 
> > VFIO supports migration, if you want to have cross-vendor migration
> > then make a standard that describes the VFIO migration data format for
> > virtio devices.
> 
> This has nothing to do with data formats - you need two devices to
> behave identically. Which is what VDPA is about really.

We've been looking at VFIO live migration extensively. Device
mediation, like VDPA does, is one legitimate approach for live
migration. It suites a certain type of heterogeneous environment well.

But, it is equally legitimate to make the devices behave the same and
have them process a common migration data.

This can happen in public with standards, or it can happen in private
within a cloud operator's "private-standard" environment.

To date, in most of my discussions, I have not seen a strong appetite
for such public standards. In part due to the complexity.

Regardles, it is not the kernel communities job to insist on one
approach or the other.

> > You are asking us to invest in the complexity of VDPA through out
> > (keep it working, keep it secure, invest time in deploying and
> > debugging in the field)
> > 
> > When it doesn't provide *ANY* value to the solution.
> 
> There's no "the solution"

Nonsense.

> this sounds like a vendor only caring about solutions that involve
> that vendor's hardware exclusively, a little.

Not really.

Understand the DPU provider is not the vendor here. The DPU provider
gives a cloud operator a SDK to build these things. The operator is
the vendor from your perspective.

In many cases live migration never leaves the operator's confines in
the first place.

Even when it does, there is no real use case to live migrate a
virtio-net function from, say, AWS to GCP.

You are pushing for a lot of complexity and software that solves a
problem people in this space don't actually have.

As I said, VDPA is fine for the scenarios it addresses. It is an
alternative, not a replacement, for VFIO.

Jason

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 19:17                 ` Michael S. Tsirkin
  (?)
@ 2023-09-21 19:51                 ` Jason Gunthorpe
  2023-09-21 20:55                     ` Michael S. Tsirkin
  -1 siblings, 1 reply; 321+ messages in thread
From: Jason Gunthorpe @ 2023-09-21 19:51 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Yishai Hadas, alex.williamson, jasowang, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, Sep 21, 2023 at 03:17:25PM -0400, Michael S. Tsirkin wrote:
> On Thu, Sep 21, 2023 at 03:39:26PM -0300, Jason Gunthorpe wrote:
> > > What is the huge amount of work am I asking to do?
> > 
> > You are asking us to invest in the complexity of VDPA through out
> > (keep it working, keep it secure, invest time in deploying and
> > debugging in the field)
> 
> I'm asking you to do nothing of the kind - I am saying that this code
> will have to be duplicated in vdpa,

Why would that be needed?

> and so I am asking what exactly is missing to just keep it all
> there.

VFIO. Seriously, we don't want unnecessary mediation in this path at
all.

> note I didn't ask you to add iommufd to vdpa though that would be
> nice ;)

I did once send someone to look.. It didn't succeed :(

Jason

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 19:34                     ` Michael S. Tsirkin
  (?)
@ 2023-09-21 19:53                     ` Jason Gunthorpe
  2023-09-21 20:16                         ` Michael S. Tsirkin
  2023-09-22  3:02                         ` Jason Wang
  -1 siblings, 2 replies; 321+ messages in thread
From: Jason Gunthorpe @ 2023-09-21 19:53 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Yishai Hadas, jasowang, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, Sep 21, 2023 at 03:34:03PM -0400, Michael S. Tsirkin wrote:

> that's easy/practical.  If instead VDPA gives the same speed with just
> shadow vq then keeping this hack in vfio seems like less of a problem.
> Finally if VDPA is faster then maybe you will reconsider using it ;)

It is not all about the speed.

VDPA presents another large and complex software stack in the
hypervisor that can be eliminated by simply using VFIO. VFIO is
already required for other scenarios.

This is about reducing complexity, reducing attack surface and
increasing maintainability of the hypervisor environment.

Jason
 

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 12:40   ` Yishai Hadas
@ 2023-09-21 19:58     ` Alex Williamson
  -1 siblings, 0 replies; 321+ messages in thread
From: Alex Williamson @ 2023-09-21 19:58 UTC (permalink / raw)
  To: Yishai Hadas; +Cc: kvm, mst, maorg, virtualization, jgg, jiri, leonro

On Thu, 21 Sep 2023 15:40:40 +0300
Yishai Hadas <yishaih@nvidia.com> wrote:

> Introduce a vfio driver over virtio devices to support the legacy
> interface functionality for VFs.
> 
> Background, from the virtio spec [1].
> --------------------------------------------------------------------
> In some systems, there is a need to support a virtio legacy driver with
> a device that does not directly support the legacy interface. In such
> scenarios, a group owner device can provide the legacy interface
> functionality for the group member devices. The driver of the owner
> device can then access the legacy interface of a member device on behalf
> of the legacy member device driver.
> 
> For example, with the SR-IOV group type, group members (VFs) can not
> present the legacy interface in an I/O BAR in BAR0 as expected by the
> legacy pci driver. If the legacy driver is running inside a virtual
> machine, the hypervisor executing the virtual machine can present a
> virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
> legacy driver accesses to this I/O BAR and forwards them to the group
> owner device (PF) using group administration commands.
> --------------------------------------------------------------------
> 
> Specifically, this driver adds support for a virtio-net VF to be exposed
> as a transitional device to a guest driver and allows the legacy IO BAR
> functionality on top.
> 
> This allows a VM which uses a legacy virtio-net driver in the guest to
> work transparently over a VF which its driver in the host is that new
> driver.
> 
> The driver can be extended easily to support some other types of virtio
> devices (e.g virtio-blk), by adding in a few places the specific type
> properties as was done for virtio-net.
> 
> For now, only the virtio-net use case was tested and as such we introduce
> the support only for such a device.
> 
> Practically,
> Upon probing a VF for a virtio-net device, in case its PF supports
> legacy access over the virtio admin commands and the VF doesn't have BAR
> 0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
> transitional device with I/O BAR in BAR 0.
> 
> The existence of the simulated I/O bar is reported later on by
> overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
> exposes itself as a transitional device by overwriting some properties
> upon reading its config space.
> 
> Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
> guest may use it via read/write calls according to the virtio
> specification.
> 
> Any read/write towards the control parts of the BAR will be captured by
> the new driver and will be translated into admin commands towards the
> device.
> 
> Any data path read/write access (i.e. virtio driver notifications) will
> be forwarded to the physical BAR which its properties were supplied by
> the command VIRTIO_PCI_QUEUE_NOTIFY upon the probing/init flow.
> 
> With that code in place a legacy driver in the guest has the look and
> feel as if having a transitional device with legacy support for both its
> control and data path flows.
> 
> [1]
> https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
> 
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> ---
>  MAINTAINERS                      |   6 +
>  drivers/vfio/pci/Kconfig         |   2 +
>  drivers/vfio/pci/Makefile        |   2 +
>  drivers/vfio/pci/virtio/Kconfig  |  15 +
>  drivers/vfio/pci/virtio/Makefile |   4 +
>  drivers/vfio/pci/virtio/cmd.c    |   4 +-
>  drivers/vfio/pci/virtio/cmd.h    |   8 +
>  drivers/vfio/pci/virtio/main.c   | 546 +++++++++++++++++++++++++++++++
>  8 files changed, 585 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/vfio/pci/virtio/Kconfig
>  create mode 100644 drivers/vfio/pci/virtio/Makefile
>  create mode 100644 drivers/vfio/pci/virtio/main.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index bf0f54c24f81..5098418c8389 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -22624,6 +22624,12 @@ L:	kvm@vger.kernel.org
>  S:	Maintained
>  F:	drivers/vfio/pci/mlx5/
>  
> +VFIO VIRTIO PCI DRIVER
> +M:	Yishai Hadas <yishaih@nvidia.com>
> +L:	kvm@vger.kernel.org
> +S:	Maintained
> +F:	drivers/vfio/pci/virtio
> +
>  VFIO PCI DEVICE SPECIFIC DRIVERS
>  R:	Jason Gunthorpe <jgg@nvidia.com>
>  R:	Yishai Hadas <yishaih@nvidia.com>
> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> index 8125e5f37832..18c397df566d 100644
> --- a/drivers/vfio/pci/Kconfig
> +++ b/drivers/vfio/pci/Kconfig
> @@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
>  
>  source "drivers/vfio/pci/pds/Kconfig"
>  
> +source "drivers/vfio/pci/virtio/Kconfig"
> +
>  endmenu
> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
> index 45167be462d8..046139a4eca5 100644
> --- a/drivers/vfio/pci/Makefile
> +++ b/drivers/vfio/pci/Makefile
> @@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
>  obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
>  
>  obj-$(CONFIG_PDS_VFIO_PCI) += pds/
> +
> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
> diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
> new file mode 100644
> index 000000000000..89eddce8b1bd
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/Kconfig
> @@ -0,0 +1,15 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +config VIRTIO_VFIO_PCI
> +        tristate "VFIO support for VIRTIO PCI devices"
> +        depends on VIRTIO_PCI
> +        select VFIO_PCI_CORE
> +        help
> +          This provides support for exposing VIRTIO VF devices using the VFIO
> +          framework that can work with a legacy virtio driver in the guest.
> +          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
> +          not indicate I/O Space.
> +          As of that this driver emulated I/O BAR in software to let a VF be
> +          seen as a transitional device in the guest and let it work with
> +          a legacy driver.
> +
> +          If you don't know what to do here, say N.
> diff --git a/drivers/vfio/pci/virtio/Makefile b/drivers/vfio/pci/virtio/Makefile
> new file mode 100644
> index 000000000000..584372648a03
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/Makefile
> @@ -0,0 +1,4 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio-vfio-pci.o
> +virtio-vfio-pci-y := main.o cmd.o
> +
> diff --git a/drivers/vfio/pci/virtio/cmd.c b/drivers/vfio/pci/virtio/cmd.c
> index f068239cdbb0..aea9d25fbf1d 100644
> --- a/drivers/vfio/pci/virtio/cmd.c
> +++ b/drivers/vfio/pci/virtio/cmd.c
> @@ -44,7 +44,7 @@ int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
>  {
>  	struct virtio_device *virtio_dev =
>  		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> -	struct virtio_admin_cmd_data_lr_write *in;
> +	struct virtio_admin_cmd_legacy_wr_data *in;
>  	struct scatterlist in_sg;
>  	struct virtio_admin_cmd cmd = {};
>  	int ret;
> @@ -74,7 +74,7 @@ int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
>  {
>  	struct virtio_device *virtio_dev =
>  		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> -	struct virtio_admin_cmd_data_lr_read *in;
> +	struct virtio_admin_cmd_legacy_rd_data *in;
>  	struct scatterlist in_sg, out_sg;
>  	struct virtio_admin_cmd cmd = {};
>  	int ret;
> diff --git a/drivers/vfio/pci/virtio/cmd.h b/drivers/vfio/pci/virtio/cmd.h
> index c2a3645f4b90..347b1dc85570 100644
> --- a/drivers/vfio/pci/virtio/cmd.h
> +++ b/drivers/vfio/pci/virtio/cmd.h
> @@ -13,7 +13,15 @@
>  
>  struct virtiovf_pci_core_device {
>  	struct vfio_pci_core_device core_device;
> +	u8 bar0_virtual_buf_size;
> +	u8 *bar0_virtual_buf;
> +	/* synchronize access to the virtual buf */
> +	struct mutex bar_mutex;
>  	int vf_id;
> +	void __iomem *notify_addr;
> +	u32 notify_offset;
> +	u8 notify_bar;
> +	u8 pci_cmd_io :1;
>  };
>  
>  int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
> diff --git a/drivers/vfio/pci/virtio/main.c b/drivers/vfio/pci/virtio/main.c
> new file mode 100644
> index 000000000000..2486991c49f3
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/main.c
> @@ -0,0 +1,546 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> + */
> +
> +#include <linux/device.h>
> +#include <linux/module.h>
> +#include <linux/mutex.h>
> +#include <linux/pci.h>
> +#include <linux/pm_runtime.h>
> +#include <linux/types.h>
> +#include <linux/uaccess.h>
> +#include <linux/vfio.h>
> +#include <linux/vfio_pci_core.h>
> +#include <linux/virtio_pci.h>
> +#include <linux/virtio_net.h>
> +#include <linux/virtio_pci_modern.h>
> +
> +#include "cmd.h"
> +
> +#define VIRTIO_LEGACY_IO_BAR_HEADER_LEN 20
> +#define VIRTIO_LEGACY_IO_BAR_MSIX_HEADER_LEN 4
> +
> +static int virtiovf_issue_lr_cmd(struct virtiovf_pci_core_device *virtvdev,
> +				 loff_t pos, char __user *buf,
> +				 size_t count, bool read)
> +{
> +	u8 *bar0_buf = virtvdev->bar0_virtual_buf;
> +	u16 opcode;
> +	int ret;
> +
> +	mutex_lock(&virtvdev->bar_mutex);
> +	if (read) {
> +		opcode = (pos < VIRTIO_PCI_CONFIG_OFF(true)) ?
> +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ :
> +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ;
> +		ret = virtiovf_cmd_lr_read(virtvdev, opcode, pos,
> +					   count, bar0_buf + pos);
> +		if (ret)
> +			goto out;
> +		if (copy_to_user(buf, bar0_buf + pos, count))
> +			ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	if (copy_from_user(bar0_buf + pos, buf, count)) {
> +		ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	opcode = (pos < VIRTIO_PCI_CONFIG_OFF(true)) ?
> +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE :
> +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE;
> +	ret = virtiovf_cmd_lr_write(virtvdev, opcode, pos, count,
> +				    bar0_buf + pos);
> +out:
> +	mutex_unlock(&virtvdev->bar_mutex);
> +	return ret;
> +}
> +
> +static int
> +translate_io_bar_to_mem_bar(struct virtiovf_pci_core_device *virtvdev,
> +			    loff_t pos, char __user *buf,
> +			    size_t count, bool read)
> +{
> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
> +	u16 queue_notify;
> +	int ret;
> +
> +	if (pos + count > virtvdev->bar0_virtual_buf_size)
> +		return -EINVAL;
> +
> +	switch (pos) {
> +	case VIRTIO_PCI_QUEUE_NOTIFY:
> +		if (count != sizeof(queue_notify))
> +			return -EINVAL;
> +		if (read) {
> +			ret = vfio_pci_ioread16(core_device, true, &queue_notify,
> +						virtvdev->notify_addr);
> +			if (ret)
> +				return ret;
> +			if (copy_to_user(buf, &queue_notify,
> +					 sizeof(queue_notify)))
> +				return -EFAULT;
> +			break;
> +		}
> +
> +		if (copy_from_user(&queue_notify, buf, count))
> +			return -EFAULT;
> +
> +		ret = vfio_pci_iowrite16(core_device, true, queue_notify,
> +					 virtvdev->notify_addr);
> +		break;
> +	default:
> +		ret = virtiovf_issue_lr_cmd(virtvdev, pos, buf, count, read);
> +	}
> +
> +	return ret ? ret : count;
> +}
> +
> +static bool range_contains_range(loff_t range1_start, size_t count1,
> +				 loff_t range2_start, size_t count2,
> +				 loff_t *start_offset)
> +{
> +	if (range1_start <= range2_start &&
> +	    range1_start + count1 >= range2_start + count2) {
> +		*start_offset = range2_start - range1_start;
> +		return true;
> +	}
> +	return false;
> +}
> +
> +static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
> +					char __user *buf, size_t count,
> +					loff_t *ppos)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +	loff_t copy_offset;
> +	__le32 val32;
> +	__le16 val16;
> +	u8 val8;
> +	int ret;
> +
> +	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
> +	if (ret < 0)
> +		return ret;
> +
> +	if (range_contains_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
> +				 &copy_offset)) {
> +		val16 = cpu_to_le16(0x1000);
> +		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
> +			return -EFAULT;
> +	}

So we take a 0x1041 ("Virtio 1.0 network device") and turn it into a
0x1000 ("Virtio network device").  Are there no features implied by the
device ID?  NB, a byte-wise access would read the real device ID.

> +
> +	if (virtvdev->pci_cmd_io &&
> +	    range_contains_range(pos, count, PCI_COMMAND, sizeof(val16),
> +				 &copy_offset)) {
> +		if (copy_from_user(&val16, buf, sizeof(val16)))
> +			return -EFAULT;
> +		val16 |= cpu_to_le16(PCI_COMMAND_IO);
> +		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
> +			return -EFAULT;
> +	}

So we can't turn off I/O memory.

> +
> +	if (range_contains_range(pos, count, PCI_REVISION_ID, sizeof(val8),
> +				 &copy_offset)) {
> +		/* Transional needs to have revision 0 */
> +		val8 = 0;
> +		if (copy_to_user(buf + copy_offset, &val8, sizeof(val8)))
> +			return -EFAULT;
> +	}

Surely some driver cares about this, right?  How is this supposed to
work in a world where libvirt parses modules.alias and automatically
loads this driver rather than vfio-pci for all 0x1041 devices?  We'd
need to denylist this driver to ever see the device for what it is.

> +
> +	if (range_contains_range(pos, count, PCI_BASE_ADDRESS_0, sizeof(val32),
> +				 &copy_offset)) {
> +		val32 = cpu_to_le32(PCI_BASE_ADDRESS_SPACE_IO);
> +		if (copy_to_user(buf + copy_offset, &val32, sizeof(val32)))
> +			return -EFAULT;
> +	}

Sloppy BAR emulation compared to the real BARs.  QEMU obviously doesn't
care.

> +
> +	if (range_contains_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16),
> +				 &copy_offset)) {
> +		/* Transitional devices use the PCI subsystem device id as
> +		 * virtio device id, same as legacy driver always did.
> +		 */

Non-networking multi-line comment style throughout please.

> +		val16 = cpu_to_le16(VIRTIO_ID_NET);
> +		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
> +			return -EFAULT;
> +	}
> +
> +	return count;
> +}
> +
> +static ssize_t
> +virtiovf_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
> +		       size_t count, loff_t *ppos)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +	int ret;
> +
> +	if (!count)
> +		return 0;
> +
> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX)
> +		return virtiovf_pci_read_config(core_vdev, buf, count, ppos);
> +
> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
> +		return vfio_pci_core_read(core_vdev, buf, count, ppos);
> +
> +	ret = pm_runtime_resume_and_get(&pdev->dev);
> +	if (ret) {
> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n",
> +				     ret);
> +		return -EIO;
> +	}
> +
> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, buf, count, true);

If the heart of this driver is simply pretending to have an I/O BAR
where I/O accesses into that BAR are translated to accesses in the MMIO
BAR, why can't this be done in the VMM, ie. QEMU?  Could I/O to MMIO
translation in QEMU improve performance (ex. if the MMIO is mmap'd and
can be accessed without bouncing back into kernel code)?


> +	pm_runtime_put(&pdev->dev);
> +	return ret;
> +}
> +
> +static ssize_t
> +virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
> +			size_t count, loff_t *ppos)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +	int ret;
> +
> +	if (!count)
> +		return 0;
> +
> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
> +		loff_t copy_offset;
> +		u16 cmd;
> +
> +		if (range_contains_range(pos, count, PCI_COMMAND, sizeof(cmd),
> +					 &copy_offset)) {
> +			if (copy_from_user(&cmd, buf + copy_offset, sizeof(cmd)))
> +				return -EFAULT;
> +			virtvdev->pci_cmd_io = (cmd & PCI_COMMAND_IO);

If we're tracking writes to PCI_COMMAND_IO, why did we statically
report I/O enabled in the read function previously?

> +		}
> +	}
> +
> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
> +		return vfio_pci_core_write(core_vdev, buf, count, ppos);
> +
> +	ret = pm_runtime_resume_and_get(&pdev->dev);
> +	if (ret) {
> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n", ret);
> +		return -EIO;
> +	}
> +
> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, (char __user *)buf, count, false);
> +	pm_runtime_put(&pdev->dev);
> +	return ret;
> +}
> +
> +static int
> +virtiovf_pci_ioctl_get_region_info(struct vfio_device *core_vdev,
> +				   unsigned int cmd, unsigned long arg)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	unsigned long minsz = offsetofend(struct vfio_region_info, offset);
> +	void __user *uarg = (void __user *)arg;
> +	struct vfio_region_info info = {};
> +
> +	if (copy_from_user(&info, uarg, minsz))
> +		return -EFAULT;
> +
> +	if (info.argsz < minsz)
> +		return -EINVAL;
> +
> +	switch (info.index) {
> +	case VFIO_PCI_BAR0_REGION_INDEX:
> +		info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
> +		info.size = virtvdev->bar0_virtual_buf_size;
> +		info.flags = VFIO_REGION_INFO_FLAG_READ |
> +			     VFIO_REGION_INFO_FLAG_WRITE;
> +		return copy_to_user(uarg, &info, minsz) ? -EFAULT : 0;
> +	default:
> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
> +	}
> +}
> +
> +static long
> +virtiovf_vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
> +			     unsigned long arg)
> +{
> +	switch (cmd) {
> +	case VFIO_DEVICE_GET_REGION_INFO:
> +		return virtiovf_pci_ioctl_get_region_info(core_vdev, cmd, arg);
> +	default:
> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
> +	}
> +}
> +
> +static int
> +virtiovf_set_notify_addr(struct virtiovf_pci_core_device *virtvdev)
> +{
> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
> +	int ret;
> +
> +	/* Setup the BAR where the 'notify' exists to be used by vfio as well
> +	 * This will let us mmap it only once and use it when needed.
> +	 */
> +	ret = vfio_pci_core_setup_barmap(core_device,
> +					 virtvdev->notify_bar);
> +	if (ret)
> +		return ret;
> +
> +	virtvdev->notify_addr = core_device->barmap[virtvdev->notify_bar] +
> +			virtvdev->notify_offset;
> +	return 0;
> +}
> +
> +static int virtiovf_pci_open_device(struct vfio_device *core_vdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct vfio_pci_core_device *vdev = &virtvdev->core_device;
> +	int ret;
> +
> +	ret = vfio_pci_core_enable(vdev);
> +	if (ret)
> +		return ret;
> +
> +	if (virtvdev->bar0_virtual_buf) {
> +		/* upon close_device() the vfio_pci_core_disable() is called
> +		 * and will close all the previous mmaps, so it seems that the
> +		 * valid life cycle for the 'notify' addr is per open/close.
> +		 */
> +		ret = virtiovf_set_notify_addr(virtvdev);
> +		if (ret) {
> +			vfio_pci_core_disable(vdev);
> +			return ret;
> +		}
> +	}
> +
> +	vfio_pci_core_finish_enable(vdev);
> +	return 0;
> +}
> +
> +static void virtiovf_pci_close_device(struct vfio_device *core_vdev)
> +{
> +	vfio_pci_core_close_device(core_vdev);
> +}

Why does this function exist?

> +
> +static int virtiovf_get_device_config_size(unsigned short device)
> +{
> +	switch (device) {
> +	case 0x1041:
> +		/* network card */
> +		return offsetofend(struct virtio_net_config, status);
> +	default:
> +		return 0;
> +	}
> +}
> +
> +static int virtiovf_read_notify_info(struct virtiovf_pci_core_device *virtvdev)
> +{
> +	u64 offset;
> +	int ret;
> +	u8 bar;
> +
> +	ret = virtiovf_cmd_lq_read_notify(virtvdev,
> +				VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM,
> +				&bar, &offset);
> +	if (ret)
> +		return ret;
> +
> +	virtvdev->notify_bar = bar;
> +	virtvdev->notify_offset = offset;
> +	return 0;
> +}
> +
> +static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct pci_dev *pdev;
> +	int ret;
> +
> +	ret = vfio_pci_core_init_dev(core_vdev);
> +	if (ret)
> +		return ret;
> +
> +	pdev = virtvdev->core_device.pdev;
> +	virtvdev->vf_id = pci_iov_vf_id(pdev);
> +	if (virtvdev->vf_id < 0)
> +		return -EINVAL;

vf_id is never used.

> +
> +	ret = virtiovf_read_notify_info(virtvdev);
> +	if (ret)
> +		return ret;
> +
> +	virtvdev->bar0_virtual_buf_size = VIRTIO_LEGACY_IO_BAR_HEADER_LEN +
> +		VIRTIO_LEGACY_IO_BAR_MSIX_HEADER_LEN +
> +		virtiovf_get_device_config_size(pdev->device);
> +	virtvdev->bar0_virtual_buf = kzalloc(virtvdev->bar0_virtual_buf_size,
> +					     GFP_KERNEL);
> +	if (!virtvdev->bar0_virtual_buf)
> +		return -ENOMEM;
> +	mutex_init(&virtvdev->bar_mutex);
> +	return 0;
> +}
> +
> +static void virtiovf_pci_core_release_dev(struct vfio_device *core_vdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +
> +	kfree(virtvdev->bar0_virtual_buf);
> +	vfio_pci_core_release_dev(core_vdev);
> +}
> +
> +static const struct vfio_device_ops virtiovf_acc_vfio_pci_tran_ops = {
> +	.name = "virtio-transitional-vfio-pci",
> +	.init = virtiovf_pci_init_device,
> +	.release = virtiovf_pci_core_release_dev,
> +	.open_device = virtiovf_pci_open_device,
> +	.close_device = virtiovf_pci_close_device,
> +	.ioctl = virtiovf_vfio_pci_core_ioctl,
> +	.read = virtiovf_pci_core_read,
> +	.write = virtiovf_pci_core_write,
> +	.mmap = vfio_pci_core_mmap,
> +	.request = vfio_pci_core_request,
> +	.match = vfio_pci_core_match,
> +	.bind_iommufd = vfio_iommufd_physical_bind,
> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
> +};
> +
> +static const struct vfio_device_ops virtiovf_acc_vfio_pci_ops = {
> +	.name = "virtio-acc-vfio-pci",
> +	.init = vfio_pci_core_init_dev,
> +	.release = vfio_pci_core_release_dev,
> +	.open_device = virtiovf_pci_open_device,
> +	.close_device = virtiovf_pci_close_device,
> +	.ioctl = vfio_pci_core_ioctl,
> +	.device_feature = vfio_pci_core_ioctl_feature,
> +	.read = vfio_pci_core_read,
> +	.write = vfio_pci_core_write,
> +	.mmap = vfio_pci_core_mmap,
> +	.request = vfio_pci_core_request,
> +	.match = vfio_pci_core_match,
> +	.bind_iommufd = vfio_iommufd_physical_bind,
> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
> +};

Why are we claiming devices that should just use vfio-pci instead?

> +
> +static bool virtiovf_bar0_exists(struct pci_dev *pdev)
> +{
> +	struct resource *res = pdev->resource;
> +
> +	return res->flags ? true : false;
> +}
> +
> +#define VIRTIOVF_USE_ADMIN_CMD_BITMAP \
> +	(BIT_ULL(VIRTIO_ADMIN_CMD_LIST_QUERY) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LIST_USE) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
> +
> +static bool virtiovf_support_legacy_access(struct pci_dev *pdev)
> +{
> +	int buf_size = DIV_ROUND_UP(VIRTIO_ADMIN_MAX_CMD_OPCODE, 64) * 8;
> +	u8 *buf;
> +	int ret;
> +
> +	/* Only virtio-net is supported/tested so far */
> +	if (pdev->device != 0x1041)
> +		return false;

Seems like the ID table should handle this, why are we preemptively
claiming all virtio devices... or actually all 0x1af4 devices, which
might not even be virtio, ex. the non-virtio ivshmem devices is 0x1110.

> +
> +	buf = kzalloc(buf_size, GFP_KERNEL);
> +	if (!buf)
> +		return false;
> +
> +	ret = virtiovf_cmd_list_query(pdev, buf, buf_size);
> +	if (ret)
> +		goto end;
> +
> +	if ((le64_to_cpup((__le64 *)buf) & VIRTIOVF_USE_ADMIN_CMD_BITMAP) !=
> +		VIRTIOVF_USE_ADMIN_CMD_BITMAP) {
> +		ret = -EOPNOTSUPP;
> +		goto end;
> +	}
> +
> +	/* confirm the used commands */
> +	memset(buf, 0, buf_size);
> +	*(__le64 *)buf = cpu_to_le64(VIRTIOVF_USE_ADMIN_CMD_BITMAP);
> +	ret = virtiovf_cmd_list_use(pdev, buf, buf_size);
> +
> +end:
> +	kfree(buf);
> +	return ret ? false : true;
> +}
> +
> +static int virtiovf_pci_probe(struct pci_dev *pdev,
> +			      const struct pci_device_id *id)
> +{
> +	const struct vfio_device_ops *ops = &virtiovf_acc_vfio_pci_ops;
> +	struct virtiovf_pci_core_device *virtvdev;
> +	int ret;
> +
> +	if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
> +	    !virtiovf_bar0_exists(pdev) && pdev->msix_cap)
> +		ops = &virtiovf_acc_vfio_pci_tran_ops;
> +
> +	virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
> +				     &pdev->dev, ops);
> +	if (IS_ERR(virtvdev))
> +		return PTR_ERR(virtvdev);
> +
> +	dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
> +	ret = vfio_pci_core_register_device(&virtvdev->core_device);
> +	if (ret)
> +		goto out;
> +	return 0;
> +out:
> +	vfio_put_device(&virtvdev->core_device.vdev);
> +	return ret;
> +}
> +
> +static void virtiovf_pci_remove(struct pci_dev *pdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
> +
> +	vfio_pci_core_unregister_device(&virtvdev->core_device);
> +	vfio_put_device(&virtvdev->core_device.vdev);
> +}
> +
> +static const struct pci_device_id virtiovf_pci_table[] = {
> +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, PCI_ANY_ID) },

libvirt will blindly use this driver for all devices matching this as
we've discussed how it should make use of modules.alias.  I don't think
this driver should be squatting on devices where it doesn't add value
and it's not clear whether this is adding or subtracting value in all
cases for the one NIC that it modifies.  How should libvirt choose when
and where to use this driver?  What regressions are we going to see
with VMs that previously saw "modern" virtio-net devices and now see a
legacy compatible device?  Thanks,

Alex

> +	{}
> +};
> +
> +MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
> +
> +static struct pci_driver virtiovf_pci_driver = {
> +	.name = KBUILD_MODNAME,
> +	.id_table = virtiovf_pci_table,
> +	.probe = virtiovf_pci_probe,
> +	.remove = virtiovf_pci_remove,
> +	.err_handler = &vfio_pci_core_err_handlers,
> +	.driver_managed_dma = true,
> +};
> +
> +module_pci_driver(virtiovf_pci_driver);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
> +MODULE_DESCRIPTION(
> +	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO device family");

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-21 19:58     ` Alex Williamson
  0 siblings, 0 replies; 321+ messages in thread
From: Alex Williamson @ 2023-09-21 19:58 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: mst, jasowang, jgg, kvm, virtualization, parav, feliu, jiri,
	kevin.tian, joao.m.martins, leonro, maorg

On Thu, 21 Sep 2023 15:40:40 +0300
Yishai Hadas <yishaih@nvidia.com> wrote:

> Introduce a vfio driver over virtio devices to support the legacy
> interface functionality for VFs.
> 
> Background, from the virtio spec [1].
> --------------------------------------------------------------------
> In some systems, there is a need to support a virtio legacy driver with
> a device that does not directly support the legacy interface. In such
> scenarios, a group owner device can provide the legacy interface
> functionality for the group member devices. The driver of the owner
> device can then access the legacy interface of a member device on behalf
> of the legacy member device driver.
> 
> For example, with the SR-IOV group type, group members (VFs) can not
> present the legacy interface in an I/O BAR in BAR0 as expected by the
> legacy pci driver. If the legacy driver is running inside a virtual
> machine, the hypervisor executing the virtual machine can present a
> virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
> legacy driver accesses to this I/O BAR and forwards them to the group
> owner device (PF) using group administration commands.
> --------------------------------------------------------------------
> 
> Specifically, this driver adds support for a virtio-net VF to be exposed
> as a transitional device to a guest driver and allows the legacy IO BAR
> functionality on top.
> 
> This allows a VM which uses a legacy virtio-net driver in the guest to
> work transparently over a VF which its driver in the host is that new
> driver.
> 
> The driver can be extended easily to support some other types of virtio
> devices (e.g virtio-blk), by adding in a few places the specific type
> properties as was done for virtio-net.
> 
> For now, only the virtio-net use case was tested and as such we introduce
> the support only for such a device.
> 
> Practically,
> Upon probing a VF for a virtio-net device, in case its PF supports
> legacy access over the virtio admin commands and the VF doesn't have BAR
> 0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
> transitional device with I/O BAR in BAR 0.
> 
> The existence of the simulated I/O bar is reported later on by
> overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
> exposes itself as a transitional device by overwriting some properties
> upon reading its config space.
> 
> Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
> guest may use it via read/write calls according to the virtio
> specification.
> 
> Any read/write towards the control parts of the BAR will be captured by
> the new driver and will be translated into admin commands towards the
> device.
> 
> Any data path read/write access (i.e. virtio driver notifications) will
> be forwarded to the physical BAR which its properties were supplied by
> the command VIRTIO_PCI_QUEUE_NOTIFY upon the probing/init flow.
> 
> With that code in place a legacy driver in the guest has the look and
> feel as if having a transitional device with legacy support for both its
> control and data path flows.
> 
> [1]
> https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
> 
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> ---
>  MAINTAINERS                      |   6 +
>  drivers/vfio/pci/Kconfig         |   2 +
>  drivers/vfio/pci/Makefile        |   2 +
>  drivers/vfio/pci/virtio/Kconfig  |  15 +
>  drivers/vfio/pci/virtio/Makefile |   4 +
>  drivers/vfio/pci/virtio/cmd.c    |   4 +-
>  drivers/vfio/pci/virtio/cmd.h    |   8 +
>  drivers/vfio/pci/virtio/main.c   | 546 +++++++++++++++++++++++++++++++
>  8 files changed, 585 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/vfio/pci/virtio/Kconfig
>  create mode 100644 drivers/vfio/pci/virtio/Makefile
>  create mode 100644 drivers/vfio/pci/virtio/main.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index bf0f54c24f81..5098418c8389 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -22624,6 +22624,12 @@ L:	kvm@vger.kernel.org
>  S:	Maintained
>  F:	drivers/vfio/pci/mlx5/
>  
> +VFIO VIRTIO PCI DRIVER
> +M:	Yishai Hadas <yishaih@nvidia.com>
> +L:	kvm@vger.kernel.org
> +S:	Maintained
> +F:	drivers/vfio/pci/virtio
> +
>  VFIO PCI DEVICE SPECIFIC DRIVERS
>  R:	Jason Gunthorpe <jgg@nvidia.com>
>  R:	Yishai Hadas <yishaih@nvidia.com>
> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> index 8125e5f37832..18c397df566d 100644
> --- a/drivers/vfio/pci/Kconfig
> +++ b/drivers/vfio/pci/Kconfig
> @@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
>  
>  source "drivers/vfio/pci/pds/Kconfig"
>  
> +source "drivers/vfio/pci/virtio/Kconfig"
> +
>  endmenu
> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
> index 45167be462d8..046139a4eca5 100644
> --- a/drivers/vfio/pci/Makefile
> +++ b/drivers/vfio/pci/Makefile
> @@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
>  obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
>  
>  obj-$(CONFIG_PDS_VFIO_PCI) += pds/
> +
> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
> diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
> new file mode 100644
> index 000000000000..89eddce8b1bd
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/Kconfig
> @@ -0,0 +1,15 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +config VIRTIO_VFIO_PCI
> +        tristate "VFIO support for VIRTIO PCI devices"
> +        depends on VIRTIO_PCI
> +        select VFIO_PCI_CORE
> +        help
> +          This provides support for exposing VIRTIO VF devices using the VFIO
> +          framework that can work with a legacy virtio driver in the guest.
> +          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
> +          not indicate I/O Space.
> +          As of that this driver emulated I/O BAR in software to let a VF be
> +          seen as a transitional device in the guest and let it work with
> +          a legacy driver.
> +
> +          If you don't know what to do here, say N.
> diff --git a/drivers/vfio/pci/virtio/Makefile b/drivers/vfio/pci/virtio/Makefile
> new file mode 100644
> index 000000000000..584372648a03
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/Makefile
> @@ -0,0 +1,4 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio-vfio-pci.o
> +virtio-vfio-pci-y := main.o cmd.o
> +
> diff --git a/drivers/vfio/pci/virtio/cmd.c b/drivers/vfio/pci/virtio/cmd.c
> index f068239cdbb0..aea9d25fbf1d 100644
> --- a/drivers/vfio/pci/virtio/cmd.c
> +++ b/drivers/vfio/pci/virtio/cmd.c
> @@ -44,7 +44,7 @@ int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
>  {
>  	struct virtio_device *virtio_dev =
>  		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> -	struct virtio_admin_cmd_data_lr_write *in;
> +	struct virtio_admin_cmd_legacy_wr_data *in;
>  	struct scatterlist in_sg;
>  	struct virtio_admin_cmd cmd = {};
>  	int ret;
> @@ -74,7 +74,7 @@ int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
>  {
>  	struct virtio_device *virtio_dev =
>  		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> -	struct virtio_admin_cmd_data_lr_read *in;
> +	struct virtio_admin_cmd_legacy_rd_data *in;
>  	struct scatterlist in_sg, out_sg;
>  	struct virtio_admin_cmd cmd = {};
>  	int ret;
> diff --git a/drivers/vfio/pci/virtio/cmd.h b/drivers/vfio/pci/virtio/cmd.h
> index c2a3645f4b90..347b1dc85570 100644
> --- a/drivers/vfio/pci/virtio/cmd.h
> +++ b/drivers/vfio/pci/virtio/cmd.h
> @@ -13,7 +13,15 @@
>  
>  struct virtiovf_pci_core_device {
>  	struct vfio_pci_core_device core_device;
> +	u8 bar0_virtual_buf_size;
> +	u8 *bar0_virtual_buf;
> +	/* synchronize access to the virtual buf */
> +	struct mutex bar_mutex;
>  	int vf_id;
> +	void __iomem *notify_addr;
> +	u32 notify_offset;
> +	u8 notify_bar;
> +	u8 pci_cmd_io :1;
>  };
>  
>  int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
> diff --git a/drivers/vfio/pci/virtio/main.c b/drivers/vfio/pci/virtio/main.c
> new file mode 100644
> index 000000000000..2486991c49f3
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/main.c
> @@ -0,0 +1,546 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> + */
> +
> +#include <linux/device.h>
> +#include <linux/module.h>
> +#include <linux/mutex.h>
> +#include <linux/pci.h>
> +#include <linux/pm_runtime.h>
> +#include <linux/types.h>
> +#include <linux/uaccess.h>
> +#include <linux/vfio.h>
> +#include <linux/vfio_pci_core.h>
> +#include <linux/virtio_pci.h>
> +#include <linux/virtio_net.h>
> +#include <linux/virtio_pci_modern.h>
> +
> +#include "cmd.h"
> +
> +#define VIRTIO_LEGACY_IO_BAR_HEADER_LEN 20
> +#define VIRTIO_LEGACY_IO_BAR_MSIX_HEADER_LEN 4
> +
> +static int virtiovf_issue_lr_cmd(struct virtiovf_pci_core_device *virtvdev,
> +				 loff_t pos, char __user *buf,
> +				 size_t count, bool read)
> +{
> +	u8 *bar0_buf = virtvdev->bar0_virtual_buf;
> +	u16 opcode;
> +	int ret;
> +
> +	mutex_lock(&virtvdev->bar_mutex);
> +	if (read) {
> +		opcode = (pos < VIRTIO_PCI_CONFIG_OFF(true)) ?
> +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ :
> +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ;
> +		ret = virtiovf_cmd_lr_read(virtvdev, opcode, pos,
> +					   count, bar0_buf + pos);
> +		if (ret)
> +			goto out;
> +		if (copy_to_user(buf, bar0_buf + pos, count))
> +			ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	if (copy_from_user(bar0_buf + pos, buf, count)) {
> +		ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	opcode = (pos < VIRTIO_PCI_CONFIG_OFF(true)) ?
> +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE :
> +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE;
> +	ret = virtiovf_cmd_lr_write(virtvdev, opcode, pos, count,
> +				    bar0_buf + pos);
> +out:
> +	mutex_unlock(&virtvdev->bar_mutex);
> +	return ret;
> +}
> +
> +static int
> +translate_io_bar_to_mem_bar(struct virtiovf_pci_core_device *virtvdev,
> +			    loff_t pos, char __user *buf,
> +			    size_t count, bool read)
> +{
> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
> +	u16 queue_notify;
> +	int ret;
> +
> +	if (pos + count > virtvdev->bar0_virtual_buf_size)
> +		return -EINVAL;
> +
> +	switch (pos) {
> +	case VIRTIO_PCI_QUEUE_NOTIFY:
> +		if (count != sizeof(queue_notify))
> +			return -EINVAL;
> +		if (read) {
> +			ret = vfio_pci_ioread16(core_device, true, &queue_notify,
> +						virtvdev->notify_addr);
> +			if (ret)
> +				return ret;
> +			if (copy_to_user(buf, &queue_notify,
> +					 sizeof(queue_notify)))
> +				return -EFAULT;
> +			break;
> +		}
> +
> +		if (copy_from_user(&queue_notify, buf, count))
> +			return -EFAULT;
> +
> +		ret = vfio_pci_iowrite16(core_device, true, queue_notify,
> +					 virtvdev->notify_addr);
> +		break;
> +	default:
> +		ret = virtiovf_issue_lr_cmd(virtvdev, pos, buf, count, read);
> +	}
> +
> +	return ret ? ret : count;
> +}
> +
> +static bool range_contains_range(loff_t range1_start, size_t count1,
> +				 loff_t range2_start, size_t count2,
> +				 loff_t *start_offset)
> +{
> +	if (range1_start <= range2_start &&
> +	    range1_start + count1 >= range2_start + count2) {
> +		*start_offset = range2_start - range1_start;
> +		return true;
> +	}
> +	return false;
> +}
> +
> +static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
> +					char __user *buf, size_t count,
> +					loff_t *ppos)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +	loff_t copy_offset;
> +	__le32 val32;
> +	__le16 val16;
> +	u8 val8;
> +	int ret;
> +
> +	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
> +	if (ret < 0)
> +		return ret;
> +
> +	if (range_contains_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
> +				 &copy_offset)) {
> +		val16 = cpu_to_le16(0x1000);
> +		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
> +			return -EFAULT;
> +	}

So we take a 0x1041 ("Virtio 1.0 network device") and turn it into a
0x1000 ("Virtio network device").  Are there no features implied by the
device ID?  NB, a byte-wise access would read the real device ID.

> +
> +	if (virtvdev->pci_cmd_io &&
> +	    range_contains_range(pos, count, PCI_COMMAND, sizeof(val16),
> +				 &copy_offset)) {
> +		if (copy_from_user(&val16, buf, sizeof(val16)))
> +			return -EFAULT;
> +		val16 |= cpu_to_le16(PCI_COMMAND_IO);
> +		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
> +			return -EFAULT;
> +	}

So we can't turn off I/O memory.

> +
> +	if (range_contains_range(pos, count, PCI_REVISION_ID, sizeof(val8),
> +				 &copy_offset)) {
> +		/* Transional needs to have revision 0 */
> +		val8 = 0;
> +		if (copy_to_user(buf + copy_offset, &val8, sizeof(val8)))
> +			return -EFAULT;
> +	}

Surely some driver cares about this, right?  How is this supposed to
work in a world where libvirt parses modules.alias and automatically
loads this driver rather than vfio-pci for all 0x1041 devices?  We'd
need to denylist this driver to ever see the device for what it is.

> +
> +	if (range_contains_range(pos, count, PCI_BASE_ADDRESS_0, sizeof(val32),
> +				 &copy_offset)) {
> +		val32 = cpu_to_le32(PCI_BASE_ADDRESS_SPACE_IO);
> +		if (copy_to_user(buf + copy_offset, &val32, sizeof(val32)))
> +			return -EFAULT;
> +	}

Sloppy BAR emulation compared to the real BARs.  QEMU obviously doesn't
care.

> +
> +	if (range_contains_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16),
> +				 &copy_offset)) {
> +		/* Transitional devices use the PCI subsystem device id as
> +		 * virtio device id, same as legacy driver always did.
> +		 */

Non-networking multi-line comment style throughout please.

> +		val16 = cpu_to_le16(VIRTIO_ID_NET);
> +		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
> +			return -EFAULT;
> +	}
> +
> +	return count;
> +}
> +
> +static ssize_t
> +virtiovf_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
> +		       size_t count, loff_t *ppos)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +	int ret;
> +
> +	if (!count)
> +		return 0;
> +
> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX)
> +		return virtiovf_pci_read_config(core_vdev, buf, count, ppos);
> +
> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
> +		return vfio_pci_core_read(core_vdev, buf, count, ppos);
> +
> +	ret = pm_runtime_resume_and_get(&pdev->dev);
> +	if (ret) {
> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n",
> +				     ret);
> +		return -EIO;
> +	}
> +
> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, buf, count, true);

If the heart of this driver is simply pretending to have an I/O BAR
where I/O accesses into that BAR are translated to accesses in the MMIO
BAR, why can't this be done in the VMM, ie. QEMU?  Could I/O to MMIO
translation in QEMU improve performance (ex. if the MMIO is mmap'd and
can be accessed without bouncing back into kernel code)?


> +	pm_runtime_put(&pdev->dev);
> +	return ret;
> +}
> +
> +static ssize_t
> +virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
> +			size_t count, loff_t *ppos)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +	int ret;
> +
> +	if (!count)
> +		return 0;
> +
> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
> +		loff_t copy_offset;
> +		u16 cmd;
> +
> +		if (range_contains_range(pos, count, PCI_COMMAND, sizeof(cmd),
> +					 &copy_offset)) {
> +			if (copy_from_user(&cmd, buf + copy_offset, sizeof(cmd)))
> +				return -EFAULT;
> +			virtvdev->pci_cmd_io = (cmd & PCI_COMMAND_IO);

If we're tracking writes to PCI_COMMAND_IO, why did we statically
report I/O enabled in the read function previously?

> +		}
> +	}
> +
> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
> +		return vfio_pci_core_write(core_vdev, buf, count, ppos);
> +
> +	ret = pm_runtime_resume_and_get(&pdev->dev);
> +	if (ret) {
> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n", ret);
> +		return -EIO;
> +	}
> +
> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, (char __user *)buf, count, false);
> +	pm_runtime_put(&pdev->dev);
> +	return ret;
> +}
> +
> +static int
> +virtiovf_pci_ioctl_get_region_info(struct vfio_device *core_vdev,
> +				   unsigned int cmd, unsigned long arg)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	unsigned long minsz = offsetofend(struct vfio_region_info, offset);
> +	void __user *uarg = (void __user *)arg;
> +	struct vfio_region_info info = {};
> +
> +	if (copy_from_user(&info, uarg, minsz))
> +		return -EFAULT;
> +
> +	if (info.argsz < minsz)
> +		return -EINVAL;
> +
> +	switch (info.index) {
> +	case VFIO_PCI_BAR0_REGION_INDEX:
> +		info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
> +		info.size = virtvdev->bar0_virtual_buf_size;
> +		info.flags = VFIO_REGION_INFO_FLAG_READ |
> +			     VFIO_REGION_INFO_FLAG_WRITE;
> +		return copy_to_user(uarg, &info, minsz) ? -EFAULT : 0;
> +	default:
> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
> +	}
> +}
> +
> +static long
> +virtiovf_vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
> +			     unsigned long arg)
> +{
> +	switch (cmd) {
> +	case VFIO_DEVICE_GET_REGION_INFO:
> +		return virtiovf_pci_ioctl_get_region_info(core_vdev, cmd, arg);
> +	default:
> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
> +	}
> +}
> +
> +static int
> +virtiovf_set_notify_addr(struct virtiovf_pci_core_device *virtvdev)
> +{
> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
> +	int ret;
> +
> +	/* Setup the BAR where the 'notify' exists to be used by vfio as well
> +	 * This will let us mmap it only once and use it when needed.
> +	 */
> +	ret = vfio_pci_core_setup_barmap(core_device,
> +					 virtvdev->notify_bar);
> +	if (ret)
> +		return ret;
> +
> +	virtvdev->notify_addr = core_device->barmap[virtvdev->notify_bar] +
> +			virtvdev->notify_offset;
> +	return 0;
> +}
> +
> +static int virtiovf_pci_open_device(struct vfio_device *core_vdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct vfio_pci_core_device *vdev = &virtvdev->core_device;
> +	int ret;
> +
> +	ret = vfio_pci_core_enable(vdev);
> +	if (ret)
> +		return ret;
> +
> +	if (virtvdev->bar0_virtual_buf) {
> +		/* upon close_device() the vfio_pci_core_disable() is called
> +		 * and will close all the previous mmaps, so it seems that the
> +		 * valid life cycle for the 'notify' addr is per open/close.
> +		 */
> +		ret = virtiovf_set_notify_addr(virtvdev);
> +		if (ret) {
> +			vfio_pci_core_disable(vdev);
> +			return ret;
> +		}
> +	}
> +
> +	vfio_pci_core_finish_enable(vdev);
> +	return 0;
> +}
> +
> +static void virtiovf_pci_close_device(struct vfio_device *core_vdev)
> +{
> +	vfio_pci_core_close_device(core_vdev);
> +}

Why does this function exist?

> +
> +static int virtiovf_get_device_config_size(unsigned short device)
> +{
> +	switch (device) {
> +	case 0x1041:
> +		/* network card */
> +		return offsetofend(struct virtio_net_config, status);
> +	default:
> +		return 0;
> +	}
> +}
> +
> +static int virtiovf_read_notify_info(struct virtiovf_pci_core_device *virtvdev)
> +{
> +	u64 offset;
> +	int ret;
> +	u8 bar;
> +
> +	ret = virtiovf_cmd_lq_read_notify(virtvdev,
> +				VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM,
> +				&bar, &offset);
> +	if (ret)
> +		return ret;
> +
> +	virtvdev->notify_bar = bar;
> +	virtvdev->notify_offset = offset;
> +	return 0;
> +}
> +
> +static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct pci_dev *pdev;
> +	int ret;
> +
> +	ret = vfio_pci_core_init_dev(core_vdev);
> +	if (ret)
> +		return ret;
> +
> +	pdev = virtvdev->core_device.pdev;
> +	virtvdev->vf_id = pci_iov_vf_id(pdev);
> +	if (virtvdev->vf_id < 0)
> +		return -EINVAL;

vf_id is never used.

> +
> +	ret = virtiovf_read_notify_info(virtvdev);
> +	if (ret)
> +		return ret;
> +
> +	virtvdev->bar0_virtual_buf_size = VIRTIO_LEGACY_IO_BAR_HEADER_LEN +
> +		VIRTIO_LEGACY_IO_BAR_MSIX_HEADER_LEN +
> +		virtiovf_get_device_config_size(pdev->device);
> +	virtvdev->bar0_virtual_buf = kzalloc(virtvdev->bar0_virtual_buf_size,
> +					     GFP_KERNEL);
> +	if (!virtvdev->bar0_virtual_buf)
> +		return -ENOMEM;
> +	mutex_init(&virtvdev->bar_mutex);
> +	return 0;
> +}
> +
> +static void virtiovf_pci_core_release_dev(struct vfio_device *core_vdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +
> +	kfree(virtvdev->bar0_virtual_buf);
> +	vfio_pci_core_release_dev(core_vdev);
> +}
> +
> +static const struct vfio_device_ops virtiovf_acc_vfio_pci_tran_ops = {
> +	.name = "virtio-transitional-vfio-pci",
> +	.init = virtiovf_pci_init_device,
> +	.release = virtiovf_pci_core_release_dev,
> +	.open_device = virtiovf_pci_open_device,
> +	.close_device = virtiovf_pci_close_device,
> +	.ioctl = virtiovf_vfio_pci_core_ioctl,
> +	.read = virtiovf_pci_core_read,
> +	.write = virtiovf_pci_core_write,
> +	.mmap = vfio_pci_core_mmap,
> +	.request = vfio_pci_core_request,
> +	.match = vfio_pci_core_match,
> +	.bind_iommufd = vfio_iommufd_physical_bind,
> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
> +};
> +
> +static const struct vfio_device_ops virtiovf_acc_vfio_pci_ops = {
> +	.name = "virtio-acc-vfio-pci",
> +	.init = vfio_pci_core_init_dev,
> +	.release = vfio_pci_core_release_dev,
> +	.open_device = virtiovf_pci_open_device,
> +	.close_device = virtiovf_pci_close_device,
> +	.ioctl = vfio_pci_core_ioctl,
> +	.device_feature = vfio_pci_core_ioctl_feature,
> +	.read = vfio_pci_core_read,
> +	.write = vfio_pci_core_write,
> +	.mmap = vfio_pci_core_mmap,
> +	.request = vfio_pci_core_request,
> +	.match = vfio_pci_core_match,
> +	.bind_iommufd = vfio_iommufd_physical_bind,
> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
> +};

Why are we claiming devices that should just use vfio-pci instead?

> +
> +static bool virtiovf_bar0_exists(struct pci_dev *pdev)
> +{
> +	struct resource *res = pdev->resource;
> +
> +	return res->flags ? true : false;
> +}
> +
> +#define VIRTIOVF_USE_ADMIN_CMD_BITMAP \
> +	(BIT_ULL(VIRTIO_ADMIN_CMD_LIST_QUERY) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LIST_USE) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
> +
> +static bool virtiovf_support_legacy_access(struct pci_dev *pdev)
> +{
> +	int buf_size = DIV_ROUND_UP(VIRTIO_ADMIN_MAX_CMD_OPCODE, 64) * 8;
> +	u8 *buf;
> +	int ret;
> +
> +	/* Only virtio-net is supported/tested so far */
> +	if (pdev->device != 0x1041)
> +		return false;

Seems like the ID table should handle this, why are we preemptively
claiming all virtio devices... or actually all 0x1af4 devices, which
might not even be virtio, ex. the non-virtio ivshmem devices is 0x1110.

> +
> +	buf = kzalloc(buf_size, GFP_KERNEL);
> +	if (!buf)
> +		return false;
> +
> +	ret = virtiovf_cmd_list_query(pdev, buf, buf_size);
> +	if (ret)
> +		goto end;
> +
> +	if ((le64_to_cpup((__le64 *)buf) & VIRTIOVF_USE_ADMIN_CMD_BITMAP) !=
> +		VIRTIOVF_USE_ADMIN_CMD_BITMAP) {
> +		ret = -EOPNOTSUPP;
> +		goto end;
> +	}
> +
> +	/* confirm the used commands */
> +	memset(buf, 0, buf_size);
> +	*(__le64 *)buf = cpu_to_le64(VIRTIOVF_USE_ADMIN_CMD_BITMAP);
> +	ret = virtiovf_cmd_list_use(pdev, buf, buf_size);
> +
> +end:
> +	kfree(buf);
> +	return ret ? false : true;
> +}
> +
> +static int virtiovf_pci_probe(struct pci_dev *pdev,
> +			      const struct pci_device_id *id)
> +{
> +	const struct vfio_device_ops *ops = &virtiovf_acc_vfio_pci_ops;
> +	struct virtiovf_pci_core_device *virtvdev;
> +	int ret;
> +
> +	if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
> +	    !virtiovf_bar0_exists(pdev) && pdev->msix_cap)
> +		ops = &virtiovf_acc_vfio_pci_tran_ops;
> +
> +	virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
> +				     &pdev->dev, ops);
> +	if (IS_ERR(virtvdev))
> +		return PTR_ERR(virtvdev);
> +
> +	dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
> +	ret = vfio_pci_core_register_device(&virtvdev->core_device);
> +	if (ret)
> +		goto out;
> +	return 0;
> +out:
> +	vfio_put_device(&virtvdev->core_device.vdev);
> +	return ret;
> +}
> +
> +static void virtiovf_pci_remove(struct pci_dev *pdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
> +
> +	vfio_pci_core_unregister_device(&virtvdev->core_device);
> +	vfio_put_device(&virtvdev->core_device.vdev);
> +}
> +
> +static const struct pci_device_id virtiovf_pci_table[] = {
> +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, PCI_ANY_ID) },

libvirt will blindly use this driver for all devices matching this as
we've discussed how it should make use of modules.alias.  I don't think
this driver should be squatting on devices where it doesn't add value
and it's not clear whether this is adding or subtracting value in all
cases for the one NIC that it modifies.  How should libvirt choose when
and where to use this driver?  What regressions are we going to see
with VMs that previously saw "modern" virtio-net devices and now see a
legacy compatible device?  Thanks,

Alex

> +	{}
> +};
> +
> +MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
> +
> +static struct pci_driver virtiovf_pci_driver = {
> +	.name = KBUILD_MODNAME,
> +	.id_table = virtiovf_pci_table,
> +	.probe = virtiovf_pci_probe,
> +	.remove = virtiovf_pci_remove,
> +	.err_handler = &vfio_pci_core_err_handlers,
> +	.driver_managed_dma = true,
> +};
> +
> +module_pci_driver(virtiovf_pci_driver);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
> +MODULE_DESCRIPTION(
> +	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO device family");


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 19:58     ` Alex Williamson
  (?)
@ 2023-09-21 20:01     ` Jason Gunthorpe
  2023-09-21 20:20         ` Michael S. Tsirkin
  -1 siblings, 1 reply; 321+ messages in thread
From: Jason Gunthorpe @ 2023-09-21 20:01 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Yishai Hadas, mst, jasowang, kvm, virtualization, parav, feliu,
	jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, Sep 21, 2023 at 01:58:32PM -0600, Alex Williamson wrote:

> > +static const struct pci_device_id virtiovf_pci_table[] = {
> > +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, PCI_ANY_ID) },
> 
> libvirt will blindly use this driver for all devices matching this as
> we've discussed how it should make use of modules.alias.  I don't think
> this driver should be squatting on devices where it doesn't add value
> and it's not clear whether this is adding or subtracting value in all
> cases for the one NIC that it modifies.  How should libvirt choose when
> and where to use this driver?  What regressions are we going to see
> with VMs that previously saw "modern" virtio-net devices and now see a
> legacy compatible device?  Thanks,

Maybe this approach needs to use a subsystem ID match?

Jason

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 19:53                     ` Jason Gunthorpe
@ 2023-09-21 20:16                         ` Michael S. Tsirkin
  2023-09-22  3:02                         ` Jason Wang
  1 sibling, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 20:16 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kvm, maorg, virtualization, jiri, leonro

On Thu, Sep 21, 2023 at 04:53:45PM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 03:34:03PM -0400, Michael S. Tsirkin wrote:
> 
> > that's easy/practical.  If instead VDPA gives the same speed with just
> > shadow vq then keeping this hack in vfio seems like less of a problem.
> > Finally if VDPA is faster then maybe you will reconsider using it ;)
> 
> It is not all about the speed.
> 
> VDPA presents another large and complex software stack in the
> hypervisor that can be eliminated by simply using VFIO.

If all you want is passing through your card to guest
then yes this can be addressed "by simply using VFIO".

And let me give you a simple example just from this patchset:
it assumes guest uses MSIX and just breaks if it doesn't.
As VDPA emulates it can emulate INT#x for guest while doing MSI
on the host side. Yea modern guests use MSIX but this is about legacy
yes?


> VFIO is
> already required for other scenarios.

Required ... by some people? Most VMs I run don't use anything
outside of virtio.

> This is about reducing complexity, reducing attack surface and
> increasing maintainability of the hypervisor environment.
> 
> Jason

Generally you get better security if you don't let guests poke at
hardware when they don't have to. But sure, matter of preference -
use VFIO, it's great. I am worried about the specific patchset though.
It seems to deal with emulating virtio which seems more like a vdpa
thing. If you start adding virtio emulation to vfio then won't
you just end up with another vdpa? And if no why not?
And I don't buy the "we already invested in this vfio based solution",
sorry - that's not a reason upstream has to maintain it.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-21 20:16                         ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 20:16 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alex Williamson, Yishai Hadas, jasowang, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, Sep 21, 2023 at 04:53:45PM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 03:34:03PM -0400, Michael S. Tsirkin wrote:
> 
> > that's easy/practical.  If instead VDPA gives the same speed with just
> > shadow vq then keeping this hack in vfio seems like less of a problem.
> > Finally if VDPA is faster then maybe you will reconsider using it ;)
> 
> It is not all about the speed.
> 
> VDPA presents another large and complex software stack in the
> hypervisor that can be eliminated by simply using VFIO.

If all you want is passing through your card to guest
then yes this can be addressed "by simply using VFIO".

And let me give you a simple example just from this patchset:
it assumes guest uses MSIX and just breaks if it doesn't.
As VDPA emulates it can emulate INT#x for guest while doing MSI
on the host side. Yea modern guests use MSIX but this is about legacy
yes?


> VFIO is
> already required for other scenarios.

Required ... by some people? Most VMs I run don't use anything
outside of virtio.

> This is about reducing complexity, reducing attack surface and
> increasing maintainability of the hypervisor environment.
> 
> Jason

Generally you get better security if you don't let guests poke at
hardware when they don't have to. But sure, matter of preference -
use VFIO, it's great. I am worried about the specific patchset though.
It seems to deal with emulating virtio which seems more like a vdpa
thing. If you start adding virtio emulation to vfio then won't
you just end up with another vdpa? And if no why not?
And I don't buy the "we already invested in this vfio based solution",
sorry - that's not a reason upstream has to maintain it.

-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 20:01     ` Jason Gunthorpe
@ 2023-09-21 20:20         ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 20:20 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kvm, maorg, virtualization, jiri, leonro

On Thu, Sep 21, 2023 at 05:01:21PM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 01:58:32PM -0600, Alex Williamson wrote:
> 
> > > +static const struct pci_device_id virtiovf_pci_table[] = {
> > > +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, PCI_ANY_ID) },
> > 
> > libvirt will blindly use this driver for all devices matching this as
> > we've discussed how it should make use of modules.alias.  I don't think
> > this driver should be squatting on devices where it doesn't add value
> > and it's not clear whether this is adding or subtracting value in all
> > cases for the one NIC that it modifies.  How should libvirt choose when
> > and where to use this driver?  What regressions are we going to see
> > with VMs that previously saw "modern" virtio-net devices and now see a
> > legacy compatible device?  Thanks,
> 
> Maybe this approach needs to use a subsystem ID match?
> 
> Jason

Maybe make users load it manually?

Please don't bind to virtio by default, you will break
all guests.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-21 20:20         ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 20:20 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alex Williamson, Yishai Hadas, jasowang, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, Sep 21, 2023 at 05:01:21PM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 01:58:32PM -0600, Alex Williamson wrote:
> 
> > > +static const struct pci_device_id virtiovf_pci_table[] = {
> > > +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, PCI_ANY_ID) },
> > 
> > libvirt will blindly use this driver for all devices matching this as
> > we've discussed how it should make use of modules.alias.  I don't think
> > this driver should be squatting on devices where it doesn't add value
> > and it's not clear whether this is adding or subtracting value in all
> > cases for the one NIC that it modifies.  How should libvirt choose when
> > and where to use this driver?  What regressions are we going to see
> > with VMs that previously saw "modern" virtio-net devices and now see a
> > legacy compatible device?  Thanks,
> 
> Maybe this approach needs to use a subsystem ID match?
> 
> Jason

Maybe make users load it manually?

Please don't bind to virtio by default, you will break
all guests.

-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-09-21 12:40   ` Yishai Hadas
@ 2023-09-21 20:34     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 20:34 UTC (permalink / raw)
  To: Yishai Hadas; +Cc: kvm, maorg, virtualization, jgg, jiri, leonro

On Thu, Sep 21, 2023 at 03:40:39PM +0300, Yishai Hadas wrote:
> Expose admin commands over the virtio device, to be used by the
> vfio-virtio driver in the next patches.
> 
> It includes: list query/use, legacy write/read, read notify_info.
> 
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> ---
>  drivers/vfio/pci/virtio/cmd.c | 146 ++++++++++++++++++++++++++++++++++
>  drivers/vfio/pci/virtio/cmd.h |  27 +++++++
>  2 files changed, 173 insertions(+)
>  create mode 100644 drivers/vfio/pci/virtio/cmd.c
>  create mode 100644 drivers/vfio/pci/virtio/cmd.h
> 
> diff --git a/drivers/vfio/pci/virtio/cmd.c b/drivers/vfio/pci/virtio/cmd.c
> new file mode 100644
> index 000000000000..f068239cdbb0
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/cmd.c
> @@ -0,0 +1,146 @@
> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> +/*
> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> + */
> +
> +#include "cmd.h"
> +
> +int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
> +{
> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> +	struct scatterlist out_sg;
> +	struct virtio_admin_cmd cmd = {};
> +
> +	if (!virtio_dev)
> +		return -ENOTCONN;
> +
> +	sg_init_one(&out_sg, buf, buf_size);
> +	cmd.opcode = VIRTIO_ADMIN_CMD_LIST_QUERY;
> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> +	cmd.result_sg = &out_sg;
> +
> +	return virtio_admin_cmd_exec(virtio_dev, &cmd);
> +}
> +

in/out seem all wrong here. In virtio terminology, in means from
device to driver, out means from driver to device.

> +int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
> +{
> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> +	struct scatterlist in_sg;
> +	struct virtio_admin_cmd cmd = {};
> +
> +	if (!virtio_dev)
> +		return -ENOTCONN;
> +
> +	sg_init_one(&in_sg, buf, buf_size);
> +	cmd.opcode = VIRTIO_ADMIN_CMD_LIST_USE;
> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> +	cmd.data_sg = &in_sg;
> +
> +	return virtio_admin_cmd_exec(virtio_dev, &cmd);
> +}
> +
> +int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,


what is _lr short for?

> +			  u8 offset, u8 size, u8 *buf)
> +{
> +	struct virtio_device *virtio_dev =
> +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> +	struct virtio_admin_cmd_data_lr_write *in;
> +	struct scatterlist in_sg;
> +	struct virtio_admin_cmd cmd = {};
> +	int ret;
> +
> +	if (!virtio_dev)
> +		return -ENOTCONN;
> +
> +	in = kzalloc(sizeof(*in) + size, GFP_KERNEL);
> +	if (!in)
> +		return -ENOMEM;
> +
> +	in->offset = offset;
> +	memcpy(in->registers, buf, size);
> +	sg_init_one(&in_sg, in, sizeof(*in) + size);
> +	cmd.opcode = opcode;
> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> +	cmd.group_member_id = virtvdev->vf_id + 1;

weird. why + 1?

> +	cmd.data_sg = &in_sg;
> +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
> +
> +	kfree(in);
> +	return ret;
> +}

How do you know it's safe to send this command, in particular at
this time? This seems to be doing zero checks, and zero synchronization
with the PF driver.


> +
> +int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> +			 u8 offset, u8 size, u8 *buf)
> +{
> +	struct virtio_device *virtio_dev =
> +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> +	struct virtio_admin_cmd_data_lr_read *in;
> +	struct scatterlist in_sg, out_sg;
> +	struct virtio_admin_cmd cmd = {};
> +	int ret;
> +
> +	if (!virtio_dev)
> +		return -ENOTCONN;
> +
> +	in = kzalloc(sizeof(*in), GFP_KERNEL);
> +	if (!in)
> +		return -ENOMEM;
> +
> +	in->offset = offset;
> +	sg_init_one(&in_sg, in, sizeof(*in));
> +	sg_init_one(&out_sg, buf, size);
> +	cmd.opcode = opcode;
> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> +	cmd.data_sg = &in_sg;
> +	cmd.result_sg = &out_sg;
> +	cmd.group_member_id = virtvdev->vf_id + 1;
> +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
> +
> +	kfree(in);
> +	return ret;
> +}
> +
> +int virtiovf_cmd_lq_read_notify(struct virtiovf_pci_core_device *virtvdev,

and what is lq short for?

> +				u8 req_bar_flags, u8 *bar, u64 *bar_offset)
> +{
> +	struct virtio_device *virtio_dev =
> +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> +	struct virtio_admin_cmd_notify_info_result *out;
> +	struct scatterlist out_sg;
> +	struct virtio_admin_cmd cmd = {};
> +	int ret;
> +
> +	if (!virtio_dev)
> +		return -ENOTCONN;
> +
> +	out = kzalloc(sizeof(*out), GFP_KERNEL);
> +	if (!out)
> +		return -ENOMEM;
> +
> +	sg_init_one(&out_sg, out, sizeof(*out));
> +	cmd.opcode = VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO;
> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> +	cmd.result_sg = &out_sg;
> +	cmd.group_member_id = virtvdev->vf_id + 1;
> +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
> +	if (!ret) {
> +		struct virtio_admin_cmd_notify_info_data *entry;
> +		int i;
> +
> +		ret = -ENOENT;
> +		for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
> +			entry = &out->entries[i];
> +			if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
> +				break;
> +			if (entry->flags != req_bar_flags)
> +				continue;
> +			*bar = entry->bar;
> +			*bar_offset = le64_to_cpu(entry->offset);
> +			ret = 0;
> +			break;
> +		}
> +	}
> +
> +	kfree(out);
> +	return ret;
> +}
> diff --git a/drivers/vfio/pci/virtio/cmd.h b/drivers/vfio/pci/virtio/cmd.h
> new file mode 100644
> index 000000000000..c2a3645f4b90
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/cmd.h
> @@ -0,0 +1,27 @@
> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> +/*
> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
> + */
> +
> +#ifndef VIRTIO_VFIO_CMD_H
> +#define VIRTIO_VFIO_CMD_H
> +
> +#include <linux/kernel.h>
> +#include <linux/virtio.h>
> +#include <linux/vfio_pci_core.h>
> +#include <linux/virtio_pci.h>
> +
> +struct virtiovf_pci_core_device {
> +	struct vfio_pci_core_device core_device;
> +	int vf_id;
> +};
> +
> +int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
> +int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size);
> +int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> +			  u8 offset, u8 size, u8 *buf);
> +int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> +			 u8 offset, u8 size, u8 *buf);
> +int virtiovf_cmd_lq_read_notify(struct virtiovf_pci_core_device *virtvdev,
> +				u8 req_bar_flags, u8 *bar, u64 *bar_offset);
> +#endif /* VIRTIO_VFIO_CMD_H */
> -- 
> 2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-09-21 20:34     ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 20:34 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: alex.williamson, jasowang, jgg, kvm, virtualization, parav,
	feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, Sep 21, 2023 at 03:40:39PM +0300, Yishai Hadas wrote:
> Expose admin commands over the virtio device, to be used by the
> vfio-virtio driver in the next patches.
> 
> It includes: list query/use, legacy write/read, read notify_info.
> 
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> ---
>  drivers/vfio/pci/virtio/cmd.c | 146 ++++++++++++++++++++++++++++++++++
>  drivers/vfio/pci/virtio/cmd.h |  27 +++++++
>  2 files changed, 173 insertions(+)
>  create mode 100644 drivers/vfio/pci/virtio/cmd.c
>  create mode 100644 drivers/vfio/pci/virtio/cmd.h
> 
> diff --git a/drivers/vfio/pci/virtio/cmd.c b/drivers/vfio/pci/virtio/cmd.c
> new file mode 100644
> index 000000000000..f068239cdbb0
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/cmd.c
> @@ -0,0 +1,146 @@
> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> +/*
> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> + */
> +
> +#include "cmd.h"
> +
> +int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
> +{
> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> +	struct scatterlist out_sg;
> +	struct virtio_admin_cmd cmd = {};
> +
> +	if (!virtio_dev)
> +		return -ENOTCONN;
> +
> +	sg_init_one(&out_sg, buf, buf_size);
> +	cmd.opcode = VIRTIO_ADMIN_CMD_LIST_QUERY;
> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> +	cmd.result_sg = &out_sg;
> +
> +	return virtio_admin_cmd_exec(virtio_dev, &cmd);
> +}
> +

in/out seem all wrong here. In virtio terminology, in means from
device to driver, out means from driver to device.

> +int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
> +{
> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> +	struct scatterlist in_sg;
> +	struct virtio_admin_cmd cmd = {};
> +
> +	if (!virtio_dev)
> +		return -ENOTCONN;
> +
> +	sg_init_one(&in_sg, buf, buf_size);
> +	cmd.opcode = VIRTIO_ADMIN_CMD_LIST_USE;
> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> +	cmd.data_sg = &in_sg;
> +
> +	return virtio_admin_cmd_exec(virtio_dev, &cmd);
> +}
> +
> +int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,


what is _lr short for?

> +			  u8 offset, u8 size, u8 *buf)
> +{
> +	struct virtio_device *virtio_dev =
> +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> +	struct virtio_admin_cmd_data_lr_write *in;
> +	struct scatterlist in_sg;
> +	struct virtio_admin_cmd cmd = {};
> +	int ret;
> +
> +	if (!virtio_dev)
> +		return -ENOTCONN;
> +
> +	in = kzalloc(sizeof(*in) + size, GFP_KERNEL);
> +	if (!in)
> +		return -ENOMEM;
> +
> +	in->offset = offset;
> +	memcpy(in->registers, buf, size);
> +	sg_init_one(&in_sg, in, sizeof(*in) + size);
> +	cmd.opcode = opcode;
> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> +	cmd.group_member_id = virtvdev->vf_id + 1;

weird. why + 1?

> +	cmd.data_sg = &in_sg;
> +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
> +
> +	kfree(in);
> +	return ret;
> +}

How do you know it's safe to send this command, in particular at
this time? This seems to be doing zero checks, and zero synchronization
with the PF driver.


> +
> +int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> +			 u8 offset, u8 size, u8 *buf)
> +{
> +	struct virtio_device *virtio_dev =
> +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> +	struct virtio_admin_cmd_data_lr_read *in;
> +	struct scatterlist in_sg, out_sg;
> +	struct virtio_admin_cmd cmd = {};
> +	int ret;
> +
> +	if (!virtio_dev)
> +		return -ENOTCONN;
> +
> +	in = kzalloc(sizeof(*in), GFP_KERNEL);
> +	if (!in)
> +		return -ENOMEM;
> +
> +	in->offset = offset;
> +	sg_init_one(&in_sg, in, sizeof(*in));
> +	sg_init_one(&out_sg, buf, size);
> +	cmd.opcode = opcode;
> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> +	cmd.data_sg = &in_sg;
> +	cmd.result_sg = &out_sg;
> +	cmd.group_member_id = virtvdev->vf_id + 1;
> +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
> +
> +	kfree(in);
> +	return ret;
> +}
> +
> +int virtiovf_cmd_lq_read_notify(struct virtiovf_pci_core_device *virtvdev,

and what is lq short for?

> +				u8 req_bar_flags, u8 *bar, u64 *bar_offset)
> +{
> +	struct virtio_device *virtio_dev =
> +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> +	struct virtio_admin_cmd_notify_info_result *out;
> +	struct scatterlist out_sg;
> +	struct virtio_admin_cmd cmd = {};
> +	int ret;
> +
> +	if (!virtio_dev)
> +		return -ENOTCONN;
> +
> +	out = kzalloc(sizeof(*out), GFP_KERNEL);
> +	if (!out)
> +		return -ENOMEM;
> +
> +	sg_init_one(&out_sg, out, sizeof(*out));
> +	cmd.opcode = VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO;
> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> +	cmd.result_sg = &out_sg;
> +	cmd.group_member_id = virtvdev->vf_id + 1;
> +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
> +	if (!ret) {
> +		struct virtio_admin_cmd_notify_info_data *entry;
> +		int i;
> +
> +		ret = -ENOENT;
> +		for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
> +			entry = &out->entries[i];
> +			if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
> +				break;
> +			if (entry->flags != req_bar_flags)
> +				continue;
> +			*bar = entry->bar;
> +			*bar_offset = le64_to_cpu(entry->offset);
> +			ret = 0;
> +			break;
> +		}
> +	}
> +
> +	kfree(out);
> +	return ret;
> +}
> diff --git a/drivers/vfio/pci/virtio/cmd.h b/drivers/vfio/pci/virtio/cmd.h
> new file mode 100644
> index 000000000000..c2a3645f4b90
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/cmd.h
> @@ -0,0 +1,27 @@
> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> +/*
> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
> + */
> +
> +#ifndef VIRTIO_VFIO_CMD_H
> +#define VIRTIO_VFIO_CMD_H
> +
> +#include <linux/kernel.h>
> +#include <linux/virtio.h>
> +#include <linux/vfio_pci_core.h>
> +#include <linux/virtio_pci.h>
> +
> +struct virtiovf_pci_core_device {
> +	struct vfio_pci_core_device core_device;
> +	int vf_id;
> +};
> +
> +int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
> +int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size);
> +int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> +			  u8 offset, u8 size, u8 *buf);
> +int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> +			 u8 offset, u8 size, u8 *buf);
> +int virtiovf_cmd_lq_read_notify(struct virtiovf_pci_core_device *virtvdev,
> +				u8 req_bar_flags, u8 *bar, u64 *bar_offset);
> +#endif /* VIRTIO_VFIO_CMD_H */
> -- 
> 2.27.0


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 19:49                 ` Jason Gunthorpe
@ 2023-09-21 20:45                     ` Michael S. Tsirkin
  2023-09-22  3:01                     ` Jason Wang
  1 sibling, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 20:45 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kvm, maorg, virtualization, jiri, leonro

On Thu, Sep 21, 2023 at 04:49:46PM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 03:13:10PM -0400, Michael S. Tsirkin wrote:
> > On Thu, Sep 21, 2023 at 03:39:26PM -0300, Jason Gunthorpe wrote:
> > > On Thu, Sep 21, 2023 at 12:53:04PM -0400, Michael S. Tsirkin wrote:
> > > > > vdpa is not vfio, I don't know how you can suggest vdpa is a
> > > > > replacement for a vfio driver. They are completely different
> > > > > things.
> > > > > Each side has its own strengths, and vfio especially is accelerating
> > > > > in its capability in way that vpda is not. eg if an iommufd conversion
> > > > > had been done by now for vdpa I might be more sympathetic.
> > > > 
> > > > Yea, I agree iommufd is a big problem with vdpa right now. Cindy was
> > > > sick and I didn't know and kept assuming she's working on this. I don't
> > > > think it's a huge amount of work though.  I'll take a look.
> > > > Is there anything else though? Do tell.
> > > 
> > > Confidential compute will never work with VDPA's approach.
> > 
> > I don't see how what this patchset is doing is different
> > wrt to Confidential compute - you trap IO accesses and emulate.
> > Care to elaborate?
> 
> This patch series isn't about confidential compute, you asked about
> the future. VFIO will support confidential compute in the future, VDPA
> will not.

Nonsense it already works.

But I did not ask about the future since I do not believe it
can be confidently predicted. I asked what is missing in VDPA
now for you to add this feature there and not in VFIO.


> > > > There are a bunch of things that I think are important for virtio
> > > > that are completely out of scope for vfio, such as migrating
> > > > cross-vendor. 
> > > 
> > > VFIO supports migration, if you want to have cross-vendor migration
> > > then make a standard that describes the VFIO migration data format for
> > > virtio devices.
> > 
> > This has nothing to do with data formats - you need two devices to
> > behave identically. Which is what VDPA is about really.
> 
> We've been looking at VFIO live migration extensively. Device
> mediation, like VDPA does, is one legitimate approach for live
> migration. It suites a certain type of heterogeneous environment well.
> 
> But, it is equally legitimate to make the devices behave the same and
> have them process a common migration data.
> 
> This can happen in public with standards, or it can happen in private
> within a cloud operator's "private-standard" environment.
> 
> To date, in most of my discussions, I have not seen a strong appetite
> for such public standards. In part due to the complexity.
> 
> Regardles, it is not the kernel communities job to insist on one
> approach or the other.
>
> > > You are asking us to invest in the complexity of VDPA through out
> > > (keep it working, keep it secure, invest time in deploying and
> > > debugging in the field)
> > > 
> > > When it doesn't provide *ANY* value to the solution.
> > 
> > There's no "the solution"
> 
> Nonsense.

what there's only one solution that you use the definite article?

> > this sounds like a vendor only caring about solutions that involve
> > that vendor's hardware exclusively, a little.
> 
> Not really.
> 
> Understand the DPU provider is not the vendor here. The DPU provider
> gives a cloud operator a SDK to build these things. The operator is
> the vendor from your perspective.
> 
> In many cases live migration never leaves the operator's confines in
> the first place.
> 
> Even when it does, there is no real use case to live migrate a
> virtio-net function from, say, AWS to GCP.
> 
> You are pushing for a lot of complexity and software that solves a
> problem people in this space don't actually have.
> 
> As I said, VDPA is fine for the scenarios it addresses. It is an
> alternative, not a replacement, for VFIO.
> 
> Jason

yea, VDPA does trap and emulate for config accesses.  which is exactly
what this patch does?  so why does it belong in vfio muddying up its
passthrough model is beyond me, except that apparently there's some
specific deployment that happens to use vfio so now whatever
that deployment needs has to go into vfio whether it belongs there or not.


-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-21 20:45                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 20:45 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yishai Hadas, alex.williamson, jasowang, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, Sep 21, 2023 at 04:49:46PM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 03:13:10PM -0400, Michael S. Tsirkin wrote:
> > On Thu, Sep 21, 2023 at 03:39:26PM -0300, Jason Gunthorpe wrote:
> > > On Thu, Sep 21, 2023 at 12:53:04PM -0400, Michael S. Tsirkin wrote:
> > > > > vdpa is not vfio, I don't know how you can suggest vdpa is a
> > > > > replacement for a vfio driver. They are completely different
> > > > > things.
> > > > > Each side has its own strengths, and vfio especially is accelerating
> > > > > in its capability in way that vpda is not. eg if an iommufd conversion
> > > > > had been done by now for vdpa I might be more sympathetic.
> > > > 
> > > > Yea, I agree iommufd is a big problem with vdpa right now. Cindy was
> > > > sick and I didn't know and kept assuming she's working on this. I don't
> > > > think it's a huge amount of work though.  I'll take a look.
> > > > Is there anything else though? Do tell.
> > > 
> > > Confidential compute will never work with VDPA's approach.
> > 
> > I don't see how what this patchset is doing is different
> > wrt to Confidential compute - you trap IO accesses and emulate.
> > Care to elaborate?
> 
> This patch series isn't about confidential compute, you asked about
> the future. VFIO will support confidential compute in the future, VDPA
> will not.

Nonsense it already works.

But I did not ask about the future since I do not believe it
can be confidently predicted. I asked what is missing in VDPA
now for you to add this feature there and not in VFIO.


> > > > There are a bunch of things that I think are important for virtio
> > > > that are completely out of scope for vfio, such as migrating
> > > > cross-vendor. 
> > > 
> > > VFIO supports migration, if you want to have cross-vendor migration
> > > then make a standard that describes the VFIO migration data format for
> > > virtio devices.
> > 
> > This has nothing to do with data formats - you need two devices to
> > behave identically. Which is what VDPA is about really.
> 
> We've been looking at VFIO live migration extensively. Device
> mediation, like VDPA does, is one legitimate approach for live
> migration. It suites a certain type of heterogeneous environment well.
> 
> But, it is equally legitimate to make the devices behave the same and
> have them process a common migration data.
> 
> This can happen in public with standards, or it can happen in private
> within a cloud operator's "private-standard" environment.
> 
> To date, in most of my discussions, I have not seen a strong appetite
> for such public standards. In part due to the complexity.
> 
> Regardles, it is not the kernel communities job to insist on one
> approach or the other.
>
> > > You are asking us to invest in the complexity of VDPA through out
> > > (keep it working, keep it secure, invest time in deploying and
> > > debugging in the field)
> > > 
> > > When it doesn't provide *ANY* value to the solution.
> > 
> > There's no "the solution"
> 
> Nonsense.

what there's only one solution that you use the definite article?

> > this sounds like a vendor only caring about solutions that involve
> > that vendor's hardware exclusively, a little.
> 
> Not really.
> 
> Understand the DPU provider is not the vendor here. The DPU provider
> gives a cloud operator a SDK to build these things. The operator is
> the vendor from your perspective.
> 
> In many cases live migration never leaves the operator's confines in
> the first place.
> 
> Even when it does, there is no real use case to live migrate a
> virtio-net function from, say, AWS to GCP.
> 
> You are pushing for a lot of complexity and software that solves a
> problem people in this space don't actually have.
> 
> As I said, VDPA is fine for the scenarios it addresses. It is an
> alternative, not a replacement, for VFIO.
> 
> Jason

yea, VDPA does trap and emulate for config accesses.  which is exactly
what this patch does?  so why does it belong in vfio muddying up its
passthrough model is beyond me, except that apparently there's some
specific deployment that happens to use vfio so now whatever
that deployment needs has to go into vfio whether it belongs there or not.


-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 19:51                 ` Jason Gunthorpe
@ 2023-09-21 20:55                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 20:55 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kvm, maorg, virtualization, jiri, leonro

On Thu, Sep 21, 2023 at 04:51:15PM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 03:17:25PM -0400, Michael S. Tsirkin wrote:
> > On Thu, Sep 21, 2023 at 03:39:26PM -0300, Jason Gunthorpe wrote:
> > > > What is the huge amount of work am I asking to do?
> > > 
> > > You are asking us to invest in the complexity of VDPA through out
> > > (keep it working, keep it secure, invest time in deploying and
> > > debugging in the field)
> > 
> > I'm asking you to do nothing of the kind - I am saying that this code
> > will have to be duplicated in vdpa,
> 
> Why would that be needed?

For the same reason it was developed in the 1st place - presumably
because it adds efficient legacy guest support with the right card?
I get it, you specifically don't need VDPA functionality, but I don't
see why is this universal, or common.


> > and so I am asking what exactly is missing to just keep it all
> > there.
> 
> VFIO. Seriously, we don't want unnecessary mediation in this path at
> all.

But which mediation is necessary is exactly up to the specific use-case.
I have no idea why would you want all of VFIO to e.g. pass access to
random config registers to the guest when it's a virtio device and the
config registers are all nicely listed in the spec. I know nvidia
hardware is so great, it has super robust cards with less security holes
than the vdpa driver, but I very much doubt this is universal for all
virtio offload cards.

> > note I didn't ask you to add iommufd to vdpa though that would be
> > nice ;)
> 
> I did once send someone to look.. It didn't succeed :(
> 
> Jason

Pity. Maybe there's some big difficulty blocking this? I'd like to know.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-21 20:55                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 20:55 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yishai Hadas, alex.williamson, jasowang, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, Sep 21, 2023 at 04:51:15PM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 03:17:25PM -0400, Michael S. Tsirkin wrote:
> > On Thu, Sep 21, 2023 at 03:39:26PM -0300, Jason Gunthorpe wrote:
> > > > What is the huge amount of work am I asking to do?
> > > 
> > > You are asking us to invest in the complexity of VDPA through out
> > > (keep it working, keep it secure, invest time in deploying and
> > > debugging in the field)
> > 
> > I'm asking you to do nothing of the kind - I am saying that this code
> > will have to be duplicated in vdpa,
> 
> Why would that be needed?

For the same reason it was developed in the 1st place - presumably
because it adds efficient legacy guest support with the right card?
I get it, you specifically don't need VDPA functionality, but I don't
see why is this universal, or common.


> > and so I am asking what exactly is missing to just keep it all
> > there.
> 
> VFIO. Seriously, we don't want unnecessary mediation in this path at
> all.

But which mediation is necessary is exactly up to the specific use-case.
I have no idea why would you want all of VFIO to e.g. pass access to
random config registers to the guest when it's a virtio device and the
config registers are all nicely listed in the spec. I know nvidia
hardware is so great, it has super robust cards with less security holes
than the vdpa driver, but I very much doubt this is universal for all
virtio offload cards.

> > note I didn't ask you to add iommufd to vdpa though that would be
> > nice ;)
> 
> I did once send someone to look.. It didn't succeed :(
> 
> Jason

Pity. Maybe there's some big difficulty blocking this? I'd like to know.

-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 20:20         ` Michael S. Tsirkin
@ 2023-09-21 20:59           ` Alex Williamson
  -1 siblings, 0 replies; 321+ messages in thread
From: Alex Williamson @ 2023-09-21 20:59 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, maorg, virtualization, Jason Gunthorpe, jiri, leonro

On Thu, 21 Sep 2023 16:20:59 -0400
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Thu, Sep 21, 2023 at 05:01:21PM -0300, Jason Gunthorpe wrote:
> > On Thu, Sep 21, 2023 at 01:58:32PM -0600, Alex Williamson wrote:
> >   
> > > > +static const struct pci_device_id virtiovf_pci_table[] = {
> > > > +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, PCI_ANY_ID) },  
> > > 
> > > libvirt will blindly use this driver for all devices matching this as
> > > we've discussed how it should make use of modules.alias.  I don't think
> > > this driver should be squatting on devices where it doesn't add value
> > > and it's not clear whether this is adding or subtracting value in all
> > > cases for the one NIC that it modifies.  How should libvirt choose when
> > > and where to use this driver?  What regressions are we going to see
> > > with VMs that previously saw "modern" virtio-net devices and now see a
> > > legacy compatible device?  Thanks,  
> > 
> > Maybe this approach needs to use a subsystem ID match?
> > 
> > Jason  
> 
> Maybe make users load it manually?
> 
> Please don't bind to virtio by default, you will break
> all guests.

This would never bind by default, it's only bound as a vfio override
driver, but if libvirt were trying to determine the correct driver to
use with vfio for a 0x1af4 device, it'd land on this one.  Thanks,

Alex

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-21 20:59           ` Alex Williamson
  0 siblings, 0 replies; 321+ messages in thread
From: Alex Williamson @ 2023-09-21 20:59 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Gunthorpe, Yishai Hadas, jasowang, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, 21 Sep 2023 16:20:59 -0400
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Thu, Sep 21, 2023 at 05:01:21PM -0300, Jason Gunthorpe wrote:
> > On Thu, Sep 21, 2023 at 01:58:32PM -0600, Alex Williamson wrote:
> >   
> > > > +static const struct pci_device_id virtiovf_pci_table[] = {
> > > > +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, PCI_ANY_ID) },  
> > > 
> > > libvirt will blindly use this driver for all devices matching this as
> > > we've discussed how it should make use of modules.alias.  I don't think
> > > this driver should be squatting on devices where it doesn't add value
> > > and it's not clear whether this is adding or subtracting value in all
> > > cases for the one NIC that it modifies.  How should libvirt choose when
> > > and where to use this driver?  What regressions are we going to see
> > > with VMs that previously saw "modern" virtio-net devices and now see a
> > > legacy compatible device?  Thanks,  
> > 
> > Maybe this approach needs to use a subsystem ID match?
> > 
> > Jason  
> 
> Maybe make users load it manually?
> 
> Please don't bind to virtio by default, you will break
> all guests.

This would never bind by default, it's only bound as a vfio override
driver, but if libvirt were trying to determine the correct driver to
use with vfio for a 0x1af4 device, it'd land on this one.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 20:16                         ` Michael S. Tsirkin
  (?)
@ 2023-09-21 22:48                         ` Jason Gunthorpe
  2023-09-22  9:47                             ` Michael S. Tsirkin
  -1 siblings, 1 reply; 321+ messages in thread
From: Jason Gunthorpe @ 2023-09-21 22:48 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Yishai Hadas, jasowang, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, Sep 21, 2023 at 04:16:25PM -0400, Michael S. Tsirkin wrote:
> On Thu, Sep 21, 2023 at 04:53:45PM -0300, Jason Gunthorpe wrote:
> > On Thu, Sep 21, 2023 at 03:34:03PM -0400, Michael S. Tsirkin wrote:
> > 
> > > that's easy/practical.  If instead VDPA gives the same speed with just
> > > shadow vq then keeping this hack in vfio seems like less of a problem.
> > > Finally if VDPA is faster then maybe you will reconsider using it ;)
> > 
> > It is not all about the speed.
> > 
> > VDPA presents another large and complex software stack in the
> > hypervisor that can be eliminated by simply using VFIO.
> 
> If all you want is passing through your card to guest
> then yes this can be addressed "by simply using VFIO".

That is pretty much the goal, yes.

> And let me give you a simple example just from this patchset:
> it assumes guest uses MSIX and just breaks if it doesn't.

It does? Really? Where did you see that?

> > VFIO is
> > already required for other scenarios.
> 
> Required ... by some people? Most VMs I run don't use anything
> outside of virtio.

Yes, some people. The sorts of people who run large data centers.

> It seems to deal with emulating virtio which seems more like a vdpa
> thing.

Alex described it right, it creates an SW trapped IO bar that relays
the doorbell to an admin queue command.

> If you start adding virtio emulation to vfio then won't
> you just end up with another vdpa? And if no why not?
> And I don't buy the "we already invested in this vfio based solution",
> sorry - that's not a reason upstream has to maintain it.

I think you would be well justified to object to actual mediation,
like processing queues in VFIO or otherwise complex things.

Fortunately there is no need to do that with DPU HW. The legacy IO BAR
is a weird quirk that just cannot be done without a software trap, and
the OASIS standardization effort was for exactly this kind of
simplistic transformation.

I also don't buy the "upstream has to maintain it" line. The team that
submitted it will maintain it just fine, thank you.

Jason

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 20:45                     ` Michael S. Tsirkin
  (?)
@ 2023-09-21 22:55                     ` Jason Gunthorpe
  2023-09-22  3:02                         ` Jason Wang
  2023-09-22 11:23                         ` Michael S. Tsirkin
  -1 siblings, 2 replies; 321+ messages in thread
From: Jason Gunthorpe @ 2023-09-21 22:55 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Yishai Hadas, alex.williamson, jasowang, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, Sep 21, 2023 at 04:45:45PM -0400, Michael S. Tsirkin wrote:
> On Thu, Sep 21, 2023 at 04:49:46PM -0300, Jason Gunthorpe wrote:
> > On Thu, Sep 21, 2023 at 03:13:10PM -0400, Michael S. Tsirkin wrote:
> > > On Thu, Sep 21, 2023 at 03:39:26PM -0300, Jason Gunthorpe wrote:
> > > > On Thu, Sep 21, 2023 at 12:53:04PM -0400, Michael S. Tsirkin wrote:
> > > > > > vdpa is not vfio, I don't know how you can suggest vdpa is a
> > > > > > replacement for a vfio driver. They are completely different
> > > > > > things.
> > > > > > Each side has its own strengths, and vfio especially is accelerating
> > > > > > in its capability in way that vpda is not. eg if an iommufd conversion
> > > > > > had been done by now for vdpa I might be more sympathetic.
> > > > > 
> > > > > Yea, I agree iommufd is a big problem with vdpa right now. Cindy was
> > > > > sick and I didn't know and kept assuming she's working on this. I don't
> > > > > think it's a huge amount of work though.  I'll take a look.
> > > > > Is there anything else though? Do tell.
> > > > 
> > > > Confidential compute will never work with VDPA's approach.
> > > 
> > > I don't see how what this patchset is doing is different
> > > wrt to Confidential compute - you trap IO accesses and emulate.
> > > Care to elaborate?
> > 
> > This patch series isn't about confidential compute, you asked about
> > the future. VFIO will support confidential compute in the future, VDPA
> > will not.
> 
> Nonsense it already works.

That isn't what I'm talking about. With a real PCI function and TDISP
we can actually DMA directly from the guest's memory without needing
the ugly bounce buffer hack. Then you can get decent performance.

> But I did not ask about the future since I do not believe it
> can be confidently predicted. I asked what is missing in VDPA
> now for you to add this feature there and not in VFIO.

I don't see that VDPA needs this, VDPA should process the IO BAR on
its own with its own logic, just like everything else it does.

This is specifically about avoiding mediation by relaying directly the
IO BAR operations to the device itself.

That is the entire irony, this whole scheme was designed and
standardized *specifically* to avoid complex mediation and here you
are saying we should just use mediation.

Jason

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 20:55                     ` Michael S. Tsirkin
  (?)
@ 2023-09-21 23:08                     ` Jason Gunthorpe
  -1 siblings, 0 replies; 321+ messages in thread
From: Jason Gunthorpe @ 2023-09-21 23:08 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Yishai Hadas, alex.williamson, jasowang, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, Sep 21, 2023 at 04:55:05PM -0400, Michael S. Tsirkin wrote:
> But which mediation is necessary is exactly up to the specific use-case.
> I have no idea why would you want all of VFIO to e.g. pass access to
> random config registers to the guest when it's a virtio device and the
> config registers are all nicely listed in the spec. I know nvidia
> hardware is so great, it has super robust cards with less security holes
> than the vdpa driver, but I very much doubt this is universal for all
> virtio offload cards.

The great thing about choice is that people can choose the
configuration that best meets their situation and needs.

Jason

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 19:49                 ` Jason Gunthorpe
@ 2023-09-22  3:01                     ` Jason Wang
  2023-09-22  3:01                     ` Jason Wang
  1 sibling, 0 replies; 321+ messages in thread
From: Jason Wang @ 2023-09-22  3:01 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvm, Michael S. Tsirkin, maorg, virtualization, jiri, leonro

On Fri, Sep 22, 2023 at 3:49 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Thu, Sep 21, 2023 at 03:13:10PM -0400, Michael S. Tsirkin wrote:
> > On Thu, Sep 21, 2023 at 03:39:26PM -0300, Jason Gunthorpe wrote:
> > > On Thu, Sep 21, 2023 at 12:53:04PM -0400, Michael S. Tsirkin wrote:
> > > > > vdpa is not vfio, I don't know how you can suggest vdpa is a
> > > > > replacement for a vfio driver. They are completely different
> > > > > things.
> > > > > Each side has its own strengths, and vfio especially is accelerating
> > > > > in its capability in way that vpda is not. eg if an iommufd conversion
> > > > > had been done by now for vdpa I might be more sympathetic.
> > > >
> > > > Yea, I agree iommufd is a big problem with vdpa right now. Cindy was
> > > > sick and I didn't know and kept assuming she's working on this. I don't
> > > > think it's a huge amount of work though.  I'll take a look.
> > > > Is there anything else though? Do tell.
> > >
> > > Confidential compute will never work with VDPA's approach.
> >
> > I don't see how what this patchset is doing is different
> > wrt to Confidential compute - you trap IO accesses and emulate.
> > Care to elaborate?
>
> This patch series isn't about confidential compute, you asked about
> the future. VFIO will support confidential compute in the future, VDPA
> will not.
>
> > > > There are a bunch of things that I think are important for virtio
> > > > that are completely out of scope for vfio, such as migrating
> > > > cross-vendor.
> > >
> > > VFIO supports migration, if you want to have cross-vendor migration
> > > then make a standard that describes the VFIO migration data format for
> > > virtio devices.
> >
> > This has nothing to do with data formats - you need two devices to
> > behave identically. Which is what VDPA is about really.
>
> We've been looking at VFIO live migration extensively. Device
> mediation, like VDPA does, is one legitimate approach for live
> migration. It suites a certain type of heterogeneous environment well.
>
> But, it is equally legitimate to make the devices behave the same and
> have them process a common migration data.
>
> This can happen in public with standards, or it can happen in private
> within a cloud operator's "private-standard" environment.
>
> To date, in most of my discussions, I have not seen a strong appetite
> for such public standards. In part due to the complexity.
>
> Regardles, it is not the kernel communities job to insist on one
> approach or the other.
>
> > > You are asking us to invest in the complexity of VDPA through out
> > > (keep it working, keep it secure, invest time in deploying and
> > > debugging in the field)
> > >
> > > When it doesn't provide *ANY* value to the solution.
> >
> > There's no "the solution"
>
> Nonsense.
>
> > this sounds like a vendor only caring about solutions that involve
> > that vendor's hardware exclusively, a little.
>
> Not really.
>
> Understand the DPU provider is not the vendor here. The DPU provider
> gives a cloud operator a SDK to build these things. The operator is
> the vendor from your perspective.
>
> In many cases live migration never leaves the operator's confines in
> the first place.
>
> Even when it does, there is no real use case to live migrate a
> virtio-net function from, say, AWS to GCP.

It can happen inside a single cloud vendor. For some reasons, DPU must
be purchased from different vendors. And vDPA has been used in that
case.

I've asked them to present this probably somewhere like KVM Forum.

>
> You are pushing for a lot of complexity and software that solves a
> problem people in this space don't actually have.
>
> As I said, VDPA is fine for the scenarios it addresses. It is an
> alternative, not a replacement, for VFIO.

We never try to replace VFIO. I don't see any problem by just using
the current VFIO to assign a virtio-pci device to the guest.

The problem is the mediation (or what you called relaying) layer
you've invented.

Thanks

>
> Jason
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-22  3:01                     ` Jason Wang
  0 siblings, 0 replies; 321+ messages in thread
From: Jason Wang @ 2023-09-22  3:01 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Michael S. Tsirkin, Yishai Hadas, alex.williamson, kvm,
	virtualization, parav, feliu, jiri, kevin.tian, joao.m.martins,
	leonro, maorg

On Fri, Sep 22, 2023 at 3:49 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Thu, Sep 21, 2023 at 03:13:10PM -0400, Michael S. Tsirkin wrote:
> > On Thu, Sep 21, 2023 at 03:39:26PM -0300, Jason Gunthorpe wrote:
> > > On Thu, Sep 21, 2023 at 12:53:04PM -0400, Michael S. Tsirkin wrote:
> > > > > vdpa is not vfio, I don't know how you can suggest vdpa is a
> > > > > replacement for a vfio driver. They are completely different
> > > > > things.
> > > > > Each side has its own strengths, and vfio especially is accelerating
> > > > > in its capability in way that vpda is not. eg if an iommufd conversion
> > > > > had been done by now for vdpa I might be more sympathetic.
> > > >
> > > > Yea, I agree iommufd is a big problem with vdpa right now. Cindy was
> > > > sick and I didn't know and kept assuming she's working on this. I don't
> > > > think it's a huge amount of work though.  I'll take a look.
> > > > Is there anything else though? Do tell.
> > >
> > > Confidential compute will never work with VDPA's approach.
> >
> > I don't see how what this patchset is doing is different
> > wrt to Confidential compute - you trap IO accesses and emulate.
> > Care to elaborate?
>
> This patch series isn't about confidential compute, you asked about
> the future. VFIO will support confidential compute in the future, VDPA
> will not.
>
> > > > There are a bunch of things that I think are important for virtio
> > > > that are completely out of scope for vfio, such as migrating
> > > > cross-vendor.
> > >
> > > VFIO supports migration, if you want to have cross-vendor migration
> > > then make a standard that describes the VFIO migration data format for
> > > virtio devices.
> >
> > This has nothing to do with data formats - you need two devices to
> > behave identically. Which is what VDPA is about really.
>
> We've been looking at VFIO live migration extensively. Device
> mediation, like VDPA does, is one legitimate approach for live
> migration. It suites a certain type of heterogeneous environment well.
>
> But, it is equally legitimate to make the devices behave the same and
> have them process a common migration data.
>
> This can happen in public with standards, or it can happen in private
> within a cloud operator's "private-standard" environment.
>
> To date, in most of my discussions, I have not seen a strong appetite
> for such public standards. In part due to the complexity.
>
> Regardles, it is not the kernel communities job to insist on one
> approach or the other.
>
> > > You are asking us to invest in the complexity of VDPA through out
> > > (keep it working, keep it secure, invest time in deploying and
> > > debugging in the field)
> > >
> > > When it doesn't provide *ANY* value to the solution.
> >
> > There's no "the solution"
>
> Nonsense.
>
> > this sounds like a vendor only caring about solutions that involve
> > that vendor's hardware exclusively, a little.
>
> Not really.
>
> Understand the DPU provider is not the vendor here. The DPU provider
> gives a cloud operator a SDK to build these things. The operator is
> the vendor from your perspective.
>
> In many cases live migration never leaves the operator's confines in
> the first place.
>
> Even when it does, there is no real use case to live migrate a
> virtio-net function from, say, AWS to GCP.

It can happen inside a single cloud vendor. For some reasons, DPU must
be purchased from different vendors. And vDPA has been used in that
case.

I've asked them to present this probably somewhere like KVM Forum.

>
> You are pushing for a lot of complexity and software that solves a
> problem people in this space don't actually have.
>
> As I said, VDPA is fine for the scenarios it addresses. It is an
> alternative, not a replacement, for VFIO.

We never try to replace VFIO. I don't see any problem by just using
the current VFIO to assign a virtio-pci device to the guest.

The problem is the mediation (or what you called relaying) layer
you've invented.

Thanks

>
> Jason
>


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 20:16                         ` Michael S. Tsirkin
@ 2023-09-22  3:02                           ` Jason Wang
  -1 siblings, 0 replies; 321+ messages in thread
From: Jason Wang @ 2023-09-22  3:02 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, maorg, virtualization, Jason Gunthorpe, jiri, leonro

On Fri, Sep 22, 2023 at 4:16 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, Sep 21, 2023 at 04:53:45PM -0300, Jason Gunthorpe wrote:
> > On Thu, Sep 21, 2023 at 03:34:03PM -0400, Michael S. Tsirkin wrote:
> >
> > > that's easy/practical.  If instead VDPA gives the same speed with just
> > > shadow vq then keeping this hack in vfio seems like less of a problem.
> > > Finally if VDPA is faster then maybe you will reconsider using it ;)
> >
> > It is not all about the speed.
> >
> > VDPA presents another large and complex software stack in the
> > hypervisor that can be eliminated by simply using VFIO.
>
> If all you want is passing through your card to guest
> then yes this can be addressed "by simply using VFIO".

+1.

And what's more, using MMIO BAR0 then it can work for legacy.

I have a handy virtio hardware from one vendor that works like this,
and I see it is done by a lot of other vendors.

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-22  3:02                           ` Jason Wang
  0 siblings, 0 replies; 321+ messages in thread
From: Jason Wang @ 2023-09-22  3:02 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Gunthorpe, Alex Williamson, Yishai Hadas, kvm,
	virtualization, parav, feliu, jiri, kevin.tian, joao.m.martins,
	leonro, maorg

On Fri, Sep 22, 2023 at 4:16 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, Sep 21, 2023 at 04:53:45PM -0300, Jason Gunthorpe wrote:
> > On Thu, Sep 21, 2023 at 03:34:03PM -0400, Michael S. Tsirkin wrote:
> >
> > > that's easy/practical.  If instead VDPA gives the same speed with just
> > > shadow vq then keeping this hack in vfio seems like less of a problem.
> > > Finally if VDPA is faster then maybe you will reconsider using it ;)
> >
> > It is not all about the speed.
> >
> > VDPA presents another large and complex software stack in the
> > hypervisor that can be eliminated by simply using VFIO.
>
> If all you want is passing through your card to guest
> then yes this can be addressed "by simply using VFIO".

+1.

And what's more, using MMIO BAR0 then it can work for legacy.

I have a handy virtio hardware from one vendor that works like this,
and I see it is done by a lot of other vendors.

Thanks


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 22:55                     ` Jason Gunthorpe
@ 2023-09-22  3:02                         ` Jason Wang
  2023-09-22 11:23                         ` Michael S. Tsirkin
  1 sibling, 0 replies; 321+ messages in thread
From: Jason Wang @ 2023-09-22  3:02 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvm, Michael S. Tsirkin, maorg, virtualization, jiri, leonro

On Fri, Sep 22, 2023 at 6:55 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Thu, Sep 21, 2023 at 04:45:45PM -0400, Michael S. Tsirkin wrote:
> > On Thu, Sep 21, 2023 at 04:49:46PM -0300, Jason Gunthorpe wrote:
> > > On Thu, Sep 21, 2023 at 03:13:10PM -0400, Michael S. Tsirkin wrote:
> > > > On Thu, Sep 21, 2023 at 03:39:26PM -0300, Jason Gunthorpe wrote:
> > > > > On Thu, Sep 21, 2023 at 12:53:04PM -0400, Michael S. Tsirkin wrote:
> > > > > > > vdpa is not vfio, I don't know how you can suggest vdpa is a
> > > > > > > replacement for a vfio driver. They are completely different
> > > > > > > things.
> > > > > > > Each side has its own strengths, and vfio especially is accelerating
> > > > > > > in its capability in way that vpda is not. eg if an iommufd conversion
> > > > > > > had been done by now for vdpa I might be more sympathetic.
> > > > > >
> > > > > > Yea, I agree iommufd is a big problem with vdpa right now. Cindy was
> > > > > > sick and I didn't know and kept assuming she's working on this. I don't
> > > > > > think it's a huge amount of work though.  I'll take a look.
> > > > > > Is there anything else though? Do tell.
> > > > >
> > > > > Confidential compute will never work with VDPA's approach.
> > > >
> > > > I don't see how what this patchset is doing is different
> > > > wrt to Confidential compute - you trap IO accesses and emulate.
> > > > Care to elaborate?
> > >
> > > This patch series isn't about confidential compute, you asked about
> > > the future. VFIO will support confidential compute in the future, VDPA
> > > will not.

What blocks vDPA from supporting that?

> >
> > Nonsense it already works.
>
> That isn't what I'm talking about. With a real PCI function and TDISP
> we can actually DMA directly from the guest's memory without needing
> the ugly bounce buffer hack. Then you can get decent performance.

This series requires the trapping in the legacy I/O BAR in VFIO. Why
can TDISP work when trapping in VFIO but not vDPA? If neither, how can
TDISP affect here?

>
> > But I did not ask about the future since I do not believe it
> > can be confidently predicted. I asked what is missing in VDPA
> > now for you to add this feature there and not in VFIO.
>
> I don't see that VDPA needs this, VDPA should process the IO BAR on
> its own with its own logic, just like everything else it does.
>
> This is specifically about avoiding mediation by relaying directly the
> IO BAR operations to the device itself.

So we had:

1) a new virtio specific driver for VFIO
2) the existing vp_vdpa driver

How much differences between them in the context of the mediation or
relaying? Or is it hard to introduce admin commands in the vDPA bus?

> That is the entire irony, this whole scheme was designed and
> standardized *specifically* to avoid complex mediation and here you
> are saying we should just use mediation.

No, using "simple VFIO passthrough" is just fine.

Thanks

>
> Jason
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-22  3:02                         ` Jason Wang
  0 siblings, 0 replies; 321+ messages in thread
From: Jason Wang @ 2023-09-22  3:02 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Michael S. Tsirkin, Yishai Hadas, alex.williamson, kvm,
	virtualization, parav, feliu, jiri, kevin.tian, joao.m.martins,
	leonro, maorg

On Fri, Sep 22, 2023 at 6:55 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Thu, Sep 21, 2023 at 04:45:45PM -0400, Michael S. Tsirkin wrote:
> > On Thu, Sep 21, 2023 at 04:49:46PM -0300, Jason Gunthorpe wrote:
> > > On Thu, Sep 21, 2023 at 03:13:10PM -0400, Michael S. Tsirkin wrote:
> > > > On Thu, Sep 21, 2023 at 03:39:26PM -0300, Jason Gunthorpe wrote:
> > > > > On Thu, Sep 21, 2023 at 12:53:04PM -0400, Michael S. Tsirkin wrote:
> > > > > > > vdpa is not vfio, I don't know how you can suggest vdpa is a
> > > > > > > replacement for a vfio driver. They are completely different
> > > > > > > things.
> > > > > > > Each side has its own strengths, and vfio especially is accelerating
> > > > > > > in its capability in way that vpda is not. eg if an iommufd conversion
> > > > > > > had been done by now for vdpa I might be more sympathetic.
> > > > > >
> > > > > > Yea, I agree iommufd is a big problem with vdpa right now. Cindy was
> > > > > > sick and I didn't know and kept assuming she's working on this. I don't
> > > > > > think it's a huge amount of work though.  I'll take a look.
> > > > > > Is there anything else though? Do tell.
> > > > >
> > > > > Confidential compute will never work with VDPA's approach.
> > > >
> > > > I don't see how what this patchset is doing is different
> > > > wrt to Confidential compute - you trap IO accesses and emulate.
> > > > Care to elaborate?
> > >
> > > This patch series isn't about confidential compute, you asked about
> > > the future. VFIO will support confidential compute in the future, VDPA
> > > will not.

What blocks vDPA from supporting that?

> >
> > Nonsense it already works.
>
> That isn't what I'm talking about. With a real PCI function and TDISP
> we can actually DMA directly from the guest's memory without needing
> the ugly bounce buffer hack. Then you can get decent performance.

This series requires the trapping in the legacy I/O BAR in VFIO. Why
can TDISP work when trapping in VFIO but not vDPA? If neither, how can
TDISP affect here?

>
> > But I did not ask about the future since I do not believe it
> > can be confidently predicted. I asked what is missing in VDPA
> > now for you to add this feature there and not in VFIO.
>
> I don't see that VDPA needs this, VDPA should process the IO BAR on
> its own with its own logic, just like everything else it does.
>
> This is specifically about avoiding mediation by relaying directly the
> IO BAR operations to the device itself.

So we had:

1) a new virtio specific driver for VFIO
2) the existing vp_vdpa driver

How much differences between them in the context of the mediation or
relaying? Or is it hard to introduce admin commands in the vDPA bus?

> That is the entire irony, this whole scheme was designed and
> standardized *specifically* to avoid complex mediation and here you
> are saying we should just use mediation.

No, using "simple VFIO passthrough" is just fine.

Thanks

>
> Jason
>


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 19:53                     ` Jason Gunthorpe
@ 2023-09-22  3:02                         ` Jason Wang
  2023-09-22  3:02                         ` Jason Wang
  1 sibling, 0 replies; 321+ messages in thread
From: Jason Wang @ 2023-09-22  3:02 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Michael S. Tsirkin, Alex Williamson, Yishai Hadas, kvm,
	virtualization, parav, feliu, jiri, kevin.tian, joao.m.martins,
	leonro, maorg

On Fri, Sep 22, 2023 at 3:53 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Thu, Sep 21, 2023 at 03:34:03PM -0400, Michael S. Tsirkin wrote:
>
> > that's easy/practical.  If instead VDPA gives the same speed with just
> > shadow vq then keeping this hack in vfio seems like less of a problem.
> > Finally if VDPA is faster then maybe you will reconsider using it ;)
>
> It is not all about the speed.
>
> VDPA presents another large and complex software stack in the
> hypervisor that can be eliminated by simply using VFIO.

vDPA supports standard virtio devices so how did you define complexity?

From the view of the application, what it wants is a simple virtio
device but not virtio-pci devices. That is what vDPA tries to present.

By simply counting LOCs: vdpa + vhost + vp_vdpa is much less code than
what VFIO had. It's not hard to expect, it will still be much less
even if iommufd is done.

Thanks



> VFIO is
> already required for other scenarios.
>
> This is about reducing complexity, reducing attack surface and
> increasing maintainability of the hypervisor environment.
>
> Jason
>
>


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-22  3:02                         ` Jason Wang
  0 siblings, 0 replies; 321+ messages in thread
From: Jason Wang @ 2023-09-22  3:02 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvm, Michael S. Tsirkin, maorg, virtualization, jiri, leonro

On Fri, Sep 22, 2023 at 3:53 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Thu, Sep 21, 2023 at 03:34:03PM -0400, Michael S. Tsirkin wrote:
>
> > that's easy/practical.  If instead VDPA gives the same speed with just
> > shadow vq then keeping this hack in vfio seems like less of a problem.
> > Finally if VDPA is faster then maybe you will reconsider using it ;)
>
> It is not all about the speed.
>
> VDPA presents another large and complex software stack in the
> hypervisor that can be eliminated by simply using VFIO.

vDPA supports standard virtio devices so how did you define complexity?

From the view of the application, what it wants is a simple virtio
device but not virtio-pci devices. That is what vDPA tries to present.

By simply counting LOCs: vdpa + vhost + vp_vdpa is much less code than
what VFIO had. It's not hard to expect, it will still be much less
even if iommufd is done.

Thanks



> VFIO is
> already required for other scenarios.
>
> This is about reducing complexity, reducing attack surface and
> increasing maintainability of the hypervisor environment.
>
> Jason
>
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 18:39             ` Jason Gunthorpe
@ 2023-09-22  3:45                 ` Zhu, Lingshan
  2023-09-21 19:17                 ` Michael S. Tsirkin
  2023-09-22  3:45                 ` Zhu, Lingshan
  2 siblings, 0 replies; 321+ messages in thread
From: Zhu, Lingshan @ 2023-09-22  3:45 UTC (permalink / raw)
  To: Jason Gunthorpe, Michael S. Tsirkin
  Cc: Yishai Hadas, alex.williamson, jasowang, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg



On 9/22/2023 2:39 AM, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 12:53:04PM -0400, Michael S. Tsirkin wrote:
>>> vdpa is not vfio, I don't know how you can suggest vdpa is a
>>> replacement for a vfio driver. They are completely different
>>> things.
>>> Each side has its own strengths, and vfio especially is accelerating
>>> in its capability in way that vpda is not. eg if an iommufd conversion
>>> had been done by now for vdpa I might be more sympathetic.
>> Yea, I agree iommufd is a big problem with vdpa right now. Cindy was
>> sick and I didn't know and kept assuming she's working on this. I don't
>> think it's a huge amount of work though.  I'll take a look.
>> Is there anything else though? Do tell.
> Confidential compute will never work with VDPA's approach.
I don't understand why vDPA can not and will never support Confidential 
computing?

Do you see any blockers?
>
>> There are a bunch of things that I think are important for virtio
>> that are completely out of scope for vfio, such as migrating
>> cross-vendor.
> VFIO supports migration, if you want to have cross-vendor migration
> then make a standard that describes the VFIO migration data format for
> virtio devices.
>
>> What is the huge amount of work am I asking to do?
> You are asking us to invest in the complexity of VDPA through out
> (keep it working, keep it secure, invest time in deploying and
> debugging in the field)
>
> When it doesn't provide *ANY* value to the solution.
>
> The starting point is a completely working vfio PCI function and the
> end goal is to put that function into a VM. That is VFIO, not VDPA.
>
> VPDA is fine for what it does, but it is not a reasonable replacement
> for VFIO.
>
> Jason


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-22  3:45                 ` Zhu, Lingshan
  0 siblings, 0 replies; 321+ messages in thread
From: Zhu, Lingshan @ 2023-09-22  3:45 UTC (permalink / raw)
  To: Jason Gunthorpe, Michael S. Tsirkin
  Cc: kvm, maorg, virtualization, jiri, leonro



On 9/22/2023 2:39 AM, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 12:53:04PM -0400, Michael S. Tsirkin wrote:
>>> vdpa is not vfio, I don't know how you can suggest vdpa is a
>>> replacement for a vfio driver. They are completely different
>>> things.
>>> Each side has its own strengths, and vfio especially is accelerating
>>> in its capability in way that vpda is not. eg if an iommufd conversion
>>> had been done by now for vdpa I might be more sympathetic.
>> Yea, I agree iommufd is a big problem with vdpa right now. Cindy was
>> sick and I didn't know and kept assuming she's working on this. I don't
>> think it's a huge amount of work though.  I'll take a look.
>> Is there anything else though? Do tell.
> Confidential compute will never work with VDPA's approach.
I don't understand why vDPA can not and will never support Confidential 
computing?

Do you see any blockers?
>
>> There are a bunch of things that I think are important for virtio
>> that are completely out of scope for vfio, such as migrating
>> cross-vendor.
> VFIO supports migration, if you want to have cross-vendor migration
> then make a standard that describes the VFIO migration data format for
> virtio devices.
>
>> What is the huge amount of work am I asking to do?
> You are asking us to invest in the complexity of VDPA through out
> (keep it working, keep it secure, invest time in deploying and
> debugging in the field)
>
> When it doesn't provide *ANY* value to the solution.
>
> The starting point is a completely working vfio PCI function and the
> end goal is to put that function into a VM. That is VFIO, not VDPA.
>
> VPDA is fine for what it does, but it is not a reasonable replacement
> for VFIO.
>
> Jason

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 22:48                         ` Jason Gunthorpe
@ 2023-09-22  9:47                             ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-22  9:47 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kvm, maorg, virtualization, jiri, leonro

On Thu, Sep 21, 2023 at 07:48:36PM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 04:16:25PM -0400, Michael S. Tsirkin wrote:
> > On Thu, Sep 21, 2023 at 04:53:45PM -0300, Jason Gunthorpe wrote:
> > > On Thu, Sep 21, 2023 at 03:34:03PM -0400, Michael S. Tsirkin wrote:
> > > 
> > > > that's easy/practical.  If instead VDPA gives the same speed with just
> > > > shadow vq then keeping this hack in vfio seems like less of a problem.
> > > > Finally if VDPA is faster then maybe you will reconsider using it ;)
> > > 
> > > It is not all about the speed.
> > > 
> > > VDPA presents another large and complex software stack in the
> > > hypervisor that can be eliminated by simply using VFIO.
> > 
> > If all you want is passing through your card to guest
> > then yes this can be addressed "by simply using VFIO".
> 
> That is pretty much the goal, yes.
> 
> > And let me give you a simple example just from this patchset:
> > it assumes guest uses MSIX and just breaks if it doesn't.
> 
> It does? Really? Where did you see that?

This thing apparently:

+               opcode = (pos < VIRTIO_PCI_CONFIG_OFF(true)) ?
+                       VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ :
+                       VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ;

That "true" is supposed to be whether guest enabled MSI or not.


> > > VFIO is
> > > already required for other scenarios.
> > 
> > Required ... by some people? Most VMs I run don't use anything
> > outside of virtio.
> 
> Yes, some people. The sorts of people who run large data centers.
>
> > It seems to deal with emulating virtio which seems more like a vdpa
> > thing.
> 
> Alex described it right, it creates an SW trapped IO bar that relays
> the doorbell to an admin queue command.
> 
> > If you start adding virtio emulation to vfio then won't
> > you just end up with another vdpa? And if no why not?
> > And I don't buy the "we already invested in this vfio based solution",
> > sorry - that's not a reason upstream has to maintain it.
> 
> I think you would be well justified to object to actual mediation,
> like processing queues in VFIO or otherwise complex things.

This mediation is kind of smallish, I agree. Not completely devoid of
logic though.

> Fortunately there is no need to do that with DPU HW. The legacy IO BAR
> is a weird quirk that just cannot be done without a software trap, and
> the OASIS standardization effort was for exactly this kind of
> simplistic transformation.
> 
> I also don't buy the "upstream has to maintain it" line. The team that
> submitted it will maintain it just fine, thank you.

it will require maintainance effort when virtio changes are made.  For
example it pokes at the device state - I don't see specific races right
now but in the past we did e.g. reset the device to recover from errors
and we might start doing it again.

If more of the logic is under virtio directory where we'll remember
to keep it in loop, and will be able to reuse it from vdpa
down the road, I would be more sympathetic.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-22  9:47                             ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-22  9:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alex Williamson, Yishai Hadas, jasowang, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, Sep 21, 2023 at 07:48:36PM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 04:16:25PM -0400, Michael S. Tsirkin wrote:
> > On Thu, Sep 21, 2023 at 04:53:45PM -0300, Jason Gunthorpe wrote:
> > > On Thu, Sep 21, 2023 at 03:34:03PM -0400, Michael S. Tsirkin wrote:
> > > 
> > > > that's easy/practical.  If instead VDPA gives the same speed with just
> > > > shadow vq then keeping this hack in vfio seems like less of a problem.
> > > > Finally if VDPA is faster then maybe you will reconsider using it ;)
> > > 
> > > It is not all about the speed.
> > > 
> > > VDPA presents another large and complex software stack in the
> > > hypervisor that can be eliminated by simply using VFIO.
> > 
> > If all you want is passing through your card to guest
> > then yes this can be addressed "by simply using VFIO".
> 
> That is pretty much the goal, yes.
> 
> > And let me give you a simple example just from this patchset:
> > it assumes guest uses MSIX and just breaks if it doesn't.
> 
> It does? Really? Where did you see that?

This thing apparently:

+               opcode = (pos < VIRTIO_PCI_CONFIG_OFF(true)) ?
+                       VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ :
+                       VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ;

That "true" is supposed to be whether guest enabled MSI or not.


> > > VFIO is
> > > already required for other scenarios.
> > 
> > Required ... by some people? Most VMs I run don't use anything
> > outside of virtio.
> 
> Yes, some people. The sorts of people who run large data centers.
>
> > It seems to deal with emulating virtio which seems more like a vdpa
> > thing.
> 
> Alex described it right, it creates an SW trapped IO bar that relays
> the doorbell to an admin queue command.
> 
> > If you start adding virtio emulation to vfio then won't
> > you just end up with another vdpa? And if no why not?
> > And I don't buy the "we already invested in this vfio based solution",
> > sorry - that's not a reason upstream has to maintain it.
> 
> I think you would be well justified to object to actual mediation,
> like processing queues in VFIO or otherwise complex things.

This mediation is kind of smallish, I agree. Not completely devoid of
logic though.

> Fortunately there is no need to do that with DPU HW. The legacy IO BAR
> is a weird quirk that just cannot be done without a software trap, and
> the OASIS standardization effort was for exactly this kind of
> simplistic transformation.
> 
> I also don't buy the "upstream has to maintain it" line. The team that
> submitted it will maintain it just fine, thank you.

it will require maintainance effort when virtio changes are made.  For
example it pokes at the device state - I don't see specific races right
now but in the past we did e.g. reset the device to recover from errors
and we might start doing it again.

If more of the logic is under virtio directory where we'll remember
to keep it in loop, and will be able to reuse it from vdpa
down the road, I would be more sympathetic.

-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-09-21 12:40   ` Yishai Hadas
@ 2023-09-22  9:54     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-22  9:54 UTC (permalink / raw)
  To: Yishai Hadas; +Cc: kvm, maorg, virtualization, jgg, jiri, leonro

On Thu, Sep 21, 2023 at 03:40:39PM +0300, Yishai Hadas wrote:
> Expose admin commands over the virtio device, to be used by the
> vfio-virtio driver in the next patches.
> 
> It includes: list query/use, legacy write/read, read notify_info.
> 
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>


This stuff is pure virtio spec. I think it should live under
drivers/virtio, too.

> ---
>  drivers/vfio/pci/virtio/cmd.c | 146 ++++++++++++++++++++++++++++++++++
>  drivers/vfio/pci/virtio/cmd.h |  27 +++++++
>  2 files changed, 173 insertions(+)
>  create mode 100644 drivers/vfio/pci/virtio/cmd.c
>  create mode 100644 drivers/vfio/pci/virtio/cmd.h
> 
> diff --git a/drivers/vfio/pci/virtio/cmd.c b/drivers/vfio/pci/virtio/cmd.c
> new file mode 100644
> index 000000000000..f068239cdbb0
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/cmd.c
> @@ -0,0 +1,146 @@
> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> +/*
> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> + */
> +
> +#include "cmd.h"
> +
> +int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
> +{
> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> +	struct scatterlist out_sg;
> +	struct virtio_admin_cmd cmd = {};
> +
> +	if (!virtio_dev)
> +		return -ENOTCONN;
> +
> +	sg_init_one(&out_sg, buf, buf_size);
> +	cmd.opcode = VIRTIO_ADMIN_CMD_LIST_QUERY;
> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> +	cmd.result_sg = &out_sg;
> +
> +	return virtio_admin_cmd_exec(virtio_dev, &cmd);
> +}
> +
> +int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
> +{
> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> +	struct scatterlist in_sg;
> +	struct virtio_admin_cmd cmd = {};
> +
> +	if (!virtio_dev)
> +		return -ENOTCONN;
> +
> +	sg_init_one(&in_sg, buf, buf_size);
> +	cmd.opcode = VIRTIO_ADMIN_CMD_LIST_USE;
> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> +	cmd.data_sg = &in_sg;
> +
> +	return virtio_admin_cmd_exec(virtio_dev, &cmd);
> +}
> +
> +int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> +			  u8 offset, u8 size, u8 *buf)
> +{
> +	struct virtio_device *virtio_dev =
> +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> +	struct virtio_admin_cmd_data_lr_write *in;
> +	struct scatterlist in_sg;
> +	struct virtio_admin_cmd cmd = {};
> +	int ret;
> +
> +	if (!virtio_dev)
> +		return -ENOTCONN;
> +
> +	in = kzalloc(sizeof(*in) + size, GFP_KERNEL);
> +	if (!in)
> +		return -ENOMEM;
> +
> +	in->offset = offset;
> +	memcpy(in->registers, buf, size);
> +	sg_init_one(&in_sg, in, sizeof(*in) + size);
> +	cmd.opcode = opcode;
> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> +	cmd.group_member_id = virtvdev->vf_id + 1;
> +	cmd.data_sg = &in_sg;
> +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
> +
> +	kfree(in);
> +	return ret;
> +}
> +
> +int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> +			 u8 offset, u8 size, u8 *buf)
> +{
> +	struct virtio_device *virtio_dev =
> +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> +	struct virtio_admin_cmd_data_lr_read *in;
> +	struct scatterlist in_sg, out_sg;
> +	struct virtio_admin_cmd cmd = {};
> +	int ret;
> +
> +	if (!virtio_dev)
> +		return -ENOTCONN;
> +
> +	in = kzalloc(sizeof(*in), GFP_KERNEL);
> +	if (!in)
> +		return -ENOMEM;
> +
> +	in->offset = offset;
> +	sg_init_one(&in_sg, in, sizeof(*in));
> +	sg_init_one(&out_sg, buf, size);
> +	cmd.opcode = opcode;
> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> +	cmd.data_sg = &in_sg;
> +	cmd.result_sg = &out_sg;
> +	cmd.group_member_id = virtvdev->vf_id + 1;
> +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
> +
> +	kfree(in);
> +	return ret;
> +}
> +
> +int virtiovf_cmd_lq_read_notify(struct virtiovf_pci_core_device *virtvdev,
> +				u8 req_bar_flags, u8 *bar, u64 *bar_offset)
> +{
> +	struct virtio_device *virtio_dev =
> +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> +	struct virtio_admin_cmd_notify_info_result *out;
> +	struct scatterlist out_sg;
> +	struct virtio_admin_cmd cmd = {};
> +	int ret;
> +
> +	if (!virtio_dev)
> +		return -ENOTCONN;
> +
> +	out = kzalloc(sizeof(*out), GFP_KERNEL);
> +	if (!out)
> +		return -ENOMEM;
> +
> +	sg_init_one(&out_sg, out, sizeof(*out));
> +	cmd.opcode = VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO;
> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> +	cmd.result_sg = &out_sg;
> +	cmd.group_member_id = virtvdev->vf_id + 1;
> +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
> +	if (!ret) {
> +		struct virtio_admin_cmd_notify_info_data *entry;
> +		int i;
> +
> +		ret = -ENOENT;
> +		for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
> +			entry = &out->entries[i];
> +			if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
> +				break;
> +			if (entry->flags != req_bar_flags)
> +				continue;
> +			*bar = entry->bar;
> +			*bar_offset = le64_to_cpu(entry->offset);
> +			ret = 0;
> +			break;
> +		}
> +	}
> +
> +	kfree(out);
> +	return ret;
> +}
> diff --git a/drivers/vfio/pci/virtio/cmd.h b/drivers/vfio/pci/virtio/cmd.h
> new file mode 100644
> index 000000000000..c2a3645f4b90
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/cmd.h
> @@ -0,0 +1,27 @@
> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> +/*
> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
> + */
> +
> +#ifndef VIRTIO_VFIO_CMD_H
> +#define VIRTIO_VFIO_CMD_H
> +
> +#include <linux/kernel.h>
> +#include <linux/virtio.h>
> +#include <linux/vfio_pci_core.h>
> +#include <linux/virtio_pci.h>
> +
> +struct virtiovf_pci_core_device {
> +	struct vfio_pci_core_device core_device;
> +	int vf_id;
> +};
> +
> +int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
> +int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size);
> +int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> +			  u8 offset, u8 size, u8 *buf);
> +int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> +			 u8 offset, u8 size, u8 *buf);
> +int virtiovf_cmd_lq_read_notify(struct virtiovf_pci_core_device *virtvdev,
> +				u8 req_bar_flags, u8 *bar, u64 *bar_offset);
> +#endif /* VIRTIO_VFIO_CMD_H */
> -- 
> 2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-09-22  9:54     ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-22  9:54 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: alex.williamson, jasowang, jgg, kvm, virtualization, parav,
	feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, Sep 21, 2023 at 03:40:39PM +0300, Yishai Hadas wrote:
> Expose admin commands over the virtio device, to be used by the
> vfio-virtio driver in the next patches.
> 
> It includes: list query/use, legacy write/read, read notify_info.
> 
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>


This stuff is pure virtio spec. I think it should live under
drivers/virtio, too.

> ---
>  drivers/vfio/pci/virtio/cmd.c | 146 ++++++++++++++++++++++++++++++++++
>  drivers/vfio/pci/virtio/cmd.h |  27 +++++++
>  2 files changed, 173 insertions(+)
>  create mode 100644 drivers/vfio/pci/virtio/cmd.c
>  create mode 100644 drivers/vfio/pci/virtio/cmd.h
> 
> diff --git a/drivers/vfio/pci/virtio/cmd.c b/drivers/vfio/pci/virtio/cmd.c
> new file mode 100644
> index 000000000000..f068239cdbb0
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/cmd.c
> @@ -0,0 +1,146 @@
> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> +/*
> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> + */
> +
> +#include "cmd.h"
> +
> +int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
> +{
> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> +	struct scatterlist out_sg;
> +	struct virtio_admin_cmd cmd = {};
> +
> +	if (!virtio_dev)
> +		return -ENOTCONN;
> +
> +	sg_init_one(&out_sg, buf, buf_size);
> +	cmd.opcode = VIRTIO_ADMIN_CMD_LIST_QUERY;
> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> +	cmd.result_sg = &out_sg;
> +
> +	return virtio_admin_cmd_exec(virtio_dev, &cmd);
> +}
> +
> +int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
> +{
> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> +	struct scatterlist in_sg;
> +	struct virtio_admin_cmd cmd = {};
> +
> +	if (!virtio_dev)
> +		return -ENOTCONN;
> +
> +	sg_init_one(&in_sg, buf, buf_size);
> +	cmd.opcode = VIRTIO_ADMIN_CMD_LIST_USE;
> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> +	cmd.data_sg = &in_sg;
> +
> +	return virtio_admin_cmd_exec(virtio_dev, &cmd);
> +}
> +
> +int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> +			  u8 offset, u8 size, u8 *buf)
> +{
> +	struct virtio_device *virtio_dev =
> +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> +	struct virtio_admin_cmd_data_lr_write *in;
> +	struct scatterlist in_sg;
> +	struct virtio_admin_cmd cmd = {};
> +	int ret;
> +
> +	if (!virtio_dev)
> +		return -ENOTCONN;
> +
> +	in = kzalloc(sizeof(*in) + size, GFP_KERNEL);
> +	if (!in)
> +		return -ENOMEM;
> +
> +	in->offset = offset;
> +	memcpy(in->registers, buf, size);
> +	sg_init_one(&in_sg, in, sizeof(*in) + size);
> +	cmd.opcode = opcode;
> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> +	cmd.group_member_id = virtvdev->vf_id + 1;
> +	cmd.data_sg = &in_sg;
> +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
> +
> +	kfree(in);
> +	return ret;
> +}
> +
> +int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> +			 u8 offset, u8 size, u8 *buf)
> +{
> +	struct virtio_device *virtio_dev =
> +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> +	struct virtio_admin_cmd_data_lr_read *in;
> +	struct scatterlist in_sg, out_sg;
> +	struct virtio_admin_cmd cmd = {};
> +	int ret;
> +
> +	if (!virtio_dev)
> +		return -ENOTCONN;
> +
> +	in = kzalloc(sizeof(*in), GFP_KERNEL);
> +	if (!in)
> +		return -ENOMEM;
> +
> +	in->offset = offset;
> +	sg_init_one(&in_sg, in, sizeof(*in));
> +	sg_init_one(&out_sg, buf, size);
> +	cmd.opcode = opcode;
> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> +	cmd.data_sg = &in_sg;
> +	cmd.result_sg = &out_sg;
> +	cmd.group_member_id = virtvdev->vf_id + 1;
> +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
> +
> +	kfree(in);
> +	return ret;
> +}
> +
> +int virtiovf_cmd_lq_read_notify(struct virtiovf_pci_core_device *virtvdev,
> +				u8 req_bar_flags, u8 *bar, u64 *bar_offset)
> +{
> +	struct virtio_device *virtio_dev =
> +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> +	struct virtio_admin_cmd_notify_info_result *out;
> +	struct scatterlist out_sg;
> +	struct virtio_admin_cmd cmd = {};
> +	int ret;
> +
> +	if (!virtio_dev)
> +		return -ENOTCONN;
> +
> +	out = kzalloc(sizeof(*out), GFP_KERNEL);
> +	if (!out)
> +		return -ENOMEM;
> +
> +	sg_init_one(&out_sg, out, sizeof(*out));
> +	cmd.opcode = VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO;
> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> +	cmd.result_sg = &out_sg;
> +	cmd.group_member_id = virtvdev->vf_id + 1;
> +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
> +	if (!ret) {
> +		struct virtio_admin_cmd_notify_info_data *entry;
> +		int i;
> +
> +		ret = -ENOENT;
> +		for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
> +			entry = &out->entries[i];
> +			if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
> +				break;
> +			if (entry->flags != req_bar_flags)
> +				continue;
> +			*bar = entry->bar;
> +			*bar_offset = le64_to_cpu(entry->offset);
> +			ret = 0;
> +			break;
> +		}
> +	}
> +
> +	kfree(out);
> +	return ret;
> +}
> diff --git a/drivers/vfio/pci/virtio/cmd.h b/drivers/vfio/pci/virtio/cmd.h
> new file mode 100644
> index 000000000000..c2a3645f4b90
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/cmd.h
> @@ -0,0 +1,27 @@
> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> +/*
> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
> + */
> +
> +#ifndef VIRTIO_VFIO_CMD_H
> +#define VIRTIO_VFIO_CMD_H
> +
> +#include <linux/kernel.h>
> +#include <linux/virtio.h>
> +#include <linux/vfio_pci_core.h>
> +#include <linux/virtio_pci.h>
> +
> +struct virtiovf_pci_core_device {
> +	struct vfio_pci_core_device core_device;
> +	int vf_id;
> +};
> +
> +int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
> +int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size);
> +int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> +			  u8 offset, u8 size, u8 *buf);
> +int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> +			 u8 offset, u8 size, u8 *buf);
> +int virtiovf_cmd_lq_read_notify(struct virtiovf_pci_core_device *virtvdev,
> +				u8 req_bar_flags, u8 *bar, u64 *bar_offset);
> +#endif /* VIRTIO_VFIO_CMD_H */
> -- 
> 2.27.0


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 12:40   ` Yishai Hadas
@ 2023-09-22 10:10     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-22 10:10 UTC (permalink / raw)
  To: Yishai Hadas; +Cc: kvm, maorg, virtualization, jgg, jiri, leonro

On Thu, Sep 21, 2023 at 03:40:40PM +0300, Yishai Hadas wrote:
> Introduce a vfio driver over virtio devices to support the legacy
> interface functionality for VFs.
> 
> Background, from the virtio spec [1].
> --------------------------------------------------------------------
> In some systems, there is a need to support a virtio legacy driver with
> a device that does not directly support the legacy interface. In such
> scenarios, a group owner device can provide the legacy interface
> functionality for the group member devices. The driver of the owner
> device can then access the legacy interface of a member device on behalf
> of the legacy member device driver.
> 
> For example, with the SR-IOV group type, group members (VFs) can not
> present the legacy interface in an I/O BAR in BAR0 as expected by the
> legacy pci driver. If the legacy driver is running inside a virtual
> machine, the hypervisor executing the virtual machine can present a
> virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
> legacy driver accesses to this I/O BAR and forwards them to the group
> owner device (PF) using group administration commands.
> --------------------------------------------------------------------
> 
> Specifically, this driver adds support for a virtio-net VF to be exposed
> as a transitional device to a guest driver and allows the legacy IO BAR
> functionality on top.
> 
> This allows a VM which uses a legacy virtio-net driver in the guest to
> work transparently over a VF which its driver in the host is that new
> driver.
> 
> The driver can be extended easily to support some other types of virtio
> devices (e.g virtio-blk), by adding in a few places the specific type
> properties as was done for virtio-net.
> 
> For now, only the virtio-net use case was tested and as such we introduce
> the support only for such a device.
> 
> Practically,
> Upon probing a VF for a virtio-net device, in case its PF supports
> legacy access over the virtio admin commands and the VF doesn't have BAR
> 0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
> transitional device with I/O BAR in BAR 0.
> 
> The existence of the simulated I/O bar is reported later on by
> overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
> exposes itself as a transitional device by overwriting some properties
> upon reading its config space.
> 
> Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
> guest may use it via read/write calls according to the virtio
> specification.
> 
> Any read/write towards the control parts of the BAR will be captured by
> the new driver and will be translated into admin commands towards the
> device.
> 
> Any data path read/write access (i.e. virtio driver notifications) will
> be forwarded to the physical BAR which its properties were supplied by
> the command VIRTIO_PCI_QUEUE_NOTIFY upon the probing/init flow.
> 
> With that code in place a legacy driver in the guest has the look and
> feel as if having a transitional device with legacy support for both its
> control and data path flows.
> 
> [1]
> https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
> 
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> ---
>  MAINTAINERS                      |   6 +
>  drivers/vfio/pci/Kconfig         |   2 +
>  drivers/vfio/pci/Makefile        |   2 +
>  drivers/vfio/pci/virtio/Kconfig  |  15 +
>  drivers/vfio/pci/virtio/Makefile |   4 +
>  drivers/vfio/pci/virtio/cmd.c    |   4 +-
>  drivers/vfio/pci/virtio/cmd.h    |   8 +
>  drivers/vfio/pci/virtio/main.c   | 546 +++++++++++++++++++++++++++++++
>  8 files changed, 585 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/vfio/pci/virtio/Kconfig
>  create mode 100644 drivers/vfio/pci/virtio/Makefile
>  create mode 100644 drivers/vfio/pci/virtio/main.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index bf0f54c24f81..5098418c8389 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -22624,6 +22624,12 @@ L:	kvm@vger.kernel.org
>  S:	Maintained
>  F:	drivers/vfio/pci/mlx5/
>  
> +VFIO VIRTIO PCI DRIVER
> +M:	Yishai Hadas <yishaih@nvidia.com>
> +L:	kvm@vger.kernel.org
> +S:	Maintained
> +F:	drivers/vfio/pci/virtio
> +
>  VFIO PCI DEVICE SPECIFIC DRIVERS
>  R:	Jason Gunthorpe <jgg@nvidia.com>
>  R:	Yishai Hadas <yishaih@nvidia.com>
> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> index 8125e5f37832..18c397df566d 100644
> --- a/drivers/vfio/pci/Kconfig
> +++ b/drivers/vfio/pci/Kconfig
> @@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
>  
>  source "drivers/vfio/pci/pds/Kconfig"
>  
> +source "drivers/vfio/pci/virtio/Kconfig"
> +
>  endmenu
> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
> index 45167be462d8..046139a4eca5 100644
> --- a/drivers/vfio/pci/Makefile
> +++ b/drivers/vfio/pci/Makefile
> @@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
>  obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
>  
>  obj-$(CONFIG_PDS_VFIO_PCI) += pds/
> +
> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
> diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
> new file mode 100644
> index 000000000000..89eddce8b1bd
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/Kconfig
> @@ -0,0 +1,15 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +config VIRTIO_VFIO_PCI
> +        tristate "VFIO support for VIRTIO PCI devices"
> +        depends on VIRTIO_PCI
> +        select VFIO_PCI_CORE
> +        help
> +          This provides support for exposing VIRTIO VF devices using the VFIO
> +          framework that can work with a legacy virtio driver in the guest.
> +          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
> +          not indicate I/O Space.
> +          As of that this driver emulated I/O BAR in software to let a VF be
> +          seen as a transitional device in the guest and let it work with
> +          a legacy driver.
> +
> +          If you don't know what to do here, say N.
> diff --git a/drivers/vfio/pci/virtio/Makefile b/drivers/vfio/pci/virtio/Makefile
> new file mode 100644
> index 000000000000..584372648a03
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/Makefile
> @@ -0,0 +1,4 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio-vfio-pci.o
> +virtio-vfio-pci-y := main.o cmd.o
> +
> diff --git a/drivers/vfio/pci/virtio/cmd.c b/drivers/vfio/pci/virtio/cmd.c
> index f068239cdbb0..aea9d25fbf1d 100644
> --- a/drivers/vfio/pci/virtio/cmd.c
> +++ b/drivers/vfio/pci/virtio/cmd.c
> @@ -44,7 +44,7 @@ int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
>  {
>  	struct virtio_device *virtio_dev =
>  		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> -	struct virtio_admin_cmd_data_lr_write *in;
> +	struct virtio_admin_cmd_legacy_wr_data *in;
>  	struct scatterlist in_sg;
>  	struct virtio_admin_cmd cmd = {};
>  	int ret;
> @@ -74,7 +74,7 @@ int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
>  {
>  	struct virtio_device *virtio_dev =
>  		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> -	struct virtio_admin_cmd_data_lr_read *in;
> +	struct virtio_admin_cmd_legacy_rd_data *in;
>  	struct scatterlist in_sg, out_sg;
>  	struct virtio_admin_cmd cmd = {};
>  	int ret;
> diff --git a/drivers/vfio/pci/virtio/cmd.h b/drivers/vfio/pci/virtio/cmd.h
> index c2a3645f4b90..347b1dc85570 100644
> --- a/drivers/vfio/pci/virtio/cmd.h
> +++ b/drivers/vfio/pci/virtio/cmd.h
> @@ -13,7 +13,15 @@
>  
>  struct virtiovf_pci_core_device {
>  	struct vfio_pci_core_device core_device;
> +	u8 bar0_virtual_buf_size;
> +	u8 *bar0_virtual_buf;
> +	/* synchronize access to the virtual buf */
> +	struct mutex bar_mutex;
>  	int vf_id;
> +	void __iomem *notify_addr;
> +	u32 notify_offset;
> +	u8 notify_bar;
> +	u8 pci_cmd_io :1;
>  };
>  
>  int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
> diff --git a/drivers/vfio/pci/virtio/main.c b/drivers/vfio/pci/virtio/main.c
> new file mode 100644
> index 000000000000..2486991c49f3
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/main.c
> @@ -0,0 +1,546 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> + */
> +
> +#include <linux/device.h>
> +#include <linux/module.h>
> +#include <linux/mutex.h>
> +#include <linux/pci.h>
> +#include <linux/pm_runtime.h>
> +#include <linux/types.h>
> +#include <linux/uaccess.h>
> +#include <linux/vfio.h>
> +#include <linux/vfio_pci_core.h>
> +#include <linux/virtio_pci.h>
> +#include <linux/virtio_net.h>
> +#include <linux/virtio_pci_modern.h>
> +
> +#include "cmd.h"
> +
> +#define VIRTIO_LEGACY_IO_BAR_HEADER_LEN 20
> +#define VIRTIO_LEGACY_IO_BAR_MSIX_HEADER_LEN 4
> +
> +static int virtiovf_issue_lr_cmd(struct virtiovf_pci_core_device *virtvdev,
> +				 loff_t pos, char __user *buf,
> +				 size_t count, bool read)
> +{
> +	u8 *bar0_buf = virtvdev->bar0_virtual_buf;
> +	u16 opcode;
> +	int ret;
> +
> +	mutex_lock(&virtvdev->bar_mutex);
> +	if (read) {
> +		opcode = (pos < VIRTIO_PCI_CONFIG_OFF(true)) ?

This "true" seems wrong. You need to know whether guest wants
msix enabled for the device or not.


> +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ :
> +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ;
> +		ret = virtiovf_cmd_lr_read(virtvdev, opcode, pos,
> +					   count, bar0_buf + pos);
> +		if (ret)
> +			goto out;
> +		if (copy_to_user(buf, bar0_buf + pos, count))
> +			ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	if (copy_from_user(bar0_buf + pos, buf, count)) {
> +		ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	opcode = (pos < VIRTIO_PCI_CONFIG_OFF(true)) ?
> +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE :
> +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE;


same

> +	ret = virtiovf_cmd_lr_write(virtvdev, opcode, pos, count,
> +				    bar0_buf + pos);
> +out:
> +	mutex_unlock(&virtvdev->bar_mutex);
> +	return ret;
> +}
> +
> +static int
> +translate_io_bar_to_mem_bar(struct virtiovf_pci_core_device *virtvdev,
> +			    loff_t pos, char __user *buf,
> +			    size_t count, bool read)
> +{
> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
> +	u16 queue_notify;
> +	int ret;
> +
> +	if (pos + count > virtvdev->bar0_virtual_buf_size)
> +		return -EINVAL;
> +
> +	switch (pos) {
> +	case VIRTIO_PCI_QUEUE_NOTIFY:
> +		if (count != sizeof(queue_notify))
> +			return -EINVAL;
> +		if (read) {
> +			ret = vfio_pci_ioread16(core_device, true, &queue_notify,
> +						virtvdev->notify_addr);
> +			if (ret)
> +				return ret;
> +			if (copy_to_user(buf, &queue_notify,
> +					 sizeof(queue_notify)))
> +				return -EFAULT;
> +			break;
> +		}
> +
> +		if (copy_from_user(&queue_notify, buf, count))
> +			return -EFAULT;
> +
> +		ret = vfio_pci_iowrite16(core_device, true, queue_notify,
> +					 virtvdev->notify_addr);
> +		break;
> +	default:
> +		ret = virtiovf_issue_lr_cmd(virtvdev, pos, buf, count, read);
> +	}
> +
> +	return ret ? ret : count;
> +}
> +
> +static bool range_contains_range(loff_t range1_start, size_t count1,
> +				 loff_t range2_start, size_t count2,
> +				 loff_t *start_offset)
> +{
> +	if (range1_start <= range2_start &&
> +	    range1_start + count1 >= range2_start + count2) {
> +		*start_offset = range2_start - range1_start;
> +		return true;
> +	}
> +	return false;
> +}
> +
> +static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
> +					char __user *buf, size_t count,
> +					loff_t *ppos)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +	loff_t copy_offset;
> +	__le32 val32;
> +	__le16 val16;
> +	u8 val8;
> +	int ret;
> +
> +	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
> +	if (ret < 0)
> +		return ret;
> +
> +	if (range_contains_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
> +				 &copy_offset)) {
> +		val16 = cpu_to_le16(0x1000);
> +		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
> +			return -EFAULT;
> +	}
> +
> +	if (virtvdev->pci_cmd_io &&
> +	    range_contains_range(pos, count, PCI_COMMAND, sizeof(val16),
> +				 &copy_offset)) {
> +		if (copy_from_user(&val16, buf, sizeof(val16)))
> +			return -EFAULT;
> +		val16 |= cpu_to_le16(PCI_COMMAND_IO);
> +		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
> +			return -EFAULT;
> +	}
> +
> +	if (range_contains_range(pos, count, PCI_REVISION_ID, sizeof(val8),
> +				 &copy_offset)) {
> +		/* Transional needs to have revision 0 */
> +		val8 = 0;
> +		if (copy_to_user(buf + copy_offset, &val8, sizeof(val8)))
> +			return -EFAULT;
> +	}
> +
> +	if (range_contains_range(pos, count, PCI_BASE_ADDRESS_0, sizeof(val32),
> +				 &copy_offset)) {
> +		val32 = cpu_to_le32(PCI_BASE_ADDRESS_SPACE_IO);
> +		if (copy_to_user(buf + copy_offset, &val32, sizeof(val32)))
> +			return -EFAULT;
> +	}
> +
> +	if (range_contains_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16),
> +				 &copy_offset)) {
> +		/* Transitional devices use the PCI subsystem device id as
> +		 * virtio device id, same as legacy driver always did.
> +		 */
> +		val16 = cpu_to_le16(VIRTIO_ID_NET);
> +		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
> +			return -EFAULT;
> +	}
> +
> +	return count;
> +}
> +
> +static ssize_t
> +virtiovf_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
> +		       size_t count, loff_t *ppos)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +	int ret;
> +
> +	if (!count)
> +		return 0;
> +
> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX)
> +		return virtiovf_pci_read_config(core_vdev, buf, count, ppos);
> +
> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
> +		return vfio_pci_core_read(core_vdev, buf, count, ppos);
> +
> +	ret = pm_runtime_resume_and_get(&pdev->dev);
> +	if (ret) {
> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n",
> +				     ret);
> +		return -EIO;
> +	}
> +
> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, buf, count, true);
> +	pm_runtime_put(&pdev->dev);
> +	return ret;
> +}
> +
> +static ssize_t
> +virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
> +			size_t count, loff_t *ppos)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +	int ret;
> +
> +	if (!count)
> +		return 0;
> +
> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
> +		loff_t copy_offset;
> +		u16 cmd;
> +
> +		if (range_contains_range(pos, count, PCI_COMMAND, sizeof(cmd),
> +					 &copy_offset)) {
> +			if (copy_from_user(&cmd, buf + copy_offset, sizeof(cmd)))
> +				return -EFAULT;
> +			virtvdev->pci_cmd_io = (cmd & PCI_COMMAND_IO);
> +		}
> +	}
> +
> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
> +		return vfio_pci_core_write(core_vdev, buf, count, ppos);
> +
> +	ret = pm_runtime_resume_and_get(&pdev->dev);
> +	if (ret) {
> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n", ret);
> +		return -EIO;
> +	}
> +
> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, (char __user *)buf, count, false);
> +	pm_runtime_put(&pdev->dev);
> +	return ret;
> +}
> +
> +static int
> +virtiovf_pci_ioctl_get_region_info(struct vfio_device *core_vdev,
> +				   unsigned int cmd, unsigned long arg)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	unsigned long minsz = offsetofend(struct vfio_region_info, offset);
> +	void __user *uarg = (void __user *)arg;
> +	struct vfio_region_info info = {};
> +
> +	if (copy_from_user(&info, uarg, minsz))
> +		return -EFAULT;
> +
> +	if (info.argsz < minsz)
> +		return -EINVAL;
> +
> +	switch (info.index) {
> +	case VFIO_PCI_BAR0_REGION_INDEX:
> +		info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
> +		info.size = virtvdev->bar0_virtual_buf_size;
> +		info.flags = VFIO_REGION_INFO_FLAG_READ |
> +			     VFIO_REGION_INFO_FLAG_WRITE;
> +		return copy_to_user(uarg, &info, minsz) ? -EFAULT : 0;
> +	default:
> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
> +	}
> +}
> +
> +static long
> +virtiovf_vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
> +			     unsigned long arg)
> +{
> +	switch (cmd) {
> +	case VFIO_DEVICE_GET_REGION_INFO:
> +		return virtiovf_pci_ioctl_get_region_info(core_vdev, cmd, arg);
> +	default:
> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
> +	}
> +}
> +
> +static int
> +virtiovf_set_notify_addr(struct virtiovf_pci_core_device *virtvdev)
> +{
> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
> +	int ret;
> +
> +	/* Setup the BAR where the 'notify' exists to be used by vfio as well
> +	 * This will let us mmap it only once and use it when needed.
> +	 */
> +	ret = vfio_pci_core_setup_barmap(core_device,
> +					 virtvdev->notify_bar);
> +	if (ret)
> +		return ret;
> +
> +	virtvdev->notify_addr = core_device->barmap[virtvdev->notify_bar] +
> +			virtvdev->notify_offset;
> +	return 0;
> +}
> +
> +static int virtiovf_pci_open_device(struct vfio_device *core_vdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct vfio_pci_core_device *vdev = &virtvdev->core_device;
> +	int ret;
> +
> +	ret = vfio_pci_core_enable(vdev);
> +	if (ret)
> +		return ret;
> +
> +	if (virtvdev->bar0_virtual_buf) {
> +		/* upon close_device() the vfio_pci_core_disable() is called
> +		 * and will close all the previous mmaps, so it seems that the
> +		 * valid life cycle for the 'notify' addr is per open/close.
> +		 */
> +		ret = virtiovf_set_notify_addr(virtvdev);
> +		if (ret) {
> +			vfio_pci_core_disable(vdev);
> +			return ret;
> +		}
> +	}
> +
> +	vfio_pci_core_finish_enable(vdev);
> +	return 0;
> +}
> +
> +static void virtiovf_pci_close_device(struct vfio_device *core_vdev)
> +{
> +	vfio_pci_core_close_device(core_vdev);
> +}
> +
> +static int virtiovf_get_device_config_size(unsigned short device)
> +{
> +	switch (device) {
> +	case 0x1041:
> +		/* network card */
> +		return offsetofend(struct virtio_net_config, status);
> +	default:
> +		return 0;
> +	}
> +}
> +
> +static int virtiovf_read_notify_info(struct virtiovf_pci_core_device *virtvdev)
> +{
> +	u64 offset;
> +	int ret;
> +	u8 bar;
> +
> +	ret = virtiovf_cmd_lq_read_notify(virtvdev,
> +				VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM,
> +				&bar, &offset);
> +	if (ret)
> +		return ret;
> +
> +	virtvdev->notify_bar = bar;
> +	virtvdev->notify_offset = offset;
> +	return 0;
> +}
> +
> +static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct pci_dev *pdev;
> +	int ret;
> +
> +	ret = vfio_pci_core_init_dev(core_vdev);
> +	if (ret)
> +		return ret;
> +
> +	pdev = virtvdev->core_device.pdev;
> +	virtvdev->vf_id = pci_iov_vf_id(pdev);
> +	if (virtvdev->vf_id < 0)
> +		return -EINVAL;
> +
> +	ret = virtiovf_read_notify_info(virtvdev);
> +	if (ret)
> +		return ret;
> +
> +	virtvdev->bar0_virtual_buf_size = VIRTIO_LEGACY_IO_BAR_HEADER_LEN +
> +		VIRTIO_LEGACY_IO_BAR_MSIX_HEADER_LEN +
> +		virtiovf_get_device_config_size(pdev->device);
> +	virtvdev->bar0_virtual_buf = kzalloc(virtvdev->bar0_virtual_buf_size,
> +					     GFP_KERNEL);
> +	if (!virtvdev->bar0_virtual_buf)
> +		return -ENOMEM;
> +	mutex_init(&virtvdev->bar_mutex);
> +	return 0;
> +}


There is very little vfio specific above. I feel with a bit of
refactoring the logic parts can be moved to virtio, with just
vfio things under vfio (mostly, vfio lets user disable memory or set low PM state
so it has to be careful not to access device in such cases).
E.g.:


	virtio_legacy_translate_offset(....)
	copy_from_user(...)
	vfio_pci_iowrite16(...)

and virtio_legacy_translate_offset would live under virtio.

Something similar for config space hacks.


> +
> +static void virtiovf_pci_core_release_dev(struct vfio_device *core_vdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +
> +	kfree(virtvdev->bar0_virtual_buf);
> +	vfio_pci_core_release_dev(core_vdev);
> +}
> +
> +static const struct vfio_device_ops virtiovf_acc_vfio_pci_tran_ops = {
> +	.name = "virtio-transitional-vfio-pci",
> +	.init = virtiovf_pci_init_device,
> +	.release = virtiovf_pci_core_release_dev,
> +	.open_device = virtiovf_pci_open_device,
> +	.close_device = virtiovf_pci_close_device,
> +	.ioctl = virtiovf_vfio_pci_core_ioctl,
> +	.read = virtiovf_pci_core_read,
> +	.write = virtiovf_pci_core_write,
> +	.mmap = vfio_pci_core_mmap,
> +	.request = vfio_pci_core_request,
> +	.match = vfio_pci_core_match,
> +	.bind_iommufd = vfio_iommufd_physical_bind,
> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
> +};
> +
> +static const struct vfio_device_ops virtiovf_acc_vfio_pci_ops = {
> +	.name = "virtio-acc-vfio-pci",
> +	.init = vfio_pci_core_init_dev,
> +	.release = vfio_pci_core_release_dev,
> +	.open_device = virtiovf_pci_open_device,
> +	.close_device = virtiovf_pci_close_device,
> +	.ioctl = vfio_pci_core_ioctl,
> +	.device_feature = vfio_pci_core_ioctl_feature,
> +	.read = vfio_pci_core_read,
> +	.write = vfio_pci_core_write,
> +	.mmap = vfio_pci_core_mmap,
> +	.request = vfio_pci_core_request,
> +	.match = vfio_pci_core_match,
> +	.bind_iommufd = vfio_iommufd_physical_bind,
> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
> +};
> +
> +static bool virtiovf_bar0_exists(struct pci_dev *pdev)
> +{
> +	struct resource *res = pdev->resource;
> +
> +	return res->flags ? true : false;
> +}
> +
> +#define VIRTIOVF_USE_ADMIN_CMD_BITMAP \
> +	(BIT_ULL(VIRTIO_ADMIN_CMD_LIST_QUERY) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LIST_USE) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
> +
> +static bool virtiovf_support_legacy_access(struct pci_dev *pdev)
> +{
> +	int buf_size = DIV_ROUND_UP(VIRTIO_ADMIN_MAX_CMD_OPCODE, 64) * 8;
> +	u8 *buf;
> +	int ret;
> +
> +	/* Only virtio-net is supported/tested so far */
> +	if (pdev->device != 0x1041)
> +		return false;
> +
> +	buf = kzalloc(buf_size, GFP_KERNEL);
> +	if (!buf)
> +		return false;
> +
> +	ret = virtiovf_cmd_list_query(pdev, buf, buf_size);
> +	if (ret)
> +		goto end;
> +
> +	if ((le64_to_cpup((__le64 *)buf) & VIRTIOVF_USE_ADMIN_CMD_BITMAP) !=
> +		VIRTIOVF_USE_ADMIN_CMD_BITMAP) {
> +		ret = -EOPNOTSUPP;
> +		goto end;
> +	}
> +
> +	/* confirm the used commands */
> +	memset(buf, 0, buf_size);
> +	*(__le64 *)buf = cpu_to_le64(VIRTIOVF_USE_ADMIN_CMD_BITMAP);
> +	ret = virtiovf_cmd_list_use(pdev, buf, buf_size);
> +
> +end:
> +	kfree(buf);
> +	return ret ? false : true;
> +}


This is virtio stuff too.

> +
> +static int virtiovf_pci_probe(struct pci_dev *pdev,
> +			      const struct pci_device_id *id)
> +{
> +	const struct vfio_device_ops *ops = &virtiovf_acc_vfio_pci_ops;
> +	struct virtiovf_pci_core_device *virtvdev;
> +	int ret;
> +
> +	if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
> +	    !virtiovf_bar0_exists(pdev) && pdev->msix_cap)
> +		ops = &virtiovf_acc_vfio_pci_tran_ops;
> +
> +	virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
> +				     &pdev->dev, ops);
> +	if (IS_ERR(virtvdev))
> +		return PTR_ERR(virtvdev);
> +
> +	dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
> +	ret = vfio_pci_core_register_device(&virtvdev->core_device);
> +	if (ret)
> +		goto out;
> +	return 0;
> +out:
> +	vfio_put_device(&virtvdev->core_device.vdev);
> +	return ret;
> +}
> +
> +static void virtiovf_pci_remove(struct pci_dev *pdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
> +
> +	vfio_pci_core_unregister_device(&virtvdev->core_device);
> +	vfio_put_device(&virtvdev->core_device.vdev);
> +}
> +
> +static const struct pci_device_id virtiovf_pci_table[] = {
> +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, PCI_ANY_ID) },
> +	{}
> +};
> +
> +MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
> +
> +static struct pci_driver virtiovf_pci_driver = {
> +	.name = KBUILD_MODNAME,
> +	.id_table = virtiovf_pci_table,
> +	.probe = virtiovf_pci_probe,
> +	.remove = virtiovf_pci_remove,
> +	.err_handler = &vfio_pci_core_err_handlers,
> +	.driver_managed_dma = true,
> +};
> +
> +module_pci_driver(virtiovf_pci_driver);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
> +MODULE_DESCRIPTION(
> +	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO device family");
> -- 
> 2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-22 10:10     ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-22 10:10 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: alex.williamson, jasowang, jgg, kvm, virtualization, parav,
	feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, Sep 21, 2023 at 03:40:40PM +0300, Yishai Hadas wrote:
> Introduce a vfio driver over virtio devices to support the legacy
> interface functionality for VFs.
> 
> Background, from the virtio spec [1].
> --------------------------------------------------------------------
> In some systems, there is a need to support a virtio legacy driver with
> a device that does not directly support the legacy interface. In such
> scenarios, a group owner device can provide the legacy interface
> functionality for the group member devices. The driver of the owner
> device can then access the legacy interface of a member device on behalf
> of the legacy member device driver.
> 
> For example, with the SR-IOV group type, group members (VFs) can not
> present the legacy interface in an I/O BAR in BAR0 as expected by the
> legacy pci driver. If the legacy driver is running inside a virtual
> machine, the hypervisor executing the virtual machine can present a
> virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
> legacy driver accesses to this I/O BAR and forwards them to the group
> owner device (PF) using group administration commands.
> --------------------------------------------------------------------
> 
> Specifically, this driver adds support for a virtio-net VF to be exposed
> as a transitional device to a guest driver and allows the legacy IO BAR
> functionality on top.
> 
> This allows a VM which uses a legacy virtio-net driver in the guest to
> work transparently over a VF which its driver in the host is that new
> driver.
> 
> The driver can be extended easily to support some other types of virtio
> devices (e.g virtio-blk), by adding in a few places the specific type
> properties as was done for virtio-net.
> 
> For now, only the virtio-net use case was tested and as such we introduce
> the support only for such a device.
> 
> Practically,
> Upon probing a VF for a virtio-net device, in case its PF supports
> legacy access over the virtio admin commands and the VF doesn't have BAR
> 0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
> transitional device with I/O BAR in BAR 0.
> 
> The existence of the simulated I/O bar is reported later on by
> overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
> exposes itself as a transitional device by overwriting some properties
> upon reading its config space.
> 
> Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
> guest may use it via read/write calls according to the virtio
> specification.
> 
> Any read/write towards the control parts of the BAR will be captured by
> the new driver and will be translated into admin commands towards the
> device.
> 
> Any data path read/write access (i.e. virtio driver notifications) will
> be forwarded to the physical BAR which its properties were supplied by
> the command VIRTIO_PCI_QUEUE_NOTIFY upon the probing/init flow.
> 
> With that code in place a legacy driver in the guest has the look and
> feel as if having a transitional device with legacy support for both its
> control and data path flows.
> 
> [1]
> https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
> 
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> ---
>  MAINTAINERS                      |   6 +
>  drivers/vfio/pci/Kconfig         |   2 +
>  drivers/vfio/pci/Makefile        |   2 +
>  drivers/vfio/pci/virtio/Kconfig  |  15 +
>  drivers/vfio/pci/virtio/Makefile |   4 +
>  drivers/vfio/pci/virtio/cmd.c    |   4 +-
>  drivers/vfio/pci/virtio/cmd.h    |   8 +
>  drivers/vfio/pci/virtio/main.c   | 546 +++++++++++++++++++++++++++++++
>  8 files changed, 585 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/vfio/pci/virtio/Kconfig
>  create mode 100644 drivers/vfio/pci/virtio/Makefile
>  create mode 100644 drivers/vfio/pci/virtio/main.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index bf0f54c24f81..5098418c8389 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -22624,6 +22624,12 @@ L:	kvm@vger.kernel.org
>  S:	Maintained
>  F:	drivers/vfio/pci/mlx5/
>  
> +VFIO VIRTIO PCI DRIVER
> +M:	Yishai Hadas <yishaih@nvidia.com>
> +L:	kvm@vger.kernel.org
> +S:	Maintained
> +F:	drivers/vfio/pci/virtio
> +
>  VFIO PCI DEVICE SPECIFIC DRIVERS
>  R:	Jason Gunthorpe <jgg@nvidia.com>
>  R:	Yishai Hadas <yishaih@nvidia.com>
> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> index 8125e5f37832..18c397df566d 100644
> --- a/drivers/vfio/pci/Kconfig
> +++ b/drivers/vfio/pci/Kconfig
> @@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
>  
>  source "drivers/vfio/pci/pds/Kconfig"
>  
> +source "drivers/vfio/pci/virtio/Kconfig"
> +
>  endmenu
> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
> index 45167be462d8..046139a4eca5 100644
> --- a/drivers/vfio/pci/Makefile
> +++ b/drivers/vfio/pci/Makefile
> @@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
>  obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
>  
>  obj-$(CONFIG_PDS_VFIO_PCI) += pds/
> +
> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
> diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
> new file mode 100644
> index 000000000000..89eddce8b1bd
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/Kconfig
> @@ -0,0 +1,15 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +config VIRTIO_VFIO_PCI
> +        tristate "VFIO support for VIRTIO PCI devices"
> +        depends on VIRTIO_PCI
> +        select VFIO_PCI_CORE
> +        help
> +          This provides support for exposing VIRTIO VF devices using the VFIO
> +          framework that can work with a legacy virtio driver in the guest.
> +          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
> +          not indicate I/O Space.
> +          As of that this driver emulated I/O BAR in software to let a VF be
> +          seen as a transitional device in the guest and let it work with
> +          a legacy driver.
> +
> +          If you don't know what to do here, say N.
> diff --git a/drivers/vfio/pci/virtio/Makefile b/drivers/vfio/pci/virtio/Makefile
> new file mode 100644
> index 000000000000..584372648a03
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/Makefile
> @@ -0,0 +1,4 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio-vfio-pci.o
> +virtio-vfio-pci-y := main.o cmd.o
> +
> diff --git a/drivers/vfio/pci/virtio/cmd.c b/drivers/vfio/pci/virtio/cmd.c
> index f068239cdbb0..aea9d25fbf1d 100644
> --- a/drivers/vfio/pci/virtio/cmd.c
> +++ b/drivers/vfio/pci/virtio/cmd.c
> @@ -44,7 +44,7 @@ int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
>  {
>  	struct virtio_device *virtio_dev =
>  		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> -	struct virtio_admin_cmd_data_lr_write *in;
> +	struct virtio_admin_cmd_legacy_wr_data *in;
>  	struct scatterlist in_sg;
>  	struct virtio_admin_cmd cmd = {};
>  	int ret;
> @@ -74,7 +74,7 @@ int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
>  {
>  	struct virtio_device *virtio_dev =
>  		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> -	struct virtio_admin_cmd_data_lr_read *in;
> +	struct virtio_admin_cmd_legacy_rd_data *in;
>  	struct scatterlist in_sg, out_sg;
>  	struct virtio_admin_cmd cmd = {};
>  	int ret;
> diff --git a/drivers/vfio/pci/virtio/cmd.h b/drivers/vfio/pci/virtio/cmd.h
> index c2a3645f4b90..347b1dc85570 100644
> --- a/drivers/vfio/pci/virtio/cmd.h
> +++ b/drivers/vfio/pci/virtio/cmd.h
> @@ -13,7 +13,15 @@
>  
>  struct virtiovf_pci_core_device {
>  	struct vfio_pci_core_device core_device;
> +	u8 bar0_virtual_buf_size;
> +	u8 *bar0_virtual_buf;
> +	/* synchronize access to the virtual buf */
> +	struct mutex bar_mutex;
>  	int vf_id;
> +	void __iomem *notify_addr;
> +	u32 notify_offset;
> +	u8 notify_bar;
> +	u8 pci_cmd_io :1;
>  };
>  
>  int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
> diff --git a/drivers/vfio/pci/virtio/main.c b/drivers/vfio/pci/virtio/main.c
> new file mode 100644
> index 000000000000..2486991c49f3
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/main.c
> @@ -0,0 +1,546 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> + */
> +
> +#include <linux/device.h>
> +#include <linux/module.h>
> +#include <linux/mutex.h>
> +#include <linux/pci.h>
> +#include <linux/pm_runtime.h>
> +#include <linux/types.h>
> +#include <linux/uaccess.h>
> +#include <linux/vfio.h>
> +#include <linux/vfio_pci_core.h>
> +#include <linux/virtio_pci.h>
> +#include <linux/virtio_net.h>
> +#include <linux/virtio_pci_modern.h>
> +
> +#include "cmd.h"
> +
> +#define VIRTIO_LEGACY_IO_BAR_HEADER_LEN 20
> +#define VIRTIO_LEGACY_IO_BAR_MSIX_HEADER_LEN 4
> +
> +static int virtiovf_issue_lr_cmd(struct virtiovf_pci_core_device *virtvdev,
> +				 loff_t pos, char __user *buf,
> +				 size_t count, bool read)
> +{
> +	u8 *bar0_buf = virtvdev->bar0_virtual_buf;
> +	u16 opcode;
> +	int ret;
> +
> +	mutex_lock(&virtvdev->bar_mutex);
> +	if (read) {
> +		opcode = (pos < VIRTIO_PCI_CONFIG_OFF(true)) ?

This "true" seems wrong. You need to know whether guest wants
msix enabled for the device or not.


> +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ :
> +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ;
> +		ret = virtiovf_cmd_lr_read(virtvdev, opcode, pos,
> +					   count, bar0_buf + pos);
> +		if (ret)
> +			goto out;
> +		if (copy_to_user(buf, bar0_buf + pos, count))
> +			ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	if (copy_from_user(bar0_buf + pos, buf, count)) {
> +		ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	opcode = (pos < VIRTIO_PCI_CONFIG_OFF(true)) ?
> +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE :
> +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE;


same

> +	ret = virtiovf_cmd_lr_write(virtvdev, opcode, pos, count,
> +				    bar0_buf + pos);
> +out:
> +	mutex_unlock(&virtvdev->bar_mutex);
> +	return ret;
> +}
> +
> +static int
> +translate_io_bar_to_mem_bar(struct virtiovf_pci_core_device *virtvdev,
> +			    loff_t pos, char __user *buf,
> +			    size_t count, bool read)
> +{
> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
> +	u16 queue_notify;
> +	int ret;
> +
> +	if (pos + count > virtvdev->bar0_virtual_buf_size)
> +		return -EINVAL;
> +
> +	switch (pos) {
> +	case VIRTIO_PCI_QUEUE_NOTIFY:
> +		if (count != sizeof(queue_notify))
> +			return -EINVAL;
> +		if (read) {
> +			ret = vfio_pci_ioread16(core_device, true, &queue_notify,
> +						virtvdev->notify_addr);
> +			if (ret)
> +				return ret;
> +			if (copy_to_user(buf, &queue_notify,
> +					 sizeof(queue_notify)))
> +				return -EFAULT;
> +			break;
> +		}
> +
> +		if (copy_from_user(&queue_notify, buf, count))
> +			return -EFAULT;
> +
> +		ret = vfio_pci_iowrite16(core_device, true, queue_notify,
> +					 virtvdev->notify_addr);
> +		break;
> +	default:
> +		ret = virtiovf_issue_lr_cmd(virtvdev, pos, buf, count, read);
> +	}
> +
> +	return ret ? ret : count;
> +}
> +
> +static bool range_contains_range(loff_t range1_start, size_t count1,
> +				 loff_t range2_start, size_t count2,
> +				 loff_t *start_offset)
> +{
> +	if (range1_start <= range2_start &&
> +	    range1_start + count1 >= range2_start + count2) {
> +		*start_offset = range2_start - range1_start;
> +		return true;
> +	}
> +	return false;
> +}
> +
> +static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
> +					char __user *buf, size_t count,
> +					loff_t *ppos)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +	loff_t copy_offset;
> +	__le32 val32;
> +	__le16 val16;
> +	u8 val8;
> +	int ret;
> +
> +	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
> +	if (ret < 0)
> +		return ret;
> +
> +	if (range_contains_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
> +				 &copy_offset)) {
> +		val16 = cpu_to_le16(0x1000);
> +		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
> +			return -EFAULT;
> +	}
> +
> +	if (virtvdev->pci_cmd_io &&
> +	    range_contains_range(pos, count, PCI_COMMAND, sizeof(val16),
> +				 &copy_offset)) {
> +		if (copy_from_user(&val16, buf, sizeof(val16)))
> +			return -EFAULT;
> +		val16 |= cpu_to_le16(PCI_COMMAND_IO);
> +		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
> +			return -EFAULT;
> +	}
> +
> +	if (range_contains_range(pos, count, PCI_REVISION_ID, sizeof(val8),
> +				 &copy_offset)) {
> +		/* Transional needs to have revision 0 */
> +		val8 = 0;
> +		if (copy_to_user(buf + copy_offset, &val8, sizeof(val8)))
> +			return -EFAULT;
> +	}
> +
> +	if (range_contains_range(pos, count, PCI_BASE_ADDRESS_0, sizeof(val32),
> +				 &copy_offset)) {
> +		val32 = cpu_to_le32(PCI_BASE_ADDRESS_SPACE_IO);
> +		if (copy_to_user(buf + copy_offset, &val32, sizeof(val32)))
> +			return -EFAULT;
> +	}
> +
> +	if (range_contains_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16),
> +				 &copy_offset)) {
> +		/* Transitional devices use the PCI subsystem device id as
> +		 * virtio device id, same as legacy driver always did.
> +		 */
> +		val16 = cpu_to_le16(VIRTIO_ID_NET);
> +		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
> +			return -EFAULT;
> +	}
> +
> +	return count;
> +}
> +
> +static ssize_t
> +virtiovf_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
> +		       size_t count, loff_t *ppos)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +	int ret;
> +
> +	if (!count)
> +		return 0;
> +
> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX)
> +		return virtiovf_pci_read_config(core_vdev, buf, count, ppos);
> +
> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
> +		return vfio_pci_core_read(core_vdev, buf, count, ppos);
> +
> +	ret = pm_runtime_resume_and_get(&pdev->dev);
> +	if (ret) {
> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n",
> +				     ret);
> +		return -EIO;
> +	}
> +
> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, buf, count, true);
> +	pm_runtime_put(&pdev->dev);
> +	return ret;
> +}
> +
> +static ssize_t
> +virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
> +			size_t count, loff_t *ppos)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +	int ret;
> +
> +	if (!count)
> +		return 0;
> +
> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
> +		loff_t copy_offset;
> +		u16 cmd;
> +
> +		if (range_contains_range(pos, count, PCI_COMMAND, sizeof(cmd),
> +					 &copy_offset)) {
> +			if (copy_from_user(&cmd, buf + copy_offset, sizeof(cmd)))
> +				return -EFAULT;
> +			virtvdev->pci_cmd_io = (cmd & PCI_COMMAND_IO);
> +		}
> +	}
> +
> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
> +		return vfio_pci_core_write(core_vdev, buf, count, ppos);
> +
> +	ret = pm_runtime_resume_and_get(&pdev->dev);
> +	if (ret) {
> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n", ret);
> +		return -EIO;
> +	}
> +
> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, (char __user *)buf, count, false);
> +	pm_runtime_put(&pdev->dev);
> +	return ret;
> +}
> +
> +static int
> +virtiovf_pci_ioctl_get_region_info(struct vfio_device *core_vdev,
> +				   unsigned int cmd, unsigned long arg)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	unsigned long minsz = offsetofend(struct vfio_region_info, offset);
> +	void __user *uarg = (void __user *)arg;
> +	struct vfio_region_info info = {};
> +
> +	if (copy_from_user(&info, uarg, minsz))
> +		return -EFAULT;
> +
> +	if (info.argsz < minsz)
> +		return -EINVAL;
> +
> +	switch (info.index) {
> +	case VFIO_PCI_BAR0_REGION_INDEX:
> +		info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
> +		info.size = virtvdev->bar0_virtual_buf_size;
> +		info.flags = VFIO_REGION_INFO_FLAG_READ |
> +			     VFIO_REGION_INFO_FLAG_WRITE;
> +		return copy_to_user(uarg, &info, minsz) ? -EFAULT : 0;
> +	default:
> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
> +	}
> +}
> +
> +static long
> +virtiovf_vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
> +			     unsigned long arg)
> +{
> +	switch (cmd) {
> +	case VFIO_DEVICE_GET_REGION_INFO:
> +		return virtiovf_pci_ioctl_get_region_info(core_vdev, cmd, arg);
> +	default:
> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
> +	}
> +}
> +
> +static int
> +virtiovf_set_notify_addr(struct virtiovf_pci_core_device *virtvdev)
> +{
> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
> +	int ret;
> +
> +	/* Setup the BAR where the 'notify' exists to be used by vfio as well
> +	 * This will let us mmap it only once and use it when needed.
> +	 */
> +	ret = vfio_pci_core_setup_barmap(core_device,
> +					 virtvdev->notify_bar);
> +	if (ret)
> +		return ret;
> +
> +	virtvdev->notify_addr = core_device->barmap[virtvdev->notify_bar] +
> +			virtvdev->notify_offset;
> +	return 0;
> +}
> +
> +static int virtiovf_pci_open_device(struct vfio_device *core_vdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct vfio_pci_core_device *vdev = &virtvdev->core_device;
> +	int ret;
> +
> +	ret = vfio_pci_core_enable(vdev);
> +	if (ret)
> +		return ret;
> +
> +	if (virtvdev->bar0_virtual_buf) {
> +		/* upon close_device() the vfio_pci_core_disable() is called
> +		 * and will close all the previous mmaps, so it seems that the
> +		 * valid life cycle for the 'notify' addr is per open/close.
> +		 */
> +		ret = virtiovf_set_notify_addr(virtvdev);
> +		if (ret) {
> +			vfio_pci_core_disable(vdev);
> +			return ret;
> +		}
> +	}
> +
> +	vfio_pci_core_finish_enable(vdev);
> +	return 0;
> +}
> +
> +static void virtiovf_pci_close_device(struct vfio_device *core_vdev)
> +{
> +	vfio_pci_core_close_device(core_vdev);
> +}
> +
> +static int virtiovf_get_device_config_size(unsigned short device)
> +{
> +	switch (device) {
> +	case 0x1041:
> +		/* network card */
> +		return offsetofend(struct virtio_net_config, status);
> +	default:
> +		return 0;
> +	}
> +}
> +
> +static int virtiovf_read_notify_info(struct virtiovf_pci_core_device *virtvdev)
> +{
> +	u64 offset;
> +	int ret;
> +	u8 bar;
> +
> +	ret = virtiovf_cmd_lq_read_notify(virtvdev,
> +				VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM,
> +				&bar, &offset);
> +	if (ret)
> +		return ret;
> +
> +	virtvdev->notify_bar = bar;
> +	virtvdev->notify_offset = offset;
> +	return 0;
> +}
> +
> +static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct pci_dev *pdev;
> +	int ret;
> +
> +	ret = vfio_pci_core_init_dev(core_vdev);
> +	if (ret)
> +		return ret;
> +
> +	pdev = virtvdev->core_device.pdev;
> +	virtvdev->vf_id = pci_iov_vf_id(pdev);
> +	if (virtvdev->vf_id < 0)
> +		return -EINVAL;
> +
> +	ret = virtiovf_read_notify_info(virtvdev);
> +	if (ret)
> +		return ret;
> +
> +	virtvdev->bar0_virtual_buf_size = VIRTIO_LEGACY_IO_BAR_HEADER_LEN +
> +		VIRTIO_LEGACY_IO_BAR_MSIX_HEADER_LEN +
> +		virtiovf_get_device_config_size(pdev->device);
> +	virtvdev->bar0_virtual_buf = kzalloc(virtvdev->bar0_virtual_buf_size,
> +					     GFP_KERNEL);
> +	if (!virtvdev->bar0_virtual_buf)
> +		return -ENOMEM;
> +	mutex_init(&virtvdev->bar_mutex);
> +	return 0;
> +}


There is very little vfio specific above. I feel with a bit of
refactoring the logic parts can be moved to virtio, with just
vfio things under vfio (mostly, vfio lets user disable memory or set low PM state
so it has to be careful not to access device in such cases).
E.g.:


	virtio_legacy_translate_offset(....)
	copy_from_user(...)
	vfio_pci_iowrite16(...)

and virtio_legacy_translate_offset would live under virtio.

Something similar for config space hacks.


> +
> +static void virtiovf_pci_core_release_dev(struct vfio_device *core_vdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +
> +	kfree(virtvdev->bar0_virtual_buf);
> +	vfio_pci_core_release_dev(core_vdev);
> +}
> +
> +static const struct vfio_device_ops virtiovf_acc_vfio_pci_tran_ops = {
> +	.name = "virtio-transitional-vfio-pci",
> +	.init = virtiovf_pci_init_device,
> +	.release = virtiovf_pci_core_release_dev,
> +	.open_device = virtiovf_pci_open_device,
> +	.close_device = virtiovf_pci_close_device,
> +	.ioctl = virtiovf_vfio_pci_core_ioctl,
> +	.read = virtiovf_pci_core_read,
> +	.write = virtiovf_pci_core_write,
> +	.mmap = vfio_pci_core_mmap,
> +	.request = vfio_pci_core_request,
> +	.match = vfio_pci_core_match,
> +	.bind_iommufd = vfio_iommufd_physical_bind,
> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
> +};
> +
> +static const struct vfio_device_ops virtiovf_acc_vfio_pci_ops = {
> +	.name = "virtio-acc-vfio-pci",
> +	.init = vfio_pci_core_init_dev,
> +	.release = vfio_pci_core_release_dev,
> +	.open_device = virtiovf_pci_open_device,
> +	.close_device = virtiovf_pci_close_device,
> +	.ioctl = vfio_pci_core_ioctl,
> +	.device_feature = vfio_pci_core_ioctl_feature,
> +	.read = vfio_pci_core_read,
> +	.write = vfio_pci_core_write,
> +	.mmap = vfio_pci_core_mmap,
> +	.request = vfio_pci_core_request,
> +	.match = vfio_pci_core_match,
> +	.bind_iommufd = vfio_iommufd_physical_bind,
> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
> +};
> +
> +static bool virtiovf_bar0_exists(struct pci_dev *pdev)
> +{
> +	struct resource *res = pdev->resource;
> +
> +	return res->flags ? true : false;
> +}
> +
> +#define VIRTIOVF_USE_ADMIN_CMD_BITMAP \
> +	(BIT_ULL(VIRTIO_ADMIN_CMD_LIST_QUERY) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LIST_USE) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
> +
> +static bool virtiovf_support_legacy_access(struct pci_dev *pdev)
> +{
> +	int buf_size = DIV_ROUND_UP(VIRTIO_ADMIN_MAX_CMD_OPCODE, 64) * 8;
> +	u8 *buf;
> +	int ret;
> +
> +	/* Only virtio-net is supported/tested so far */
> +	if (pdev->device != 0x1041)
> +		return false;
> +
> +	buf = kzalloc(buf_size, GFP_KERNEL);
> +	if (!buf)
> +		return false;
> +
> +	ret = virtiovf_cmd_list_query(pdev, buf, buf_size);
> +	if (ret)
> +		goto end;
> +
> +	if ((le64_to_cpup((__le64 *)buf) & VIRTIOVF_USE_ADMIN_CMD_BITMAP) !=
> +		VIRTIOVF_USE_ADMIN_CMD_BITMAP) {
> +		ret = -EOPNOTSUPP;
> +		goto end;
> +	}
> +
> +	/* confirm the used commands */
> +	memset(buf, 0, buf_size);
> +	*(__le64 *)buf = cpu_to_le64(VIRTIOVF_USE_ADMIN_CMD_BITMAP);
> +	ret = virtiovf_cmd_list_use(pdev, buf, buf_size);
> +
> +end:
> +	kfree(buf);
> +	return ret ? false : true;
> +}


This is virtio stuff too.

> +
> +static int virtiovf_pci_probe(struct pci_dev *pdev,
> +			      const struct pci_device_id *id)
> +{
> +	const struct vfio_device_ops *ops = &virtiovf_acc_vfio_pci_ops;
> +	struct virtiovf_pci_core_device *virtvdev;
> +	int ret;
> +
> +	if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
> +	    !virtiovf_bar0_exists(pdev) && pdev->msix_cap)
> +		ops = &virtiovf_acc_vfio_pci_tran_ops;
> +
> +	virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
> +				     &pdev->dev, ops);
> +	if (IS_ERR(virtvdev))
> +		return PTR_ERR(virtvdev);
> +
> +	dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
> +	ret = vfio_pci_core_register_device(&virtvdev->core_device);
> +	if (ret)
> +		goto out;
> +	return 0;
> +out:
> +	vfio_put_device(&virtvdev->core_device.vdev);
> +	return ret;
> +}
> +
> +static void virtiovf_pci_remove(struct pci_dev *pdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
> +
> +	vfio_pci_core_unregister_device(&virtvdev->core_device);
> +	vfio_put_device(&virtvdev->core_device.vdev);
> +}
> +
> +static const struct pci_device_id virtiovf_pci_table[] = {
> +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, PCI_ANY_ID) },
> +	{}
> +};
> +
> +MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
> +
> +static struct pci_driver virtiovf_pci_driver = {
> +	.name = KBUILD_MODNAME,
> +	.id_table = virtiovf_pci_table,
> +	.probe = virtiovf_pci_probe,
> +	.remove = virtiovf_pci_remove,
> +	.err_handler = &vfio_pci_core_err_handlers,
> +	.driver_managed_dma = true,
> +};
> +
> +module_pci_driver(virtiovf_pci_driver);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
> +MODULE_DESCRIPTION(
> +	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO device family");
> -- 
> 2.27.0


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 22:55                     ` Jason Gunthorpe
@ 2023-09-22 11:23                         ` Michael S. Tsirkin
  2023-09-22 11:23                         ` Michael S. Tsirkin
  1 sibling, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-22 11:23 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kvm, maorg, virtualization, jiri, leonro

On Thu, Sep 21, 2023 at 07:55:26PM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 04:45:45PM -0400, Michael S. Tsirkin wrote:
> > On Thu, Sep 21, 2023 at 04:49:46PM -0300, Jason Gunthorpe wrote:
> > > On Thu, Sep 21, 2023 at 03:13:10PM -0400, Michael S. Tsirkin wrote:
> > > > On Thu, Sep 21, 2023 at 03:39:26PM -0300, Jason Gunthorpe wrote:
> > > > > On Thu, Sep 21, 2023 at 12:53:04PM -0400, Michael S. Tsirkin wrote:
> > > > > > > vdpa is not vfio, I don't know how you can suggest vdpa is a
> > > > > > > replacement for a vfio driver. They are completely different
> > > > > > > things.
> > > > > > > Each side has its own strengths, and vfio especially is accelerating
> > > > > > > in its capability in way that vpda is not. eg if an iommufd conversion
> > > > > > > had been done by now for vdpa I might be more sympathetic.
> > > > > > 
> > > > > > Yea, I agree iommufd is a big problem with vdpa right now. Cindy was
> > > > > > sick and I didn't know and kept assuming she's working on this. I don't
> > > > > > think it's a huge amount of work though.  I'll take a look.
> > > > > > Is there anything else though? Do tell.
> > > > > 
> > > > > Confidential compute will never work with VDPA's approach.
> > > > 
> > > > I don't see how what this patchset is doing is different
> > > > wrt to Confidential compute - you trap IO accesses and emulate.
> > > > Care to elaborate?
> > > 
> > > This patch series isn't about confidential compute, you asked about
> > > the future. VFIO will support confidential compute in the future, VDPA
> > > will not.
> > 
> > Nonsense it already works.
> 
> That isn't what I'm talking about. With a real PCI function and TDISP
> we can actually DMA directly from the guest's memory without needing
> the ugly bounce buffer hack. Then you can get decent performance.

Aha, TDISP.  But that one clearly does not need and can not use
this kind of hack?

> > But I did not ask about the future since I do not believe it
> > can be confidently predicted. I asked what is missing in VDPA
> > now for you to add this feature there and not in VFIO.
> 
> I don't see that VDPA needs this, VDPA should process the IO BAR on
> its own with its own logic, just like everything else it does.

First there's some logic here such as translating legacy IO
offsets to modern ones that could be reused.

But also, this is not just IO BAR, that indeed can be easily done in
software.  When a device operates in legacy mode there are subtle
differences with modern mode such as a different header size for the net
device.

> This is specifically about avoiding mediation by relaying directly the
> IO BAR operations to the device itself.
> 
> That is the entire irony, this whole scheme was designed and
> standardized *specifically* to avoid complex mediation and here you
> are saying we should just use mediation.
> 
> Jason

Not exactly. What I had in mind is just having the logic in
the vdpa module so users don't need to know what does the device
support and what it doesn't. If we can we bypass mediation
(to simplify the software stack) if we can not we do not.

Looking at it from user's POV, it is just super confusing that
card ABC would need to be used with VDPA to drive legacy while
card DEF needs to be used with VFIO. And both VFIO and VDPA
will happily bind, too. Oh man ...


-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-22 11:23                         ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-22 11:23 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yishai Hadas, alex.williamson, jasowang, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, Sep 21, 2023 at 07:55:26PM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 04:45:45PM -0400, Michael S. Tsirkin wrote:
> > On Thu, Sep 21, 2023 at 04:49:46PM -0300, Jason Gunthorpe wrote:
> > > On Thu, Sep 21, 2023 at 03:13:10PM -0400, Michael S. Tsirkin wrote:
> > > > On Thu, Sep 21, 2023 at 03:39:26PM -0300, Jason Gunthorpe wrote:
> > > > > On Thu, Sep 21, 2023 at 12:53:04PM -0400, Michael S. Tsirkin wrote:
> > > > > > > vdpa is not vfio, I don't know how you can suggest vdpa is a
> > > > > > > replacement for a vfio driver. They are completely different
> > > > > > > things.
> > > > > > > Each side has its own strengths, and vfio especially is accelerating
> > > > > > > in its capability in way that vpda is not. eg if an iommufd conversion
> > > > > > > had been done by now for vdpa I might be more sympathetic.
> > > > > > 
> > > > > > Yea, I agree iommufd is a big problem with vdpa right now. Cindy was
> > > > > > sick and I didn't know and kept assuming she's working on this. I don't
> > > > > > think it's a huge amount of work though.  I'll take a look.
> > > > > > Is there anything else though? Do tell.
> > > > > 
> > > > > Confidential compute will never work with VDPA's approach.
> > > > 
> > > > I don't see how what this patchset is doing is different
> > > > wrt to Confidential compute - you trap IO accesses and emulate.
> > > > Care to elaborate?
> > > 
> > > This patch series isn't about confidential compute, you asked about
> > > the future. VFIO will support confidential compute in the future, VDPA
> > > will not.
> > 
> > Nonsense it already works.
> 
> That isn't what I'm talking about. With a real PCI function and TDISP
> we can actually DMA directly from the guest's memory without needing
> the ugly bounce buffer hack. Then you can get decent performance.

Aha, TDISP.  But that one clearly does not need and can not use
this kind of hack?

> > But I did not ask about the future since I do not believe it
> > can be confidently predicted. I asked what is missing in VDPA
> > now for you to add this feature there and not in VFIO.
> 
> I don't see that VDPA needs this, VDPA should process the IO BAR on
> its own with its own logic, just like everything else it does.

First there's some logic here such as translating legacy IO
offsets to modern ones that could be reused.

But also, this is not just IO BAR, that indeed can be easily done in
software.  When a device operates in legacy mode there are subtle
differences with modern mode such as a different header size for the net
device.

> This is specifically about avoiding mediation by relaying directly the
> IO BAR operations to the device itself.
> 
> That is the entire irony, this whole scheme was designed and
> standardized *specifically* to avoid complex mediation and here you
> are saying we should just use mediation.
> 
> Jason

Not exactly. What I had in mind is just having the logic in
the vdpa module so users don't need to know what does the device
support and what it doesn't. If we can we bypass mediation
(to simplify the software stack) if we can not we do not.

Looking at it from user's POV, it is just super confusing that
card ABC would need to be used with VDPA to drive legacy while
card DEF needs to be used with VFIO. And both VFIO and VDPA
will happily bind, too. Oh man ...


-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-22  3:01                     ` Jason Wang
  (?)
@ 2023-09-22 12:11                     ` Jason Gunthorpe
  2023-09-25  2:34                         ` Jason Wang
  -1 siblings, 1 reply; 321+ messages in thread
From: Jason Gunthorpe @ 2023-09-22 12:11 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Yishai Hadas, alex.williamson, kvm,
	virtualization, parav, feliu, jiri, kevin.tian, joao.m.martins,
	leonro, maorg

On Fri, Sep 22, 2023 at 11:01:23AM +0800, Jason Wang wrote:

> > Even when it does, there is no real use case to live migrate a
> > virtio-net function from, say, AWS to GCP.
> 
> It can happen inside a single cloud vendor. For some reasons, DPU must
> be purchased from different vendors. And vDPA has been used in that
> case.

Nope, you misunderstand the DPU scenario.

Look at something like vmware DPU enablement. vmware runs the software
side of the DPU and all their supported DPU HW, from every vendor,
generates the same PCI functions on the x86. They are the same because
the same software on the DPU side is creating them.

There is no reason to put a mediation layer in the x86 if you also
control the DPU.

Cloud vendors will similarly use DPUs to create a PCI functions that
meet the cloud vendor's internal specification. Regardless of DPU
vendor.

Fundamentally if you control the DPU SW and the hypervisor software
you do not need hypervisor meditation because everything you could do
in hypervisor mediation can just be done in the DPU. Putting it in the
DPU is better in every regard.

So, as I keep saying, in this scenario the goal is no mediation in the
hypervisor. It is pointless, everything you think you need to do there
is actually already being done in the DPU.

Once you commit to this configuration you are committed to VFIO in the
hypervisor. eg your DPU is likely also making NVMe and other PCI
functions too.

> The problem is the mediation (or what you called relaying) layer
> you've invented.

It is not mediation, it is implementing the OASIS spec for VFIO
support of IO BAR.

Jason

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-22 11:23                         ` Michael S. Tsirkin
  (?)
@ 2023-09-22 12:15                         ` Jason Gunthorpe
  -1 siblings, 0 replies; 321+ messages in thread
From: Jason Gunthorpe @ 2023-09-22 12:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Yishai Hadas, alex.williamson, jasowang, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Fri, Sep 22, 2023 at 07:23:28AM -0400, Michael S. Tsirkin wrote:
> On Thu, Sep 21, 2023 at 07:55:26PM -0300, Jason Gunthorpe wrote:

> Looking at it from user's POV, it is just super confusing that
> card ABC would need to be used with VDPA to drive legacy while
> card DEF needs to be used with VFIO. And both VFIO and VDPA
> will happily bind, too. Oh man ...

It is standard VFIO stuff. If you don't attach vfio then you get a
normal kernel vfio-net driver. If you turn that into VDPA then you get
that.

If you attach VFIO to the PCI function then you get VFIO.

There is nothing special here, we have good infrastructure to support
doing this already.

User gets to pick. I don't understand why you think the kernel side
should deny this choice.

Jason

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-22  3:02                           ` Jason Wang
  (?)
@ 2023-09-22 12:22                           ` Jason Gunthorpe
  2023-09-22 12:25                               ` Parav Pandit via Virtualization
  -1 siblings, 1 reply; 321+ messages in thread
From: Jason Gunthorpe @ 2023-09-22 12:22 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Alex Williamson, Yishai Hadas, kvm,
	virtualization, parav, feliu, jiri, kevin.tian, joao.m.martins,
	leonro, maorg

On Fri, Sep 22, 2023 at 11:02:21AM +0800, Jason Wang wrote:

> And what's more, using MMIO BAR0 then it can work for legacy.

Oh? How? Our team didn't think so.

Jason

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-22  9:47                             ` Michael S. Tsirkin
  (?)
@ 2023-09-22 12:23                             ` Jason Gunthorpe
  2023-09-22 15:45                                 ` Michael S. Tsirkin
  -1 siblings, 1 reply; 321+ messages in thread
From: Jason Gunthorpe @ 2023-09-22 12:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Yishai Hadas, jasowang, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Fri, Sep 22, 2023 at 05:47:23AM -0400, Michael S. Tsirkin wrote:

> it will require maintainance effort when virtio changes are made.  For
> example it pokes at the device state - I don't see specific races right
> now but in the past we did e.g. reset the device to recover from errors
> and we might start doing it again.
> 
> If more of the logic is under virtio directory where we'll remember
> to keep it in loop, and will be able to reuse it from vdpa
> down the road, I would be more sympathetic.

This is inevitable, the VFIO live migration driver will need all this
infrastructure too.

Jason
 

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-22  3:02                         ` Jason Wang
  (?)
@ 2023-09-22 12:25                         ` Jason Gunthorpe
  2023-09-22 15:39                             ` Michael S. Tsirkin
  -1 siblings, 1 reply; 321+ messages in thread
From: Jason Gunthorpe @ 2023-09-22 12:25 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Alex Williamson, Yishai Hadas, kvm,
	virtualization, parav, feliu, jiri, kevin.tian, joao.m.martins,
	leonro, maorg

On Fri, Sep 22, 2023 at 11:02:50AM +0800, Jason Wang wrote:
> On Fri, Sep 22, 2023 at 3:53 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
> >
> > On Thu, Sep 21, 2023 at 03:34:03PM -0400, Michael S. Tsirkin wrote:
> >
> > > that's easy/practical.  If instead VDPA gives the same speed with just
> > > shadow vq then keeping this hack in vfio seems like less of a problem.
> > > Finally if VDPA is faster then maybe you will reconsider using it ;)
> >
> > It is not all about the speed.
> >
> > VDPA presents another large and complex software stack in the
> > hypervisor that can be eliminated by simply using VFIO.
> 
> vDPA supports standard virtio devices so how did you define
> complexity?

As I said, VFIO is already required for other devices in these VMs. So
anything incremental over base-line vfio-pci is complexity to
minimize.

Everything vdpa does is either redundant or unnecessary compared to
VFIO in these environments.

Jason

^ permalink raw reply	[flat|nested] 321+ messages in thread

* RE: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-22 12:22                           ` Jason Gunthorpe
@ 2023-09-22 12:25                               ` Parav Pandit via Virtualization
  0 siblings, 0 replies; 321+ messages in thread
From: Parav Pandit @ 2023-09-22 12:25 UTC (permalink / raw)
  To: Jason Gunthorpe, Jason Wang
  Cc: Michael S. Tsirkin, Alex Williamson, Yishai Hadas, kvm,
	virtualization, Feng Liu, Jiri Pirko, kevin.tian, joao.m.martins,
	Leon Romanovsky, Maor Gottlieb


> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, September 22, 2023 5:53 PM


> > And what's more, using MMIO BAR0 then it can work for legacy.
> 
> Oh? How? Our team didn't think so.

It does not. It was already discussed.
The device reset in legacy is not synchronous.
The drivers do not wait for reset to complete; it was written for the sw backend.
Hence MMIO BAR0 is not the best option in real implementations.

^ permalink raw reply	[flat|nested] 321+ messages in thread

* RE: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-22 12:25                               ` Parav Pandit via Virtualization
  0 siblings, 0 replies; 321+ messages in thread
From: Parav Pandit via Virtualization @ 2023-09-22 12:25 UTC (permalink / raw)
  To: Jason Gunthorpe, Jason Wang
  Cc: kvm, Michael S. Tsirkin, Maor Gottlieb, virtualization,
	Jiri Pirko, Leon Romanovsky


> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, September 22, 2023 5:53 PM


> > And what's more, using MMIO BAR0 then it can work for legacy.
> 
> Oh? How? Our team didn't think so.

It does not. It was already discussed.
The device reset in legacy is not synchronous.
The drivers do not wait for reset to complete; it was written for the sw backend.
Hence MMIO BAR0 is not the best option in real implementations.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 19:58     ` Alex Williamson
  (?)
  (?)
@ 2023-09-22 12:37     ` Jason Gunthorpe
  2023-09-22 12:59         ` Parav Pandit via Virtualization
  -1 siblings, 1 reply; 321+ messages in thread
From: Jason Gunthorpe @ 2023-09-22 12:37 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Yishai Hadas, mst, jasowang, kvm, virtualization, parav, feliu,
	jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, Sep 21, 2023 at 01:58:32PM -0600, Alex Williamson wrote:

> If the heart of this driver is simply pretending to have an I/O BAR
> where I/O accesses into that BAR are translated to accesses in the MMIO
> BAR, why can't this be done in the VMM, ie. QEMU?  

That isn't exactly what it does, the IO bar access is translated into
an admin queue command on the PF and excuted by the PCI function.

So it would be difficult to do that in qemu without also somehow
wiring up qemu to access the PF's kernel driver's admin queue.

It would have been nice if it was a trivial 1:1 translation to the
MMIO bar, but it seems that didn't entirely work with existing VMs. So
OASIS standardized this approach.

The bigger picture is there is also a live migration standard & driver
in the works that will re-use all this admin queue infrastructure
anyhow, so the best course is to keep this in the kernel.

Jason

^ permalink raw reply	[flat|nested] 321+ messages in thread

* RE: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-22 12:37     ` Jason Gunthorpe
@ 2023-09-22 12:59         ` Parav Pandit via Virtualization
  0 siblings, 0 replies; 321+ messages in thread
From: Parav Pandit @ 2023-09-22 12:59 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson
  Cc: Yishai Hadas, mst, jasowang, kvm, virtualization, Feng Liu,
	Jiri Pirko, kevin.tian, joao.m.martins, Leon Romanovsky,
	Maor Gottlieb


> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, September 22, 2023 6:07 PM
> 
> On Thu, Sep 21, 2023 at 01:58:32PM -0600, Alex Williamson wrote:
> 
> > If the heart of this driver is simply pretending to have an I/O BAR
> > where I/O accesses into that BAR are translated to accesses in the
> > MMIO BAR, why can't this be done in the VMM, ie. QEMU?
> 
> That isn't exactly what it does, the IO bar access is translated into an admin
> queue command on the PF and excuted by the PCI function.
> 
> So it would be difficult to do that in qemu without also somehow wiring up
> qemu to access the PF's kernel driver's admin queue.
> 
> It would have been nice if it was a trivial 1:1 translation to the MMIO bar, but it
> seems that didn't entirely work with existing VMs. So OASIS standardized this
> approach.
> 
> The bigger picture is there is also a live migration standard & driver in the
> works that will re-use all this admin queue infrastructure anyhow, so the best
> course is to keep this in the kernel.

Additionally in the future the AQ of the PF will also be used to provision the VFs (virtio OASIS calls them member devices), such framework also resides in the kernel.
Such PFs are in use by the kernel driver.

+1 for keeping this framework in the kernel.

^ permalink raw reply	[flat|nested] 321+ messages in thread

* RE: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-22 12:59         ` Parav Pandit via Virtualization
  0 siblings, 0 replies; 321+ messages in thread
From: Parav Pandit via Virtualization @ 2023-09-22 12:59 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson
  Cc: kvm, mst, Maor Gottlieb, virtualization, Jiri Pirko, Leon Romanovsky


> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, September 22, 2023 6:07 PM
> 
> On Thu, Sep 21, 2023 at 01:58:32PM -0600, Alex Williamson wrote:
> 
> > If the heart of this driver is simply pretending to have an I/O BAR
> > where I/O accesses into that BAR are translated to accesses in the
> > MMIO BAR, why can't this be done in the VMM, ie. QEMU?
> 
> That isn't exactly what it does, the IO bar access is translated into an admin
> queue command on the PF and excuted by the PCI function.
> 
> So it would be difficult to do that in qemu without also somehow wiring up
> qemu to access the PF's kernel driver's admin queue.
> 
> It would have been nice if it was a trivial 1:1 translation to the MMIO bar, but it
> seems that didn't entirely work with existing VMs. So OASIS standardized this
> approach.
> 
> The bigger picture is there is also a live migration standard & driver in the
> works that will re-use all this admin queue infrastructure anyhow, so the best
> course is to keep this in the kernel.

Additionally in the future the AQ of the PF will also be used to provision the VFs (virtio OASIS calls them member devices), such framework also resides in the kernel.
Such PFs are in use by the kernel driver.

+1 for keeping this framework in the kernel.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-22 12:25                               ` Parav Pandit via Virtualization
@ 2023-09-22 15:13                                 ` Michael S. Tsirkin
  -1 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-22 15:13 UTC (permalink / raw)
  To: Parav Pandit
  Cc: kvm, Maor Gottlieb, virtualization, Jason Gunthorpe, Jiri Pirko,
	Leon Romanovsky

On Fri, Sep 22, 2023 at 12:25:06PM +0000, Parav Pandit wrote:
> 
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Friday, September 22, 2023 5:53 PM
> 
> 
> > > And what's more, using MMIO BAR0 then it can work for legacy.
> > 
> > Oh? How? Our team didn't think so.
> 
> It does not. It was already discussed.
> The device reset in legacy is not synchronous.
> The drivers do not wait for reset to complete; it was written for the sw backend.
> Hence MMIO BAR0 is not the best option in real implementations.

Or maybe they made it synchronous in hardware, that's all.
After all same is true for the IO BAR0 e.g. for the PF: IO writes are posted anyway.

Whether that's possible would depend on the hardware architecture.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-22 15:13                                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-22 15:13 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Jason Gunthorpe, Jason Wang, Alex Williamson, Yishai Hadas, kvm,
	virtualization, Feng Liu, Jiri Pirko, kevin.tian, joao.m.martins,
	Leon Romanovsky, Maor Gottlieb

On Fri, Sep 22, 2023 at 12:25:06PM +0000, Parav Pandit wrote:
> 
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Friday, September 22, 2023 5:53 PM
> 
> 
> > > And what's more, using MMIO BAR0 then it can work for legacy.
> > 
> > Oh? How? Our team didn't think so.
> 
> It does not. It was already discussed.
> The device reset in legacy is not synchronous.
> The drivers do not wait for reset to complete; it was written for the sw backend.
> Hence MMIO BAR0 is not the best option in real implementations.

Or maybe they made it synchronous in hardware, that's all.
After all same is true for the IO BAR0 e.g. for the PF: IO writes are posted anyway.

Whether that's possible would depend on the hardware architecture.

-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-22 15:13                                 ` Michael S. Tsirkin
  (?)
@ 2023-09-22 15:15                                 ` Jason Gunthorpe
  2023-09-22 15:40                                     ` Michael S. Tsirkin
  -1 siblings, 1 reply; 321+ messages in thread
From: Jason Gunthorpe @ 2023-09-22 15:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Parav Pandit, Jason Wang, Alex Williamson, Yishai Hadas, kvm,
	virtualization, Feng Liu, Jiri Pirko, kevin.tian, joao.m.martins,
	Leon Romanovsky, Maor Gottlieb

On Fri, Sep 22, 2023 at 11:13:18AM -0400, Michael S. Tsirkin wrote:
> On Fri, Sep 22, 2023 at 12:25:06PM +0000, Parav Pandit wrote:
> > 
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Friday, September 22, 2023 5:53 PM
> > 
> > 
> > > > And what's more, using MMIO BAR0 then it can work for legacy.
> > > 
> > > Oh? How? Our team didn't think so.
> > 
> > It does not. It was already discussed.
> > The device reset in legacy is not synchronous.
> > The drivers do not wait for reset to complete; it was written for the sw backend.
> > Hence MMIO BAR0 is not the best option in real implementations.
> 
> Or maybe they made it synchronous in hardware, that's all.
> After all same is true for the IO BAR0 e.g. for the PF: IO writes
> are posted anyway.

IO writes are not posted in PCI.

Jason

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-22 12:25                         ` Jason Gunthorpe
@ 2023-09-22 15:39                             ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-22 15:39 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kvm, maorg, virtualization, jiri, leonro

On Fri, Sep 22, 2023 at 09:25:01AM -0300, Jason Gunthorpe wrote:
> On Fri, Sep 22, 2023 at 11:02:50AM +0800, Jason Wang wrote:
> > On Fri, Sep 22, 2023 at 3:53 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
> > >
> > > On Thu, Sep 21, 2023 at 03:34:03PM -0400, Michael S. Tsirkin wrote:
> > >
> > > > that's easy/practical.  If instead VDPA gives the same speed with just
> > > > shadow vq then keeping this hack in vfio seems like less of a problem.
> > > > Finally if VDPA is faster then maybe you will reconsider using it ;)
> > >
> > > It is not all about the speed.
> > >
> > > VDPA presents another large and complex software stack in the
> > > hypervisor that can be eliminated by simply using VFIO.
> > 
> > vDPA supports standard virtio devices so how did you define
> > complexity?
> 
> As I said, VFIO is already required for other devices in these VMs. So
> anything incremental over base-line vfio-pci is complexity to
> minimize.
> 
> Everything vdpa does is either redundant or unnecessary compared to
> VFIO in these environments.
> 
> Jason

Yes but you know. There are all kind of environments.  I guess you
consider yours the most mainstream and important, and are sure it will
always stay like this.  But if there's a driver that does what you need
then you use that. You really should be explaining what vdpa
*does not* do that you need.

But anyway, if Alex wants to maintain this it's not too bad,
but I would like to see more code move into a library
living under the virtio directory. As it is structured now
it will make virtio core development harder.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-22 15:39                             ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-22 15:39 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jason Wang, Alex Williamson, Yishai Hadas, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Fri, Sep 22, 2023 at 09:25:01AM -0300, Jason Gunthorpe wrote:
> On Fri, Sep 22, 2023 at 11:02:50AM +0800, Jason Wang wrote:
> > On Fri, Sep 22, 2023 at 3:53 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
> > >
> > > On Thu, Sep 21, 2023 at 03:34:03PM -0400, Michael S. Tsirkin wrote:
> > >
> > > > that's easy/practical.  If instead VDPA gives the same speed with just
> > > > shadow vq then keeping this hack in vfio seems like less of a problem.
> > > > Finally if VDPA is faster then maybe you will reconsider using it ;)
> > >
> > > It is not all about the speed.
> > >
> > > VDPA presents another large and complex software stack in the
> > > hypervisor that can be eliminated by simply using VFIO.
> > 
> > vDPA supports standard virtio devices so how did you define
> > complexity?
> 
> As I said, VFIO is already required for other devices in these VMs. So
> anything incremental over base-line vfio-pci is complexity to
> minimize.
> 
> Everything vdpa does is either redundant or unnecessary compared to
> VFIO in these environments.
> 
> Jason

Yes but you know. There are all kind of environments.  I guess you
consider yours the most mainstream and important, and are sure it will
always stay like this.  But if there's a driver that does what you need
then you use that. You really should be explaining what vdpa
*does not* do that you need.

But anyway, if Alex wants to maintain this it's not too bad,
but I would like to see more code move into a library
living under the virtio directory. As it is structured now
it will make virtio core development harder.

-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-22 15:15                                 ` Jason Gunthorpe
@ 2023-09-22 15:40                                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-22 15:40 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Parav Pandit, Jason Wang, Alex Williamson, Yishai Hadas, kvm,
	virtualization, Feng Liu, Jiri Pirko, kevin.tian, joao.m.martins,
	Leon Romanovsky, Maor Gottlieb

On Fri, Sep 22, 2023 at 12:15:34PM -0300, Jason Gunthorpe wrote:
> On Fri, Sep 22, 2023 at 11:13:18AM -0400, Michael S. Tsirkin wrote:
> > On Fri, Sep 22, 2023 at 12:25:06PM +0000, Parav Pandit wrote:
> > > 
> > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > Sent: Friday, September 22, 2023 5:53 PM
> > > 
> > > 
> > > > > And what's more, using MMIO BAR0 then it can work for legacy.
> > > > 
> > > > Oh? How? Our team didn't think so.
> > > 
> > > It does not. It was already discussed.
> > > The device reset in legacy is not synchronous.
> > > The drivers do not wait for reset to complete; it was written for the sw backend.
> > > Hence MMIO BAR0 is not the best option in real implementations.
> > 
> > Or maybe they made it synchronous in hardware, that's all.
> > After all same is true for the IO BAR0 e.g. for the PF: IO writes
> > are posted anyway.
> 
> IO writes are not posted in PCI.

Aha, I was confused. Thanks for the correction. I guess you just buffer
subsequent transactions while reset is going on and reset quickly enough
for it to be seemless then?

-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-22 15:40                                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-22 15:40 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvm, Maor Gottlieb, virtualization, Jiri Pirko, Leon Romanovsky

On Fri, Sep 22, 2023 at 12:15:34PM -0300, Jason Gunthorpe wrote:
> On Fri, Sep 22, 2023 at 11:13:18AM -0400, Michael S. Tsirkin wrote:
> > On Fri, Sep 22, 2023 at 12:25:06PM +0000, Parav Pandit wrote:
> > > 
> > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > Sent: Friday, September 22, 2023 5:53 PM
> > > 
> > > 
> > > > > And what's more, using MMIO BAR0 then it can work for legacy.
> > > > 
> > > > Oh? How? Our team didn't think so.
> > > 
> > > It does not. It was already discussed.
> > > The device reset in legacy is not synchronous.
> > > The drivers do not wait for reset to complete; it was written for the sw backend.
> > > Hence MMIO BAR0 is not the best option in real implementations.
> > 
> > Or maybe they made it synchronous in hardware, that's all.
> > After all same is true for the IO BAR0 e.g. for the PF: IO writes
> > are posted anyway.
> 
> IO writes are not posted in PCI.

Aha, I was confused. Thanks for the correction. I guess you just buffer
subsequent transactions while reset is going on and reset quickly enough
for it to be seemless then?

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-22 12:23                             ` Jason Gunthorpe
@ 2023-09-22 15:45                                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-22 15:45 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kvm, maorg, virtualization, jiri, leonro

On Fri, Sep 22, 2023 at 09:23:28AM -0300, Jason Gunthorpe wrote:
> On Fri, Sep 22, 2023 at 05:47:23AM -0400, Michael S. Tsirkin wrote:
> 
> > it will require maintainance effort when virtio changes are made.  For
> > example it pokes at the device state - I don't see specific races right
> > now but in the past we did e.g. reset the device to recover from errors
> > and we might start doing it again.
> > 
> > If more of the logic is under virtio directory where we'll remember
> > to keep it in loop, and will be able to reuse it from vdpa
> > down the road, I would be more sympathetic.
> 
> This is inevitable, the VFIO live migration driver will need all this
> infrastructure too.
> 
> Jason
>  

I am not sure what you are saying and what is inevitable.
VDPA for sure will want live migration support.  I am not at all
sympathetic to efforts that want to duplicate that support for virtio
under VFIO. Put it in a library under the virtio directory,
with a sane will documented interface.
I don't maintain VFIO and Alex can merge what he wants,
but I won't merge patches that export virtio internals in a way
that will make virtio maintainance harder.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-22 15:45                                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-22 15:45 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alex Williamson, Yishai Hadas, jasowang, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Fri, Sep 22, 2023 at 09:23:28AM -0300, Jason Gunthorpe wrote:
> On Fri, Sep 22, 2023 at 05:47:23AM -0400, Michael S. Tsirkin wrote:
> 
> > it will require maintainance effort when virtio changes are made.  For
> > example it pokes at the device state - I don't see specific races right
> > now but in the past we did e.g. reset the device to recover from errors
> > and we might start doing it again.
> > 
> > If more of the logic is under virtio directory where we'll remember
> > to keep it in loop, and will be able to reuse it from vdpa
> > down the road, I would be more sympathetic.
> 
> This is inevitable, the VFIO live migration driver will need all this
> infrastructure too.
> 
> Jason
>  

I am not sure what you are saying and what is inevitable.
VDPA for sure will want live migration support.  I am not at all
sympathetic to efforts that want to duplicate that support for virtio
under VFIO. Put it in a library under the virtio directory,
with a sane will documented interface.
I don't maintain VFIO and Alex can merge what he wants,
but I won't merge patches that export virtio internals in a way
that will make virtio maintainance harder.

-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 12:40   ` Yishai Hadas
@ 2023-09-22 15:53     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-22 15:53 UTC (permalink / raw)
  To: Yishai Hadas; +Cc: kvm, maorg, virtualization, jgg, jiri, leonro

On Thu, Sep 21, 2023 at 03:40:40PM +0300, Yishai Hadas wrote:
> Introduce a vfio driver over virtio devices to support the legacy
> interface functionality for VFs.
> 
> Background, from the virtio spec [1].
> --------------------------------------------------------------------
> In some systems, there is a need to support a virtio legacy driver with
> a device that does not directly support the legacy interface. In such
> scenarios, a group owner device can provide the legacy interface
> functionality for the group member devices. The driver of the owner
> device can then access the legacy interface of a member device on behalf
> of the legacy member device driver.
> 
> For example, with the SR-IOV group type, group members (VFs) can not
> present the legacy interface in an I/O BAR in BAR0 as expected by the
> legacy pci driver. If the legacy driver is running inside a virtual
> machine, the hypervisor executing the virtual machine can present a
> virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
> legacy driver accesses to this I/O BAR and forwards them to the group
> owner device (PF) using group administration commands.
> --------------------------------------------------------------------
> 
> Specifically, this driver adds support for a virtio-net VF to be exposed
> as a transitional device to a guest driver and allows the legacy IO BAR
> functionality on top.
> 
> This allows a VM which uses a legacy virtio-net driver in the guest to
> work transparently over a VF which its driver in the host is that new
> driver.
> 
> The driver can be extended easily to support some other types of virtio
> devices (e.g virtio-blk), by adding in a few places the specific type
> properties as was done for virtio-net.
> 
> For now, only the virtio-net use case was tested and as such we introduce
> the support only for such a device.
> 
> Practically,
> Upon probing a VF for a virtio-net device, in case its PF supports
> legacy access over the virtio admin commands and the VF doesn't have BAR
> 0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
> transitional device with I/O BAR in BAR 0.
> 
> The existence of the simulated I/O bar is reported later on by
> overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
> exposes itself as a transitional device by overwriting some properties
> upon reading its config space.
> 
> Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
> guest may use it via read/write calls according to the virtio
> specification.
> 
> Any read/write towards the control parts of the BAR will be captured by
> the new driver and will be translated into admin commands towards the
> device.
> 
> Any data path read/write access (i.e. virtio driver notifications) will
> be forwarded to the physical BAR which its properties were supplied by
> the command VIRTIO_PCI_QUEUE_NOTIFY upon the probing/init flow.
> 
> With that code in place a legacy driver in the guest has the look and
> feel as if having a transitional device with legacy support for both its
> control and data path flows.
> 
> [1]
> https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
> 
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> ---
>  MAINTAINERS                      |   6 +
>  drivers/vfio/pci/Kconfig         |   2 +
>  drivers/vfio/pci/Makefile        |   2 +
>  drivers/vfio/pci/virtio/Kconfig  |  15 +
>  drivers/vfio/pci/virtio/Makefile |   4 +
>  drivers/vfio/pci/virtio/cmd.c    |   4 +-
>  drivers/vfio/pci/virtio/cmd.h    |   8 +
>  drivers/vfio/pci/virtio/main.c   | 546 +++++++++++++++++++++++++++++++
>  8 files changed, 585 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/vfio/pci/virtio/Kconfig
>  create mode 100644 drivers/vfio/pci/virtio/Makefile
>  create mode 100644 drivers/vfio/pci/virtio/main.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index bf0f54c24f81..5098418c8389 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -22624,6 +22624,12 @@ L:	kvm@vger.kernel.org
>  S:	Maintained
>  F:	drivers/vfio/pci/mlx5/
>  
> +VFIO VIRTIO PCI DRIVER
> +M:	Yishai Hadas <yishaih@nvidia.com>
> +L:	kvm@vger.kernel.org
> +S:	Maintained
> +F:	drivers/vfio/pci/virtio
> +
>  VFIO PCI DEVICE SPECIFIC DRIVERS
>  R:	Jason Gunthorpe <jgg@nvidia.com>
>  R:	Yishai Hadas <yishaih@nvidia.com>
> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> index 8125e5f37832..18c397df566d 100644
> --- a/drivers/vfio/pci/Kconfig
> +++ b/drivers/vfio/pci/Kconfig
> @@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
>  
>  source "drivers/vfio/pci/pds/Kconfig"
>  
> +source "drivers/vfio/pci/virtio/Kconfig"
> +
>  endmenu
> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
> index 45167be462d8..046139a4eca5 100644
> --- a/drivers/vfio/pci/Makefile
> +++ b/drivers/vfio/pci/Makefile
> @@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
>  obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
>  
>  obj-$(CONFIG_PDS_VFIO_PCI) += pds/
> +
> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
> diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
> new file mode 100644
> index 000000000000..89eddce8b1bd
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/Kconfig
> @@ -0,0 +1,15 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +config VIRTIO_VFIO_PCI
> +        tristate "VFIO support for VIRTIO PCI devices"
> +        depends on VIRTIO_PCI
> +        select VFIO_PCI_CORE
> +        help
> +          This provides support for exposing VIRTIO VF devices using the VFIO
> +          framework that can work with a legacy virtio driver in the guest.
> +          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
> +          not indicate I/O Space.
> +          As of that this driver emulated I/O BAR in software to let a VF be
> +          seen as a transitional device in the guest and let it work with
> +          a legacy driver.
> +
> +          If you don't know what to do here, say N.
> diff --git a/drivers/vfio/pci/virtio/Makefile b/drivers/vfio/pci/virtio/Makefile
> new file mode 100644
> index 000000000000..584372648a03
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/Makefile
> @@ -0,0 +1,4 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio-vfio-pci.o
> +virtio-vfio-pci-y := main.o cmd.o
> +
> diff --git a/drivers/vfio/pci/virtio/cmd.c b/drivers/vfio/pci/virtio/cmd.c
> index f068239cdbb0..aea9d25fbf1d 100644
> --- a/drivers/vfio/pci/virtio/cmd.c
> +++ b/drivers/vfio/pci/virtio/cmd.c
> @@ -44,7 +44,7 @@ int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
>  {
>  	struct virtio_device *virtio_dev =
>  		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> -	struct virtio_admin_cmd_data_lr_write *in;
> +	struct virtio_admin_cmd_legacy_wr_data *in;
>  	struct scatterlist in_sg;
>  	struct virtio_admin_cmd cmd = {};
>  	int ret;
> @@ -74,7 +74,7 @@ int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
>  {
>  	struct virtio_device *virtio_dev =
>  		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> -	struct virtio_admin_cmd_data_lr_read *in;
> +	struct virtio_admin_cmd_legacy_rd_data *in;
>  	struct scatterlist in_sg, out_sg;
>  	struct virtio_admin_cmd cmd = {};
>  	int ret;
> diff --git a/drivers/vfio/pci/virtio/cmd.h b/drivers/vfio/pci/virtio/cmd.h
> index c2a3645f4b90..347b1dc85570 100644
> --- a/drivers/vfio/pci/virtio/cmd.h
> +++ b/drivers/vfio/pci/virtio/cmd.h
> @@ -13,7 +13,15 @@
>  
>  struct virtiovf_pci_core_device {
>  	struct vfio_pci_core_device core_device;
> +	u8 bar0_virtual_buf_size;
> +	u8 *bar0_virtual_buf;
> +	/* synchronize access to the virtual buf */
> +	struct mutex bar_mutex;
>  	int vf_id;
> +	void __iomem *notify_addr;
> +	u32 notify_offset;
> +	u8 notify_bar;
> +	u8 pci_cmd_io :1;
>  };
>  
>  int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
> diff --git a/drivers/vfio/pci/virtio/main.c b/drivers/vfio/pci/virtio/main.c
> new file mode 100644
> index 000000000000..2486991c49f3
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/main.c
> @@ -0,0 +1,546 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> + */
> +
> +#include <linux/device.h>
> +#include <linux/module.h>
> +#include <linux/mutex.h>
> +#include <linux/pci.h>
> +#include <linux/pm_runtime.h>
> +#include <linux/types.h>
> +#include <linux/uaccess.h>
> +#include <linux/vfio.h>
> +#include <linux/vfio_pci_core.h>
> +#include <linux/virtio_pci.h>
> +#include <linux/virtio_net.h>
> +#include <linux/virtio_pci_modern.h>
> +
> +#include "cmd.h"
> +
> +#define VIRTIO_LEGACY_IO_BAR_HEADER_LEN 20
> +#define VIRTIO_LEGACY_IO_BAR_MSIX_HEADER_LEN 4
> +
> +static int virtiovf_issue_lr_cmd(struct virtiovf_pci_core_device *virtvdev,
> +				 loff_t pos, char __user *buf,
> +				 size_t count, bool read)
> +{
> +	u8 *bar0_buf = virtvdev->bar0_virtual_buf;
> +	u16 opcode;
> +	int ret;
> +
> +	mutex_lock(&virtvdev->bar_mutex);
> +	if (read) {
> +		opcode = (pos < VIRTIO_PCI_CONFIG_OFF(true)) ?
> +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ :
> +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ;
> +		ret = virtiovf_cmd_lr_read(virtvdev, opcode, pos,
> +					   count, bar0_buf + pos);
> +		if (ret)
> +			goto out;
> +		if (copy_to_user(buf, bar0_buf + pos, count))
> +			ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	if (copy_from_user(bar0_buf + pos, buf, count)) {
> +		ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	opcode = (pos < VIRTIO_PCI_CONFIG_OFF(true)) ?
> +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE :
> +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE;
> +	ret = virtiovf_cmd_lr_write(virtvdev, opcode, pos, count,
> +				    bar0_buf + pos);
> +out:
> +	mutex_unlock(&virtvdev->bar_mutex);
> +	return ret;
> +}
> +
> +static int
> +translate_io_bar_to_mem_bar(struct virtiovf_pci_core_device *virtvdev,
> +			    loff_t pos, char __user *buf,
> +			    size_t count, bool read)
> +{
> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
> +	u16 queue_notify;
> +	int ret;
> +
> +	if (pos + count > virtvdev->bar0_virtual_buf_size)
> +		return -EINVAL;
> +
> +	switch (pos) {
> +	case VIRTIO_PCI_QUEUE_NOTIFY:
> +		if (count != sizeof(queue_notify))
> +			return -EINVAL;
> +		if (read) {
> +			ret = vfio_pci_ioread16(core_device, true, &queue_notify,
> +						virtvdev->notify_addr);
> +			if (ret)
> +				return ret;
> +			if (copy_to_user(buf, &queue_notify,
> +					 sizeof(queue_notify)))
> +				return -EFAULT;
> +			break;
> +		}
> +
> +		if (copy_from_user(&queue_notify, buf, count))
> +			return -EFAULT;
> +
> +		ret = vfio_pci_iowrite16(core_device, true, queue_notify,
> +					 virtvdev->notify_addr);
> +		break;
> +	default:
> +		ret = virtiovf_issue_lr_cmd(virtvdev, pos, buf, count, read);
> +	}
> +
> +	return ret ? ret : count;
> +}
> +
> +static bool range_contains_range(loff_t range1_start, size_t count1,
> +				 loff_t range2_start, size_t count2,
> +				 loff_t *start_offset)
> +{
> +	if (range1_start <= range2_start &&
> +	    range1_start + count1 >= range2_start + count2) {
> +		*start_offset = range2_start - range1_start;
> +		return true;
> +	}
> +	return false;
> +}
> +
> +static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
> +					char __user *buf, size_t count,
> +					loff_t *ppos)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +	loff_t copy_offset;
> +	__le32 val32;
> +	__le16 val16;
> +	u8 val8;
> +	int ret;
> +
> +	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
> +	if (ret < 0)
> +		return ret;
> +
> +	if (range_contains_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
> +				 &copy_offset)) {
> +		val16 = cpu_to_le16(0x1000);
> +		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
> +			return -EFAULT;
> +	}
> +
> +	if (virtvdev->pci_cmd_io &&
> +	    range_contains_range(pos, count, PCI_COMMAND, sizeof(val16),
> +				 &copy_offset)) {
> +		if (copy_from_user(&val16, buf, sizeof(val16)))
> +			return -EFAULT;
> +		val16 |= cpu_to_le16(PCI_COMMAND_IO);
> +		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
> +			return -EFAULT;
> +	}
> +
> +	if (range_contains_range(pos, count, PCI_REVISION_ID, sizeof(val8),
> +				 &copy_offset)) {
> +		/* Transional needs to have revision 0 */
> +		val8 = 0;
> +		if (copy_to_user(buf + copy_offset, &val8, sizeof(val8)))
> +			return -EFAULT;
> +	}
> +
> +	if (range_contains_range(pos, count, PCI_BASE_ADDRESS_0, sizeof(val32),
> +				 &copy_offset)) {
> +		val32 = cpu_to_le32(PCI_BASE_ADDRESS_SPACE_IO);
> +		if (copy_to_user(buf + copy_offset, &val32, sizeof(val32)))
> +			return -EFAULT;
> +	}
> +
> +	if (range_contains_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16),
> +				 &copy_offset)) {
> +		/* Transitional devices use the PCI subsystem device id as
> +		 * virtio device id, same as legacy driver always did.
> +		 */
> +		val16 = cpu_to_le16(VIRTIO_ID_NET);
> +		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
> +			return -EFAULT;
> +	}
> +
> +	return count;
> +}
> +
> +static ssize_t
> +virtiovf_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
> +		       size_t count, loff_t *ppos)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +	int ret;
> +
> +	if (!count)
> +		return 0;
> +
> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX)
> +		return virtiovf_pci_read_config(core_vdev, buf, count, ppos);
> +
> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
> +		return vfio_pci_core_read(core_vdev, buf, count, ppos);
> +
> +	ret = pm_runtime_resume_and_get(&pdev->dev);
> +	if (ret) {
> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n",
> +				     ret);
> +		return -EIO;
> +	}
> +
> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, buf, count, true);
> +	pm_runtime_put(&pdev->dev);
> +	return ret;
> +}
> +
> +static ssize_t
> +virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
> +			size_t count, loff_t *ppos)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +	int ret;
> +
> +	if (!count)
> +		return 0;
> +
> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
> +		loff_t copy_offset;
> +		u16 cmd;
> +
> +		if (range_contains_range(pos, count, PCI_COMMAND, sizeof(cmd),
> +					 &copy_offset)) {
> +			if (copy_from_user(&cmd, buf + copy_offset, sizeof(cmd)))
> +				return -EFAULT;
> +			virtvdev->pci_cmd_io = (cmd & PCI_COMMAND_IO);
> +		}
> +	}
> +
> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
> +		return vfio_pci_core_write(core_vdev, buf, count, ppos);
> +
> +	ret = pm_runtime_resume_and_get(&pdev->dev);
> +	if (ret) {
> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n", ret);
> +		return -EIO;
> +	}
> +
> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, (char __user *)buf, count, false);
> +	pm_runtime_put(&pdev->dev);
> +	return ret;
> +}
> +
> +static int
> +virtiovf_pci_ioctl_get_region_info(struct vfio_device *core_vdev,
> +				   unsigned int cmd, unsigned long arg)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	unsigned long minsz = offsetofend(struct vfio_region_info, offset);
> +	void __user *uarg = (void __user *)arg;
> +	struct vfio_region_info info = {};
> +
> +	if (copy_from_user(&info, uarg, minsz))
> +		return -EFAULT;
> +
> +	if (info.argsz < minsz)
> +		return -EINVAL;
> +
> +	switch (info.index) {
> +	case VFIO_PCI_BAR0_REGION_INDEX:
> +		info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
> +		info.size = virtvdev->bar0_virtual_buf_size;
> +		info.flags = VFIO_REGION_INFO_FLAG_READ |
> +			     VFIO_REGION_INFO_FLAG_WRITE;
> +		return copy_to_user(uarg, &info, minsz) ? -EFAULT : 0;
> +	default:
> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
> +	}
> +}
> +
> +static long
> +virtiovf_vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
> +			     unsigned long arg)
> +{
> +	switch (cmd) {
> +	case VFIO_DEVICE_GET_REGION_INFO:
> +		return virtiovf_pci_ioctl_get_region_info(core_vdev, cmd, arg);
> +	default:
> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
> +	}
> +}
> +
> +static int
> +virtiovf_set_notify_addr(struct virtiovf_pci_core_device *virtvdev)
> +{
> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
> +	int ret;
> +
> +	/* Setup the BAR where the 'notify' exists to be used by vfio as well
> +	 * This will let us mmap it only once and use it when needed.
> +	 */
> +	ret = vfio_pci_core_setup_barmap(core_device,
> +					 virtvdev->notify_bar);
> +	if (ret)
> +		return ret;
> +
> +	virtvdev->notify_addr = core_device->barmap[virtvdev->notify_bar] +
> +			virtvdev->notify_offset;
> +	return 0;
> +}
> +
> +static int virtiovf_pci_open_device(struct vfio_device *core_vdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct vfio_pci_core_device *vdev = &virtvdev->core_device;
> +	int ret;
> +
> +	ret = vfio_pci_core_enable(vdev);
> +	if (ret)
> +		return ret;
> +
> +	if (virtvdev->bar0_virtual_buf) {
> +		/* upon close_device() the vfio_pci_core_disable() is called
> +		 * and will close all the previous mmaps, so it seems that the
> +		 * valid life cycle for the 'notify' addr is per open/close.
> +		 */
> +		ret = virtiovf_set_notify_addr(virtvdev);
> +		if (ret) {
> +			vfio_pci_core_disable(vdev);
> +			return ret;
> +		}
> +	}
> +
> +	vfio_pci_core_finish_enable(vdev);
> +	return 0;
> +}
> +
> +static void virtiovf_pci_close_device(struct vfio_device *core_vdev)
> +{
> +	vfio_pci_core_close_device(core_vdev);
> +}
> +
> +static int virtiovf_get_device_config_size(unsigned short device)
> +{
> +	switch (device) {
> +	case 0x1041:
> +		/* network card */
> +		return offsetofend(struct virtio_net_config, status);
> +	default:
> +		return 0;
> +	}
> +}
> +
> +static int virtiovf_read_notify_info(struct virtiovf_pci_core_device *virtvdev)
> +{
> +	u64 offset;
> +	int ret;
> +	u8 bar;
> +
> +	ret = virtiovf_cmd_lq_read_notify(virtvdev,
> +				VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM,
> +				&bar, &offset);
> +	if (ret)
> +		return ret;
> +
> +	virtvdev->notify_bar = bar;
> +	virtvdev->notify_offset = offset;
> +	return 0;
> +}
> +
> +static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct pci_dev *pdev;
> +	int ret;
> +
> +	ret = vfio_pci_core_init_dev(core_vdev);
> +	if (ret)
> +		return ret;
> +
> +	pdev = virtvdev->core_device.pdev;
> +	virtvdev->vf_id = pci_iov_vf_id(pdev);
> +	if (virtvdev->vf_id < 0)
> +		return -EINVAL;
> +
> +	ret = virtiovf_read_notify_info(virtvdev);
> +	if (ret)
> +		return ret;
> +
> +	virtvdev->bar0_virtual_buf_size = VIRTIO_LEGACY_IO_BAR_HEADER_LEN +
> +		VIRTIO_LEGACY_IO_BAR_MSIX_HEADER_LEN +
> +		virtiovf_get_device_config_size(pdev->device);
> +	virtvdev->bar0_virtual_buf = kzalloc(virtvdev->bar0_virtual_buf_size,
> +					     GFP_KERNEL);
> +	if (!virtvdev->bar0_virtual_buf)
> +		return -ENOMEM;
> +	mutex_init(&virtvdev->bar_mutex);
> +	return 0;
> +}
> +
> +static void virtiovf_pci_core_release_dev(struct vfio_device *core_vdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +
> +	kfree(virtvdev->bar0_virtual_buf);
> +	vfio_pci_core_release_dev(core_vdev);
> +}
> +
> +static const struct vfio_device_ops virtiovf_acc_vfio_pci_tran_ops = {
> +	.name = "virtio-transitional-vfio-pci",
> +	.init = virtiovf_pci_init_device,
> +	.release = virtiovf_pci_core_release_dev,
> +	.open_device = virtiovf_pci_open_device,
> +	.close_device = virtiovf_pci_close_device,
> +	.ioctl = virtiovf_vfio_pci_core_ioctl,
> +	.read = virtiovf_pci_core_read,
> +	.write = virtiovf_pci_core_write,
> +	.mmap = vfio_pci_core_mmap,
> +	.request = vfio_pci_core_request,
> +	.match = vfio_pci_core_match,
> +	.bind_iommufd = vfio_iommufd_physical_bind,
> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
> +};
> +
> +static const struct vfio_device_ops virtiovf_acc_vfio_pci_ops = {
> +	.name = "virtio-acc-vfio-pci",
> +	.init = vfio_pci_core_init_dev,
> +	.release = vfio_pci_core_release_dev,
> +	.open_device = virtiovf_pci_open_device,
> +	.close_device = virtiovf_pci_close_device,
> +	.ioctl = vfio_pci_core_ioctl,
> +	.device_feature = vfio_pci_core_ioctl_feature,
> +	.read = vfio_pci_core_read,
> +	.write = vfio_pci_core_write,
> +	.mmap = vfio_pci_core_mmap,
> +	.request = vfio_pci_core_request,
> +	.match = vfio_pci_core_match,
> +	.bind_iommufd = vfio_iommufd_physical_bind,
> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
> +};
> +
> +static bool virtiovf_bar0_exists(struct pci_dev *pdev)
> +{
> +	struct resource *res = pdev->resource;
> +
> +	return res->flags ? true : false;
> +}
> +
> +#define VIRTIOVF_USE_ADMIN_CMD_BITMAP \
> +	(BIT_ULL(VIRTIO_ADMIN_CMD_LIST_QUERY) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LIST_USE) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
> +
> +static bool virtiovf_support_legacy_access(struct pci_dev *pdev)
> +{
> +	int buf_size = DIV_ROUND_UP(VIRTIO_ADMIN_MAX_CMD_OPCODE, 64) * 8;
> +	u8 *buf;
> +	int ret;
> +
> +	/* Only virtio-net is supported/tested so far */
> +	if (pdev->device != 0x1041)
> +		return false;
> +
> +	buf = kzalloc(buf_size, GFP_KERNEL);
> +	if (!buf)
> +		return false;
> +
> +	ret = virtiovf_cmd_list_query(pdev, buf, buf_size);
> +	if (ret)
> +		goto end;
> +
> +	if ((le64_to_cpup((__le64 *)buf) & VIRTIOVF_USE_ADMIN_CMD_BITMAP) !=
> +		VIRTIOVF_USE_ADMIN_CMD_BITMAP) {
> +		ret = -EOPNOTSUPP;
> +		goto end;
> +	}
> +
> +	/* confirm the used commands */
> +	memset(buf, 0, buf_size);
> +	*(__le64 *)buf = cpu_to_le64(VIRTIOVF_USE_ADMIN_CMD_BITMAP);
> +	ret = virtiovf_cmd_list_use(pdev, buf, buf_size);
> +
> +end:
> +	kfree(buf);
> +	return ret ? false : true;
> +}
> +
> +static int virtiovf_pci_probe(struct pci_dev *pdev,
> +			      const struct pci_device_id *id)
> +{
> +	const struct vfio_device_ops *ops = &virtiovf_acc_vfio_pci_ops;
> +	struct virtiovf_pci_core_device *virtvdev;
> +	int ret;
> +
> +	if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
> +	    !virtiovf_bar0_exists(pdev) && pdev->msix_cap)

I see this is the reason you set MSIX to true. But I think it's a
misunderstanding - that true means MSIX is enabled by guest, not that
it exists.


> +		ops = &virtiovf_acc_vfio_pci_tran_ops;



Actually, I remember there's a problem with just always doing
transitional and that is VIRTIO_F_ACCESS_PLATFORM - some configs just
break in weird ways as device will go through an iommu. It would be
nicer I think if userspace had the last word on whether it wants to
enable legacy or not, even if hardware supports that.


> +
> +	virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
> +				     &pdev->dev, ops);
> +	if (IS_ERR(virtvdev))
> +		return PTR_ERR(virtvdev);
> +
> +	dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
> +	ret = vfio_pci_core_register_device(&virtvdev->core_device);
> +	if (ret)
> +		goto out;
> +	return 0;
> +out:
> +	vfio_put_device(&virtvdev->core_device.vdev);
> +	return ret;
> +}
> +
> +static void virtiovf_pci_remove(struct pci_dev *pdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
> +
> +	vfio_pci_core_unregister_device(&virtvdev->core_device);
> +	vfio_put_device(&virtvdev->core_device.vdev);
> +}
> +
> +static const struct pci_device_id virtiovf_pci_table[] = {
> +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, PCI_ANY_ID) },
> +	{}
> +};
> +
> +MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
> +
> +static struct pci_driver virtiovf_pci_driver = {
> +	.name = KBUILD_MODNAME,
> +	.id_table = virtiovf_pci_table,
> +	.probe = virtiovf_pci_probe,
> +	.remove = virtiovf_pci_remove,
> +	.err_handler = &vfio_pci_core_err_handlers,
> +	.driver_managed_dma = true,
> +};
> +
> +module_pci_driver(virtiovf_pci_driver);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
> +MODULE_DESCRIPTION(
> +	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO device family");
> -- 
> 2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-22 15:53     ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-22 15:53 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: alex.williamson, jasowang, jgg, kvm, virtualization, parav,
	feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, Sep 21, 2023 at 03:40:40PM +0300, Yishai Hadas wrote:
> Introduce a vfio driver over virtio devices to support the legacy
> interface functionality for VFs.
> 
> Background, from the virtio spec [1].
> --------------------------------------------------------------------
> In some systems, there is a need to support a virtio legacy driver with
> a device that does not directly support the legacy interface. In such
> scenarios, a group owner device can provide the legacy interface
> functionality for the group member devices. The driver of the owner
> device can then access the legacy interface of a member device on behalf
> of the legacy member device driver.
> 
> For example, with the SR-IOV group type, group members (VFs) can not
> present the legacy interface in an I/O BAR in BAR0 as expected by the
> legacy pci driver. If the legacy driver is running inside a virtual
> machine, the hypervisor executing the virtual machine can present a
> virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
> legacy driver accesses to this I/O BAR and forwards them to the group
> owner device (PF) using group administration commands.
> --------------------------------------------------------------------
> 
> Specifically, this driver adds support for a virtio-net VF to be exposed
> as a transitional device to a guest driver and allows the legacy IO BAR
> functionality on top.
> 
> This allows a VM which uses a legacy virtio-net driver in the guest to
> work transparently over a VF which its driver in the host is that new
> driver.
> 
> The driver can be extended easily to support some other types of virtio
> devices (e.g virtio-blk), by adding in a few places the specific type
> properties as was done for virtio-net.
> 
> For now, only the virtio-net use case was tested and as such we introduce
> the support only for such a device.
> 
> Practically,
> Upon probing a VF for a virtio-net device, in case its PF supports
> legacy access over the virtio admin commands and the VF doesn't have BAR
> 0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
> transitional device with I/O BAR in BAR 0.
> 
> The existence of the simulated I/O bar is reported later on by
> overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
> exposes itself as a transitional device by overwriting some properties
> upon reading its config space.
> 
> Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
> guest may use it via read/write calls according to the virtio
> specification.
> 
> Any read/write towards the control parts of the BAR will be captured by
> the new driver and will be translated into admin commands towards the
> device.
> 
> Any data path read/write access (i.e. virtio driver notifications) will
> be forwarded to the physical BAR which its properties were supplied by
> the command VIRTIO_PCI_QUEUE_NOTIFY upon the probing/init flow.
> 
> With that code in place a legacy driver in the guest has the look and
> feel as if having a transitional device with legacy support for both its
> control and data path flows.
> 
> [1]
> https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
> 
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> ---
>  MAINTAINERS                      |   6 +
>  drivers/vfio/pci/Kconfig         |   2 +
>  drivers/vfio/pci/Makefile        |   2 +
>  drivers/vfio/pci/virtio/Kconfig  |  15 +
>  drivers/vfio/pci/virtio/Makefile |   4 +
>  drivers/vfio/pci/virtio/cmd.c    |   4 +-
>  drivers/vfio/pci/virtio/cmd.h    |   8 +
>  drivers/vfio/pci/virtio/main.c   | 546 +++++++++++++++++++++++++++++++
>  8 files changed, 585 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/vfio/pci/virtio/Kconfig
>  create mode 100644 drivers/vfio/pci/virtio/Makefile
>  create mode 100644 drivers/vfio/pci/virtio/main.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index bf0f54c24f81..5098418c8389 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -22624,6 +22624,12 @@ L:	kvm@vger.kernel.org
>  S:	Maintained
>  F:	drivers/vfio/pci/mlx5/
>  
> +VFIO VIRTIO PCI DRIVER
> +M:	Yishai Hadas <yishaih@nvidia.com>
> +L:	kvm@vger.kernel.org
> +S:	Maintained
> +F:	drivers/vfio/pci/virtio
> +
>  VFIO PCI DEVICE SPECIFIC DRIVERS
>  R:	Jason Gunthorpe <jgg@nvidia.com>
>  R:	Yishai Hadas <yishaih@nvidia.com>
> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> index 8125e5f37832..18c397df566d 100644
> --- a/drivers/vfio/pci/Kconfig
> +++ b/drivers/vfio/pci/Kconfig
> @@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
>  
>  source "drivers/vfio/pci/pds/Kconfig"
>  
> +source "drivers/vfio/pci/virtio/Kconfig"
> +
>  endmenu
> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
> index 45167be462d8..046139a4eca5 100644
> --- a/drivers/vfio/pci/Makefile
> +++ b/drivers/vfio/pci/Makefile
> @@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
>  obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
>  
>  obj-$(CONFIG_PDS_VFIO_PCI) += pds/
> +
> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
> diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
> new file mode 100644
> index 000000000000..89eddce8b1bd
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/Kconfig
> @@ -0,0 +1,15 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +config VIRTIO_VFIO_PCI
> +        tristate "VFIO support for VIRTIO PCI devices"
> +        depends on VIRTIO_PCI
> +        select VFIO_PCI_CORE
> +        help
> +          This provides support for exposing VIRTIO VF devices using the VFIO
> +          framework that can work with a legacy virtio driver in the guest.
> +          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
> +          not indicate I/O Space.
> +          As of that this driver emulated I/O BAR in software to let a VF be
> +          seen as a transitional device in the guest and let it work with
> +          a legacy driver.
> +
> +          If you don't know what to do here, say N.
> diff --git a/drivers/vfio/pci/virtio/Makefile b/drivers/vfio/pci/virtio/Makefile
> new file mode 100644
> index 000000000000..584372648a03
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/Makefile
> @@ -0,0 +1,4 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio-vfio-pci.o
> +virtio-vfio-pci-y := main.o cmd.o
> +
> diff --git a/drivers/vfio/pci/virtio/cmd.c b/drivers/vfio/pci/virtio/cmd.c
> index f068239cdbb0..aea9d25fbf1d 100644
> --- a/drivers/vfio/pci/virtio/cmd.c
> +++ b/drivers/vfio/pci/virtio/cmd.c
> @@ -44,7 +44,7 @@ int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
>  {
>  	struct virtio_device *virtio_dev =
>  		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> -	struct virtio_admin_cmd_data_lr_write *in;
> +	struct virtio_admin_cmd_legacy_wr_data *in;
>  	struct scatterlist in_sg;
>  	struct virtio_admin_cmd cmd = {};
>  	int ret;
> @@ -74,7 +74,7 @@ int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
>  {
>  	struct virtio_device *virtio_dev =
>  		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> -	struct virtio_admin_cmd_data_lr_read *in;
> +	struct virtio_admin_cmd_legacy_rd_data *in;
>  	struct scatterlist in_sg, out_sg;
>  	struct virtio_admin_cmd cmd = {};
>  	int ret;
> diff --git a/drivers/vfio/pci/virtio/cmd.h b/drivers/vfio/pci/virtio/cmd.h
> index c2a3645f4b90..347b1dc85570 100644
> --- a/drivers/vfio/pci/virtio/cmd.h
> +++ b/drivers/vfio/pci/virtio/cmd.h
> @@ -13,7 +13,15 @@
>  
>  struct virtiovf_pci_core_device {
>  	struct vfio_pci_core_device core_device;
> +	u8 bar0_virtual_buf_size;
> +	u8 *bar0_virtual_buf;
> +	/* synchronize access to the virtual buf */
> +	struct mutex bar_mutex;
>  	int vf_id;
> +	void __iomem *notify_addr;
> +	u32 notify_offset;
> +	u8 notify_bar;
> +	u8 pci_cmd_io :1;
>  };
>  
>  int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
> diff --git a/drivers/vfio/pci/virtio/main.c b/drivers/vfio/pci/virtio/main.c
> new file mode 100644
> index 000000000000..2486991c49f3
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/main.c
> @@ -0,0 +1,546 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> + */
> +
> +#include <linux/device.h>
> +#include <linux/module.h>
> +#include <linux/mutex.h>
> +#include <linux/pci.h>
> +#include <linux/pm_runtime.h>
> +#include <linux/types.h>
> +#include <linux/uaccess.h>
> +#include <linux/vfio.h>
> +#include <linux/vfio_pci_core.h>
> +#include <linux/virtio_pci.h>
> +#include <linux/virtio_net.h>
> +#include <linux/virtio_pci_modern.h>
> +
> +#include "cmd.h"
> +
> +#define VIRTIO_LEGACY_IO_BAR_HEADER_LEN 20
> +#define VIRTIO_LEGACY_IO_BAR_MSIX_HEADER_LEN 4
> +
> +static int virtiovf_issue_lr_cmd(struct virtiovf_pci_core_device *virtvdev,
> +				 loff_t pos, char __user *buf,
> +				 size_t count, bool read)
> +{
> +	u8 *bar0_buf = virtvdev->bar0_virtual_buf;
> +	u16 opcode;
> +	int ret;
> +
> +	mutex_lock(&virtvdev->bar_mutex);
> +	if (read) {
> +		opcode = (pos < VIRTIO_PCI_CONFIG_OFF(true)) ?
> +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ :
> +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ;
> +		ret = virtiovf_cmd_lr_read(virtvdev, opcode, pos,
> +					   count, bar0_buf + pos);
> +		if (ret)
> +			goto out;
> +		if (copy_to_user(buf, bar0_buf + pos, count))
> +			ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	if (copy_from_user(bar0_buf + pos, buf, count)) {
> +		ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	opcode = (pos < VIRTIO_PCI_CONFIG_OFF(true)) ?
> +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE :
> +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE;
> +	ret = virtiovf_cmd_lr_write(virtvdev, opcode, pos, count,
> +				    bar0_buf + pos);
> +out:
> +	mutex_unlock(&virtvdev->bar_mutex);
> +	return ret;
> +}
> +
> +static int
> +translate_io_bar_to_mem_bar(struct virtiovf_pci_core_device *virtvdev,
> +			    loff_t pos, char __user *buf,
> +			    size_t count, bool read)
> +{
> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
> +	u16 queue_notify;
> +	int ret;
> +
> +	if (pos + count > virtvdev->bar0_virtual_buf_size)
> +		return -EINVAL;
> +
> +	switch (pos) {
> +	case VIRTIO_PCI_QUEUE_NOTIFY:
> +		if (count != sizeof(queue_notify))
> +			return -EINVAL;
> +		if (read) {
> +			ret = vfio_pci_ioread16(core_device, true, &queue_notify,
> +						virtvdev->notify_addr);
> +			if (ret)
> +				return ret;
> +			if (copy_to_user(buf, &queue_notify,
> +					 sizeof(queue_notify)))
> +				return -EFAULT;
> +			break;
> +		}
> +
> +		if (copy_from_user(&queue_notify, buf, count))
> +			return -EFAULT;
> +
> +		ret = vfio_pci_iowrite16(core_device, true, queue_notify,
> +					 virtvdev->notify_addr);
> +		break;
> +	default:
> +		ret = virtiovf_issue_lr_cmd(virtvdev, pos, buf, count, read);
> +	}
> +
> +	return ret ? ret : count;
> +}
> +
> +static bool range_contains_range(loff_t range1_start, size_t count1,
> +				 loff_t range2_start, size_t count2,
> +				 loff_t *start_offset)
> +{
> +	if (range1_start <= range2_start &&
> +	    range1_start + count1 >= range2_start + count2) {
> +		*start_offset = range2_start - range1_start;
> +		return true;
> +	}
> +	return false;
> +}
> +
> +static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
> +					char __user *buf, size_t count,
> +					loff_t *ppos)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +	loff_t copy_offset;
> +	__le32 val32;
> +	__le16 val16;
> +	u8 val8;
> +	int ret;
> +
> +	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
> +	if (ret < 0)
> +		return ret;
> +
> +	if (range_contains_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
> +				 &copy_offset)) {
> +		val16 = cpu_to_le16(0x1000);
> +		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
> +			return -EFAULT;
> +	}
> +
> +	if (virtvdev->pci_cmd_io &&
> +	    range_contains_range(pos, count, PCI_COMMAND, sizeof(val16),
> +				 &copy_offset)) {
> +		if (copy_from_user(&val16, buf, sizeof(val16)))
> +			return -EFAULT;
> +		val16 |= cpu_to_le16(PCI_COMMAND_IO);
> +		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
> +			return -EFAULT;
> +	}
> +
> +	if (range_contains_range(pos, count, PCI_REVISION_ID, sizeof(val8),
> +				 &copy_offset)) {
> +		/* Transional needs to have revision 0 */
> +		val8 = 0;
> +		if (copy_to_user(buf + copy_offset, &val8, sizeof(val8)))
> +			return -EFAULT;
> +	}
> +
> +	if (range_contains_range(pos, count, PCI_BASE_ADDRESS_0, sizeof(val32),
> +				 &copy_offset)) {
> +		val32 = cpu_to_le32(PCI_BASE_ADDRESS_SPACE_IO);
> +		if (copy_to_user(buf + copy_offset, &val32, sizeof(val32)))
> +			return -EFAULT;
> +	}
> +
> +	if (range_contains_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16),
> +				 &copy_offset)) {
> +		/* Transitional devices use the PCI subsystem device id as
> +		 * virtio device id, same as legacy driver always did.
> +		 */
> +		val16 = cpu_to_le16(VIRTIO_ID_NET);
> +		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
> +			return -EFAULT;
> +	}
> +
> +	return count;
> +}
> +
> +static ssize_t
> +virtiovf_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
> +		       size_t count, loff_t *ppos)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +	int ret;
> +
> +	if (!count)
> +		return 0;
> +
> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX)
> +		return virtiovf_pci_read_config(core_vdev, buf, count, ppos);
> +
> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
> +		return vfio_pci_core_read(core_vdev, buf, count, ppos);
> +
> +	ret = pm_runtime_resume_and_get(&pdev->dev);
> +	if (ret) {
> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n",
> +				     ret);
> +		return -EIO;
> +	}
> +
> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, buf, count, true);
> +	pm_runtime_put(&pdev->dev);
> +	return ret;
> +}
> +
> +static ssize_t
> +virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
> +			size_t count, loff_t *ppos)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +	int ret;
> +
> +	if (!count)
> +		return 0;
> +
> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
> +		loff_t copy_offset;
> +		u16 cmd;
> +
> +		if (range_contains_range(pos, count, PCI_COMMAND, sizeof(cmd),
> +					 &copy_offset)) {
> +			if (copy_from_user(&cmd, buf + copy_offset, sizeof(cmd)))
> +				return -EFAULT;
> +			virtvdev->pci_cmd_io = (cmd & PCI_COMMAND_IO);
> +		}
> +	}
> +
> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
> +		return vfio_pci_core_write(core_vdev, buf, count, ppos);
> +
> +	ret = pm_runtime_resume_and_get(&pdev->dev);
> +	if (ret) {
> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n", ret);
> +		return -EIO;
> +	}
> +
> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, (char __user *)buf, count, false);
> +	pm_runtime_put(&pdev->dev);
> +	return ret;
> +}
> +
> +static int
> +virtiovf_pci_ioctl_get_region_info(struct vfio_device *core_vdev,
> +				   unsigned int cmd, unsigned long arg)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	unsigned long minsz = offsetofend(struct vfio_region_info, offset);
> +	void __user *uarg = (void __user *)arg;
> +	struct vfio_region_info info = {};
> +
> +	if (copy_from_user(&info, uarg, minsz))
> +		return -EFAULT;
> +
> +	if (info.argsz < minsz)
> +		return -EINVAL;
> +
> +	switch (info.index) {
> +	case VFIO_PCI_BAR0_REGION_INDEX:
> +		info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
> +		info.size = virtvdev->bar0_virtual_buf_size;
> +		info.flags = VFIO_REGION_INFO_FLAG_READ |
> +			     VFIO_REGION_INFO_FLAG_WRITE;
> +		return copy_to_user(uarg, &info, minsz) ? -EFAULT : 0;
> +	default:
> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
> +	}
> +}
> +
> +static long
> +virtiovf_vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
> +			     unsigned long arg)
> +{
> +	switch (cmd) {
> +	case VFIO_DEVICE_GET_REGION_INFO:
> +		return virtiovf_pci_ioctl_get_region_info(core_vdev, cmd, arg);
> +	default:
> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
> +	}
> +}
> +
> +static int
> +virtiovf_set_notify_addr(struct virtiovf_pci_core_device *virtvdev)
> +{
> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
> +	int ret;
> +
> +	/* Setup the BAR where the 'notify' exists to be used by vfio as well
> +	 * This will let us mmap it only once and use it when needed.
> +	 */
> +	ret = vfio_pci_core_setup_barmap(core_device,
> +					 virtvdev->notify_bar);
> +	if (ret)
> +		return ret;
> +
> +	virtvdev->notify_addr = core_device->barmap[virtvdev->notify_bar] +
> +			virtvdev->notify_offset;
> +	return 0;
> +}
> +
> +static int virtiovf_pci_open_device(struct vfio_device *core_vdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct vfio_pci_core_device *vdev = &virtvdev->core_device;
> +	int ret;
> +
> +	ret = vfio_pci_core_enable(vdev);
> +	if (ret)
> +		return ret;
> +
> +	if (virtvdev->bar0_virtual_buf) {
> +		/* upon close_device() the vfio_pci_core_disable() is called
> +		 * and will close all the previous mmaps, so it seems that the
> +		 * valid life cycle for the 'notify' addr is per open/close.
> +		 */
> +		ret = virtiovf_set_notify_addr(virtvdev);
> +		if (ret) {
> +			vfio_pci_core_disable(vdev);
> +			return ret;
> +		}
> +	}
> +
> +	vfio_pci_core_finish_enable(vdev);
> +	return 0;
> +}
> +
> +static void virtiovf_pci_close_device(struct vfio_device *core_vdev)
> +{
> +	vfio_pci_core_close_device(core_vdev);
> +}
> +
> +static int virtiovf_get_device_config_size(unsigned short device)
> +{
> +	switch (device) {
> +	case 0x1041:
> +		/* network card */
> +		return offsetofend(struct virtio_net_config, status);
> +	default:
> +		return 0;
> +	}
> +}
> +
> +static int virtiovf_read_notify_info(struct virtiovf_pci_core_device *virtvdev)
> +{
> +	u64 offset;
> +	int ret;
> +	u8 bar;
> +
> +	ret = virtiovf_cmd_lq_read_notify(virtvdev,
> +				VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM,
> +				&bar, &offset);
> +	if (ret)
> +		return ret;
> +
> +	virtvdev->notify_bar = bar;
> +	virtvdev->notify_offset = offset;
> +	return 0;
> +}
> +
> +static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct pci_dev *pdev;
> +	int ret;
> +
> +	ret = vfio_pci_core_init_dev(core_vdev);
> +	if (ret)
> +		return ret;
> +
> +	pdev = virtvdev->core_device.pdev;
> +	virtvdev->vf_id = pci_iov_vf_id(pdev);
> +	if (virtvdev->vf_id < 0)
> +		return -EINVAL;
> +
> +	ret = virtiovf_read_notify_info(virtvdev);
> +	if (ret)
> +		return ret;
> +
> +	virtvdev->bar0_virtual_buf_size = VIRTIO_LEGACY_IO_BAR_HEADER_LEN +
> +		VIRTIO_LEGACY_IO_BAR_MSIX_HEADER_LEN +
> +		virtiovf_get_device_config_size(pdev->device);
> +	virtvdev->bar0_virtual_buf = kzalloc(virtvdev->bar0_virtual_buf_size,
> +					     GFP_KERNEL);
> +	if (!virtvdev->bar0_virtual_buf)
> +		return -ENOMEM;
> +	mutex_init(&virtvdev->bar_mutex);
> +	return 0;
> +}
> +
> +static void virtiovf_pci_core_release_dev(struct vfio_device *core_vdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +
> +	kfree(virtvdev->bar0_virtual_buf);
> +	vfio_pci_core_release_dev(core_vdev);
> +}
> +
> +static const struct vfio_device_ops virtiovf_acc_vfio_pci_tran_ops = {
> +	.name = "virtio-transitional-vfio-pci",
> +	.init = virtiovf_pci_init_device,
> +	.release = virtiovf_pci_core_release_dev,
> +	.open_device = virtiovf_pci_open_device,
> +	.close_device = virtiovf_pci_close_device,
> +	.ioctl = virtiovf_vfio_pci_core_ioctl,
> +	.read = virtiovf_pci_core_read,
> +	.write = virtiovf_pci_core_write,
> +	.mmap = vfio_pci_core_mmap,
> +	.request = vfio_pci_core_request,
> +	.match = vfio_pci_core_match,
> +	.bind_iommufd = vfio_iommufd_physical_bind,
> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
> +};
> +
> +static const struct vfio_device_ops virtiovf_acc_vfio_pci_ops = {
> +	.name = "virtio-acc-vfio-pci",
> +	.init = vfio_pci_core_init_dev,
> +	.release = vfio_pci_core_release_dev,
> +	.open_device = virtiovf_pci_open_device,
> +	.close_device = virtiovf_pci_close_device,
> +	.ioctl = vfio_pci_core_ioctl,
> +	.device_feature = vfio_pci_core_ioctl_feature,
> +	.read = vfio_pci_core_read,
> +	.write = vfio_pci_core_write,
> +	.mmap = vfio_pci_core_mmap,
> +	.request = vfio_pci_core_request,
> +	.match = vfio_pci_core_match,
> +	.bind_iommufd = vfio_iommufd_physical_bind,
> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
> +};
> +
> +static bool virtiovf_bar0_exists(struct pci_dev *pdev)
> +{
> +	struct resource *res = pdev->resource;
> +
> +	return res->flags ? true : false;
> +}
> +
> +#define VIRTIOVF_USE_ADMIN_CMD_BITMAP \
> +	(BIT_ULL(VIRTIO_ADMIN_CMD_LIST_QUERY) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LIST_USE) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
> +
> +static bool virtiovf_support_legacy_access(struct pci_dev *pdev)
> +{
> +	int buf_size = DIV_ROUND_UP(VIRTIO_ADMIN_MAX_CMD_OPCODE, 64) * 8;
> +	u8 *buf;
> +	int ret;
> +
> +	/* Only virtio-net is supported/tested so far */
> +	if (pdev->device != 0x1041)
> +		return false;
> +
> +	buf = kzalloc(buf_size, GFP_KERNEL);
> +	if (!buf)
> +		return false;
> +
> +	ret = virtiovf_cmd_list_query(pdev, buf, buf_size);
> +	if (ret)
> +		goto end;
> +
> +	if ((le64_to_cpup((__le64 *)buf) & VIRTIOVF_USE_ADMIN_CMD_BITMAP) !=
> +		VIRTIOVF_USE_ADMIN_CMD_BITMAP) {
> +		ret = -EOPNOTSUPP;
> +		goto end;
> +	}
> +
> +	/* confirm the used commands */
> +	memset(buf, 0, buf_size);
> +	*(__le64 *)buf = cpu_to_le64(VIRTIOVF_USE_ADMIN_CMD_BITMAP);
> +	ret = virtiovf_cmd_list_use(pdev, buf, buf_size);
> +
> +end:
> +	kfree(buf);
> +	return ret ? false : true;
> +}
> +
> +static int virtiovf_pci_probe(struct pci_dev *pdev,
> +			      const struct pci_device_id *id)
> +{
> +	const struct vfio_device_ops *ops = &virtiovf_acc_vfio_pci_ops;
> +	struct virtiovf_pci_core_device *virtvdev;
> +	int ret;
> +
> +	if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
> +	    !virtiovf_bar0_exists(pdev) && pdev->msix_cap)

I see this is the reason you set MSIX to true. But I think it's a
misunderstanding - that true means MSIX is enabled by guest, not that
it exists.


> +		ops = &virtiovf_acc_vfio_pci_tran_ops;



Actually, I remember there's a problem with just always doing
transitional and that is VIRTIO_F_ACCESS_PLATFORM - some configs just
break in weird ways as device will go through an iommu. It would be
nicer I think if userspace had the last word on whether it wants to
enable legacy or not, even if hardware supports that.


> +
> +	virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
> +				     &pdev->dev, ops);
> +	if (IS_ERR(virtvdev))
> +		return PTR_ERR(virtvdev);
> +
> +	dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
> +	ret = vfio_pci_core_register_device(&virtvdev->core_device);
> +	if (ret)
> +		goto out;
> +	return 0;
> +out:
> +	vfio_put_device(&virtvdev->core_device.vdev);
> +	return ret;
> +}
> +
> +static void virtiovf_pci_remove(struct pci_dev *pdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
> +
> +	vfio_pci_core_unregister_device(&virtvdev->core_device);
> +	vfio_put_device(&virtvdev->core_device.vdev);
> +}
> +
> +static const struct pci_device_id virtiovf_pci_table[] = {
> +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, PCI_ANY_ID) },
> +	{}
> +};
> +
> +MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
> +
> +static struct pci_driver virtiovf_pci_driver = {
> +	.name = KBUILD_MODNAME,
> +	.id_table = virtiovf_pci_table,
> +	.probe = virtiovf_pci_probe,
> +	.remove = virtiovf_pci_remove,
> +	.err_handler = &vfio_pci_core_err_handlers,
> +	.driver_managed_dma = true,
> +};
> +
> +module_pci_driver(virtiovf_pci_driver);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
> +MODULE_DESCRIPTION(
> +	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO device family");
> -- 
> 2.27.0


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-22 15:39                             ` Michael S. Tsirkin
  (?)
@ 2023-09-22 16:19                             ` Jason Gunthorpe
  2023-09-25 18:16                                 ` Michael S. Tsirkin
  -1 siblings, 1 reply; 321+ messages in thread
From: Jason Gunthorpe @ 2023-09-22 16:19 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, Alex Williamson, Yishai Hadas, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Fri, Sep 22, 2023 at 11:39:19AM -0400, Michael S. Tsirkin wrote:
> On Fri, Sep 22, 2023 at 09:25:01AM -0300, Jason Gunthorpe wrote:
> > On Fri, Sep 22, 2023 at 11:02:50AM +0800, Jason Wang wrote:
> > > On Fri, Sep 22, 2023 at 3:53 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
> > > >
> > > > On Thu, Sep 21, 2023 at 03:34:03PM -0400, Michael S. Tsirkin wrote:
> > > >
> > > > > that's easy/practical.  If instead VDPA gives the same speed with just
> > > > > shadow vq then keeping this hack in vfio seems like less of a problem.
> > > > > Finally if VDPA is faster then maybe you will reconsider using it ;)
> > > >
> > > > It is not all about the speed.
> > > >
> > > > VDPA presents another large and complex software stack in the
> > > > hypervisor that can be eliminated by simply using VFIO.
> > > 
> > > vDPA supports standard virtio devices so how did you define
> > > complexity?
> > 
> > As I said, VFIO is already required for other devices in these VMs. So
> > anything incremental over base-line vfio-pci is complexity to
> > minimize.
> > 
> > Everything vdpa does is either redundant or unnecessary compared to
> > VFIO in these environments.
> > 
> > Jason
> 
> Yes but you know. There are all kind of environments.  I guess you
> consider yours the most mainstream and important, and are sure it will
> always stay like this.  But if there's a driver that does what you need
> then you use that.

Come on, you are the one saying we cannot do things in the best way
possible because you want your way of doing things to be the only way
allowed. Which of us thinks "yours the most mainstream and important" ??

I'm not telling you to throw away VPDA, I'm saying there are
legimitate real world use cases where VFIO is the appropriate
interface, not VDPA.

I want choice, not dogmatic exclusion that there is Only One True Way.

> You really should be explaining what vdpa *does not* do that you
> need.

I think I've done that enough, but if you have been following my
explanation you should see that the entire point of this design is to
allow a virtio device to be created inside a DPU to a specific
detailed specification (eg an AWS virtio-net device, for instance)

The implementation is in the DPU, and only the DPU.

At the end of the day VDPA uses mediation and creates some
RedHat/VDPA/Qemu virtio-net device in the guest. It is emphatically
NOT a perfect recreation of the "AWS virtio-net" we started out with.

It entirely fails to achieve the most important thing it needs to do!

Yishai will rework the series with your remarks, we can look again on
v2, thanks for all the input!

Jason

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-22 15:40                                     ` Michael S. Tsirkin
  (?)
@ 2023-09-22 16:22                                     ` Jason Gunthorpe
  2023-09-25 17:36                                         ` Michael S. Tsirkin
  -1 siblings, 1 reply; 321+ messages in thread
From: Jason Gunthorpe @ 2023-09-22 16:22 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Parav Pandit, Jason Wang, Alex Williamson, Yishai Hadas, kvm,
	virtualization, Feng Liu, Jiri Pirko, kevin.tian, joao.m.martins,
	Leon Romanovsky, Maor Gottlieb

On Fri, Sep 22, 2023 at 11:40:58AM -0400, Michael S. Tsirkin wrote:
> On Fri, Sep 22, 2023 at 12:15:34PM -0300, Jason Gunthorpe wrote:
> > On Fri, Sep 22, 2023 at 11:13:18AM -0400, Michael S. Tsirkin wrote:
> > > On Fri, Sep 22, 2023 at 12:25:06PM +0000, Parav Pandit wrote:
> > > > 
> > > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > > Sent: Friday, September 22, 2023 5:53 PM
> > > > 
> > > > 
> > > > > > And what's more, using MMIO BAR0 then it can work for legacy.
> > > > > 
> > > > > Oh? How? Our team didn't think so.
> > > > 
> > > > It does not. It was already discussed.
> > > > The device reset in legacy is not synchronous.
> > > > The drivers do not wait for reset to complete; it was written for the sw backend.
> > > > Hence MMIO BAR0 is not the best option in real implementations.
> > > 
> > > Or maybe they made it synchronous in hardware, that's all.
> > > After all same is true for the IO BAR0 e.g. for the PF: IO writes
> > > are posted anyway.
> > 
> > IO writes are not posted in PCI.
> 
> Aha, I was confused. Thanks for the correction. I guess you just buffer
> subsequent transactions while reset is going on and reset quickly enough
> for it to be seemless then?

From a hardware perspective the CPU issues an non-posted IO write and
then it stops processing until the far side returns an IO completion.

Using that you can emulate what the SW virtio model did and delay the
CPU from restarting until the reset is completed.

Since MMIO is always posted, this is not possible to emulate directly
using MMIO.

Converting IO into non-posted admin commands is a fairly close
recreation to what actual HW would do.

Jason

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 07/11] virtio-pci: Introduce admin commands
  2023-09-21 12:40   ` Yishai Hadas
@ 2023-09-24  5:18     ` kernel test robot
  -1 siblings, 0 replies; 321+ messages in thread
From: kernel test robot @ 2023-09-24  5:18 UTC (permalink / raw)
  To: Yishai Hadas, alex.williamson, mst, jasowang, jgg
  Cc: oe-kbuild-all, kvm, virtualization, parav, feliu, jiri,
	kevin.tian, joao.m.martins, leonro, yishaih, maorg

Hi Yishai,

kernel test robot noticed the following build errors:

[auto build test ERROR on awilliam-vfio/for-linus]
[also build test ERROR on mst-vhost/linux-next linus/master v6.6-rc2 next-20230921]
[cannot apply to awilliam-vfio/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Yishai-Hadas/virtio-pci-Use-virtio-pci-device-layer-vq-info-instead-of-generic-one/20230922-062611
base:   https://github.com/awilliam/linux-vfio.git for-linus
patch link:    https://lore.kernel.org/r/20230921124040.145386-8-yishaih%40nvidia.com
patch subject: [PATCH vfio 07/11] virtio-pci: Introduce admin commands
config: i386-randconfig-012-20230924 (https://download.01.org/0day-ci/archive/20230924/202309241353.ykr3cC2K-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230924/202309241353.ykr3cC2K-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202309241353.ykr3cC2K-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from <command-line>:
>> ./usr/include/linux/virtio_pci.h:250:9: error: unknown type name 'u8'
     250 |         u8 offset; /* Starting offset of the register(s) to write. */
         |         ^~
   ./usr/include/linux/virtio_pci.h:251:9: error: unknown type name 'u8'
     251 |         u8 reserved[7];
         |         ^~
   ./usr/include/linux/virtio_pci.h:252:9: error: unknown type name 'u8'
     252 |         u8 registers[];
         |         ^~
   ./usr/include/linux/virtio_pci.h:256:9: error: unknown type name 'u8'
     256 |         u8 offset; /* Starting offset of the register(s) to read. */
         |         ^~
   ./usr/include/linux/virtio_pci.h:266:9: error: unknown type name 'u8'
     266 |         u8 flags; /* 0 = end of list, 1 = owner device, 2 = member device */
         |         ^~
   ./usr/include/linux/virtio_pci.h:267:9: error: unknown type name 'u8'
     267 |         u8 bar; /* BAR of the member or the owner device */
         |         ^~
   ./usr/include/linux/virtio_pci.h:268:9: error: unknown type name 'u8'
     268 |         u8 padding[6];
         |         ^~

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 07/11] virtio-pci: Introduce admin commands
@ 2023-09-24  5:18     ` kernel test robot
  0 siblings, 0 replies; 321+ messages in thread
From: kernel test robot @ 2023-09-24  5:18 UTC (permalink / raw)
  To: Yishai Hadas, alex.williamson, mst, jasowang, jgg
  Cc: kvm, maorg, oe-kbuild-all, virtualization, jiri, leonro

Hi Yishai,

kernel test robot noticed the following build errors:

[auto build test ERROR on awilliam-vfio/for-linus]
[also build test ERROR on mst-vhost/linux-next linus/master v6.6-rc2 next-20230921]
[cannot apply to awilliam-vfio/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Yishai-Hadas/virtio-pci-Use-virtio-pci-device-layer-vq-info-instead-of-generic-one/20230922-062611
base:   https://github.com/awilliam/linux-vfio.git for-linus
patch link:    https://lore.kernel.org/r/20230921124040.145386-8-yishaih%40nvidia.com
patch subject: [PATCH vfio 07/11] virtio-pci: Introduce admin commands
config: i386-randconfig-012-20230924 (https://download.01.org/0day-ci/archive/20230924/202309241353.ykr3cC2K-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230924/202309241353.ykr3cC2K-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202309241353.ykr3cC2K-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from <command-line>:
>> ./usr/include/linux/virtio_pci.h:250:9: error: unknown type name 'u8'
     250 |         u8 offset; /* Starting offset of the register(s) to write. */
         |         ^~
   ./usr/include/linux/virtio_pci.h:251:9: error: unknown type name 'u8'
     251 |         u8 reserved[7];
         |         ^~
   ./usr/include/linux/virtio_pci.h:252:9: error: unknown type name 'u8'
     252 |         u8 registers[];
         |         ^~
   ./usr/include/linux/virtio_pci.h:256:9: error: unknown type name 'u8'
     256 |         u8 offset; /* Starting offset of the register(s) to read. */
         |         ^~
   ./usr/include/linux/virtio_pci.h:266:9: error: unknown type name 'u8'
     266 |         u8 flags; /* 0 = end of list, 1 = owner device, 2 = member device */
         |         ^~
   ./usr/include/linux/virtio_pci.h:267:9: error: unknown type name 'u8'
     267 |         u8 bar; /* BAR of the member or the owner device */
         |         ^~
   ./usr/include/linux/virtio_pci.h:268:9: error: unknown type name 'u8'
     268 |         u8 padding[6];
         |         ^~

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-22 12:25                               ` Parav Pandit via Virtualization
@ 2023-09-25  2:30                                 ` Jason Wang
  -1 siblings, 0 replies; 321+ messages in thread
From: Jason Wang @ 2023-09-25  2:30 UTC (permalink / raw)
  To: Parav Pandit
  Cc: kvm, Michael S. Tsirkin, Maor Gottlieb, virtualization,
	Jason Gunthorpe, Jiri Pirko, Leon Romanovsky

On Fri, Sep 22, 2023 at 8:25 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Friday, September 22, 2023 5:53 PM
>
>
> > > And what's more, using MMIO BAR0 then it can work for legacy.
> >
> > Oh? How? Our team didn't think so.
>
> It does not. It was already discussed.
> The device reset in legacy is not synchronous.

How do you know this?

> The drivers do not wait for reset to complete; it was written for the sw backend.

Do you see there's a flush after reset in the legacy driver?

static void vp_reset(struct virtio_device *vdev)
{
        struct virtio_pci_device *vp_dev = to_vp_device(vdev);
        /* 0 status means a reset. */
        vp_legacy_set_status(&vp_dev->ldev, 0);
        /* Flush out the status write, and flush in device writes,
         * including MSi-X interrupts, if any. */
        vp_legacy_get_status(&vp_dev->ldev);
        /* Flush pending VQ/configuration callbacks. */
        vp_synchronize_vectors(vdev);
}

Thanks



> Hence MMIO BAR0 is not the best option in real implementations.
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-25  2:30                                 ` Jason Wang
  0 siblings, 0 replies; 321+ messages in thread
From: Jason Wang @ 2023-09-25  2:30 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Jason Gunthorpe, Michael S. Tsirkin, Alex Williamson,
	Yishai Hadas, kvm, virtualization, Feng Liu, Jiri Pirko,
	kevin.tian, joao.m.martins, Leon Romanovsky, Maor Gottlieb

On Fri, Sep 22, 2023 at 8:25 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Friday, September 22, 2023 5:53 PM
>
>
> > > And what's more, using MMIO BAR0 then it can work for legacy.
> >
> > Oh? How? Our team didn't think so.
>
> It does not. It was already discussed.
> The device reset in legacy is not synchronous.

How do you know this?

> The drivers do not wait for reset to complete; it was written for the sw backend.

Do you see there's a flush after reset in the legacy driver?

static void vp_reset(struct virtio_device *vdev)
{
        struct virtio_pci_device *vp_dev = to_vp_device(vdev);
        /* 0 status means a reset. */
        vp_legacy_set_status(&vp_dev->ldev, 0);
        /* Flush out the status write, and flush in device writes,
         * including MSi-X interrupts, if any. */
        vp_legacy_get_status(&vp_dev->ldev);
        /* Flush pending VQ/configuration callbacks. */
        vp_synchronize_vectors(vdev);
}

Thanks



> Hence MMIO BAR0 is not the best option in real implementations.
>


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-22 12:11                     ` Jason Gunthorpe
@ 2023-09-25  2:34                         ` Jason Wang
  0 siblings, 0 replies; 321+ messages in thread
From: Jason Wang @ 2023-09-25  2:34 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Michael S. Tsirkin, Yishai Hadas, alex.williamson, kvm,
	virtualization, parav, feliu, jiri, kevin.tian, joao.m.martins,
	leonro, maorg

On Fri, Sep 22, 2023 at 8:11 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Fri, Sep 22, 2023 at 11:01:23AM +0800, Jason Wang wrote:
>
> > > Even when it does, there is no real use case to live migrate a
> > > virtio-net function from, say, AWS to GCP.
> >
> > It can happen inside a single cloud vendor. For some reasons, DPU must
> > be purchased from different vendors. And vDPA has been used in that
> > case.
>
> Nope, you misunderstand the DPU scenario.
>
> Look at something like vmware DPU enablement. vmware runs the software
> side of the DPU and all their supported DPU HW, from every vendor,
> generates the same PCI functions on the x86. They are the same because
> the same software on the DPU side is creating them.
>
> There is no reason to put a mediation layer in the x86 if you also
> control the DPU.
>
> Cloud vendors will similarly use DPUs to create a PCI functions that
> meet the cloud vendor's internal specification.

This can only work if:

1) the internal specification has finer garin than virtio spec
2) so it can define what is not implemented in the virtio spec (like
migration and compatibility)

All of the above doesn't seem to be possible or realistic now, and it
actually has a risk to be not compatible with virtio spec. In the
future when virtio has live migration supported, they want to be able
to migrate between virtio and vDPA.

As I said, vDPA has been used for cross vendor live migration for a while.

> Regardless of DPU
> vendor.
>
> Fundamentally if you control the DPU SW and the hypervisor software
> you do not need hypervisor meditation because everything you could do
> in hypervisor mediation can just be done in the DPU. Putting it in the
> DPU is better in every regard.
>
> So, as I keep saying, in this scenario the goal is no mediation in the
> hypervisor.

That's pretty fine, but I don't think trapping + relying is not
mediation. Does it really matter what happens after trapping?

> It is pointless, everything you think you need to do there
> is actually already being done in the DPU.

Well, migration or even Qemu could be offloaded to DPU as well. If
that's the direction that's pretty fine.

Thanks

>
> Jason
>


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-25  2:34                         ` Jason Wang
  0 siblings, 0 replies; 321+ messages in thread
From: Jason Wang @ 2023-09-25  2:34 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvm, Michael S. Tsirkin, maorg, virtualization, jiri, leonro

On Fri, Sep 22, 2023 at 8:11 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Fri, Sep 22, 2023 at 11:01:23AM +0800, Jason Wang wrote:
>
> > > Even when it does, there is no real use case to live migrate a
> > > virtio-net function from, say, AWS to GCP.
> >
> > It can happen inside a single cloud vendor. For some reasons, DPU must
> > be purchased from different vendors. And vDPA has been used in that
> > case.
>
> Nope, you misunderstand the DPU scenario.
>
> Look at something like vmware DPU enablement. vmware runs the software
> side of the DPU and all their supported DPU HW, from every vendor,
> generates the same PCI functions on the x86. They are the same because
> the same software on the DPU side is creating them.
>
> There is no reason to put a mediation layer in the x86 if you also
> control the DPU.
>
> Cloud vendors will similarly use DPUs to create a PCI functions that
> meet the cloud vendor's internal specification.

This can only work if:

1) the internal specification has finer garin than virtio spec
2) so it can define what is not implemented in the virtio spec (like
migration and compatibility)

All of the above doesn't seem to be possible or realistic now, and it
actually has a risk to be not compatible with virtio spec. In the
future when virtio has live migration supported, they want to be able
to migrate between virtio and vDPA.

As I said, vDPA has been used for cross vendor live migration for a while.

> Regardless of DPU
> vendor.
>
> Fundamentally if you control the DPU SW and the hypervisor software
> you do not need hypervisor meditation because everything you could do
> in hypervisor mediation can just be done in the DPU. Putting it in the
> DPU is better in every regard.
>
> So, as I keep saying, in this scenario the goal is no mediation in the
> hypervisor.

That's pretty fine, but I don't think trapping + relying is not
mediation. Does it really matter what happens after trapping?

> It is pointless, everything you think you need to do there
> is actually already being done in the DPU.

Well, migration or even Qemu could be offloaded to DPU as well. If
that's the direction that's pretty fine.

Thanks

>
> Jason
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 07/11] virtio-pci: Introduce admin commands
  2023-09-21 12:40   ` Yishai Hadas
@ 2023-09-25  3:18     ` kernel test robot
  -1 siblings, 0 replies; 321+ messages in thread
From: kernel test robot @ 2023-09-25  3:18 UTC (permalink / raw)
  To: Yishai Hadas, alex.williamson, mst, jasowang, jgg
  Cc: llvm, oe-kbuild-all, kvm, virtualization, parav, feliu, jiri,
	kevin.tian, joao.m.martins, leonro, yishaih, maorg

Hi Yishai,

kernel test robot noticed the following build warnings:

[auto build test WARNING on awilliam-vfio/for-linus]
[also build test WARNING on linus/master v6.6-rc3 next-20230921]
[cannot apply to awilliam-vfio/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Yishai-Hadas/virtio-pci-Use-virtio-pci-device-layer-vq-info-instead-of-generic-one/20230922-062611
base:   https://github.com/awilliam/linux-vfio.git for-linus
patch link:    https://lore.kernel.org/r/20230921124040.145386-8-yishaih%40nvidia.com
patch subject: [PATCH vfio 07/11] virtio-pci: Introduce admin commands
config: x86_64-rhel-8.3-rust (https://download.01.org/0day-ci/archive/20230925/202309251120.rWbiAZYM-lkp@intel.com/config)
compiler: clang version 16.0.4 (https://github.com/llvm/llvm-project.git ae42196bc493ffe877a7e3dff8be32035dea4d07)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230925/202309251120.rWbiAZYM-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202309251120.rWbiAZYM-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from drivers/virtio/virtio_pci_modern_dev.c:3:
   In file included from include/linux/virtio_pci_modern.h:6:
>> include/uapi/linux/virtio_pci.h:270:4: warning: attribute '__packed__' is ignored, place it after "struct" to apply attribute to type declaration [-Wignored-attributes]
   }; __packed
      ^
   include/linux/compiler_attributes.h:304:56: note: expanded from macro '__packed'
   #define __packed                        __attribute__((__packed__))
                                                          ^
   1 warning generated.


vim +270 include/uapi/linux/virtio_pci.h

   264	
   265	struct virtio_admin_cmd_notify_info_data {
   266		u8 flags; /* 0 = end of list, 1 = owner device, 2 = member device */
   267		u8 bar; /* BAR of the member or the owner device */
   268		u8 padding[6];
   269		__le64 offset; /* Offset within bar. */
 > 270	}; __packed
   271	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 07/11] virtio-pci: Introduce admin commands
@ 2023-09-25  3:18     ` kernel test robot
  0 siblings, 0 replies; 321+ messages in thread
From: kernel test robot @ 2023-09-25  3:18 UTC (permalink / raw)
  To: Yishai Hadas, alex.williamson, mst, jasowang, jgg
  Cc: kvm, maorg, llvm, virtualization, jiri, oe-kbuild-all, leonro

Hi Yishai,

kernel test robot noticed the following build warnings:

[auto build test WARNING on awilliam-vfio/for-linus]
[also build test WARNING on linus/master v6.6-rc3 next-20230921]
[cannot apply to awilliam-vfio/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Yishai-Hadas/virtio-pci-Use-virtio-pci-device-layer-vq-info-instead-of-generic-one/20230922-062611
base:   https://github.com/awilliam/linux-vfio.git for-linus
patch link:    https://lore.kernel.org/r/20230921124040.145386-8-yishaih%40nvidia.com
patch subject: [PATCH vfio 07/11] virtio-pci: Introduce admin commands
config: x86_64-rhel-8.3-rust (https://download.01.org/0day-ci/archive/20230925/202309251120.rWbiAZYM-lkp@intel.com/config)
compiler: clang version 16.0.4 (https://github.com/llvm/llvm-project.git ae42196bc493ffe877a7e3dff8be32035dea4d07)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230925/202309251120.rWbiAZYM-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202309251120.rWbiAZYM-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from drivers/virtio/virtio_pci_modern_dev.c:3:
   In file included from include/linux/virtio_pci_modern.h:6:
>> include/uapi/linux/virtio_pci.h:270:4: warning: attribute '__packed__' is ignored, place it after "struct" to apply attribute to type declaration [-Wignored-attributes]
   }; __packed
      ^
   include/linux/compiler_attributes.h:304:56: note: expanded from macro '__packed'
   #define __packed                        __attribute__((__packed__))
                                                          ^
   1 warning generated.


vim +270 include/uapi/linux/virtio_pci.h

   264	
   265	struct virtio_admin_cmd_notify_info_data {
   266		u8 flags; /* 0 = end of list, 1 = owner device, 2 = member device */
   267		u8 bar; /* BAR of the member or the owner device */
   268		u8 padding[6];
   269		__le64 offset; /* Offset within bar. */
 > 270	}; __packed
   271	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 20:55                     ` Michael S. Tsirkin
@ 2023-09-25  4:44                       ` Zhu, Lingshan
  -1 siblings, 0 replies; 321+ messages in thread
From: Zhu, Lingshan @ 2023-09-25  4:44 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Gunthorpe
  Cc: kvm, maorg, virtualization, jiri, leonro



On 9/22/2023 4:55 AM, Michael S. Tsirkin wrote:
> On Thu, Sep 21, 2023 at 04:51:15PM -0300, Jason Gunthorpe wrote:
>> On Thu, Sep 21, 2023 at 03:17:25PM -0400, Michael S. Tsirkin wrote:
>>> On Thu, Sep 21, 2023 at 03:39:26PM -0300, Jason Gunthorpe wrote:
>>>>> What is the huge amount of work am I asking to do?
>>>> You are asking us to invest in the complexity of VDPA through out
>>>> (keep it working, keep it secure, invest time in deploying and
>>>> debugging in the field)
>>> I'm asking you to do nothing of the kind - I am saying that this code
>>> will have to be duplicated in vdpa,
>> Why would that be needed?
> For the same reason it was developed in the 1st place - presumably
> because it adds efficient legacy guest support with the right card?
> I get it, you specifically don't need VDPA functionality, but I don't
> see why is this universal, or common.
>
>
>>> and so I am asking what exactly is missing to just keep it all
>>> there.
>> VFIO. Seriously, we don't want unnecessary mediation in this path at
>> all.
> But which mediation is necessary is exactly up to the specific use-case.
> I have no idea why would you want all of VFIO to e.g. pass access to
> random config registers to the guest when it's a virtio device and the
> config registers are all nicely listed in the spec. I know nvidia
> hardware is so great, it has super robust cards with less security holes
> than the vdpa driver, but I very much doubt this is universal for all
> virtio offload cards.
I agree with MST.
>>> note I didn't ask you to add iommufd to vdpa though that would be
>>> nice ;)
>> I did once send someone to look.. It didn't succeed :(
>>
>> Jason
> Pity. Maybe there's some big difficulty blocking this? I'd like to know.
>


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-25  4:44                       ` Zhu, Lingshan
  0 siblings, 0 replies; 321+ messages in thread
From: Zhu, Lingshan @ 2023-09-25  4:44 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Gunthorpe
  Cc: leonro, jiri, maorg, kvm, virtualization



On 9/22/2023 4:55 AM, Michael S. Tsirkin wrote:
> On Thu, Sep 21, 2023 at 04:51:15PM -0300, Jason Gunthorpe wrote:
>> On Thu, Sep 21, 2023 at 03:17:25PM -0400, Michael S. Tsirkin wrote:
>>> On Thu, Sep 21, 2023 at 03:39:26PM -0300, Jason Gunthorpe wrote:
>>>>> What is the huge amount of work am I asking to do?
>>>> You are asking us to invest in the complexity of VDPA through out
>>>> (keep it working, keep it secure, invest time in deploying and
>>>> debugging in the field)
>>> I'm asking you to do nothing of the kind - I am saying that this code
>>> will have to be duplicated in vdpa,
>> Why would that be needed?
> For the same reason it was developed in the 1st place - presumably
> because it adds efficient legacy guest support with the right card?
> I get it, you specifically don't need VDPA functionality, but I don't
> see why is this universal, or common.
>
>
>>> and so I am asking what exactly is missing to just keep it all
>>> there.
>> VFIO. Seriously, we don't want unnecessary mediation in this path at
>> all.
> But which mediation is necessary is exactly up to the specific use-case.
> I have no idea why would you want all of VFIO to e.g. pass access to
> random config registers to the guest when it's a virtio device and the
> config registers are all nicely listed in the spec. I know nvidia
> hardware is so great, it has super robust cards with less security holes
> than the vdpa driver, but I very much doubt this is universal for all
> virtio offload cards.
I agree with MST.
>>> note I didn't ask you to add iommufd to vdpa though that would be
>>> nice ;)
>> I did once send someone to look.. It didn't succeed :(
>>
>> Jason
> Pity. Maybe there's some big difficulty blocking this? I'd like to know.
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* RE: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-25  2:30                                 ` Jason Wang
@ 2023-09-25  8:26                                   ` Parav Pandit via Virtualization
  -1 siblings, 0 replies; 321+ messages in thread
From: Parav Pandit @ 2023-09-25  8:26 UTC (permalink / raw)
  To: Jason Wang
  Cc: Jason Gunthorpe, Michael S. Tsirkin, Alex Williamson,
	Yishai Hadas, kvm, virtualization, Feng Liu, Jiri Pirko,
	kevin.tian, joao.m.martins, Leon Romanovsky, Maor Gottlieb



> From: Jason Wang <jasowang@redhat.com>
> Sent: Monday, September 25, 2023 8:00 AM
> 
> On Fri, Sep 22, 2023 at 8:25 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Friday, September 22, 2023 5:53 PM
> >
> >
> > > > And what's more, using MMIO BAR0 then it can work for legacy.
> > >
> > > Oh? How? Our team didn't think so.
> >
> > It does not. It was already discussed.
> > The device reset in legacy is not synchronous.
> 
> How do you know this?
>
Not sure the motivation of same discussion done in the OASIS with you and others in past.

Anyways, please find the answer below.

About reset,
The legacy device specification has not enforced below cited 1.0 driver requirement of 1.0.

"The driver SHOULD consider a driver-initiated reset complete when it reads device status as 0."
 
[1] https://ozlabs.org/~rusty/virtio-spec/virtio-0.9.5.pdf

> > The drivers do not wait for reset to complete; it was written for the sw
> backend.
> 
> Do you see there's a flush after reset in the legacy driver?
> 
Yes. it only flushes the write by reading it. The driver does not get _wait_ for the reset to complete within the device like above.

Please see the reset flow of 1.x device as below.
In fact the comment of the 1.x device also needs to be updated to indicate that driver need to wait for the device to finish the reset.
I will send separate patch for improving this comment of vp_reset() to match the spec.

static void vp_reset(struct virtio_device *vdev)
{
        struct virtio_pci_device *vp_dev = to_vp_device(vdev);
        struct virtio_pci_modern_device *mdev = &vp_dev->mdev;

        /* 0 status means a reset. */
        vp_modern_set_status(mdev, 0);
        /* After writing 0 to device_status, the driver MUST wait for a read of
         * device_status to return 0 before reinitializing the device.
         * This will flush out the status write, and flush in device writes,
         * including MSI-X interrupts, if any.
         */
        while (vp_modern_get_status(mdev))
                msleep(1);
        /* Flush pending VQ/configuration callbacks. */
        vp_synchronize_vectors(vdev);
}


> static void vp_reset(struct virtio_device *vdev) {
>         struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>         /* 0 status means a reset. */
>         vp_legacy_set_status(&vp_dev->ldev, 0);
>         /* Flush out the status write, and flush in device writes,
>          * including MSi-X interrupts, if any. */
>         vp_legacy_get_status(&vp_dev->ldev);
>         /* Flush pending VQ/configuration callbacks. */
>         vp_synchronize_vectors(vdev);
> }
> 
> Thanks
> 
> 
> 
> > Hence MMIO BAR0 is not the best option in real implementations.
> >


^ permalink raw reply	[flat|nested] 321+ messages in thread

* RE: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-25  8:26                                   ` Parav Pandit via Virtualization
  0 siblings, 0 replies; 321+ messages in thread
From: Parav Pandit via Virtualization @ 2023-09-25  8:26 UTC (permalink / raw)
  To: Jason Wang
  Cc: kvm, Michael S. Tsirkin, Maor Gottlieb, virtualization,
	Jason Gunthorpe, Jiri Pirko, Leon Romanovsky



> From: Jason Wang <jasowang@redhat.com>
> Sent: Monday, September 25, 2023 8:00 AM
> 
> On Fri, Sep 22, 2023 at 8:25 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Friday, September 22, 2023 5:53 PM
> >
> >
> > > > And what's more, using MMIO BAR0 then it can work for legacy.
> > >
> > > Oh? How? Our team didn't think so.
> >
> > It does not. It was already discussed.
> > The device reset in legacy is not synchronous.
> 
> How do you know this?
>
Not sure the motivation of same discussion done in the OASIS with you and others in past.

Anyways, please find the answer below.

About reset,
The legacy device specification has not enforced below cited 1.0 driver requirement of 1.0.

"The driver SHOULD consider a driver-initiated reset complete when it reads device status as 0."
 
[1] https://ozlabs.org/~rusty/virtio-spec/virtio-0.9.5.pdf

> > The drivers do not wait for reset to complete; it was written for the sw
> backend.
> 
> Do you see there's a flush after reset in the legacy driver?
> 
Yes. it only flushes the write by reading it. The driver does not get _wait_ for the reset to complete within the device like above.

Please see the reset flow of 1.x device as below.
In fact the comment of the 1.x device also needs to be updated to indicate that driver need to wait for the device to finish the reset.
I will send separate patch for improving this comment of vp_reset() to match the spec.

static void vp_reset(struct virtio_device *vdev)
{
        struct virtio_pci_device *vp_dev = to_vp_device(vdev);
        struct virtio_pci_modern_device *mdev = &vp_dev->mdev;

        /* 0 status means a reset. */
        vp_modern_set_status(mdev, 0);
        /* After writing 0 to device_status, the driver MUST wait for a read of
         * device_status to return 0 before reinitializing the device.
         * This will flush out the status write, and flush in device writes,
         * including MSI-X interrupts, if any.
         */
        while (vp_modern_get_status(mdev))
                msleep(1);
        /* Flush pending VQ/configuration callbacks. */
        vp_synchronize_vectors(vdev);
}


> static void vp_reset(struct virtio_device *vdev) {
>         struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>         /* 0 status means a reset. */
>         vp_legacy_set_status(&vp_dev->ldev, 0);
>         /* Flush out the status write, and flush in device writes,
>          * including MSi-X interrupts, if any. */
>         vp_legacy_get_status(&vp_dev->ldev);
>         /* Flush pending VQ/configuration callbacks. */
>         vp_synchronize_vectors(vdev);
> }
> 
> Thanks
> 
> 
> 
> > Hence MMIO BAR0 is not the best option in real implementations.
> >

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-25  2:34                         ` Jason Wang
  (?)
@ 2023-09-25 12:26                         ` Jason Gunthorpe
  2023-09-25 19:44                             ` Michael S. Tsirkin
  2023-09-26  4:37                             ` Jason Wang
  -1 siblings, 2 replies; 321+ messages in thread
From: Jason Gunthorpe @ 2023-09-25 12:26 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Yishai Hadas, alex.williamson, kvm,
	virtualization, parav, feliu, jiri, kevin.tian, joao.m.martins,
	leonro, maorg

On Mon, Sep 25, 2023 at 10:34:54AM +0800, Jason Wang wrote:

> > Cloud vendors will similarly use DPUs to create a PCI functions that
> > meet the cloud vendor's internal specification.
> 
> This can only work if:
> 
> 1) the internal specification has finer garin than virtio spec
> 2) so it can define what is not implemented in the virtio spec (like
> migration and compatibility)

Yes, and that is what is happening. Realistically the "spec" isjust a
piece of software that the Cloud vendor owns which is simply ported to
multiple DPU vendors.

It is the same as VDPA. If VDPA can make multiple NIC vendors
consistent then why do you have a hard time believing we can do the
same thing just on the ARM side of a DPU?

> All of the above doesn't seem to be possible or realistic now, and it
> actually has a risk to be not compatible with virtio spec. In the
> future when virtio has live migration supported, they want to be able
> to migrate between virtio and vDPA.

Well, that is for the spec to design. 

> > So, as I keep saying, in this scenario the goal is no mediation in the
> > hypervisor.
> 
> That's pretty fine, but I don't think trapping + relying is not
> mediation. Does it really matter what happens after trapping?

It is not mediation in the sense that the kernel driver does not in
any way make decisions on the behavior of the device. It simply
transforms an IO operation into a device command and relays it to the
device. The device still fully controls its own behavior.

VDPA is very different from this. You might call them both mediation,
sure, but then you need another word to describe the additional
changes VPDA is doing.

> > It is pointless, everything you think you need to do there
> > is actually already being done in the DPU.
> 
> Well, migration or even Qemu could be offloaded to DPU as well. If
> that's the direction that's pretty fine.

That's silly, of course qemu/kvm can't run in the DPU.

However, we can empty qemu and the hypervisor out so all it does is
run kvm and run vfio. In this model the DPU does all the OVS, storage,
"VPDA", etc. qemu is just a passive relay of the DPU PCI functions
into VM's vPCI functions.

So, everything VDPA was doing in the environment is migrated into the
DPU.

In this model the DPU is an extension of the hypervisor/qemu
environment and we shift code from x86 side to arm side to increase
security, save power and increase total system performance.

Jason

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-22 16:22                                     ` Jason Gunthorpe
@ 2023-09-25 17:36                                         ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-25 17:36 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Parav Pandit, Jason Wang, Alex Williamson, Yishai Hadas, kvm,
	virtualization, Feng Liu, Jiri Pirko, kevin.tian, joao.m.martins,
	Leon Romanovsky, Maor Gottlieb

On Fri, Sep 22, 2023 at 01:22:33PM -0300, Jason Gunthorpe wrote:
> On Fri, Sep 22, 2023 at 11:40:58AM -0400, Michael S. Tsirkin wrote:
> > On Fri, Sep 22, 2023 at 12:15:34PM -0300, Jason Gunthorpe wrote:
> > > On Fri, Sep 22, 2023 at 11:13:18AM -0400, Michael S. Tsirkin wrote:
> > > > On Fri, Sep 22, 2023 at 12:25:06PM +0000, Parav Pandit wrote:
> > > > > 
> > > > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > > > Sent: Friday, September 22, 2023 5:53 PM
> > > > > 
> > > > > 
> > > > > > > And what's more, using MMIO BAR0 then it can work for legacy.
> > > > > > 
> > > > > > Oh? How? Our team didn't think so.
> > > > > 
> > > > > It does not. It was already discussed.
> > > > > The device reset in legacy is not synchronous.
> > > > > The drivers do not wait for reset to complete; it was written for the sw backend.
> > > > > Hence MMIO BAR0 is not the best option in real implementations.
> > > > 
> > > > Or maybe they made it synchronous in hardware, that's all.
> > > > After all same is true for the IO BAR0 e.g. for the PF: IO writes
> > > > are posted anyway.
> > > 
> > > IO writes are not posted in PCI.
> > 
> > Aha, I was confused. Thanks for the correction. I guess you just buffer
> > subsequent transactions while reset is going on and reset quickly enough
> > for it to be seemless then?
> 
> >From a hardware perspective the CPU issues an non-posted IO write and
> then it stops processing until the far side returns an IO completion.
> 
> Using that you can emulate what the SW virtio model did and delay the
> CPU from restarting until the reset is completed.
> 
> Since MMIO is always posted, this is not possible to emulate directly
> using MMIO.
> 
> Converting IO into non-posted admin commands is a fairly close
> recreation to what actual HW would do.
> 
> Jason

I thought you asked how it is possible for hardware to support reset if
all it does is replace IO BAR with memory BAR. The answer is that since
2011 the reset is followed by read of the status field (which isn't much
older than MSIX support from 2009 - which this code assumes).  If one
uses a Linux driver from 2011 and on then all you need to do is defer
response to this read until after the reset is complete.

If you are using older drivers or other OSes then reset using a posted
write after device has operated for a while might not be safe, so e.g.
you might trigger races if you remove drivers from system or
trigger hot unplug.  For example: 

	static void virtio_pci_remove(struct pci_dev *pci_dev)
	{

	....

		unregister_virtio_device(&vp_dev->vdev);

	^^^^ triggers reset, then releases memory

	....

		pci_disable_device(pci_dev);

	^^^ blocks DMA by clearing bus master

	}

here you could see some DMA into memory that has just been released.


As Jason mentions hardware exists that is used under one of these two
restrictions on the guest (Linux since 2011 or no resets while DMA is
going on), and it works fine with these existing guests.

Given the restrictions, virtio TC didn't elect to standardize this
approach and instead opted for the heavier approach of
converting IO into non-posted admin commands in software.


-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-25 17:36                                         ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-25 17:36 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvm, Maor Gottlieb, virtualization, Jiri Pirko, Leon Romanovsky

On Fri, Sep 22, 2023 at 01:22:33PM -0300, Jason Gunthorpe wrote:
> On Fri, Sep 22, 2023 at 11:40:58AM -0400, Michael S. Tsirkin wrote:
> > On Fri, Sep 22, 2023 at 12:15:34PM -0300, Jason Gunthorpe wrote:
> > > On Fri, Sep 22, 2023 at 11:13:18AM -0400, Michael S. Tsirkin wrote:
> > > > On Fri, Sep 22, 2023 at 12:25:06PM +0000, Parav Pandit wrote:
> > > > > 
> > > > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > > > Sent: Friday, September 22, 2023 5:53 PM
> > > > > 
> > > > > 
> > > > > > > And what's more, using MMIO BAR0 then it can work for legacy.
> > > > > > 
> > > > > > Oh? How? Our team didn't think so.
> > > > > 
> > > > > It does not. It was already discussed.
> > > > > The device reset in legacy is not synchronous.
> > > > > The drivers do not wait for reset to complete; it was written for the sw backend.
> > > > > Hence MMIO BAR0 is not the best option in real implementations.
> > > > 
> > > > Or maybe they made it synchronous in hardware, that's all.
> > > > After all same is true for the IO BAR0 e.g. for the PF: IO writes
> > > > are posted anyway.
> > > 
> > > IO writes are not posted in PCI.
> > 
> > Aha, I was confused. Thanks for the correction. I guess you just buffer
> > subsequent transactions while reset is going on and reset quickly enough
> > for it to be seemless then?
> 
> >From a hardware perspective the CPU issues an non-posted IO write and
> then it stops processing until the far side returns an IO completion.
> 
> Using that you can emulate what the SW virtio model did and delay the
> CPU from restarting until the reset is completed.
> 
> Since MMIO is always posted, this is not possible to emulate directly
> using MMIO.
> 
> Converting IO into non-posted admin commands is a fairly close
> recreation to what actual HW would do.
> 
> Jason

I thought you asked how it is possible for hardware to support reset if
all it does is replace IO BAR with memory BAR. The answer is that since
2011 the reset is followed by read of the status field (which isn't much
older than MSIX support from 2009 - which this code assumes).  If one
uses a Linux driver from 2011 and on then all you need to do is defer
response to this read until after the reset is complete.

If you are using older drivers or other OSes then reset using a posted
write after device has operated for a while might not be safe, so e.g.
you might trigger races if you remove drivers from system or
trigger hot unplug.  For example: 

	static void virtio_pci_remove(struct pci_dev *pci_dev)
	{

	....

		unregister_virtio_device(&vp_dev->vdev);

	^^^^ triggers reset, then releases memory

	....

		pci_disable_device(pci_dev);

	^^^ blocks DMA by clearing bus master

	}

here you could see some DMA into memory that has just been released.


As Jason mentions hardware exists that is used under one of these two
restrictions on the guest (Linux since 2011 or no resets while DMA is
going on), and it works fine with these existing guests.

Given the restrictions, virtio TC didn't elect to standardize this
approach and instead opted for the heavier approach of
converting IO into non-posted admin commands in software.


-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-22 16:19                             ` Jason Gunthorpe
@ 2023-09-25 18:16                                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-25 18:16 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kvm, maorg, virtualization, jiri, leonro

On Fri, Sep 22, 2023 at 01:19:28PM -0300, Jason Gunthorpe wrote:
> On Fri, Sep 22, 2023 at 11:39:19AM -0400, Michael S. Tsirkin wrote:
> > On Fri, Sep 22, 2023 at 09:25:01AM -0300, Jason Gunthorpe wrote:
> > > On Fri, Sep 22, 2023 at 11:02:50AM +0800, Jason Wang wrote:
> > > > On Fri, Sep 22, 2023 at 3:53 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
> > > > >
> > > > > On Thu, Sep 21, 2023 at 03:34:03PM -0400, Michael S. Tsirkin wrote:
> > > > >
> > > > > > that's easy/practical.  If instead VDPA gives the same speed with just
> > > > > > shadow vq then keeping this hack in vfio seems like less of a problem.
> > > > > > Finally if VDPA is faster then maybe you will reconsider using it ;)
> > > > >
> > > > > It is not all about the speed.
> > > > >
> > > > > VDPA presents another large and complex software stack in the
> > > > > hypervisor that can be eliminated by simply using VFIO.
> > > > 
> > > > vDPA supports standard virtio devices so how did you define
> > > > complexity?
> > > 
> > > As I said, VFIO is already required for other devices in these VMs. So
> > > anything incremental over base-line vfio-pci is complexity to
> > > minimize.
> > > 
> > > Everything vdpa does is either redundant or unnecessary compared to
> > > VFIO in these environments.
> > > 
> > > Jason
> > 
> > Yes but you know. There are all kind of environments.  I guess you
> > consider yours the most mainstream and important, and are sure it will
> > always stay like this.  But if there's a driver that does what you need
> > then you use that.
> 
> Come on, you are the one saying we cannot do things in the best way
> possible because you want your way of doing things to be the only way
> allowed. Which of us thinks "yours the most mainstream and important" ??
> 
> I'm not telling you to throw away VPDA, I'm saying there are
> legimitate real world use cases where VFIO is the appropriate
> interface, not VDPA.
> 
> I want choice, not dogmatic exclusion that there is Only One True Way.

I don't particularly think there's only one way, vfio is already there.
I am specifically thinking about this patch, for example it
muddies the waters a bit: normally I think vfio exposed device
with the same ID, suddenly it changes the ID as visible to the guest.
But again, whether doing this kind of thing is OK is more up to Alex than me.

I do want to understand if there's a use-case that vdpa does not address
simply because it might be worth while to extend it to do so, and a
bunch of people working on it are at Red Hat and I might have some input
into how that labor is allocated. But if the use-case is simply "has to
be vfio and not vdpa" then I guess not.




> > You really should be explaining what vdpa *does not* do that you
> > need.
> 
> I think I've done that enough, but if you have been following my
> explanation you should see that the entire point of this design is to
> allow a virtio device to be created inside a DPU to a specific
> detailed specification (eg an AWS virtio-net device, for instance)
> 
> The implementation is in the DPU, and only the DPU.
> 
> At the end of the day VDPA uses mediation and creates some
> RedHat/VDPA/Qemu virtio-net device in the guest. It is emphatically
> NOT a perfect recreation of the "AWS virtio-net" we started out with.
> 
> It entirely fails to achieve the most important thing it needs to do!

It could be that we are using mediation differently - in my world it's
when there's some host software on the path between guest and hardware,
and this qualifies.  The difference between what this patch does and
what vdpa does seems quantitative, not qualitative. Which might be
enough to motivate this work, I don't mind. But you seem to feel
it is qualitative and I am genuinely curious about it, because
if yes then it might lead e.g. the virtio standard in new directions.

I can *imagine* all kind of reasons to want to use vfio as compared to vdpa;
here are some examples I came up with, quickly:
- maybe you have drivers that poke at registers not in virtio spec:
  vfio allows that, vdpa by design does not
- maybe you are using vfio with a lot of devices already and don't want
  to special-case handling for virtio devices on the host
do any of the above motivations ring the bell? Some of the things you
said seem to hint at that. If yes maybe include this in the cover
letter.

There is also a question of capability. Specifically iommufd support
is lacking in vdpa (though there are finally some RFC patches to
address that). All this is fine, could be enough to motivate
a work like this one. But I am very curious to know if there
is any other capability lacking in vdpa. I asked already and you
didn't answer so I guess not?




> Yishai will rework the series with your remarks, we can look again on
> v2, thanks for all the input!
> 
> Jason

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-25 18:16                                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-25 18:16 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jason Wang, Alex Williamson, Yishai Hadas, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Fri, Sep 22, 2023 at 01:19:28PM -0300, Jason Gunthorpe wrote:
> On Fri, Sep 22, 2023 at 11:39:19AM -0400, Michael S. Tsirkin wrote:
> > On Fri, Sep 22, 2023 at 09:25:01AM -0300, Jason Gunthorpe wrote:
> > > On Fri, Sep 22, 2023 at 11:02:50AM +0800, Jason Wang wrote:
> > > > On Fri, Sep 22, 2023 at 3:53 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
> > > > >
> > > > > On Thu, Sep 21, 2023 at 03:34:03PM -0400, Michael S. Tsirkin wrote:
> > > > >
> > > > > > that's easy/practical.  If instead VDPA gives the same speed with just
> > > > > > shadow vq then keeping this hack in vfio seems like less of a problem.
> > > > > > Finally if VDPA is faster then maybe you will reconsider using it ;)
> > > > >
> > > > > It is not all about the speed.
> > > > >
> > > > > VDPA presents another large and complex software stack in the
> > > > > hypervisor that can be eliminated by simply using VFIO.
> > > > 
> > > > vDPA supports standard virtio devices so how did you define
> > > > complexity?
> > > 
> > > As I said, VFIO is already required for other devices in these VMs. So
> > > anything incremental over base-line vfio-pci is complexity to
> > > minimize.
> > > 
> > > Everything vdpa does is either redundant or unnecessary compared to
> > > VFIO in these environments.
> > > 
> > > Jason
> > 
> > Yes but you know. There are all kind of environments.  I guess you
> > consider yours the most mainstream and important, and are sure it will
> > always stay like this.  But if there's a driver that does what you need
> > then you use that.
> 
> Come on, you are the one saying we cannot do things in the best way
> possible because you want your way of doing things to be the only way
> allowed. Which of us thinks "yours the most mainstream and important" ??
> 
> I'm not telling you to throw away VPDA, I'm saying there are
> legimitate real world use cases where VFIO is the appropriate
> interface, not VDPA.
> 
> I want choice, not dogmatic exclusion that there is Only One True Way.

I don't particularly think there's only one way, vfio is already there.
I am specifically thinking about this patch, for example it
muddies the waters a bit: normally I think vfio exposed device
with the same ID, suddenly it changes the ID as visible to the guest.
But again, whether doing this kind of thing is OK is more up to Alex than me.

I do want to understand if there's a use-case that vdpa does not address
simply because it might be worth while to extend it to do so, and a
bunch of people working on it are at Red Hat and I might have some input
into how that labor is allocated. But if the use-case is simply "has to
be vfio and not vdpa" then I guess not.




> > You really should be explaining what vdpa *does not* do that you
> > need.
> 
> I think I've done that enough, but if you have been following my
> explanation you should see that the entire point of this design is to
> allow a virtio device to be created inside a DPU to a specific
> detailed specification (eg an AWS virtio-net device, for instance)
> 
> The implementation is in the DPU, and only the DPU.
> 
> At the end of the day VDPA uses mediation and creates some
> RedHat/VDPA/Qemu virtio-net device in the guest. It is emphatically
> NOT a perfect recreation of the "AWS virtio-net" we started out with.
> 
> It entirely fails to achieve the most important thing it needs to do!

It could be that we are using mediation differently - in my world it's
when there's some host software on the path between guest and hardware,
and this qualifies.  The difference between what this patch does and
what vdpa does seems quantitative, not qualitative. Which might be
enough to motivate this work, I don't mind. But you seem to feel
it is qualitative and I am genuinely curious about it, because
if yes then it might lead e.g. the virtio standard in new directions.

I can *imagine* all kind of reasons to want to use vfio as compared to vdpa;
here are some examples I came up with, quickly:
- maybe you have drivers that poke at registers not in virtio spec:
  vfio allows that, vdpa by design does not
- maybe you are using vfio with a lot of devices already and don't want
  to special-case handling for virtio devices on the host
do any of the above motivations ring the bell? Some of the things you
said seem to hint at that. If yes maybe include this in the cover
letter.

There is also a question of capability. Specifically iommufd support
is lacking in vdpa (though there are finally some RFC patches to
address that). All this is fine, could be enough to motivate
a work like this one. But I am very curious to know if there
is any other capability lacking in vdpa. I asked already and you
didn't answer so I guess not?




> Yishai will rework the series with your remarks, we can look again on
> v2, thanks for all the input!
> 
> Jason


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-25  8:26                                   ` Parav Pandit via Virtualization
@ 2023-09-25 18:36                                     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-25 18:36 UTC (permalink / raw)
  To: Parav Pandit
  Cc: kvm, Maor Gottlieb, virtualization, Jason Gunthorpe, Jiri Pirko,
	Leon Romanovsky

On Mon, Sep 25, 2023 at 08:26:33AM +0000, Parav Pandit wrote:
> 
> 
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Monday, September 25, 2023 8:00 AM
> > 
> > On Fri, Sep 22, 2023 at 8:25 PM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > Sent: Friday, September 22, 2023 5:53 PM
> > >
> > >
> > > > > And what's more, using MMIO BAR0 then it can work for legacy.
> > > >
> > > > Oh? How? Our team didn't think so.
> > >
> > > It does not. It was already discussed.
> > > The device reset in legacy is not synchronous.
> > 
> > How do you know this?
> >
> Not sure the motivation of same discussion done in the OASIS with you and others in past.
> 
> Anyways, please find the answer below.
> 
> About reset,
> The legacy device specification has not enforced below cited 1.0 driver requirement of 1.0.
> 
> "The driver SHOULD consider a driver-initiated reset complete when it reads device status as 0."
>  
> [1] https://ozlabs.org/~rusty/virtio-spec/virtio-0.9.5.pdf

Basically, I think any drivers that did not read status (linux pre 2011)
before freeing memory under DMA have a reset path that is racy wrt DMA, since 
memory writes are posted and IO writes while not posted have completion
that does not order posted transactions, e.g. from pci express spec:
        D2b
        An I/O or Configuration Write Completion 37 is permitted to pass a Posted Request.
having said that there were a ton of driver races discovered on this
path in the years since, I suspect if one cares about this then
just avoiding stress on reset is wise.



> > > The drivers do not wait for reset to complete; it was written for the sw
> > backend.
> > 
> > Do you see there's a flush after reset in the legacy driver?
> > 
> Yes. it only flushes the write by reading it. The driver does not get _wait_ for the reset to complete within the device like above.

One can thinkably do that wait in hardware, though. Just defer completion until
read is done.

> Please see the reset flow of 1.x device as below.
> In fact the comment of the 1.x device also needs to be updated to indicate that driver need to wait for the device to finish the reset.
> I will send separate patch for improving this comment of vp_reset() to match the spec.
> 
> static void vp_reset(struct virtio_device *vdev)
> {
>         struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>         struct virtio_pci_modern_device *mdev = &vp_dev->mdev;
> 
>         /* 0 status means a reset. */
>         vp_modern_set_status(mdev, 0);
>         /* After writing 0 to device_status, the driver MUST wait for a read of
>          * device_status to return 0 before reinitializing the device.
>          * This will flush out the status write, and flush in device writes,
>          * including MSI-X interrupts, if any.
>          */
>         while (vp_modern_get_status(mdev))
>                 msleep(1);
>         /* Flush pending VQ/configuration callbacks. */
>         vp_synchronize_vectors(vdev);
> }
> 
> 
> > static void vp_reset(struct virtio_device *vdev) {
> >         struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> >         /* 0 status means a reset. */
> >         vp_legacy_set_status(&vp_dev->ldev, 0);
> >         /* Flush out the status write, and flush in device writes,
> >          * including MSi-X interrupts, if any. */
> >         vp_legacy_get_status(&vp_dev->ldev);
> >         /* Flush pending VQ/configuration callbacks. */
> >         vp_synchronize_vectors(vdev);
> > }
> > 
> > Thanks
> > 
> > 
> > 
> > > Hence MMIO BAR0 is not the best option in real implementations.
> > >
> 

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-25 18:36                                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-25 18:36 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Jason Wang, Jason Gunthorpe, Alex Williamson, Yishai Hadas, kvm,
	virtualization, Feng Liu, Jiri Pirko, kevin.tian, joao.m.martins,
	Leon Romanovsky, Maor Gottlieb

On Mon, Sep 25, 2023 at 08:26:33AM +0000, Parav Pandit wrote:
> 
> 
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Monday, September 25, 2023 8:00 AM
> > 
> > On Fri, Sep 22, 2023 at 8:25 PM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > Sent: Friday, September 22, 2023 5:53 PM
> > >
> > >
> > > > > And what's more, using MMIO BAR0 then it can work for legacy.
> > > >
> > > > Oh? How? Our team didn't think so.
> > >
> > > It does not. It was already discussed.
> > > The device reset in legacy is not synchronous.
> > 
> > How do you know this?
> >
> Not sure the motivation of same discussion done in the OASIS with you and others in past.
> 
> Anyways, please find the answer below.
> 
> About reset,
> The legacy device specification has not enforced below cited 1.0 driver requirement of 1.0.
> 
> "The driver SHOULD consider a driver-initiated reset complete when it reads device status as 0."
>  
> [1] https://ozlabs.org/~rusty/virtio-spec/virtio-0.9.5.pdf

Basically, I think any drivers that did not read status (linux pre 2011)
before freeing memory under DMA have a reset path that is racy wrt DMA, since 
memory writes are posted and IO writes while not posted have completion
that does not order posted transactions, e.g. from pci express spec:
        D2b
        An I/O or Configuration Write Completion 37 is permitted to pass a Posted Request.
having said that there were a ton of driver races discovered on this
path in the years since, I suspect if one cares about this then
just avoiding stress on reset is wise.



> > > The drivers do not wait for reset to complete; it was written for the sw
> > backend.
> > 
> > Do you see there's a flush after reset in the legacy driver?
> > 
> Yes. it only flushes the write by reading it. The driver does not get _wait_ for the reset to complete within the device like above.

One can thinkably do that wait in hardware, though. Just defer completion until
read is done.

> Please see the reset flow of 1.x device as below.
> In fact the comment of the 1.x device also needs to be updated to indicate that driver need to wait for the device to finish the reset.
> I will send separate patch for improving this comment of vp_reset() to match the spec.
> 
> static void vp_reset(struct virtio_device *vdev)
> {
>         struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>         struct virtio_pci_modern_device *mdev = &vp_dev->mdev;
> 
>         /* 0 status means a reset. */
>         vp_modern_set_status(mdev, 0);
>         /* After writing 0 to device_status, the driver MUST wait for a read of
>          * device_status to return 0 before reinitializing the device.
>          * This will flush out the status write, and flush in device writes,
>          * including MSI-X interrupts, if any.
>          */
>         while (vp_modern_get_status(mdev))
>                 msleep(1);
>         /* Flush pending VQ/configuration callbacks. */
>         vp_synchronize_vectors(vdev);
> }
> 
> 
> > static void vp_reset(struct virtio_device *vdev) {
> >         struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> >         /* 0 status means a reset. */
> >         vp_legacy_set_status(&vp_dev->ldev, 0);
> >         /* Flush out the status write, and flush in device writes,
> >          * including MSi-X interrupts, if any. */
> >         vp_legacy_get_status(&vp_dev->ldev);
> >         /* Flush pending VQ/configuration callbacks. */
> >         vp_synchronize_vectors(vdev);
> > }
> > 
> > Thanks
> > 
> > 
> > 
> > > Hence MMIO BAR0 is not the best option in real implementations.
> > >
> 


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-25 18:16                                 ` Michael S. Tsirkin
  (?)
@ 2023-09-25 18:53                                 ` Jason Gunthorpe
  2023-09-25 19:52                                     ` Michael S. Tsirkin
  -1 siblings, 1 reply; 321+ messages in thread
From: Jason Gunthorpe @ 2023-09-25 18:53 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, Alex Williamson, Yishai Hadas, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Mon, Sep 25, 2023 at 02:16:30PM -0400, Michael S. Tsirkin wrote:

> I do want to understand if there's a use-case that vdpa does not address
> simply because it might be worth while to extend it to do so, and a
> bunch of people working on it are at Red Hat and I might have some input
> into how that labor is allocated. But if the use-case is simply "has to
> be vfio and not vdpa" then I guess not.

If you strip away all the philisophical arguing VDPA has no way to
isolate the control and data virtqs to different IOMMU configurations
with this single PCI function.

The existing HW VDPA drivers provided device specific ways to handle
this.

Without DMA isolation you can't assign the high speed data virtq's to
the VM without mediating them as well.

> It could be that we are using mediation differently - in my world it's
> when there's some host software on the path between guest and hardware,
> and this qualifies.  

That is pretty general. As I said to Jason, if you want to use it that
way then you need to make up a new word to describe what VDPA does as
there is a clear difference in scope between this VFIO patch (relay IO
commands to the device) and VDPA (intercept all the control plane,
control virtq and bring it to a RedHat/qemu standard common behavior)
 
> There is also a question of capability. Specifically iommufd support
> is lacking in vdpa (though there are finally some RFC patches to
> address that). All this is fine, could be enough to motivate
> a work like this one.

I've answered many times, you just don't semm to like the answers or
dismiss them as not relevant to you.

Jason

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-25 12:26                         ` Jason Gunthorpe
@ 2023-09-25 19:44                             ` Michael S. Tsirkin
  2023-09-26  4:37                             ` Jason Wang
  1 sibling, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-25 19:44 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jason Wang, Yishai Hadas, alex.williamson, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Mon, Sep 25, 2023 at 09:26:07AM -0300, Jason Gunthorpe wrote:
> > > So, as I keep saying, in this scenario the goal is no mediation in the
> > > hypervisor.
> > 
> > That's pretty fine, but I don't think trapping + relying is not
> > mediation. Does it really matter what happens after trapping?
> 
> It is not mediation in the sense that the kernel driver does not in
> any way make decisions on the behavior of the device. It simply
> transforms an IO operation into a device command and relays it to the
> device. The device still fully controls its own behavior.
> 
> VDPA is very different from this. You might call them both mediation,
> sure, but then you need another word to describe the additional
> changes VPDA is doing.

Sorry about hijacking the thread a little bit, but could you
call out some of the changes that are the most problematic
for you?

> > > It is pointless, everything you think you need to do there
> > > is actually already being done in the DPU.
> > 
> > Well, migration or even Qemu could be offloaded to DPU as well. If
> > that's the direction that's pretty fine.
> 
> That's silly, of course qemu/kvm can't run in the DPU.
> 
> However, we can empty qemu and the hypervisor out so all it does is
> run kvm and run vfio. In this model the DPU does all the OVS, storage,
> "VPDA", etc. qemu is just a passive relay of the DPU PCI functions
> into VM's vPCI functions.
> 
> So, everything VDPA was doing in the environment is migrated into the
> DPU.
> 
> In this model the DPU is an extension of the hypervisor/qemu
> environment and we shift code from x86 side to arm side to increase
> security, save power and increase total system performance.
> 
> Jason

I think I begin to understand. On the DPU you have some virtio
devices but also some non-virtio devices.  So you have to
use VFIO to talk to the DPU. Reusing VFIO to talk to virtio
devices too, simplifies things for you. If guests will see
vendor-specific devices from the DPU anyway, it will be impossible
to migrate such guests away from the DPU so the cross-vendor
migration capability is less important in this use-case.
Is this a good summary?


-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-25 19:44                             ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-25 19:44 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kvm, maorg, virtualization, jiri, leonro

On Mon, Sep 25, 2023 at 09:26:07AM -0300, Jason Gunthorpe wrote:
> > > So, as I keep saying, in this scenario the goal is no mediation in the
> > > hypervisor.
> > 
> > That's pretty fine, but I don't think trapping + relying is not
> > mediation. Does it really matter what happens after trapping?
> 
> It is not mediation in the sense that the kernel driver does not in
> any way make decisions on the behavior of the device. It simply
> transforms an IO operation into a device command and relays it to the
> device. The device still fully controls its own behavior.
> 
> VDPA is very different from this. You might call them both mediation,
> sure, but then you need another word to describe the additional
> changes VPDA is doing.

Sorry about hijacking the thread a little bit, but could you
call out some of the changes that are the most problematic
for you?

> > > It is pointless, everything you think you need to do there
> > > is actually already being done in the DPU.
> > 
> > Well, migration or even Qemu could be offloaded to DPU as well. If
> > that's the direction that's pretty fine.
> 
> That's silly, of course qemu/kvm can't run in the DPU.
> 
> However, we can empty qemu and the hypervisor out so all it does is
> run kvm and run vfio. In this model the DPU does all the OVS, storage,
> "VPDA", etc. qemu is just a passive relay of the DPU PCI functions
> into VM's vPCI functions.
> 
> So, everything VDPA was doing in the environment is migrated into the
> DPU.
> 
> In this model the DPU is an extension of the hypervisor/qemu
> environment and we shift code from x86 side to arm side to increase
> security, save power and increase total system performance.
> 
> Jason

I think I begin to understand. On the DPU you have some virtio
devices but also some non-virtio devices.  So you have to
use VFIO to talk to the DPU. Reusing VFIO to talk to virtio
devices too, simplifies things for you. If guests will see
vendor-specific devices from the DPU anyway, it will be impossible
to migrate such guests away from the DPU so the cross-vendor
migration capability is less important in this use-case.
Is this a good summary?


-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-25 18:53                                 ` Jason Gunthorpe
@ 2023-09-25 19:52                                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-25 19:52 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jason Wang, Alex Williamson, Yishai Hadas, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Mon, Sep 25, 2023 at 03:53:18PM -0300, Jason Gunthorpe wrote:
> On Mon, Sep 25, 2023 at 02:16:30PM -0400, Michael S. Tsirkin wrote:
> 
> > I do want to understand if there's a use-case that vdpa does not address
> > simply because it might be worth while to extend it to do so, and a
> > bunch of people working on it are at Red Hat and I might have some input
> > into how that labor is allocated. But if the use-case is simply "has to
> > be vfio and not vdpa" then I guess not.
> 
> If you strip away all the philisophical arguing VDPA has no way to
> isolate the control and data virtqs to different IOMMU configurations
> with this single PCI function.

Aha, so address space/PASID support then?

> The existing HW VDPA drivers provided device specific ways to handle
> this.
> 
> Without DMA isolation you can't assign the high speed data virtq's to
> the VM without mediating them as well.
> 
> > It could be that we are using mediation differently - in my world it's
> > when there's some host software on the path between guest and hardware,
> > and this qualifies.  
> 
> That is pretty general. As I said to Jason, if you want to use it that
> way then you need to make up a new word to describe what VDPA does as
> there is a clear difference in scope between this VFIO patch (relay IO
> commands to the device) and VDPA (intercept all the control plane,
> control virtq and bring it to a RedHat/qemu standard common behavior)

IIUC VDPA itself does not really bring it to either RedHat or qemu
standard, it just allows userspace to control behaviour - if userspace
is qemu then it's qemu deciding how it behaves. Which I guess this
doesn't. Right?  RedHat's not in the picture at all I think.

> > There is also a question of capability. Specifically iommufd support
> > is lacking in vdpa (though there are finally some RFC patches to
> > address that). All this is fine, could be enough to motivate
> > a work like this one.
> 
> I've answered many times, you just don't semm to like the answers or
> dismiss them as not relevant to you.
> 
> Jason


Not really I think I lack some of the picture so I don't fully
understand. Or maybe I missed something else.

-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-25 19:52                                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-25 19:52 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kvm, maorg, virtualization, jiri, leonro

On Mon, Sep 25, 2023 at 03:53:18PM -0300, Jason Gunthorpe wrote:
> On Mon, Sep 25, 2023 at 02:16:30PM -0400, Michael S. Tsirkin wrote:
> 
> > I do want to understand if there's a use-case that vdpa does not address
> > simply because it might be worth while to extend it to do so, and a
> > bunch of people working on it are at Red Hat and I might have some input
> > into how that labor is allocated. But if the use-case is simply "has to
> > be vfio and not vdpa" then I guess not.
> 
> If you strip away all the philisophical arguing VDPA has no way to
> isolate the control and data virtqs to different IOMMU configurations
> with this single PCI function.

Aha, so address space/PASID support then?

> The existing HW VDPA drivers provided device specific ways to handle
> this.
> 
> Without DMA isolation you can't assign the high speed data virtq's to
> the VM without mediating them as well.
> 
> > It could be that we are using mediation differently - in my world it's
> > when there's some host software on the path between guest and hardware,
> > and this qualifies.  
> 
> That is pretty general. As I said to Jason, if you want to use it that
> way then you need to make up a new word to describe what VDPA does as
> there is a clear difference in scope between this VFIO patch (relay IO
> commands to the device) and VDPA (intercept all the control plane,
> control virtq and bring it to a RedHat/qemu standard common behavior)

IIUC VDPA itself does not really bring it to either RedHat or qemu
standard, it just allows userspace to control behaviour - if userspace
is qemu then it's qemu deciding how it behaves. Which I guess this
doesn't. Right?  RedHat's not in the picture at all I think.

> > There is also a question of capability. Specifically iommufd support
> > is lacking in vdpa (though there are finally some RFC patches to
> > address that). All this is fine, could be enough to motivate
> > a work like this one.
> 
> I've answered many times, you just don't semm to like the answers or
> dismiss them as not relevant to you.
> 
> Jason


Not really I think I lack some of the picture so I don't fully
understand. Or maybe I missed something else.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-25 19:44                             ` Michael S. Tsirkin
  (?)
@ 2023-09-26  0:40                             ` Jason Gunthorpe
  2023-09-26  5:34                                 ` Michael S. Tsirkin
  2023-09-26  5:42                                 ` Michael S. Tsirkin
  -1 siblings, 2 replies; 321+ messages in thread
From: Jason Gunthorpe @ 2023-09-26  0:40 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, Yishai Hadas, alex.williamson, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Mon, Sep 25, 2023 at 03:44:11PM -0400, Michael S. Tsirkin wrote:
> > VDPA is very different from this. You might call them both mediation,
> > sure, but then you need another word to describe the additional
> > changes VPDA is doing.
> 
> Sorry about hijacking the thread a little bit, but could you
> call out some of the changes that are the most problematic
> for you?

I don't really know these details. The operators have an existing
virtio world that is ABI toward the VM for them, and they do not want
*anything* to change. The VM should be unware if the virtio device is
created by old hypervisor software or new DPU software. It presents
exactly the same ABI.

So the challenge really is to convince that VDPA delivers that, and
frankly, I don't think it does. ABI toward the VM is very important
here.

> > In this model the DPU is an extension of the hypervisor/qemu
> > environment and we shift code from x86 side to arm side to increase
> > security, save power and increase total system performance.
> 
> I think I begin to understand. On the DPU you have some virtio
> devices but also some non-virtio devices.  So you have to
> use VFIO to talk to the DPU. Reusing VFIO to talk to virtio
> devices too, simplifies things for you. 

Yes

> If guests will see vendor-specific devices from the DPU anyway, it
> will be impossible to migrate such guests away from the DPU so the
> cross-vendor migration capability is less important in this
> use-case.  Is this a good summary?

Well, sort of. As I said before, the vendor here is the cloud
operator, not the DPU supplier. The guest will see an AWS virtio-net
function, for example.

The operator will ensure that all their different implementations of
this function will interwork for migration.

So within the closed world of a single operator live migration will
work just fine.

Since the hypervisor's controlled by the operator only migrate within
the operators own environment anyhow, it is an already solved problem.

Jason

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-25  8:26                                   ` Parav Pandit via Virtualization
@ 2023-09-26  2:32                                     ` Jason Wang
  -1 siblings, 0 replies; 321+ messages in thread
From: Jason Wang @ 2023-09-26  2:32 UTC (permalink / raw)
  To: Parav Pandit
  Cc: kvm, Michael S. Tsirkin, Maor Gottlieb, virtualization,
	Jason Gunthorpe, Jiri Pirko, Leon Romanovsky

On Mon, Sep 25, 2023 at 4:26 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Monday, September 25, 2023 8:00 AM
> >
> > On Fri, Sep 22, 2023 at 8:25 PM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > Sent: Friday, September 22, 2023 5:53 PM
> > >
> > >
> > > > > And what's more, using MMIO BAR0 then it can work for legacy.
> > > >
> > > > Oh? How? Our team didn't think so.
> > >
> > > It does not. It was already discussed.
> > > The device reset in legacy is not synchronous.
> >
> > How do you know this?
> >
> Not sure the motivation of same discussion done in the OASIS with you and others in past.

That is exactly the same point.

It's too late to define the legacy behaviour accurately in the spec so
people will be lost in the legacy maze easily.

>
> Anyways, please find the answer below.
>
> About reset,
> The legacy device specification has not enforced below cited 1.0 driver requirement of 1.0.
>
> "The driver SHOULD consider a driver-initiated reset complete when it reads device status as 0."

We are talking about how to make devices work for legacy drivers. So
it has nothing related to 1.0.

>
> [1] https://ozlabs.org/~rusty/virtio-spec/virtio-0.9.5.pdf
>
> > > The drivers do not wait for reset to complete; it was written for the sw
> > backend.
> >
> > Do you see there's a flush after reset in the legacy driver?
> >
> Yes. it only flushes the write by reading it. The driver does not get _wait_ for the reset to complete within the device like above.

It's the implementation details in legacy. The device needs to make
sure (reset) the driver can work (is done before get_status return).

That's all.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-26  2:32                                     ` Jason Wang
  0 siblings, 0 replies; 321+ messages in thread
From: Jason Wang @ 2023-09-26  2:32 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Jason Gunthorpe, Michael S. Tsirkin, Alex Williamson,
	Yishai Hadas, kvm, virtualization, Feng Liu, Jiri Pirko,
	kevin.tian, joao.m.martins, Leon Romanovsky, Maor Gottlieb

On Mon, Sep 25, 2023 at 4:26 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Monday, September 25, 2023 8:00 AM
> >
> > On Fri, Sep 22, 2023 at 8:25 PM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > Sent: Friday, September 22, 2023 5:53 PM
> > >
> > >
> > > > > And what's more, using MMIO BAR0 then it can work for legacy.
> > > >
> > > > Oh? How? Our team didn't think so.
> > >
> > > It does not. It was already discussed.
> > > The device reset in legacy is not synchronous.
> >
> > How do you know this?
> >
> Not sure the motivation of same discussion done in the OASIS with you and others in past.

That is exactly the same point.

It's too late to define the legacy behaviour accurately in the spec so
people will be lost in the legacy maze easily.

>
> Anyways, please find the answer below.
>
> About reset,
> The legacy device specification has not enforced below cited 1.0 driver requirement of 1.0.
>
> "The driver SHOULD consider a driver-initiated reset complete when it reads device status as 0."

We are talking about how to make devices work for legacy drivers. So
it has nothing related to 1.0.

>
> [1] https://ozlabs.org/~rusty/virtio-spec/virtio-0.9.5.pdf
>
> > > The drivers do not wait for reset to complete; it was written for the sw
> > backend.
> >
> > Do you see there's a flush after reset in the legacy driver?
> >
> Yes. it only flushes the write by reading it. The driver does not get _wait_ for the reset to complete within the device like above.

It's the implementation details in legacy. The device needs to make
sure (reset) the driver can work (is done before get_status return).

That's all.


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-25 18:36                                     ` Michael S. Tsirkin
@ 2023-09-26  2:34                                       ` Zhu, Lingshan
  -1 siblings, 0 replies; 321+ messages in thread
From: Zhu, Lingshan @ 2023-09-26  2:34 UTC (permalink / raw)
  To: Michael S. Tsirkin, Parav Pandit
  Cc: kvm, Maor Gottlieb, virtualization, Jason Gunthorpe, Jiri Pirko,
	Leon Romanovsky



On 9/26/2023 2:36 AM, Michael S. Tsirkin wrote:
> On Mon, Sep 25, 2023 at 08:26:33AM +0000, Parav Pandit wrote:
>>
>>> From: Jason Wang <jasowang@redhat.com>
>>> Sent: Monday, September 25, 2023 8:00 AM
>>>
>>> On Fri, Sep 22, 2023 at 8:25 PM Parav Pandit <parav@nvidia.com> wrote:
>>>>
>>>>> From: Jason Gunthorpe <jgg@nvidia.com>
>>>>> Sent: Friday, September 22, 2023 5:53 PM
>>>>
>>>>>> And what's more, using MMIO BAR0 then it can work for legacy.
>>>>> Oh? How? Our team didn't think so.
>>>> It does not. It was already discussed.
>>>> The device reset in legacy is not synchronous.
>>> How do you know this?
>>>
>> Not sure the motivation of same discussion done in the OASIS with you and others in past.
>>
>> Anyways, please find the answer below.
>>
>> About reset,
>> The legacy device specification has not enforced below cited 1.0 driver requirement of 1.0.
>>
>> "The driver SHOULD consider a driver-initiated reset complete when it reads device status as 0."
>>   
>> [1] https://ozlabs.org/~rusty/virtio-spec/virtio-0.9.5.pdf
> Basically, I think any drivers that did not read status (linux pre 2011)
> before freeing memory under DMA have a reset path that is racy wrt DMA, since
> memory writes are posted and IO writes while not posted have completion
> that does not order posted transactions, e.g. from pci express spec:
>          D2b
>          An I/O or Configuration Write Completion 37 is permitted to pass a Posted Request.
> having said that there were a ton of driver races discovered on this
> path in the years since, I suspect if one cares about this then
> just avoiding stress on reset is wise.
>
>
>
>>>> The drivers do not wait for reset to complete; it was written for the sw
>>> backend.
>>>
>>> Do you see there's a flush after reset in the legacy driver?
>>>
>> Yes. it only flushes the write by reading it. The driver does not get _wait_ for the reset to complete within the device like above.
> One can thinkably do that wait in hardware, though. Just defer completion until
> read is done.
I agree with MST. At least Intel devices work fine with vfio-pci and 
legacy driver without any changes.
So far so good.

Thanks
Zhu Lingshan
>
>> Please see the reset flow of 1.x device as below.
>> In fact the comment of the 1.x device also needs to be updated to indicate that driver need to wait for the device to finish the reset.
>> I will send separate patch for improving this comment of vp_reset() to match the spec.
>>
>> static void vp_reset(struct virtio_device *vdev)
>> {
>>          struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>>          struct virtio_pci_modern_device *mdev = &vp_dev->mdev;
>>
>>          /* 0 status means a reset. */
>>          vp_modern_set_status(mdev, 0);
>>          /* After writing 0 to device_status, the driver MUST wait for a read of
>>           * device_status to return 0 before reinitializing the device.
>>           * This will flush out the status write, and flush in device writes,
>>           * including MSI-X interrupts, if any.
>>           */
>>          while (vp_modern_get_status(mdev))
>>                  msleep(1);
>>          /* Flush pending VQ/configuration callbacks. */
>>          vp_synchronize_vectors(vdev);
>> }
>>
>>
>>> static void vp_reset(struct virtio_device *vdev) {
>>>          struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>>>          /* 0 status means a reset. */
>>>          vp_legacy_set_status(&vp_dev->ldev, 0);
>>>          /* Flush out the status write, and flush in device writes,
>>>           * including MSi-X interrupts, if any. */
>>>          vp_legacy_get_status(&vp_dev->ldev);
>>>          /* Flush pending VQ/configuration callbacks. */
>>>          vp_synchronize_vectors(vdev);
>>> }
>>>
>>> Thanks
>>>
>>>
>>>
>>>> Hence MMIO BAR0 is not the best option in real implementations.
>>>>
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-26  2:34                                       ` Zhu, Lingshan
  0 siblings, 0 replies; 321+ messages in thread
From: Zhu, Lingshan @ 2023-09-26  2:34 UTC (permalink / raw)
  To: Michael S. Tsirkin, Parav Pandit
  Cc: kvm, Maor Gottlieb, virtualization, Jason Gunthorpe, Jiri Pirko,
	Leon Romanovsky



On 9/26/2023 2:36 AM, Michael S. Tsirkin wrote:
> On Mon, Sep 25, 2023 at 08:26:33AM +0000, Parav Pandit wrote:
>>
>>> From: Jason Wang <jasowang@redhat.com>
>>> Sent: Monday, September 25, 2023 8:00 AM
>>>
>>> On Fri, Sep 22, 2023 at 8:25 PM Parav Pandit <parav@nvidia.com> wrote:
>>>>
>>>>> From: Jason Gunthorpe <jgg@nvidia.com>
>>>>> Sent: Friday, September 22, 2023 5:53 PM
>>>>
>>>>>> And what's more, using MMIO BAR0 then it can work for legacy.
>>>>> Oh? How? Our team didn't think so.
>>>> It does not. It was already discussed.
>>>> The device reset in legacy is not synchronous.
>>> How do you know this?
>>>
>> Not sure the motivation of same discussion done in the OASIS with you and others in past.
>>
>> Anyways, please find the answer below.
>>
>> About reset,
>> The legacy device specification has not enforced below cited 1.0 driver requirement of 1.0.
>>
>> "The driver SHOULD consider a driver-initiated reset complete when it reads device status as 0."
>>   
>> [1] https://ozlabs.org/~rusty/virtio-spec/virtio-0.9.5.pdf
> Basically, I think any drivers that did not read status (linux pre 2011)
> before freeing memory under DMA have a reset path that is racy wrt DMA, since
> memory writes are posted and IO writes while not posted have completion
> that does not order posted transactions, e.g. from pci express spec:
>          D2b
>          An I/O or Configuration Write Completion 37 is permitted to pass a Posted Request.
> having said that there were a ton of driver races discovered on this
> path in the years since, I suspect if one cares about this then
> just avoiding stress on reset is wise.
>
>
>
>>>> The drivers do not wait for reset to complete; it was written for the sw
>>> backend.
>>>
>>> Do you see there's a flush after reset in the legacy driver?
>>>
>> Yes. it only flushes the write by reading it. The driver does not get _wait_ for the reset to complete within the device like above.
> One can thinkably do that wait in hardware, though. Just defer completion until
> read is done.
I agree with MST. At least Intel devices work fine with vfio-pci and 
legacy driver without any changes.
So far so good.

Thanks
Zhu Lingshan
>
>> Please see the reset flow of 1.x device as below.
>> In fact the comment of the 1.x device also needs to be updated to indicate that driver need to wait for the device to finish the reset.
>> I will send separate patch for improving this comment of vp_reset() to match the spec.
>>
>> static void vp_reset(struct virtio_device *vdev)
>> {
>>          struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>>          struct virtio_pci_modern_device *mdev = &vp_dev->mdev;
>>
>>          /* 0 status means a reset. */
>>          vp_modern_set_status(mdev, 0);
>>          /* After writing 0 to device_status, the driver MUST wait for a read of
>>           * device_status to return 0 before reinitializing the device.
>>           * This will flush out the status write, and flush in device writes,
>>           * including MSI-X interrupts, if any.
>>           */
>>          while (vp_modern_get_status(mdev))
>>                  msleep(1);
>>          /* Flush pending VQ/configuration callbacks. */
>>          vp_synchronize_vectors(vdev);
>> }
>>
>>
>>> static void vp_reset(struct virtio_device *vdev) {
>>>          struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>>>          /* 0 status means a reset. */
>>>          vp_legacy_set_status(&vp_dev->ldev, 0);
>>>          /* Flush out the status write, and flush in device writes,
>>>           * including MSi-X interrupts, if any. */
>>>          vp_legacy_get_status(&vp_dev->ldev);
>>>          /* Flush pending VQ/configuration callbacks. */
>>>          vp_synchronize_vectors(vdev);
>>> }
>>>
>>> Thanks
>>>
>>>
>>>
>>>> Hence MMIO BAR0 is not the best option in real implementations.
>>>>
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* RE: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-25 18:36                                     ` Michael S. Tsirkin
@ 2023-09-26  3:45                                       ` Parav Pandit via Virtualization
  -1 siblings, 0 replies; 321+ messages in thread
From: Parav Pandit @ 2023-09-26  3:45 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, Jason Gunthorpe, Alex Williamson, Yishai Hadas, kvm,
	virtualization, Feng Liu, Jiri Pirko, kevin.tian, joao.m.martins,
	Leon Romanovsky, Maor Gottlieb



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, September 26, 2023 12:06 AM

> One can thinkably do that wait in hardware, though. Just defer completion until
> read is done.
>
Once OASIS does such new interface and if some hw vendor _actually_ wants to do such complex hw, may be vfio driver can adopt to it.
When we worked with you, we discussed that there such hw does not have enough returns and hence technical committee choose to proceed with admin commands.
I will skip re-discussing all over it again here.

The current virto spec is delivering the best trade-offs of functionality, performance and light weight implementation with future forward path towards more features as Jason explained such as migration.
All with near zero driver, qemu and sw involvement for rapidly growing feature set...

^ permalink raw reply	[flat|nested] 321+ messages in thread

* RE: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-26  3:45                                       ` Parav Pandit via Virtualization
  0 siblings, 0 replies; 321+ messages in thread
From: Parav Pandit via Virtualization @ 2023-09-26  3:45 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, Maor Gottlieb, virtualization, Jason Gunthorpe, Jiri Pirko,
	Leon Romanovsky



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, September 26, 2023 12:06 AM

> One can thinkably do that wait in hardware, though. Just defer completion until
> read is done.
>
Once OASIS does such new interface and if some hw vendor _actually_ wants to do such complex hw, may be vfio driver can adopt to it.
When we worked with you, we discussed that there such hw does not have enough returns and hence technical committee choose to proceed with admin commands.
I will skip re-discussing all over it again here.

The current virto spec is delivering the best trade-offs of functionality, performance and light weight implementation with future forward path towards more features as Jason explained such as migration.
All with near zero driver, qemu and sw involvement for rapidly growing feature set...
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* RE: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-26  2:32                                     ` Jason Wang
@ 2023-09-26  4:01                                       ` Parav Pandit via Virtualization
  -1 siblings, 0 replies; 321+ messages in thread
From: Parav Pandit @ 2023-09-26  4:01 UTC (permalink / raw)
  To: Jason Wang
  Cc: Jason Gunthorpe, Michael S. Tsirkin, Alex Williamson,
	Yishai Hadas, kvm, virtualization, Feng Liu, Jiri Pirko,
	kevin.tian, joao.m.martins, Leon Romanovsky, Maor Gottlieb



> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, September 26, 2023 8:03 AM
> 
> It's the implementation details in legacy. The device needs to make sure (reset)
> the driver can work (is done before get_status return).
It is part of the 0.9.5 and 1.x specification as I quoted those text above.

^ permalink raw reply	[flat|nested] 321+ messages in thread

* RE: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-26  4:01                                       ` Parav Pandit via Virtualization
  0 siblings, 0 replies; 321+ messages in thread
From: Parav Pandit via Virtualization @ 2023-09-26  4:01 UTC (permalink / raw)
  To: Jason Wang
  Cc: kvm, Michael S. Tsirkin, Maor Gottlieb, virtualization,
	Jason Gunthorpe, Jiri Pirko, Leon Romanovsky



> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, September 26, 2023 8:03 AM
> 
> It's the implementation details in legacy. The device needs to make sure (reset)
> the driver can work (is done before get_status return).
It is part of the 0.9.5 and 1.x specification as I quoted those text above.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-26  4:01                                       ` Parav Pandit via Virtualization
@ 2023-09-26  4:37                                         ` Jason Wang
  -1 siblings, 0 replies; 321+ messages in thread
From: Jason Wang @ 2023-09-26  4:37 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Jason Gunthorpe, Michael S. Tsirkin, Alex Williamson,
	Yishai Hadas, kvm, virtualization, Feng Liu, Jiri Pirko,
	kevin.tian, joao.m.martins, Leon Romanovsky, Maor Gottlieb

On Tue, Sep 26, 2023 at 12:01 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Tuesday, September 26, 2023 8:03 AM
> >
> > It's the implementation details in legacy. The device needs to make sure (reset)
> > the driver can work (is done before get_status return).
> It is part of the 0.9.5 and 1.x specification as I quoted those text above.

What I meant is: legacy devices need to find their way to make legacy
drivers work. That's how legacy works.

It's too late to add any normative to the 0.95 spec. So the device
behaviour is actually defined by the legacy drivers. That is why it is
tricky.

If you can't find a way to make legacy drivers work, use modern.

That's it.

Thanks


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-26  4:37                                         ` Jason Wang
  0 siblings, 0 replies; 321+ messages in thread
From: Jason Wang @ 2023-09-26  4:37 UTC (permalink / raw)
  To: Parav Pandit
  Cc: kvm, Michael S. Tsirkin, Maor Gottlieb, virtualization,
	Jason Gunthorpe, Jiri Pirko, Leon Romanovsky

On Tue, Sep 26, 2023 at 12:01 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Tuesday, September 26, 2023 8:03 AM
> >
> > It's the implementation details in legacy. The device needs to make sure (reset)
> > the driver can work (is done before get_status return).
> It is part of the 0.9.5 and 1.x specification as I quoted those text above.

What I meant is: legacy devices need to find their way to make legacy
drivers work. That's how legacy works.

It's too late to add any normative to the 0.95 spec. So the device
behaviour is actually defined by the legacy drivers. That is why it is
tricky.

If you can't find a way to make legacy drivers work, use modern.

That's it.

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-26  3:45                                       ` Parav Pandit via Virtualization
@ 2023-09-26  4:37                                         ` Jason Wang
  -1 siblings, 0 replies; 321+ messages in thread
From: Jason Wang @ 2023-09-26  4:37 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Michael S. Tsirkin, Jason Gunthorpe, Alex Williamson,
	Yishai Hadas, kvm, virtualization, Feng Liu, Jiri Pirko,
	kevin.tian, joao.m.martins, Leon Romanovsky, Maor Gottlieb

On Tue, Sep 26, 2023 at 11:45 AM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Tuesday, September 26, 2023 12:06 AM
>
> > One can thinkably do that wait in hardware, though. Just defer completion until
> > read is done.
> >
> Once OASIS does such new interface and if some hw vendor _actually_ wants to do such complex hw, may be vfio driver can adopt to it.

It is you that tries to revive legacy in the spec. We all know legacy
is tricky but work.

> When we worked with you, we discussed that there such hw does not have enough returns and hence technical committee choose to proceed with admin commands.

I don't think my questions regarding the legacy transport get good
answers at that time. What's more, we all know spec allows to fix,
workaround or even deprecate a feature.

Thanks


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-26  4:37                                         ` Jason Wang
  0 siblings, 0 replies; 321+ messages in thread
From: Jason Wang @ 2023-09-26  4:37 UTC (permalink / raw)
  To: Parav Pandit
  Cc: kvm, Michael S. Tsirkin, Maor Gottlieb, virtualization,
	Jason Gunthorpe, Jiri Pirko, Leon Romanovsky

On Tue, Sep 26, 2023 at 11:45 AM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Tuesday, September 26, 2023 12:06 AM
>
> > One can thinkably do that wait in hardware, though. Just defer completion until
> > read is done.
> >
> Once OASIS does such new interface and if some hw vendor _actually_ wants to do such complex hw, may be vfio driver can adopt to it.

It is you that tries to revive legacy in the spec. We all know legacy
is tricky but work.

> When we worked with you, we discussed that there such hw does not have enough returns and hence technical committee choose to proceed with admin commands.

I don't think my questions regarding the legacy transport get good
answers at that time. What's more, we all know spec allows to fix,
workaround or even deprecate a feature.

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-25 12:26                         ` Jason Gunthorpe
@ 2023-09-26  4:37                             ` Jason Wang
  2023-09-26  4:37                             ` Jason Wang
  1 sibling, 0 replies; 321+ messages in thread
From: Jason Wang @ 2023-09-26  4:37 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvm, Michael S. Tsirkin, maorg, virtualization, jiri, leonro

On Mon, Sep 25, 2023 at 8:26 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Mon, Sep 25, 2023 at 10:34:54AM +0800, Jason Wang wrote:
>
> > > Cloud vendors will similarly use DPUs to create a PCI functions that
> > > meet the cloud vendor's internal specification.
> >
> > This can only work if:
> >
> > 1) the internal specification has finer garin than virtio spec
> > 2) so it can define what is not implemented in the virtio spec (like
> > migration and compatibility)
>
> Yes, and that is what is happening. Realistically the "spec" isjust a
> piece of software that the Cloud vendor owns which is simply ported to
> multiple DPU vendors.
>
> It is the same as VDPA. If VDPA can make multiple NIC vendors
> consistent then why do you have a hard time believing we can do the
> same thing just on the ARM side of a DPU?

I don't. We all know vDPA can do more than virtio.

>
> > All of the above doesn't seem to be possible or realistic now, and it
> > actually has a risk to be not compatible with virtio spec. In the
> > future when virtio has live migration supported, they want to be able
> > to migrate between virtio and vDPA.
>
> Well, that is for the spec to design.

Right, so if we'd consider migration from virtio to vDPA, it needs to
be designed in a way that allows more involvement from hypervisor
other than coupling it with a specific interface (like admin
virtqueues).

>
> > > So, as I keep saying, in this scenario the goal is no mediation in the
> > > hypervisor.
> >
> > That's pretty fine, but I don't think trapping + relying is not
> > mediation. Does it really matter what happens after trapping?
>
> It is not mediation in the sense that the kernel driver does not in
> any way make decisions on the behavior of the device. It simply
> transforms an IO operation into a device command and relays it to the
> device. The device still fully controls its own behavior.
>
> VDPA is very different from this. You might call them both mediation,
> sure, but then you need another word to describe the additional
> changes VPDA is doing.
>
> > > It is pointless, everything you think you need to do there
> > > is actually already being done in the DPU.
> >
> > Well, migration or even Qemu could be offloaded to DPU as well. If
> > that's the direction that's pretty fine.
>
> That's silly, of course qemu/kvm can't run in the DPU.

KVM can't for sure but part of Qemu could. This model has been used.

>
> However, we can empty qemu and the hypervisor out so all it does is
> run kvm and run vfio. In this model the DPU does all the OVS, storage,
> "VPDA", etc. qemu is just a passive relay of the DPU PCI functions
> into VM's vPCI functions.
>
> So, everything VDPA was doing in the environment is migrated into the
> DPU.

It really depends on the use cases. For example, in the case of DPU
what if we want to provide multiple virtio devices through a single
VF?

>
> In this model the DPU is an extension of the hypervisor/qemu
> environment and we shift code from x86 side to arm side to increase
> security, save power and increase total system performance.

That's pretty fine.

Thanks

>
> Jason
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-26  4:37                             ` Jason Wang
  0 siblings, 0 replies; 321+ messages in thread
From: Jason Wang @ 2023-09-26  4:37 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Michael S. Tsirkin, Yishai Hadas, alex.williamson, kvm,
	virtualization, parav, feliu, jiri, kevin.tian, joao.m.martins,
	leonro, maorg

On Mon, Sep 25, 2023 at 8:26 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Mon, Sep 25, 2023 at 10:34:54AM +0800, Jason Wang wrote:
>
> > > Cloud vendors will similarly use DPUs to create a PCI functions that
> > > meet the cloud vendor's internal specification.
> >
> > This can only work if:
> >
> > 1) the internal specification has finer garin than virtio spec
> > 2) so it can define what is not implemented in the virtio spec (like
> > migration and compatibility)
>
> Yes, and that is what is happening. Realistically the "spec" isjust a
> piece of software that the Cloud vendor owns which is simply ported to
> multiple DPU vendors.
>
> It is the same as VDPA. If VDPA can make multiple NIC vendors
> consistent then why do you have a hard time believing we can do the
> same thing just on the ARM side of a DPU?

I don't. We all know vDPA can do more than virtio.

>
> > All of the above doesn't seem to be possible or realistic now, and it
> > actually has a risk to be not compatible with virtio spec. In the
> > future when virtio has live migration supported, they want to be able
> > to migrate between virtio and vDPA.
>
> Well, that is for the spec to design.

Right, so if we'd consider migration from virtio to vDPA, it needs to
be designed in a way that allows more involvement from hypervisor
other than coupling it with a specific interface (like admin
virtqueues).

>
> > > So, as I keep saying, in this scenario the goal is no mediation in the
> > > hypervisor.
> >
> > That's pretty fine, but I don't think trapping + relying is not
> > mediation. Does it really matter what happens after trapping?
>
> It is not mediation in the sense that the kernel driver does not in
> any way make decisions on the behavior of the device. It simply
> transforms an IO operation into a device command and relays it to the
> device. The device still fully controls its own behavior.
>
> VDPA is very different from this. You might call them both mediation,
> sure, but then you need another word to describe the additional
> changes VPDA is doing.
>
> > > It is pointless, everything you think you need to do there
> > > is actually already being done in the DPU.
> >
> > Well, migration or even Qemu could be offloaded to DPU as well. If
> > that's the direction that's pretty fine.
>
> That's silly, of course qemu/kvm can't run in the DPU.

KVM can't for sure but part of Qemu could. This model has been used.

>
> However, we can empty qemu and the hypervisor out so all it does is
> run kvm and run vfio. In this model the DPU does all the OVS, storage,
> "VPDA", etc. qemu is just a passive relay of the DPU PCI functions
> into VM's vPCI functions.
>
> So, everything VDPA was doing in the environment is migrated into the
> DPU.

It really depends on the use cases. For example, in the case of DPU
what if we want to provide multiple virtio devices through a single
VF?

>
> In this model the DPU is an extension of the hypervisor/qemu
> environment and we shift code from x86 side to arm side to increase
> security, save power and increase total system performance.

That's pretty fine.

Thanks

>
> Jason
>


^ permalink raw reply	[flat|nested] 321+ messages in thread

* RE: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-26  4:37                                         ` Jason Wang
@ 2023-09-26  5:27                                           ` Parav Pandit via Virtualization
  -1 siblings, 0 replies; 321+ messages in thread
From: Parav Pandit @ 2023-09-26  5:27 UTC (permalink / raw)
  To: Jason Wang
  Cc: Jason Gunthorpe, Michael S. Tsirkin, Alex Williamson,
	Yishai Hadas, kvm, virtualization, Feng Liu, Jiri Pirko,
	kevin.tian, joao.m.martins, Leon Romanovsky, Maor Gottlieb



> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, September 26, 2023 10:07 AM


> 
> If you can't find a way to make legacy drivers work, use modern.
>
Understood.
This vfio series make the legacy drivers work.
Thanks.
 
> That's it.
> 
> Thanks


^ permalink raw reply	[flat|nested] 321+ messages in thread

* RE: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-26  5:27                                           ` Parav Pandit via Virtualization
  0 siblings, 0 replies; 321+ messages in thread
From: Parav Pandit via Virtualization @ 2023-09-26  5:27 UTC (permalink / raw)
  To: Jason Wang
  Cc: kvm, Michael S. Tsirkin, Maor Gottlieb, virtualization,
	Jason Gunthorpe, Jiri Pirko, Leon Romanovsky



> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, September 26, 2023 10:07 AM


> 
> If you can't find a way to make legacy drivers work, use modern.
>
Understood.
This vfio series make the legacy drivers work.
Thanks.
 
> That's it.
> 
> Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* RE: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-26  4:37                             ` Jason Wang
@ 2023-09-26  5:33                               ` Parav Pandit via Virtualization
  -1 siblings, 0 replies; 321+ messages in thread
From: Parav Pandit @ 2023-09-26  5:33 UTC (permalink / raw)
  To: Jason Wang, Jason Gunthorpe
  Cc: Michael S. Tsirkin, Yishai Hadas, alex.williamson, kvm,
	virtualization, Feng Liu, Jiri Pirko, kevin.tian, joao.m.martins,
	Leon Romanovsky, Maor Gottlieb



> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, September 26, 2023 10:08 AM

> Right, so if we'd consider migration from virtio to vDPA, it needs to be designed
> in a way that allows more involvement from hypervisor other than coupling it
> with a specific interface (like admin virtqueues).
It is not attached to the admin virtqueues.
One way to use it using admin commands at [1].
One can define without admin command by explaining the technical difficulties in admin command may/cannot work.

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html

^ permalink raw reply	[flat|nested] 321+ messages in thread

* RE: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-26  5:33                               ` Parav Pandit via Virtualization
  0 siblings, 0 replies; 321+ messages in thread
From: Parav Pandit via Virtualization @ 2023-09-26  5:33 UTC (permalink / raw)
  To: Jason Wang, Jason Gunthorpe
  Cc: kvm, Michael S. Tsirkin, Maor Gottlieb, virtualization,
	Jiri Pirko, Leon Romanovsky



> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, September 26, 2023 10:08 AM

> Right, so if we'd consider migration from virtio to vDPA, it needs to be designed
> in a way that allows more involvement from hypervisor other than coupling it
> with a specific interface (like admin virtqueues).
It is not attached to the admin virtqueues.
One way to use it using admin commands at [1].
One can define without admin command by explaining the technical difficulties in admin command may/cannot work.

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-26  0:40                             ` Jason Gunthorpe
@ 2023-09-26  5:34                                 ` Michael S. Tsirkin
  2023-09-26  5:42                                 ` Michael S. Tsirkin
  1 sibling, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-26  5:34 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kvm, maorg, virtualization, jiri, leonro

On Mon, Sep 25, 2023 at 09:40:59PM -0300, Jason Gunthorpe wrote:
> On Mon, Sep 25, 2023 at 03:44:11PM -0400, Michael S. Tsirkin wrote:
> > > VDPA is very different from this. You might call them both mediation,
> > > sure, but then you need another word to describe the additional
> > > changes VPDA is doing.
> > 
> > Sorry about hijacking the thread a little bit, but could you
> > call out some of the changes that are the most problematic
> > for you?
> 
> I don't really know these details. The operators have an existing
> virtio world that is ABI toward the VM for them, and they do not want
> *anything* to change. The VM should be unware if the virtio device is
> created by old hypervisor software or new DPU software. It presents
> exactly the same ABI.
> 
> So the challenge really is to convince that VDPA delivers that, and
> frankly, I don't think it does. ABI toward the VM is very important
> here.

And to complete the picture, it is the DPU software/firmware that
is resposible for maintaining this ABI in your ideal world?


> > > In this model the DPU is an extension of the hypervisor/qemu
> > > environment and we shift code from x86 side to arm side to increase
> > > security, save power and increase total system performance.
> > 
> > I think I begin to understand. On the DPU you have some virtio
> > devices but also some non-virtio devices.  So you have to
> > use VFIO to talk to the DPU. Reusing VFIO to talk to virtio
> > devices too, simplifies things for you. 
> 
> Yes
> 
> > If guests will see vendor-specific devices from the DPU anyway, it
> > will be impossible to migrate such guests away from the DPU so the
> > cross-vendor migration capability is less important in this
> > use-case.  Is this a good summary?
> 
> Well, sort of. As I said before, the vendor here is the cloud
> operator, not the DPU supplier. The guest will see an AWS virtio-net
> function, for example.
> 
> The operator will ensure that all their different implementations of
> this function will interwork for migration.
> 
> So within the closed world of a single operator live migration will
> work just fine.
> 
> Since the hypervisor's controlled by the operator only migrate within
> the operators own environment anyhow, it is an already solved problem.
> 
> Jason


Okay the picture emerges I think. Thanks! I'll try to summarize later
for everyone's benefit.


-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-26  5:34                                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-26  5:34 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jason Wang, Yishai Hadas, alex.williamson, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Mon, Sep 25, 2023 at 09:40:59PM -0300, Jason Gunthorpe wrote:
> On Mon, Sep 25, 2023 at 03:44:11PM -0400, Michael S. Tsirkin wrote:
> > > VDPA is very different from this. You might call them both mediation,
> > > sure, but then you need another word to describe the additional
> > > changes VPDA is doing.
> > 
> > Sorry about hijacking the thread a little bit, but could you
> > call out some of the changes that are the most problematic
> > for you?
> 
> I don't really know these details. The operators have an existing
> virtio world that is ABI toward the VM for them, and they do not want
> *anything* to change. The VM should be unware if the virtio device is
> created by old hypervisor software or new DPU software. It presents
> exactly the same ABI.
> 
> So the challenge really is to convince that VDPA delivers that, and
> frankly, I don't think it does. ABI toward the VM is very important
> here.

And to complete the picture, it is the DPU software/firmware that
is resposible for maintaining this ABI in your ideal world?


> > > In this model the DPU is an extension of the hypervisor/qemu
> > > environment and we shift code from x86 side to arm side to increase
> > > security, save power and increase total system performance.
> > 
> > I think I begin to understand. On the DPU you have some virtio
> > devices but also some non-virtio devices.  So you have to
> > use VFIO to talk to the DPU. Reusing VFIO to talk to virtio
> > devices too, simplifies things for you. 
> 
> Yes
> 
> > If guests will see vendor-specific devices from the DPU anyway, it
> > will be impossible to migrate such guests away from the DPU so the
> > cross-vendor migration capability is less important in this
> > use-case.  Is this a good summary?
> 
> Well, sort of. As I said before, the vendor here is the cloud
> operator, not the DPU supplier. The guest will see an AWS virtio-net
> function, for example.
> 
> The operator will ensure that all their different implementations of
> this function will interwork for migration.
> 
> So within the closed world of a single operator live migration will
> work just fine.
> 
> Since the hypervisor's controlled by the operator only migrate within
> the operators own environment anyhow, it is an already solved problem.
> 
> Jason


Okay the picture emerges I think. Thanks! I'll try to summarize later
for everyone's benefit.


-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-26  0:40                             ` Jason Gunthorpe
@ 2023-09-26  5:42                                 ` Michael S. Tsirkin
  2023-09-26  5:42                                 ` Michael S. Tsirkin
  1 sibling, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-26  5:42 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kvm, maorg, virtualization, jiri, leonro

On Mon, Sep 25, 2023 at 09:40:59PM -0300, Jason Gunthorpe wrote:
> On Mon, Sep 25, 2023 at 03:44:11PM -0400, Michael S. Tsirkin wrote:
> > > VDPA is very different from this. You might call them both mediation,
> > > sure, but then you need another word to describe the additional
> > > changes VPDA is doing.
> > 
> > Sorry about hijacking the thread a little bit, but could you
> > call out some of the changes that are the most problematic
> > for you?
> 
> I don't really know these details.

Maybe, you then should desist from saying things like "It entirely fails
to achieve the most important thing it needs to do!" You are not making
any new friends with saying this about a piece of software without
knowing the details.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-26  5:42                                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-26  5:42 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jason Wang, Yishai Hadas, alex.williamson, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Mon, Sep 25, 2023 at 09:40:59PM -0300, Jason Gunthorpe wrote:
> On Mon, Sep 25, 2023 at 03:44:11PM -0400, Michael S. Tsirkin wrote:
> > > VDPA is very different from this. You might call them both mediation,
> > > sure, but then you need another word to describe the additional
> > > changes VPDA is doing.
> > 
> > Sorry about hijacking the thread a little bit, but could you
> > call out some of the changes that are the most problematic
> > for you?
> 
> I don't really know these details.

Maybe, you then should desist from saying things like "It entirely fails
to achieve the most important thing it needs to do!" You are not making
any new friends with saying this about a piece of software without
knowing the details.

-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 08/11] vfio/pci: Expose vfio_pci_core_setup_barmap()
  2023-09-21 16:35     ` Alex Williamson
@ 2023-09-26  9:45       ` Yishai Hadas via Virtualization
  -1 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas @ 2023-09-26  9:45 UTC (permalink / raw)
  To: Alex Williamson
  Cc: mst, jasowang, jgg, kvm, virtualization, parav, feliu, jiri,
	kevin.tian, joao.m.martins, leonro, maorg

On 21/09/2023 19:35, Alex Williamson wrote:
> On Thu, 21 Sep 2023 15:40:37 +0300
> Yishai Hadas <yishaih@nvidia.com> wrote:
>
>> Expose vfio_pci_core_setup_barmap() to be used by drivers.
>>
>> This will let drivers to mmap a BAR and re-use it from both vfio and the
>> driver when it's applicable.
>>
>> This API will be used in the next patches by the vfio/virtio coming
>> driver.
>>
>> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>> ---
>>   drivers/vfio/pci/vfio_pci_core.c | 25 +++++++++++++++++++++++++
>>   drivers/vfio/pci/vfio_pci_rdwr.c | 28 ++--------------------------
>>   include/linux/vfio_pci_core.h    |  1 +
>>   3 files changed, 28 insertions(+), 26 deletions(-)
>>
>> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
>> index 1929103ee59a..b56111ed8a8c 100644
>> --- a/drivers/vfio/pci/vfio_pci_core.c
>> +++ b/drivers/vfio/pci/vfio_pci_core.c
>> @@ -684,6 +684,31 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
>>   }
>>   EXPORT_SYMBOL_GPL(vfio_pci_core_disable);
>>   
>> +int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar)
>> +{
>> +	struct pci_dev *pdev = vdev->pdev;
>> +	void __iomem *io;
>> +	int ret;
>> +
>> +	if (vdev->barmap[bar])
>> +		return 0;
>> +
>> +	ret = pci_request_selected_regions(pdev, 1 << bar, "vfio");
>> +	if (ret)
>> +		return ret;
>> +
>> +	io = pci_iomap(pdev, bar, 0);
>> +	if (!io) {
>> +		pci_release_selected_regions(pdev, 1 << bar);
>> +		return -ENOMEM;
>> +	}
>> +
>> +	vdev->barmap[bar] = io;
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL(vfio_pci_core_setup_barmap);
> Not to endorse the rest of this yet, but minimally _GPL, same for the
> following patch.  Thanks,
>
> Alex

Sure, will change to EXPORT_SYMBOL_GPL as part of V1.

Yishai

>> +
>>   void vfio_pci_core_close_device(struct vfio_device *core_vdev)
>>   {
>>   	struct vfio_pci_core_device *vdev =
>> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
>> index e27de61ac9fe..6f08b3ecbb89 100644
>> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
>> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
>> @@ -200,30 +200,6 @@ static ssize_t do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem,
>>   	return done;
>>   }
>>   
>> -static int vfio_pci_setup_barmap(struct vfio_pci_core_device *vdev, int bar)
>> -{
>> -	struct pci_dev *pdev = vdev->pdev;
>> -	int ret;
>> -	void __iomem *io;
>> -
>> -	if (vdev->barmap[bar])
>> -		return 0;
>> -
>> -	ret = pci_request_selected_regions(pdev, 1 << bar, "vfio");
>> -	if (ret)
>> -		return ret;
>> -
>> -	io = pci_iomap(pdev, bar, 0);
>> -	if (!io) {
>> -		pci_release_selected_regions(pdev, 1 << bar);
>> -		return -ENOMEM;
>> -	}
>> -
>> -	vdev->barmap[bar] = io;
>> -
>> -	return 0;
>> -}
>> -
>>   ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
>>   			size_t count, loff_t *ppos, bool iswrite)
>>   {
>> @@ -262,7 +238,7 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
>>   		}
>>   		x_end = end;
>>   	} else {
>> -		int ret = vfio_pci_setup_barmap(vdev, bar);
>> +		int ret = vfio_pci_core_setup_barmap(vdev, bar);
>>   		if (ret) {
>>   			done = ret;
>>   			goto out;
>> @@ -438,7 +414,7 @@ int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset,
>>   		return -EINVAL;
>>   #endif
>>   
>> -	ret = vfio_pci_setup_barmap(vdev, bar);
>> +	ret = vfio_pci_core_setup_barmap(vdev, bar);
>>   	if (ret)
>>   		return ret;
>>   
>> diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
>> index 562e8754869d..67ac58e20e1d 100644
>> --- a/include/linux/vfio_pci_core.h
>> +++ b/include/linux/vfio_pci_core.h
>> @@ -127,6 +127,7 @@ int vfio_pci_core_match(struct vfio_device *core_vdev, char *buf);
>>   int vfio_pci_core_enable(struct vfio_pci_core_device *vdev);
>>   void vfio_pci_core_disable(struct vfio_pci_core_device *vdev);
>>   void vfio_pci_core_finish_enable(struct vfio_pci_core_device *vdev);
>> +int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar);
>>   pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev,
>>   						pci_channel_state_t state);
>>   



^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 08/11] vfio/pci: Expose vfio_pci_core_setup_barmap()
@ 2023-09-26  9:45       ` Yishai Hadas via Virtualization
  0 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-09-26  9:45 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, mst, maorg, virtualization, jgg, jiri, leonro

On 21/09/2023 19:35, Alex Williamson wrote:
> On Thu, 21 Sep 2023 15:40:37 +0300
> Yishai Hadas <yishaih@nvidia.com> wrote:
>
>> Expose vfio_pci_core_setup_barmap() to be used by drivers.
>>
>> This will let drivers to mmap a BAR and re-use it from both vfio and the
>> driver when it's applicable.
>>
>> This API will be used in the next patches by the vfio/virtio coming
>> driver.
>>
>> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>> ---
>>   drivers/vfio/pci/vfio_pci_core.c | 25 +++++++++++++++++++++++++
>>   drivers/vfio/pci/vfio_pci_rdwr.c | 28 ++--------------------------
>>   include/linux/vfio_pci_core.h    |  1 +
>>   3 files changed, 28 insertions(+), 26 deletions(-)
>>
>> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
>> index 1929103ee59a..b56111ed8a8c 100644
>> --- a/drivers/vfio/pci/vfio_pci_core.c
>> +++ b/drivers/vfio/pci/vfio_pci_core.c
>> @@ -684,6 +684,31 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
>>   }
>>   EXPORT_SYMBOL_GPL(vfio_pci_core_disable);
>>   
>> +int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar)
>> +{
>> +	struct pci_dev *pdev = vdev->pdev;
>> +	void __iomem *io;
>> +	int ret;
>> +
>> +	if (vdev->barmap[bar])
>> +		return 0;
>> +
>> +	ret = pci_request_selected_regions(pdev, 1 << bar, "vfio");
>> +	if (ret)
>> +		return ret;
>> +
>> +	io = pci_iomap(pdev, bar, 0);
>> +	if (!io) {
>> +		pci_release_selected_regions(pdev, 1 << bar);
>> +		return -ENOMEM;
>> +	}
>> +
>> +	vdev->barmap[bar] = io;
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL(vfio_pci_core_setup_barmap);
> Not to endorse the rest of this yet, but minimally _GPL, same for the
> following patch.  Thanks,
>
> Alex

Sure, will change to EXPORT_SYMBOL_GPL as part of V1.

Yishai

>> +
>>   void vfio_pci_core_close_device(struct vfio_device *core_vdev)
>>   {
>>   	struct vfio_pci_core_device *vdev =
>> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
>> index e27de61ac9fe..6f08b3ecbb89 100644
>> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
>> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
>> @@ -200,30 +200,6 @@ static ssize_t do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem,
>>   	return done;
>>   }
>>   
>> -static int vfio_pci_setup_barmap(struct vfio_pci_core_device *vdev, int bar)
>> -{
>> -	struct pci_dev *pdev = vdev->pdev;
>> -	int ret;
>> -	void __iomem *io;
>> -
>> -	if (vdev->barmap[bar])
>> -		return 0;
>> -
>> -	ret = pci_request_selected_regions(pdev, 1 << bar, "vfio");
>> -	if (ret)
>> -		return ret;
>> -
>> -	io = pci_iomap(pdev, bar, 0);
>> -	if (!io) {
>> -		pci_release_selected_regions(pdev, 1 << bar);
>> -		return -ENOMEM;
>> -	}
>> -
>> -	vdev->barmap[bar] = io;
>> -
>> -	return 0;
>> -}
>> -
>>   ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
>>   			size_t count, loff_t *ppos, bool iswrite)
>>   {
>> @@ -262,7 +238,7 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
>>   		}
>>   		x_end = end;
>>   	} else {
>> -		int ret = vfio_pci_setup_barmap(vdev, bar);
>> +		int ret = vfio_pci_core_setup_barmap(vdev, bar);
>>   		if (ret) {
>>   			done = ret;
>>   			goto out;
>> @@ -438,7 +414,7 @@ int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset,
>>   		return -EINVAL;
>>   #endif
>>   
>> -	ret = vfio_pci_setup_barmap(vdev, bar);
>> +	ret = vfio_pci_core_setup_barmap(vdev, bar);
>>   	if (ret)
>>   		return ret;
>>   
>> diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
>> index 562e8754869d..67ac58e20e1d 100644
>> --- a/include/linux/vfio_pci_core.h
>> +++ b/include/linux/vfio_pci_core.h
>> @@ -127,6 +127,7 @@ int vfio_pci_core_match(struct vfio_device *core_vdev, char *buf);
>>   int vfio_pci_core_enable(struct vfio_pci_core_device *vdev);
>>   void vfio_pci_core_disable(struct vfio_pci_core_device *vdev);
>>   void vfio_pci_core_finish_enable(struct vfio_pci_core_device *vdev);
>> +int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar);
>>   pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev,
>>   						pci_channel_state_t state);
>>   


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-09-21 20:34     ` Michael S. Tsirkin
@ 2023-09-26 10:51       ` Yishai Hadas via Virtualization
  -1 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas @ 2023-09-26 10:51 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: alex.williamson, jasowang, jgg, kvm, virtualization, parav,
	feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On 21/09/2023 23:34, Michael S. Tsirkin wrote:
> On Thu, Sep 21, 2023 at 03:40:39PM +0300, Yishai Hadas wrote:
>> Expose admin commands over the virtio device, to be used by the
>> vfio-virtio driver in the next patches.
>>
>> It includes: list query/use, legacy write/read, read notify_info.
>>
>> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>> ---
>>   drivers/vfio/pci/virtio/cmd.c | 146 ++++++++++++++++++++++++++++++++++
>>   drivers/vfio/pci/virtio/cmd.h |  27 +++++++
>>   2 files changed, 173 insertions(+)
>>   create mode 100644 drivers/vfio/pci/virtio/cmd.c
>>   create mode 100644 drivers/vfio/pci/virtio/cmd.h
>>
>> diff --git a/drivers/vfio/pci/virtio/cmd.c b/drivers/vfio/pci/virtio/cmd.c
>> new file mode 100644
>> index 000000000000..f068239cdbb0
>> --- /dev/null
>> +++ b/drivers/vfio/pci/virtio/cmd.c
>> @@ -0,0 +1,146 @@
>> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
>> +/*
>> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
>> + */
>> +
>> +#include "cmd.h"
>> +
>> +int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
>> +{
>> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>> +	struct scatterlist out_sg;
>> +	struct virtio_admin_cmd cmd = {};
>> +
>> +	if (!virtio_dev)
>> +		return -ENOTCONN;
>> +
>> +	sg_init_one(&out_sg, buf, buf_size);
>> +	cmd.opcode = VIRTIO_ADMIN_CMD_LIST_QUERY;
>> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
>> +	cmd.result_sg = &out_sg;
>> +
>> +	return virtio_admin_cmd_exec(virtio_dev, &cmd);
>> +}
>> +
> in/out seem all wrong here. In virtio terminology, in means from
> device to driver, out means from driver to device.
I referred here to in/out from vfio POV who prepares the command.

However, I can replace it to follow the virtio terminology as you 
suggested if this more makes sense.

Please see also my coming answer on your suggestion to put all of this 
in the virtio layer.

>> +int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
>> +{
>> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>> +	struct scatterlist in_sg;
>> +	struct virtio_admin_cmd cmd = {};
>> +
>> +	if (!virtio_dev)
>> +		return -ENOTCONN;
>> +
>> +	sg_init_one(&in_sg, buf, buf_size);
>> +	cmd.opcode = VIRTIO_ADMIN_CMD_LIST_USE;
>> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
>> +	cmd.data_sg = &in_sg;
>> +
>> +	return virtio_admin_cmd_exec(virtio_dev, &cmd);
>> +}
>> +
>> +int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
>
> what is _lr short for?

This was an acronym to legacy_read.

The actual command is according to the given opcode which can be one 
among LEGACY_COMMON_CFG_READ, LEGACY_DEV_CFG_READ.

I can rename it to '_legacy_read' (i.e. virtiovf_issue_legacy_read_cmd) 
to be clearer.

>
>> +			  u8 offset, u8 size, u8 *buf)
>> +{
>> +	struct virtio_device *virtio_dev =
>> +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
>> +	struct virtio_admin_cmd_data_lr_write *in;
>> +	struct scatterlist in_sg;
>> +	struct virtio_admin_cmd cmd = {};
>> +	int ret;
>> +
>> +	if (!virtio_dev)
>> +		return -ENOTCONN;
>> +
>> +	in = kzalloc(sizeof(*in) + size, GFP_KERNEL);
>> +	if (!in)
>> +		return -ENOMEM;
>> +
>> +	in->offset = offset;
>> +	memcpy(in->registers, buf, size);
>> +	sg_init_one(&in_sg, in, sizeof(*in) + size);
>> +	cmd.opcode = opcode;
>> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
>> +	cmd.group_member_id = virtvdev->vf_id + 1;
> weird. why + 1?

This follows the virtio spec in that area.

"When sending commands with the SR-IOV group type, the driver specify a 
value for group_member_id
between 1 and NumVFs inclusive."

The 'virtvdev->vf_id' was set upon vfio/virtio driver initialization by 
pci_iov_vf_id() which its first index is 0.

>> +	cmd.data_sg = &in_sg;
>> +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
>> +
>> +	kfree(in);
>> +	return ret;
>> +}
> How do you know it's safe to send this command, in particular at
> this time? This seems to be doing zero checks, and zero synchronization
> with the PF driver.
>
The virtiovf_cmd_lr_read()/other gets a virtio VF and it gets its PF by 
calling virtio_pci_vf_get_pf_dev().

The VF can't gone by 'disable sriov' as it's owned/used by vfio.

The PF can't gone by rmmod/modprobe -r of virtio, as of the 'module in 
use'/dependencies between VFIO to VIRTIO.

The below check [1] was done only from a clean code perspective, which 
might theoretically fail in case the given VF doesn't use a virtio driver.

[1] if (!virtio_dev)
         return -ENOTCONN;

So, it looks safe as is.

>> +
>> +int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
>> +			 u8 offset, u8 size, u8 *buf)
>> +{
>> +	struct virtio_device *virtio_dev =
>> +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
>> +	struct virtio_admin_cmd_data_lr_read *in;
>> +	struct scatterlist in_sg, out_sg;
>> +	struct virtio_admin_cmd cmd = {};
>> +	int ret;
>> +
>> +	if (!virtio_dev)
>> +		return -ENOTCONN;
>> +
>> +	in = kzalloc(sizeof(*in), GFP_KERNEL);
>> +	if (!in)
>> +		return -ENOMEM;
>> +
>> +	in->offset = offset;
>> +	sg_init_one(&in_sg, in, sizeof(*in));
>> +	sg_init_one(&out_sg, buf, size);
>> +	cmd.opcode = opcode;
>> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
>> +	cmd.data_sg = &in_sg;
>> +	cmd.result_sg = &out_sg;
>> +	cmd.group_member_id = virtvdev->vf_id + 1;
>> +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
>> +
>> +	kfree(in);
>> +	return ret;
>> +}
>> +
>> +int virtiovf_cmd_lq_read_notify(struct virtiovf_pci_core_device *virtvdev,
> and what is lq short for?

To be more explicit, I may replace to virtiovf_cmd_legacy_notify_info() 
to follow the spec opcode.

Yishai

>
>> +				u8 req_bar_flags, u8 *bar, u64 *bar_offset)
>> +{
>> +	struct virtio_device *virtio_dev =
>> +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
>> +	struct virtio_admin_cmd_notify_info_result *out;
>> +	struct scatterlist out_sg;
>> +	struct virtio_admin_cmd cmd = {};
>> +	int ret;
>> +
>> +	if (!virtio_dev)
>> +		return -ENOTCONN;
>> +
>> +	out = kzalloc(sizeof(*out), GFP_KERNEL);
>> +	if (!out)
>> +		return -ENOMEM;
>> +
>> +	sg_init_one(&out_sg, out, sizeof(*out));
>> +	cmd.opcode = VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO;
>> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
>> +	cmd.result_sg = &out_sg;
>> +	cmd.group_member_id = virtvdev->vf_id + 1;
>> +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
>> +	if (!ret) {
>> +		struct virtio_admin_cmd_notify_info_data *entry;
>> +		int i;
>> +
>> +		ret = -ENOENT;
>> +		for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
>> +			entry = &out->entries[i];
>> +			if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
>> +				break;
>> +			if (entry->flags != req_bar_flags)
>> +				continue;
>> +			*bar = entry->bar;
>> +			*bar_offset = le64_to_cpu(entry->offset);
>> +			ret = 0;
>> +			break;
>> +		}
>> +	}
>> +
>> +	kfree(out);
>> +	return ret;
>> +}
>> diff --git a/drivers/vfio/pci/virtio/cmd.h b/drivers/vfio/pci/virtio/cmd.h
>> new file mode 100644
>> index 000000000000..c2a3645f4b90
>> --- /dev/null
>> +++ b/drivers/vfio/pci/virtio/cmd.h
>> @@ -0,0 +1,27 @@
>> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
>> +/*
>> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
>> + */
>> +
>> +#ifndef VIRTIO_VFIO_CMD_H
>> +#define VIRTIO_VFIO_CMD_H
>> +
>> +#include <linux/kernel.h>
>> +#include <linux/virtio.h>
>> +#include <linux/vfio_pci_core.h>
>> +#include <linux/virtio_pci.h>
>> +
>> +struct virtiovf_pci_core_device {
>> +	struct vfio_pci_core_device core_device;
>> +	int vf_id;
>> +};
>> +
>> +int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
>> +int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size);
>> +int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
>> +			  u8 offset, u8 size, u8 *buf);
>> +int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
>> +			 u8 offset, u8 size, u8 *buf);
>> +int virtiovf_cmd_lq_read_notify(struct virtiovf_pci_core_device *virtvdev,
>> +				u8 req_bar_flags, u8 *bar, u64 *bar_offset);
>> +#endif /* VIRTIO_VFIO_CMD_H */
>> -- 
>> 2.27.0



^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-09-26 10:51       ` Yishai Hadas via Virtualization
  0 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-09-26 10:51 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, maorg, virtualization, jgg, jiri, leonro

On 21/09/2023 23:34, Michael S. Tsirkin wrote:
> On Thu, Sep 21, 2023 at 03:40:39PM +0300, Yishai Hadas wrote:
>> Expose admin commands over the virtio device, to be used by the
>> vfio-virtio driver in the next patches.
>>
>> It includes: list query/use, legacy write/read, read notify_info.
>>
>> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>> ---
>>   drivers/vfio/pci/virtio/cmd.c | 146 ++++++++++++++++++++++++++++++++++
>>   drivers/vfio/pci/virtio/cmd.h |  27 +++++++
>>   2 files changed, 173 insertions(+)
>>   create mode 100644 drivers/vfio/pci/virtio/cmd.c
>>   create mode 100644 drivers/vfio/pci/virtio/cmd.h
>>
>> diff --git a/drivers/vfio/pci/virtio/cmd.c b/drivers/vfio/pci/virtio/cmd.c
>> new file mode 100644
>> index 000000000000..f068239cdbb0
>> --- /dev/null
>> +++ b/drivers/vfio/pci/virtio/cmd.c
>> @@ -0,0 +1,146 @@
>> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
>> +/*
>> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
>> + */
>> +
>> +#include "cmd.h"
>> +
>> +int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
>> +{
>> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>> +	struct scatterlist out_sg;
>> +	struct virtio_admin_cmd cmd = {};
>> +
>> +	if (!virtio_dev)
>> +		return -ENOTCONN;
>> +
>> +	sg_init_one(&out_sg, buf, buf_size);
>> +	cmd.opcode = VIRTIO_ADMIN_CMD_LIST_QUERY;
>> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
>> +	cmd.result_sg = &out_sg;
>> +
>> +	return virtio_admin_cmd_exec(virtio_dev, &cmd);
>> +}
>> +
> in/out seem all wrong here. In virtio terminology, in means from
> device to driver, out means from driver to device.
I referred here to in/out from vfio POV who prepares the command.

However, I can replace it to follow the virtio terminology as you 
suggested if this more makes sense.

Please see also my coming answer on your suggestion to put all of this 
in the virtio layer.

>> +int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
>> +{
>> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>> +	struct scatterlist in_sg;
>> +	struct virtio_admin_cmd cmd = {};
>> +
>> +	if (!virtio_dev)
>> +		return -ENOTCONN;
>> +
>> +	sg_init_one(&in_sg, buf, buf_size);
>> +	cmd.opcode = VIRTIO_ADMIN_CMD_LIST_USE;
>> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
>> +	cmd.data_sg = &in_sg;
>> +
>> +	return virtio_admin_cmd_exec(virtio_dev, &cmd);
>> +}
>> +
>> +int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
>
> what is _lr short for?

This was an acronym to legacy_read.

The actual command is according to the given opcode which can be one 
among LEGACY_COMMON_CFG_READ, LEGACY_DEV_CFG_READ.

I can rename it to '_legacy_read' (i.e. virtiovf_issue_legacy_read_cmd) 
to be clearer.

>
>> +			  u8 offset, u8 size, u8 *buf)
>> +{
>> +	struct virtio_device *virtio_dev =
>> +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
>> +	struct virtio_admin_cmd_data_lr_write *in;
>> +	struct scatterlist in_sg;
>> +	struct virtio_admin_cmd cmd = {};
>> +	int ret;
>> +
>> +	if (!virtio_dev)
>> +		return -ENOTCONN;
>> +
>> +	in = kzalloc(sizeof(*in) + size, GFP_KERNEL);
>> +	if (!in)
>> +		return -ENOMEM;
>> +
>> +	in->offset = offset;
>> +	memcpy(in->registers, buf, size);
>> +	sg_init_one(&in_sg, in, sizeof(*in) + size);
>> +	cmd.opcode = opcode;
>> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
>> +	cmd.group_member_id = virtvdev->vf_id + 1;
> weird. why + 1?

This follows the virtio spec in that area.

"When sending commands with the SR-IOV group type, the driver specify a 
value for group_member_id
between 1 and NumVFs inclusive."

The 'virtvdev->vf_id' was set upon vfio/virtio driver initialization by 
pci_iov_vf_id() which its first index is 0.

>> +	cmd.data_sg = &in_sg;
>> +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
>> +
>> +	kfree(in);
>> +	return ret;
>> +}
> How do you know it's safe to send this command, in particular at
> this time? This seems to be doing zero checks, and zero synchronization
> with the PF driver.
>
The virtiovf_cmd_lr_read()/other gets a virtio VF and it gets its PF by 
calling virtio_pci_vf_get_pf_dev().

The VF can't gone by 'disable sriov' as it's owned/used by vfio.

The PF can't gone by rmmod/modprobe -r of virtio, as of the 'module in 
use'/dependencies between VFIO to VIRTIO.

The below check [1] was done only from a clean code perspective, which 
might theoretically fail in case the given VF doesn't use a virtio driver.

[1] if (!virtio_dev)
         return -ENOTCONN;

So, it looks safe as is.

>> +
>> +int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
>> +			 u8 offset, u8 size, u8 *buf)
>> +{
>> +	struct virtio_device *virtio_dev =
>> +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
>> +	struct virtio_admin_cmd_data_lr_read *in;
>> +	struct scatterlist in_sg, out_sg;
>> +	struct virtio_admin_cmd cmd = {};
>> +	int ret;
>> +
>> +	if (!virtio_dev)
>> +		return -ENOTCONN;
>> +
>> +	in = kzalloc(sizeof(*in), GFP_KERNEL);
>> +	if (!in)
>> +		return -ENOMEM;
>> +
>> +	in->offset = offset;
>> +	sg_init_one(&in_sg, in, sizeof(*in));
>> +	sg_init_one(&out_sg, buf, size);
>> +	cmd.opcode = opcode;
>> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
>> +	cmd.data_sg = &in_sg;
>> +	cmd.result_sg = &out_sg;
>> +	cmd.group_member_id = virtvdev->vf_id + 1;
>> +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
>> +
>> +	kfree(in);
>> +	return ret;
>> +}
>> +
>> +int virtiovf_cmd_lq_read_notify(struct virtiovf_pci_core_device *virtvdev,
> and what is lq short for?

To be more explicit, I may replace to virtiovf_cmd_legacy_notify_info() 
to follow the spec opcode.

Yishai

>
>> +				u8 req_bar_flags, u8 *bar, u64 *bar_offset)
>> +{
>> +	struct virtio_device *virtio_dev =
>> +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
>> +	struct virtio_admin_cmd_notify_info_result *out;
>> +	struct scatterlist out_sg;
>> +	struct virtio_admin_cmd cmd = {};
>> +	int ret;
>> +
>> +	if (!virtio_dev)
>> +		return -ENOTCONN;
>> +
>> +	out = kzalloc(sizeof(*out), GFP_KERNEL);
>> +	if (!out)
>> +		return -ENOMEM;
>> +
>> +	sg_init_one(&out_sg, out, sizeof(*out));
>> +	cmd.opcode = VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO;
>> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
>> +	cmd.result_sg = &out_sg;
>> +	cmd.group_member_id = virtvdev->vf_id + 1;
>> +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
>> +	if (!ret) {
>> +		struct virtio_admin_cmd_notify_info_data *entry;
>> +		int i;
>> +
>> +		ret = -ENOENT;
>> +		for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
>> +			entry = &out->entries[i];
>> +			if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
>> +				break;
>> +			if (entry->flags != req_bar_flags)
>> +				continue;
>> +			*bar = entry->bar;
>> +			*bar_offset = le64_to_cpu(entry->offset);
>> +			ret = 0;
>> +			break;
>> +		}
>> +	}
>> +
>> +	kfree(out);
>> +	return ret;
>> +}
>> diff --git a/drivers/vfio/pci/virtio/cmd.h b/drivers/vfio/pci/virtio/cmd.h
>> new file mode 100644
>> index 000000000000..c2a3645f4b90
>> --- /dev/null
>> +++ b/drivers/vfio/pci/virtio/cmd.h
>> @@ -0,0 +1,27 @@
>> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
>> +/*
>> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
>> + */
>> +
>> +#ifndef VIRTIO_VFIO_CMD_H
>> +#define VIRTIO_VFIO_CMD_H
>> +
>> +#include <linux/kernel.h>
>> +#include <linux/virtio.h>
>> +#include <linux/vfio_pci_core.h>
>> +#include <linux/virtio_pci.h>
>> +
>> +struct virtiovf_pci_core_device {
>> +	struct vfio_pci_core_device core_device;
>> +	int vf_id;
>> +};
>> +
>> +int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
>> +int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size);
>> +int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
>> +			  u8 offset, u8 size, u8 *buf);
>> +int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
>> +			 u8 offset, u8 size, u8 *buf);
>> +int virtiovf_cmd_lq_read_notify(struct virtiovf_pci_core_device *virtvdev,
>> +				u8 req_bar_flags, u8 *bar, u64 *bar_offset);
>> +#endif /* VIRTIO_VFIO_CMD_H */
>> -- 
>> 2.27.0


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-09-22  9:54     ` Michael S. Tsirkin
@ 2023-09-26 11:14       ` Yishai Hadas via Virtualization
  -1 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas @ 2023-09-26 11:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: alex.williamson, jasowang, jgg, kvm, virtualization, parav,
	feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On 22/09/2023 12:54, Michael S. Tsirkin wrote:
> On Thu, Sep 21, 2023 at 03:40:39PM +0300, Yishai Hadas wrote:
>> Expose admin commands over the virtio device, to be used by the
>> vfio-virtio driver in the next patches.
>>
>> It includes: list query/use, legacy write/read, read notify_info.
>>
>> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>
> This stuff is pure virtio spec. I think it should live under
> drivers/virtio, too.

The motivation to put it in the vfio layer was from the below main reasons:

1) Having it inside virtio may require to export a symbol/function per 
command.

    This will end up today by 5 and in the future (e.g. live migration) 
with much more exported symbols.

    With current code we export only 2 generic symbols 
virtio_pci_vf_get_pf_dev(), virtio_admin_cmd_exec() which may fit for 
any further extension.

2) For now there is no logic in this vfio layer, however, in the future 
we may have some DMA/other logic that should better fit to the 
caller/client layer (i.e. vfio).

By the way, this follows what was done already between vfio/mlx5 to 
mlx5_core modules where mlx5_core exposes generic APIs to execute a 
command and to get the a PF from a given mlx5 VF.

This way, we can enable further commands to be added/extended 
easily/cleanly.

See for example here [1, 2].

[1] 
https://elixir.bootlin.com/linux/v6.6-rc3/source/drivers/vfio/pci/mlx5/cmd.c#L210

[2] 
https://elixir.bootlin.com/linux/v6.6-rc3/source/drivers/vfio/pci/mlx5/cmd.c#L683

Yishai

>
>> ---
>>   drivers/vfio/pci/virtio/cmd.c | 146 ++++++++++++++++++++++++++++++++++
>>   drivers/vfio/pci/virtio/cmd.h |  27 +++++++
>>   2 files changed, 173 insertions(+)
>>   create mode 100644 drivers/vfio/pci/virtio/cmd.c
>>   create mode 100644 drivers/vfio/pci/virtio/cmd.h
>>
>> diff --git a/drivers/vfio/pci/virtio/cmd.c b/drivers/vfio/pci/virtio/cmd.c
>> new file mode 100644
>> index 000000000000..f068239cdbb0
>> --- /dev/null
>> +++ b/drivers/vfio/pci/virtio/cmd.c
>> @@ -0,0 +1,146 @@
>> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
>> +/*
>> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
>> + */
>> +
>> +#include "cmd.h"
>> +
>> +int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
>> +{
>> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>> +	struct scatterlist out_sg;
>> +	struct virtio_admin_cmd cmd = {};
>> +
>> +	if (!virtio_dev)
>> +		return -ENOTCONN;
>> +
>> +	sg_init_one(&out_sg, buf, buf_size);
>> +	cmd.opcode = VIRTIO_ADMIN_CMD_LIST_QUERY;
>> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
>> +	cmd.result_sg = &out_sg;
>> +
>> +	return virtio_admin_cmd_exec(virtio_dev, &cmd);
>> +}
>> +
>> +int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
>> +{
>> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>> +	struct scatterlist in_sg;
>> +	struct virtio_admin_cmd cmd = {};
>> +
>> +	if (!virtio_dev)
>> +		return -ENOTCONN;
>> +
>> +	sg_init_one(&in_sg, buf, buf_size);
>> +	cmd.opcode = VIRTIO_ADMIN_CMD_LIST_USE;
>> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
>> +	cmd.data_sg = &in_sg;
>> +
>> +	return virtio_admin_cmd_exec(virtio_dev, &cmd);
>> +}
>> +
>> +int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
>> +			  u8 offset, u8 size, u8 *buf)
>> +{
>> +	struct virtio_device *virtio_dev =
>> +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
>> +	struct virtio_admin_cmd_data_lr_write *in;
>> +	struct scatterlist in_sg;
>> +	struct virtio_admin_cmd cmd = {};
>> +	int ret;
>> +
>> +	if (!virtio_dev)
>> +		return -ENOTCONN;
>> +
>> +	in = kzalloc(sizeof(*in) + size, GFP_KERNEL);
>> +	if (!in)
>> +		return -ENOMEM;
>> +
>> +	in->offset = offset;
>> +	memcpy(in->registers, buf, size);
>> +	sg_init_one(&in_sg, in, sizeof(*in) + size);
>> +	cmd.opcode = opcode;
>> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
>> +	cmd.group_member_id = virtvdev->vf_id + 1;
>> +	cmd.data_sg = &in_sg;
>> +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
>> +
>> +	kfree(in);
>> +	return ret;
>> +}
>> +
>> +int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
>> +			 u8 offset, u8 size, u8 *buf)
>> +{
>> +	struct virtio_device *virtio_dev =
>> +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
>> +	struct virtio_admin_cmd_data_lr_read *in;
>> +	struct scatterlist in_sg, out_sg;
>> +	struct virtio_admin_cmd cmd = {};
>> +	int ret;
>> +
>> +	if (!virtio_dev)
>> +		return -ENOTCONN;
>> +
>> +	in = kzalloc(sizeof(*in), GFP_KERNEL);
>> +	if (!in)
>> +		return -ENOMEM;
>> +
>> +	in->offset = offset;
>> +	sg_init_one(&in_sg, in, sizeof(*in));
>> +	sg_init_one(&out_sg, buf, size);
>> +	cmd.opcode = opcode;
>> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
>> +	cmd.data_sg = &in_sg;
>> +	cmd.result_sg = &out_sg;
>> +	cmd.group_member_id = virtvdev->vf_id + 1;
>> +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
>> +
>> +	kfree(in);
>> +	return ret;
>> +}
>> +
>> +int virtiovf_cmd_lq_read_notify(struct virtiovf_pci_core_device *virtvdev,
>> +				u8 req_bar_flags, u8 *bar, u64 *bar_offset)
>> +{
>> +	struct virtio_device *virtio_dev =
>> +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
>> +	struct virtio_admin_cmd_notify_info_result *out;
>> +	struct scatterlist out_sg;
>> +	struct virtio_admin_cmd cmd = {};
>> +	int ret;
>> +
>> +	if (!virtio_dev)
>> +		return -ENOTCONN;
>> +
>> +	out = kzalloc(sizeof(*out), GFP_KERNEL);
>> +	if (!out)
>> +		return -ENOMEM;
>> +
>> +	sg_init_one(&out_sg, out, sizeof(*out));
>> +	cmd.opcode = VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO;
>> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
>> +	cmd.result_sg = &out_sg;
>> +	cmd.group_member_id = virtvdev->vf_id + 1;
>> +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
>> +	if (!ret) {
>> +		struct virtio_admin_cmd_notify_info_data *entry;
>> +		int i;
>> +
>> +		ret = -ENOENT;
>> +		for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
>> +			entry = &out->entries[i];
>> +			if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
>> +				break;
>> +			if (entry->flags != req_bar_flags)
>> +				continue;
>> +			*bar = entry->bar;
>> +			*bar_offset = le64_to_cpu(entry->offset);
>> +			ret = 0;
>> +			break;
>> +		}
>> +	}
>> +
>> +	kfree(out);
>> +	return ret;
>> +}
>> diff --git a/drivers/vfio/pci/virtio/cmd.h b/drivers/vfio/pci/virtio/cmd.h
>> new file mode 100644
>> index 000000000000..c2a3645f4b90
>> --- /dev/null
>> +++ b/drivers/vfio/pci/virtio/cmd.h
>> @@ -0,0 +1,27 @@
>> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
>> +/*
>> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
>> + */
>> +
>> +#ifndef VIRTIO_VFIO_CMD_H
>> +#define VIRTIO_VFIO_CMD_H
>> +
>> +#include <linux/kernel.h>
>> +#include <linux/virtio.h>
>> +#include <linux/vfio_pci_core.h>
>> +#include <linux/virtio_pci.h>
>> +
>> +struct virtiovf_pci_core_device {
>> +	struct vfio_pci_core_device core_device;
>> +	int vf_id;
>> +};
>> +
>> +int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
>> +int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size);
>> +int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
>> +			  u8 offset, u8 size, u8 *buf);
>> +int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
>> +			 u8 offset, u8 size, u8 *buf);
>> +int virtiovf_cmd_lq_read_notify(struct virtiovf_pci_core_device *virtvdev,
>> +				u8 req_bar_flags, u8 *bar, u64 *bar_offset);
>> +#endif /* VIRTIO_VFIO_CMD_H */
>> -- 
>> 2.27.0



^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-09-26 11:14       ` Yishai Hadas via Virtualization
  0 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-09-26 11:14 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, maorg, virtualization, jgg, jiri, leonro

On 22/09/2023 12:54, Michael S. Tsirkin wrote:
> On Thu, Sep 21, 2023 at 03:40:39PM +0300, Yishai Hadas wrote:
>> Expose admin commands over the virtio device, to be used by the
>> vfio-virtio driver in the next patches.
>>
>> It includes: list query/use, legacy write/read, read notify_info.
>>
>> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>
> This stuff is pure virtio spec. I think it should live under
> drivers/virtio, too.

The motivation to put it in the vfio layer was from the below main reasons:

1) Having it inside virtio may require to export a symbol/function per 
command.

    This will end up today by 5 and in the future (e.g. live migration) 
with much more exported symbols.

    With current code we export only 2 generic symbols 
virtio_pci_vf_get_pf_dev(), virtio_admin_cmd_exec() which may fit for 
any further extension.

2) For now there is no logic in this vfio layer, however, in the future 
we may have some DMA/other logic that should better fit to the 
caller/client layer (i.e. vfio).

By the way, this follows what was done already between vfio/mlx5 to 
mlx5_core modules where mlx5_core exposes generic APIs to execute a 
command and to get the a PF from a given mlx5 VF.

This way, we can enable further commands to be added/extended 
easily/cleanly.

See for example here [1, 2].

[1] 
https://elixir.bootlin.com/linux/v6.6-rc3/source/drivers/vfio/pci/mlx5/cmd.c#L210

[2] 
https://elixir.bootlin.com/linux/v6.6-rc3/source/drivers/vfio/pci/mlx5/cmd.c#L683

Yishai

>
>> ---
>>   drivers/vfio/pci/virtio/cmd.c | 146 ++++++++++++++++++++++++++++++++++
>>   drivers/vfio/pci/virtio/cmd.h |  27 +++++++
>>   2 files changed, 173 insertions(+)
>>   create mode 100644 drivers/vfio/pci/virtio/cmd.c
>>   create mode 100644 drivers/vfio/pci/virtio/cmd.h
>>
>> diff --git a/drivers/vfio/pci/virtio/cmd.c b/drivers/vfio/pci/virtio/cmd.c
>> new file mode 100644
>> index 000000000000..f068239cdbb0
>> --- /dev/null
>> +++ b/drivers/vfio/pci/virtio/cmd.c
>> @@ -0,0 +1,146 @@
>> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
>> +/*
>> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
>> + */
>> +
>> +#include "cmd.h"
>> +
>> +int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
>> +{
>> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>> +	struct scatterlist out_sg;
>> +	struct virtio_admin_cmd cmd = {};
>> +
>> +	if (!virtio_dev)
>> +		return -ENOTCONN;
>> +
>> +	sg_init_one(&out_sg, buf, buf_size);
>> +	cmd.opcode = VIRTIO_ADMIN_CMD_LIST_QUERY;
>> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
>> +	cmd.result_sg = &out_sg;
>> +
>> +	return virtio_admin_cmd_exec(virtio_dev, &cmd);
>> +}
>> +
>> +int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
>> +{
>> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>> +	struct scatterlist in_sg;
>> +	struct virtio_admin_cmd cmd = {};
>> +
>> +	if (!virtio_dev)
>> +		return -ENOTCONN;
>> +
>> +	sg_init_one(&in_sg, buf, buf_size);
>> +	cmd.opcode = VIRTIO_ADMIN_CMD_LIST_USE;
>> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
>> +	cmd.data_sg = &in_sg;
>> +
>> +	return virtio_admin_cmd_exec(virtio_dev, &cmd);
>> +}
>> +
>> +int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
>> +			  u8 offset, u8 size, u8 *buf)
>> +{
>> +	struct virtio_device *virtio_dev =
>> +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
>> +	struct virtio_admin_cmd_data_lr_write *in;
>> +	struct scatterlist in_sg;
>> +	struct virtio_admin_cmd cmd = {};
>> +	int ret;
>> +
>> +	if (!virtio_dev)
>> +		return -ENOTCONN;
>> +
>> +	in = kzalloc(sizeof(*in) + size, GFP_KERNEL);
>> +	if (!in)
>> +		return -ENOMEM;
>> +
>> +	in->offset = offset;
>> +	memcpy(in->registers, buf, size);
>> +	sg_init_one(&in_sg, in, sizeof(*in) + size);
>> +	cmd.opcode = opcode;
>> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
>> +	cmd.group_member_id = virtvdev->vf_id + 1;
>> +	cmd.data_sg = &in_sg;
>> +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
>> +
>> +	kfree(in);
>> +	return ret;
>> +}
>> +
>> +int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
>> +			 u8 offset, u8 size, u8 *buf)
>> +{
>> +	struct virtio_device *virtio_dev =
>> +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
>> +	struct virtio_admin_cmd_data_lr_read *in;
>> +	struct scatterlist in_sg, out_sg;
>> +	struct virtio_admin_cmd cmd = {};
>> +	int ret;
>> +
>> +	if (!virtio_dev)
>> +		return -ENOTCONN;
>> +
>> +	in = kzalloc(sizeof(*in), GFP_KERNEL);
>> +	if (!in)
>> +		return -ENOMEM;
>> +
>> +	in->offset = offset;
>> +	sg_init_one(&in_sg, in, sizeof(*in));
>> +	sg_init_one(&out_sg, buf, size);
>> +	cmd.opcode = opcode;
>> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
>> +	cmd.data_sg = &in_sg;
>> +	cmd.result_sg = &out_sg;
>> +	cmd.group_member_id = virtvdev->vf_id + 1;
>> +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
>> +
>> +	kfree(in);
>> +	return ret;
>> +}
>> +
>> +int virtiovf_cmd_lq_read_notify(struct virtiovf_pci_core_device *virtvdev,
>> +				u8 req_bar_flags, u8 *bar, u64 *bar_offset)
>> +{
>> +	struct virtio_device *virtio_dev =
>> +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
>> +	struct virtio_admin_cmd_notify_info_result *out;
>> +	struct scatterlist out_sg;
>> +	struct virtio_admin_cmd cmd = {};
>> +	int ret;
>> +
>> +	if (!virtio_dev)
>> +		return -ENOTCONN;
>> +
>> +	out = kzalloc(sizeof(*out), GFP_KERNEL);
>> +	if (!out)
>> +		return -ENOMEM;
>> +
>> +	sg_init_one(&out_sg, out, sizeof(*out));
>> +	cmd.opcode = VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO;
>> +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
>> +	cmd.result_sg = &out_sg;
>> +	cmd.group_member_id = virtvdev->vf_id + 1;
>> +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
>> +	if (!ret) {
>> +		struct virtio_admin_cmd_notify_info_data *entry;
>> +		int i;
>> +
>> +		ret = -ENOENT;
>> +		for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
>> +			entry = &out->entries[i];
>> +			if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
>> +				break;
>> +			if (entry->flags != req_bar_flags)
>> +				continue;
>> +			*bar = entry->bar;
>> +			*bar_offset = le64_to_cpu(entry->offset);
>> +			ret = 0;
>> +			break;
>> +		}
>> +	}
>> +
>> +	kfree(out);
>> +	return ret;
>> +}
>> diff --git a/drivers/vfio/pci/virtio/cmd.h b/drivers/vfio/pci/virtio/cmd.h
>> new file mode 100644
>> index 000000000000..c2a3645f4b90
>> --- /dev/null
>> +++ b/drivers/vfio/pci/virtio/cmd.h
>> @@ -0,0 +1,27 @@
>> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
>> +/*
>> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
>> + */
>> +
>> +#ifndef VIRTIO_VFIO_CMD_H
>> +#define VIRTIO_VFIO_CMD_H
>> +
>> +#include <linux/kernel.h>
>> +#include <linux/virtio.h>
>> +#include <linux/vfio_pci_core.h>
>> +#include <linux/virtio_pci.h>
>> +
>> +struct virtiovf_pci_core_device {
>> +	struct vfio_pci_core_device core_device;
>> +	int vf_id;
>> +};
>> +
>> +int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
>> +int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size);
>> +int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
>> +			  u8 offset, u8 size, u8 *buf);
>> +int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
>> +			 u8 offset, u8 size, u8 *buf);
>> +int virtiovf_cmd_lq_read_notify(struct virtiovf_pci_core_device *virtvdev,
>> +				u8 req_bar_flags, u8 *bar, u64 *bar_offset);
>> +#endif /* VIRTIO_VFIO_CMD_H */
>> -- 
>> 2.27.0


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-09-26 10:51       ` Yishai Hadas via Virtualization
@ 2023-09-26 11:25         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-26 11:25 UTC (permalink / raw)
  To: Yishai Hadas; +Cc: kvm, maorg, virtualization, jgg, jiri, leonro

On Tue, Sep 26, 2023 at 01:51:13PM +0300, Yishai Hadas wrote:
> On 21/09/2023 23:34, Michael S. Tsirkin wrote:
> > On Thu, Sep 21, 2023 at 03:40:39PM +0300, Yishai Hadas wrote:
> > > Expose admin commands over the virtio device, to be used by the
> > > vfio-virtio driver in the next patches.
> > > 
> > > It includes: list query/use, legacy write/read, read notify_info.
> > > 
> > > Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> > > ---
> > >   drivers/vfio/pci/virtio/cmd.c | 146 ++++++++++++++++++++++++++++++++++
> > >   drivers/vfio/pci/virtio/cmd.h |  27 +++++++
> > >   2 files changed, 173 insertions(+)
> > >   create mode 100644 drivers/vfio/pci/virtio/cmd.c
> > >   create mode 100644 drivers/vfio/pci/virtio/cmd.h
> > > 
> > > diff --git a/drivers/vfio/pci/virtio/cmd.c b/drivers/vfio/pci/virtio/cmd.c
> > > new file mode 100644
> > > index 000000000000..f068239cdbb0
> > > --- /dev/null
> > > +++ b/drivers/vfio/pci/virtio/cmd.c
> > > @@ -0,0 +1,146 @@
> > > +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> > > +/*
> > > + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> > > + */
> > > +
> > > +#include "cmd.h"
> > > +
> > > +int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
> > > +{
> > > +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> > > +	struct scatterlist out_sg;
> > > +	struct virtio_admin_cmd cmd = {};
> > > +
> > > +	if (!virtio_dev)
> > > +		return -ENOTCONN;
> > > +
> > > +	sg_init_one(&out_sg, buf, buf_size);
> > > +	cmd.opcode = VIRTIO_ADMIN_CMD_LIST_QUERY;
> > > +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> > > +	cmd.result_sg = &out_sg;
> > > +
> > > +	return virtio_admin_cmd_exec(virtio_dev, &cmd);
> > > +}
> > > +
> > in/out seem all wrong here. In virtio terminology, in means from
> > device to driver, out means from driver to device.
> I referred here to in/out from vfio POV who prepares the command.
> 
> However, I can replace it to follow the virtio terminology as you suggested
> if this more makes sense.
> 
> Please see also my coming answer on your suggestion to put all of this in
> the virtio layer.
> 
> > > +int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
> > > +{
> > > +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> > > +	struct scatterlist in_sg;
> > > +	struct virtio_admin_cmd cmd = {};
> > > +
> > > +	if (!virtio_dev)
> > > +		return -ENOTCONN;
> > > +
> > > +	sg_init_one(&in_sg, buf, buf_size);
> > > +	cmd.opcode = VIRTIO_ADMIN_CMD_LIST_USE;
> > > +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> > > +	cmd.data_sg = &in_sg;
> > > +
> > > +	return virtio_admin_cmd_exec(virtio_dev, &cmd);
> > > +}
> > > +
> > > +int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> > 
> > what is _lr short for?
> 
> This was an acronym to legacy_read.
> 
> The actual command is according to the given opcode which can be one among
> LEGACY_COMMON_CFG_READ, LEGACY_DEV_CFG_READ.
> 
> I can rename it to '_legacy_read' (i.e. virtiovf_issue_legacy_read_cmd) to
> be clearer.
> 
> > 
> > > +			  u8 offset, u8 size, u8 *buf)
> > > +{
> > > +	struct virtio_device *virtio_dev =
> > > +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> > > +	struct virtio_admin_cmd_data_lr_write *in;
> > > +	struct scatterlist in_sg;
> > > +	struct virtio_admin_cmd cmd = {};
> > > +	int ret;
> > > +
> > > +	if (!virtio_dev)
> > > +		return -ENOTCONN;
> > > +
> > > +	in = kzalloc(sizeof(*in) + size, GFP_KERNEL);
> > > +	if (!in)
> > > +		return -ENOMEM;
> > > +
> > > +	in->offset = offset;
> > > +	memcpy(in->registers, buf, size);
> > > +	sg_init_one(&in_sg, in, sizeof(*in) + size);
> > > +	cmd.opcode = opcode;
> > > +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> > > +	cmd.group_member_id = virtvdev->vf_id + 1;
> > weird. why + 1?
> 
> This follows the virtio spec in that area.
> 
> "When sending commands with the SR-IOV group type, the driver specify a
> value for group_member_id
> between 1 and NumVFs inclusive."

Ah, I get it. Pls add a comment.

> The 'virtvdev->vf_id' was set upon vfio/virtio driver initialization by
> pci_iov_vf_id() which its first index is 0.
> 
> > > +	cmd.data_sg = &in_sg;
> > > +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
> > > +
> > > +	kfree(in);
> > > +	return ret;
> > > +}
> > How do you know it's safe to send this command, in particular at
> > this time? This seems to be doing zero checks, and zero synchronization
> > with the PF driver.
> > 
> The virtiovf_cmd_lr_read()/other gets a virtio VF and it gets its PF by
> calling virtio_pci_vf_get_pf_dev().
> 
> The VF can't gone by 'disable sriov' as it's owned/used by vfio.
> 
> The PF can't gone by rmmod/modprobe -r of virtio, as of the 'module in
> use'/dependencies between VFIO to VIRTIO.
> 
> The below check [1] was done only from a clean code perspective, which might
> theoretically fail in case the given VF doesn't use a virtio driver.
> 
> [1] if (!virtio_dev)
>         return -ENOTCONN;
> 
> So, it looks safe as is.

Can the device can be unbound from module right after you did the check?
What about suspend - can this be called while suspend is in progress?


More importantly, virtio can decide to reset the device for its
own internal reasons (e.g. to recover from an error).
We used to do it when attaching XDP, and we can start doing it again.
That's one of the reasons why I want all this code under virtio, so we'll remember.


> > > +
> > > +int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> > > +			 u8 offset, u8 size, u8 *buf)
> > > +{
> > > +	struct virtio_device *virtio_dev =
> > > +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> > > +	struct virtio_admin_cmd_data_lr_read *in;
> > > +	struct scatterlist in_sg, out_sg;
> > > +	struct virtio_admin_cmd cmd = {};
> > > +	int ret;
> > > +
> > > +	if (!virtio_dev)
> > > +		return -ENOTCONN;
> > > +
> > > +	in = kzalloc(sizeof(*in), GFP_KERNEL);
> > > +	if (!in)
> > > +		return -ENOMEM;
> > > +
> > > +	in->offset = offset;
> > > +	sg_init_one(&in_sg, in, sizeof(*in));
> > > +	sg_init_one(&out_sg, buf, size);
> > > +	cmd.opcode = opcode;
> > > +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> > > +	cmd.data_sg = &in_sg;
> > > +	cmd.result_sg = &out_sg;
> > > +	cmd.group_member_id = virtvdev->vf_id + 1;
> > > +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
> > > +
> > > +	kfree(in);
> > > +	return ret;
> > > +}
> > > +
> > > +int virtiovf_cmd_lq_read_notify(struct virtiovf_pci_core_device *virtvdev,
> > and what is lq short for?
> 
> To be more explicit, I may replace to virtiovf_cmd_legacy_notify_info() to
> follow the spec opcode.
> 
> Yishai
> 
> > 
> > > +				u8 req_bar_flags, u8 *bar, u64 *bar_offset)
> > > +{
> > > +	struct virtio_device *virtio_dev =
> > > +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> > > +	struct virtio_admin_cmd_notify_info_result *out;
> > > +	struct scatterlist out_sg;
> > > +	struct virtio_admin_cmd cmd = {};
> > > +	int ret;
> > > +
> > > +	if (!virtio_dev)
> > > +		return -ENOTCONN;
> > > +
> > > +	out = kzalloc(sizeof(*out), GFP_KERNEL);
> > > +	if (!out)
> > > +		return -ENOMEM;
> > > +
> > > +	sg_init_one(&out_sg, out, sizeof(*out));
> > > +	cmd.opcode = VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO;
> > > +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> > > +	cmd.result_sg = &out_sg;
> > > +	cmd.group_member_id = virtvdev->vf_id + 1;
> > > +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
> > > +	if (!ret) {
> > > +		struct virtio_admin_cmd_notify_info_data *entry;
> > > +		int i;
> > > +
> > > +		ret = -ENOENT;
> > > +		for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
> > > +			entry = &out->entries[i];
> > > +			if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
> > > +				break;
> > > +			if (entry->flags != req_bar_flags)
> > > +				continue;
> > > +			*bar = entry->bar;
> > > +			*bar_offset = le64_to_cpu(entry->offset);
> > > +			ret = 0;
> > > +			break;
> > > +		}
> > > +	}
> > > +
> > > +	kfree(out);
> > > +	return ret;
> > > +}
> > > diff --git a/drivers/vfio/pci/virtio/cmd.h b/drivers/vfio/pci/virtio/cmd.h
> > > new file mode 100644
> > > index 000000000000..c2a3645f4b90
> > > --- /dev/null
> > > +++ b/drivers/vfio/pci/virtio/cmd.h
> > > @@ -0,0 +1,27 @@
> > > +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> > > +/*
> > > + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
> > > + */
> > > +
> > > +#ifndef VIRTIO_VFIO_CMD_H
> > > +#define VIRTIO_VFIO_CMD_H
> > > +
> > > +#include <linux/kernel.h>
> > > +#include <linux/virtio.h>
> > > +#include <linux/vfio_pci_core.h>
> > > +#include <linux/virtio_pci.h>
> > > +
> > > +struct virtiovf_pci_core_device {
> > > +	struct vfio_pci_core_device core_device;
> > > +	int vf_id;
> > > +};
> > > +
> > > +int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
> > > +int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size);
> > > +int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> > > +			  u8 offset, u8 size, u8 *buf);
> > > +int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> > > +			 u8 offset, u8 size, u8 *buf);
> > > +int virtiovf_cmd_lq_read_notify(struct virtiovf_pci_core_device *virtvdev,
> > > +				u8 req_bar_flags, u8 *bar, u64 *bar_offset);
> > > +#endif /* VIRTIO_VFIO_CMD_H */
> > > -- 
> > > 2.27.0
> 

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-09-26 11:25         ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-26 11:25 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: alex.williamson, jasowang, jgg, kvm, virtualization, parav,
	feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Tue, Sep 26, 2023 at 01:51:13PM +0300, Yishai Hadas wrote:
> On 21/09/2023 23:34, Michael S. Tsirkin wrote:
> > On Thu, Sep 21, 2023 at 03:40:39PM +0300, Yishai Hadas wrote:
> > > Expose admin commands over the virtio device, to be used by the
> > > vfio-virtio driver in the next patches.
> > > 
> > > It includes: list query/use, legacy write/read, read notify_info.
> > > 
> > > Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> > > ---
> > >   drivers/vfio/pci/virtio/cmd.c | 146 ++++++++++++++++++++++++++++++++++
> > >   drivers/vfio/pci/virtio/cmd.h |  27 +++++++
> > >   2 files changed, 173 insertions(+)
> > >   create mode 100644 drivers/vfio/pci/virtio/cmd.c
> > >   create mode 100644 drivers/vfio/pci/virtio/cmd.h
> > > 
> > > diff --git a/drivers/vfio/pci/virtio/cmd.c b/drivers/vfio/pci/virtio/cmd.c
> > > new file mode 100644
> > > index 000000000000..f068239cdbb0
> > > --- /dev/null
> > > +++ b/drivers/vfio/pci/virtio/cmd.c
> > > @@ -0,0 +1,146 @@
> > > +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> > > +/*
> > > + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> > > + */
> > > +
> > > +#include "cmd.h"
> > > +
> > > +int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
> > > +{
> > > +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> > > +	struct scatterlist out_sg;
> > > +	struct virtio_admin_cmd cmd = {};
> > > +
> > > +	if (!virtio_dev)
> > > +		return -ENOTCONN;
> > > +
> > > +	sg_init_one(&out_sg, buf, buf_size);
> > > +	cmd.opcode = VIRTIO_ADMIN_CMD_LIST_QUERY;
> > > +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> > > +	cmd.result_sg = &out_sg;
> > > +
> > > +	return virtio_admin_cmd_exec(virtio_dev, &cmd);
> > > +}
> > > +
> > in/out seem all wrong here. In virtio terminology, in means from
> > device to driver, out means from driver to device.
> I referred here to in/out from vfio POV who prepares the command.
> 
> However, I can replace it to follow the virtio terminology as you suggested
> if this more makes sense.
> 
> Please see also my coming answer on your suggestion to put all of this in
> the virtio layer.
> 
> > > +int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
> > > +{
> > > +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> > > +	struct scatterlist in_sg;
> > > +	struct virtio_admin_cmd cmd = {};
> > > +
> > > +	if (!virtio_dev)
> > > +		return -ENOTCONN;
> > > +
> > > +	sg_init_one(&in_sg, buf, buf_size);
> > > +	cmd.opcode = VIRTIO_ADMIN_CMD_LIST_USE;
> > > +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> > > +	cmd.data_sg = &in_sg;
> > > +
> > > +	return virtio_admin_cmd_exec(virtio_dev, &cmd);
> > > +}
> > > +
> > > +int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> > 
> > what is _lr short for?
> 
> This was an acronym to legacy_read.
> 
> The actual command is according to the given opcode which can be one among
> LEGACY_COMMON_CFG_READ, LEGACY_DEV_CFG_READ.
> 
> I can rename it to '_legacy_read' (i.e. virtiovf_issue_legacy_read_cmd) to
> be clearer.
> 
> > 
> > > +			  u8 offset, u8 size, u8 *buf)
> > > +{
> > > +	struct virtio_device *virtio_dev =
> > > +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> > > +	struct virtio_admin_cmd_data_lr_write *in;
> > > +	struct scatterlist in_sg;
> > > +	struct virtio_admin_cmd cmd = {};
> > > +	int ret;
> > > +
> > > +	if (!virtio_dev)
> > > +		return -ENOTCONN;
> > > +
> > > +	in = kzalloc(sizeof(*in) + size, GFP_KERNEL);
> > > +	if (!in)
> > > +		return -ENOMEM;
> > > +
> > > +	in->offset = offset;
> > > +	memcpy(in->registers, buf, size);
> > > +	sg_init_one(&in_sg, in, sizeof(*in) + size);
> > > +	cmd.opcode = opcode;
> > > +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> > > +	cmd.group_member_id = virtvdev->vf_id + 1;
> > weird. why + 1?
> 
> This follows the virtio spec in that area.
> 
> "When sending commands with the SR-IOV group type, the driver specify a
> value for group_member_id
> between 1 and NumVFs inclusive."

Ah, I get it. Pls add a comment.

> The 'virtvdev->vf_id' was set upon vfio/virtio driver initialization by
> pci_iov_vf_id() which its first index is 0.
> 
> > > +	cmd.data_sg = &in_sg;
> > > +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
> > > +
> > > +	kfree(in);
> > > +	return ret;
> > > +}
> > How do you know it's safe to send this command, in particular at
> > this time? This seems to be doing zero checks, and zero synchronization
> > with the PF driver.
> > 
> The virtiovf_cmd_lr_read()/other gets a virtio VF and it gets its PF by
> calling virtio_pci_vf_get_pf_dev().
> 
> The VF can't gone by 'disable sriov' as it's owned/used by vfio.
> 
> The PF can't gone by rmmod/modprobe -r of virtio, as of the 'module in
> use'/dependencies between VFIO to VIRTIO.
> 
> The below check [1] was done only from a clean code perspective, which might
> theoretically fail in case the given VF doesn't use a virtio driver.
> 
> [1] if (!virtio_dev)
>         return -ENOTCONN;
> 
> So, it looks safe as is.

Can the device can be unbound from module right after you did the check?
What about suspend - can this be called while suspend is in progress?


More importantly, virtio can decide to reset the device for its
own internal reasons (e.g. to recover from an error).
We used to do it when attaching XDP, and we can start doing it again.
That's one of the reasons why I want all this code under virtio, so we'll remember.


> > > +
> > > +int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> > > +			 u8 offset, u8 size, u8 *buf)
> > > +{
> > > +	struct virtio_device *virtio_dev =
> > > +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> > > +	struct virtio_admin_cmd_data_lr_read *in;
> > > +	struct scatterlist in_sg, out_sg;
> > > +	struct virtio_admin_cmd cmd = {};
> > > +	int ret;
> > > +
> > > +	if (!virtio_dev)
> > > +		return -ENOTCONN;
> > > +
> > > +	in = kzalloc(sizeof(*in), GFP_KERNEL);
> > > +	if (!in)
> > > +		return -ENOMEM;
> > > +
> > > +	in->offset = offset;
> > > +	sg_init_one(&in_sg, in, sizeof(*in));
> > > +	sg_init_one(&out_sg, buf, size);
> > > +	cmd.opcode = opcode;
> > > +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> > > +	cmd.data_sg = &in_sg;
> > > +	cmd.result_sg = &out_sg;
> > > +	cmd.group_member_id = virtvdev->vf_id + 1;
> > > +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
> > > +
> > > +	kfree(in);
> > > +	return ret;
> > > +}
> > > +
> > > +int virtiovf_cmd_lq_read_notify(struct virtiovf_pci_core_device *virtvdev,
> > and what is lq short for?
> 
> To be more explicit, I may replace to virtiovf_cmd_legacy_notify_info() to
> follow the spec opcode.
> 
> Yishai
> 
> > 
> > > +				u8 req_bar_flags, u8 *bar, u64 *bar_offset)
> > > +{
> > > +	struct virtio_device *virtio_dev =
> > > +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> > > +	struct virtio_admin_cmd_notify_info_result *out;
> > > +	struct scatterlist out_sg;
> > > +	struct virtio_admin_cmd cmd = {};
> > > +	int ret;
> > > +
> > > +	if (!virtio_dev)
> > > +		return -ENOTCONN;
> > > +
> > > +	out = kzalloc(sizeof(*out), GFP_KERNEL);
> > > +	if (!out)
> > > +		return -ENOMEM;
> > > +
> > > +	sg_init_one(&out_sg, out, sizeof(*out));
> > > +	cmd.opcode = VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO;
> > > +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> > > +	cmd.result_sg = &out_sg;
> > > +	cmd.group_member_id = virtvdev->vf_id + 1;
> > > +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
> > > +	if (!ret) {
> > > +		struct virtio_admin_cmd_notify_info_data *entry;
> > > +		int i;
> > > +
> > > +		ret = -ENOENT;
> > > +		for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
> > > +			entry = &out->entries[i];
> > > +			if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
> > > +				break;
> > > +			if (entry->flags != req_bar_flags)
> > > +				continue;
> > > +			*bar = entry->bar;
> > > +			*bar_offset = le64_to_cpu(entry->offset);
> > > +			ret = 0;
> > > +			break;
> > > +		}
> > > +	}
> > > +
> > > +	kfree(out);
> > > +	return ret;
> > > +}
> > > diff --git a/drivers/vfio/pci/virtio/cmd.h b/drivers/vfio/pci/virtio/cmd.h
> > > new file mode 100644
> > > index 000000000000..c2a3645f4b90
> > > --- /dev/null
> > > +++ b/drivers/vfio/pci/virtio/cmd.h
> > > @@ -0,0 +1,27 @@
> > > +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> > > +/*
> > > + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
> > > + */
> > > +
> > > +#ifndef VIRTIO_VFIO_CMD_H
> > > +#define VIRTIO_VFIO_CMD_H
> > > +
> > > +#include <linux/kernel.h>
> > > +#include <linux/virtio.h>
> > > +#include <linux/vfio_pci_core.h>
> > > +#include <linux/virtio_pci.h>
> > > +
> > > +struct virtiovf_pci_core_device {
> > > +	struct vfio_pci_core_device core_device;
> > > +	int vf_id;
> > > +};
> > > +
> > > +int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
> > > +int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size);
> > > +int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> > > +			  u8 offset, u8 size, u8 *buf);
> > > +int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> > > +			 u8 offset, u8 size, u8 *buf);
> > > +int virtiovf_cmd_lq_read_notify(struct virtiovf_pci_core_device *virtvdev,
> > > +				u8 req_bar_flags, u8 *bar, u64 *bar_offset);
> > > +#endif /* VIRTIO_VFIO_CMD_H */
> > > -- 
> > > 2.27.0
> 


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-09-26 11:14       ` Yishai Hadas via Virtualization
@ 2023-09-26 11:41         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-26 11:41 UTC (permalink / raw)
  To: Yishai Hadas; +Cc: kvm, maorg, virtualization, jgg, jiri, leonro

On Tue, Sep 26, 2023 at 02:14:01PM +0300, Yishai Hadas wrote:
> On 22/09/2023 12:54, Michael S. Tsirkin wrote:
> > On Thu, Sep 21, 2023 at 03:40:39PM +0300, Yishai Hadas wrote:
> > > Expose admin commands over the virtio device, to be used by the
> > > vfio-virtio driver in the next patches.
> > > 
> > > It includes: list query/use, legacy write/read, read notify_info.
> > > 
> > > Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> > 
> > This stuff is pure virtio spec. I think it should live under
> > drivers/virtio, too.
> 
> The motivation to put it in the vfio layer was from the below main reasons:
> 
> 1) Having it inside virtio may require to export a symbol/function per
> command.
> 
>    This will end up today by 5 and in the future (e.g. live migration) with
> much more exported symbols.
>
>    With current code we export only 2 generic symbols
> virtio_pci_vf_get_pf_dev(), virtio_admin_cmd_exec() which may fit for any
> further extension.

Except, there's no reasonable way for virtio to know what is done with
the device then. You are not using just 2 symbols at all, instead you
are using the rich vq API which was explicitly designed for the driver
running the device being responsible for serializing accesses. Which is
actually loaded and running. And I *think* your use won't conflict ATM
mostly by luck. Witness the hack in patch 01 as exhibit 1 - nothing
at all even hints at the fact that the reason for the complicated
dance is because another driver pokes at some of the vqs.


> 2) For now there is no logic in this vfio layer, however, in the future we
> may have some DMA/other logic that should better fit to the caller/client
> layer (i.e. vfio).

You are poking at the device without any locks etc. Maybe it looks like
no logic to you but it does not look like that from where I stand.

> By the way, this follows what was done already between vfio/mlx5 to
> mlx5_core modules where mlx5_core exposes generic APIs to execute a command
> and to get the a PF from a given mlx5 VF.

This is up to mlx5 maintainers. In particular they only need to worry
that their patches work with specific hardware which they likely have.
virtio has to work with multiple vendors - hardware and software -
and exposing a low level API that I can't test on my laptop
is not at all my ideal.

> This way, we can enable further commands to be added/extended
> easily/cleanly.

Something for vfio maintainer to consider in case it was
assumed that it's just this one weird thing
but otherwise it's all generic vfio. It's not going to stop there,
will it? The duplication of functionality with vdpa will continue :(


I am much more interested in adding reusable functionality that
everyone benefits from than in vfio poking at the device in its
own weird ways that only benefit specific hardware.


> See for example here [1, 2].
> 
> [1] https://elixir.bootlin.com/linux/v6.6-rc3/source/drivers/vfio/pci/mlx5/cmd.c#L210
> 
> [2] https://elixir.bootlin.com/linux/v6.6-rc3/source/drivers/vfio/pci/mlx5/cmd.c#L683
> 
> Yishai



> > 
> > > ---
> > >   drivers/vfio/pci/virtio/cmd.c | 146 ++++++++++++++++++++++++++++++++++
> > >   drivers/vfio/pci/virtio/cmd.h |  27 +++++++
> > >   2 files changed, 173 insertions(+)
> > >   create mode 100644 drivers/vfio/pci/virtio/cmd.c
> > >   create mode 100644 drivers/vfio/pci/virtio/cmd.h
> > > 
> > > diff --git a/drivers/vfio/pci/virtio/cmd.c b/drivers/vfio/pci/virtio/cmd.c
> > > new file mode 100644
> > > index 000000000000..f068239cdbb0
> > > --- /dev/null
> > > +++ b/drivers/vfio/pci/virtio/cmd.c
> > > @@ -0,0 +1,146 @@
> > > +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> > > +/*
> > > + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> > > + */
> > > +
> > > +#include "cmd.h"
> > > +
> > > +int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
> > > +{
> > > +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> > > +	struct scatterlist out_sg;
> > > +	struct virtio_admin_cmd cmd = {};
> > > +
> > > +	if (!virtio_dev)
> > > +		return -ENOTCONN;
> > > +
> > > +	sg_init_one(&out_sg, buf, buf_size);
> > > +	cmd.opcode = VIRTIO_ADMIN_CMD_LIST_QUERY;
> > > +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> > > +	cmd.result_sg = &out_sg;
> > > +
> > > +	return virtio_admin_cmd_exec(virtio_dev, &cmd);
> > > +}
> > > +
> > > +int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
> > > +{
> > > +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> > > +	struct scatterlist in_sg;
> > > +	struct virtio_admin_cmd cmd = {};
> > > +
> > > +	if (!virtio_dev)
> > > +		return -ENOTCONN;
> > > +
> > > +	sg_init_one(&in_sg, buf, buf_size);
> > > +	cmd.opcode = VIRTIO_ADMIN_CMD_LIST_USE;
> > > +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> > > +	cmd.data_sg = &in_sg;
> > > +
> > > +	return virtio_admin_cmd_exec(virtio_dev, &cmd);
> > > +}
> > > +
> > > +int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> > > +			  u8 offset, u8 size, u8 *buf)
> > > +{
> > > +	struct virtio_device *virtio_dev =
> > > +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> > > +	struct virtio_admin_cmd_data_lr_write *in;
> > > +	struct scatterlist in_sg;
> > > +	struct virtio_admin_cmd cmd = {};
> > > +	int ret;
> > > +
> > > +	if (!virtio_dev)
> > > +		return -ENOTCONN;
> > > +
> > > +	in = kzalloc(sizeof(*in) + size, GFP_KERNEL);
> > > +	if (!in)
> > > +		return -ENOMEM;
> > > +
> > > +	in->offset = offset;
> > > +	memcpy(in->registers, buf, size);
> > > +	sg_init_one(&in_sg, in, sizeof(*in) + size);
> > > +	cmd.opcode = opcode;
> > > +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> > > +	cmd.group_member_id = virtvdev->vf_id + 1;
> > > +	cmd.data_sg = &in_sg;
> > > +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
> > > +
> > > +	kfree(in);
> > > +	return ret;
> > > +}
> > > +
> > > +int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> > > +			 u8 offset, u8 size, u8 *buf)
> > > +{
> > > +	struct virtio_device *virtio_dev =
> > > +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> > > +	struct virtio_admin_cmd_data_lr_read *in;
> > > +	struct scatterlist in_sg, out_sg;
> > > +	struct virtio_admin_cmd cmd = {};
> > > +	int ret;
> > > +
> > > +	if (!virtio_dev)
> > > +		return -ENOTCONN;
> > > +
> > > +	in = kzalloc(sizeof(*in), GFP_KERNEL);
> > > +	if (!in)
> > > +		return -ENOMEM;
> > > +
> > > +	in->offset = offset;
> > > +	sg_init_one(&in_sg, in, sizeof(*in));
> > > +	sg_init_one(&out_sg, buf, size);
> > > +	cmd.opcode = opcode;
> > > +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> > > +	cmd.data_sg = &in_sg;
> > > +	cmd.result_sg = &out_sg;
> > > +	cmd.group_member_id = virtvdev->vf_id + 1;
> > > +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
> > > +
> > > +	kfree(in);
> > > +	return ret;
> > > +}
> > > +
> > > +int virtiovf_cmd_lq_read_notify(struct virtiovf_pci_core_device *virtvdev,
> > > +				u8 req_bar_flags, u8 *bar, u64 *bar_offset)
> > > +{
> > > +	struct virtio_device *virtio_dev =
> > > +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> > > +	struct virtio_admin_cmd_notify_info_result *out;
> > > +	struct scatterlist out_sg;
> > > +	struct virtio_admin_cmd cmd = {};
> > > +	int ret;
> > > +
> > > +	if (!virtio_dev)
> > > +		return -ENOTCONN;
> > > +
> > > +	out = kzalloc(sizeof(*out), GFP_KERNEL);
> > > +	if (!out)
> > > +		return -ENOMEM;
> > > +
> > > +	sg_init_one(&out_sg, out, sizeof(*out));
> > > +	cmd.opcode = VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO;
> > > +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> > > +	cmd.result_sg = &out_sg;
> > > +	cmd.group_member_id = virtvdev->vf_id + 1;
> > > +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
> > > +	if (!ret) {
> > > +		struct virtio_admin_cmd_notify_info_data *entry;
> > > +		int i;
> > > +
> > > +		ret = -ENOENT;
> > > +		for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
> > > +			entry = &out->entries[i];
> > > +			if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
> > > +				break;
> > > +			if (entry->flags != req_bar_flags)
> > > +				continue;
> > > +			*bar = entry->bar;
> > > +			*bar_offset = le64_to_cpu(entry->offset);
> > > +			ret = 0;
> > > +			break;
> > > +		}
> > > +	}
> > > +
> > > +	kfree(out);
> > > +	return ret;
> > > +}
> > > diff --git a/drivers/vfio/pci/virtio/cmd.h b/drivers/vfio/pci/virtio/cmd.h
> > > new file mode 100644
> > > index 000000000000..c2a3645f4b90
> > > --- /dev/null
> > > +++ b/drivers/vfio/pci/virtio/cmd.h
> > > @@ -0,0 +1,27 @@
> > > +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> > > +/*
> > > + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
> > > + */
> > > +
> > > +#ifndef VIRTIO_VFIO_CMD_H
> > > +#define VIRTIO_VFIO_CMD_H
> > > +
> > > +#include <linux/kernel.h>
> > > +#include <linux/virtio.h>
> > > +#include <linux/vfio_pci_core.h>
> > > +#include <linux/virtio_pci.h>
> > > +
> > > +struct virtiovf_pci_core_device {
> > > +	struct vfio_pci_core_device core_device;
> > > +	int vf_id;
> > > +};
> > > +
> > > +int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
> > > +int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size);
> > > +int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> > > +			  u8 offset, u8 size, u8 *buf);
> > > +int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> > > +			 u8 offset, u8 size, u8 *buf);
> > > +int virtiovf_cmd_lq_read_notify(struct virtiovf_pci_core_device *virtvdev,
> > > +				u8 req_bar_flags, u8 *bar, u64 *bar_offset);
> > > +#endif /* VIRTIO_VFIO_CMD_H */
> > > -- 
> > > 2.27.0
> 

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-09-26 11:41         ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-26 11:41 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: alex.williamson, jasowang, jgg, kvm, virtualization, parav,
	feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Tue, Sep 26, 2023 at 02:14:01PM +0300, Yishai Hadas wrote:
> On 22/09/2023 12:54, Michael S. Tsirkin wrote:
> > On Thu, Sep 21, 2023 at 03:40:39PM +0300, Yishai Hadas wrote:
> > > Expose admin commands over the virtio device, to be used by the
> > > vfio-virtio driver in the next patches.
> > > 
> > > It includes: list query/use, legacy write/read, read notify_info.
> > > 
> > > Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> > 
> > This stuff is pure virtio spec. I think it should live under
> > drivers/virtio, too.
> 
> The motivation to put it in the vfio layer was from the below main reasons:
> 
> 1) Having it inside virtio may require to export a symbol/function per
> command.
> 
>    This will end up today by 5 and in the future (e.g. live migration) with
> much more exported symbols.
>
>    With current code we export only 2 generic symbols
> virtio_pci_vf_get_pf_dev(), virtio_admin_cmd_exec() which may fit for any
> further extension.

Except, there's no reasonable way for virtio to know what is done with
the device then. You are not using just 2 symbols at all, instead you
are using the rich vq API which was explicitly designed for the driver
running the device being responsible for serializing accesses. Which is
actually loaded and running. And I *think* your use won't conflict ATM
mostly by luck. Witness the hack in patch 01 as exhibit 1 - nothing
at all even hints at the fact that the reason for the complicated
dance is because another driver pokes at some of the vqs.


> 2) For now there is no logic in this vfio layer, however, in the future we
> may have some DMA/other logic that should better fit to the caller/client
> layer (i.e. vfio).

You are poking at the device without any locks etc. Maybe it looks like
no logic to you but it does not look like that from where I stand.

> By the way, this follows what was done already between vfio/mlx5 to
> mlx5_core modules where mlx5_core exposes generic APIs to execute a command
> and to get the a PF from a given mlx5 VF.

This is up to mlx5 maintainers. In particular they only need to worry
that their patches work with specific hardware which they likely have.
virtio has to work with multiple vendors - hardware and software -
and exposing a low level API that I can't test on my laptop
is not at all my ideal.

> This way, we can enable further commands to be added/extended
> easily/cleanly.

Something for vfio maintainer to consider in case it was
assumed that it's just this one weird thing
but otherwise it's all generic vfio. It's not going to stop there,
will it? The duplication of functionality with vdpa will continue :(


I am much more interested in adding reusable functionality that
everyone benefits from than in vfio poking at the device in its
own weird ways that only benefit specific hardware.


> See for example here [1, 2].
> 
> [1] https://elixir.bootlin.com/linux/v6.6-rc3/source/drivers/vfio/pci/mlx5/cmd.c#L210
> 
> [2] https://elixir.bootlin.com/linux/v6.6-rc3/source/drivers/vfio/pci/mlx5/cmd.c#L683
> 
> Yishai



> > 
> > > ---
> > >   drivers/vfio/pci/virtio/cmd.c | 146 ++++++++++++++++++++++++++++++++++
> > >   drivers/vfio/pci/virtio/cmd.h |  27 +++++++
> > >   2 files changed, 173 insertions(+)
> > >   create mode 100644 drivers/vfio/pci/virtio/cmd.c
> > >   create mode 100644 drivers/vfio/pci/virtio/cmd.h
> > > 
> > > diff --git a/drivers/vfio/pci/virtio/cmd.c b/drivers/vfio/pci/virtio/cmd.c
> > > new file mode 100644
> > > index 000000000000..f068239cdbb0
> > > --- /dev/null
> > > +++ b/drivers/vfio/pci/virtio/cmd.c
> > > @@ -0,0 +1,146 @@
> > > +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> > > +/*
> > > + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> > > + */
> > > +
> > > +#include "cmd.h"
> > > +
> > > +int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
> > > +{
> > > +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> > > +	struct scatterlist out_sg;
> > > +	struct virtio_admin_cmd cmd = {};
> > > +
> > > +	if (!virtio_dev)
> > > +		return -ENOTCONN;
> > > +
> > > +	sg_init_one(&out_sg, buf, buf_size);
> > > +	cmd.opcode = VIRTIO_ADMIN_CMD_LIST_QUERY;
> > > +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> > > +	cmd.result_sg = &out_sg;
> > > +
> > > +	return virtio_admin_cmd_exec(virtio_dev, &cmd);
> > > +}
> > > +
> > > +int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
> > > +{
> > > +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> > > +	struct scatterlist in_sg;
> > > +	struct virtio_admin_cmd cmd = {};
> > > +
> > > +	if (!virtio_dev)
> > > +		return -ENOTCONN;
> > > +
> > > +	sg_init_one(&in_sg, buf, buf_size);
> > > +	cmd.opcode = VIRTIO_ADMIN_CMD_LIST_USE;
> > > +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> > > +	cmd.data_sg = &in_sg;
> > > +
> > > +	return virtio_admin_cmd_exec(virtio_dev, &cmd);
> > > +}
> > > +
> > > +int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> > > +			  u8 offset, u8 size, u8 *buf)
> > > +{
> > > +	struct virtio_device *virtio_dev =
> > > +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> > > +	struct virtio_admin_cmd_data_lr_write *in;
> > > +	struct scatterlist in_sg;
> > > +	struct virtio_admin_cmd cmd = {};
> > > +	int ret;
> > > +
> > > +	if (!virtio_dev)
> > > +		return -ENOTCONN;
> > > +
> > > +	in = kzalloc(sizeof(*in) + size, GFP_KERNEL);
> > > +	if (!in)
> > > +		return -ENOMEM;
> > > +
> > > +	in->offset = offset;
> > > +	memcpy(in->registers, buf, size);
> > > +	sg_init_one(&in_sg, in, sizeof(*in) + size);
> > > +	cmd.opcode = opcode;
> > > +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> > > +	cmd.group_member_id = virtvdev->vf_id + 1;
> > > +	cmd.data_sg = &in_sg;
> > > +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
> > > +
> > > +	kfree(in);
> > > +	return ret;
> > > +}
> > > +
> > > +int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> > > +			 u8 offset, u8 size, u8 *buf)
> > > +{
> > > +	struct virtio_device *virtio_dev =
> > > +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> > > +	struct virtio_admin_cmd_data_lr_read *in;
> > > +	struct scatterlist in_sg, out_sg;
> > > +	struct virtio_admin_cmd cmd = {};
> > > +	int ret;
> > > +
> > > +	if (!virtio_dev)
> > > +		return -ENOTCONN;
> > > +
> > > +	in = kzalloc(sizeof(*in), GFP_KERNEL);
> > > +	if (!in)
> > > +		return -ENOMEM;
> > > +
> > > +	in->offset = offset;
> > > +	sg_init_one(&in_sg, in, sizeof(*in));
> > > +	sg_init_one(&out_sg, buf, size);
> > > +	cmd.opcode = opcode;
> > > +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> > > +	cmd.data_sg = &in_sg;
> > > +	cmd.result_sg = &out_sg;
> > > +	cmd.group_member_id = virtvdev->vf_id + 1;
> > > +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
> > > +
> > > +	kfree(in);
> > > +	return ret;
> > > +}
> > > +
> > > +int virtiovf_cmd_lq_read_notify(struct virtiovf_pci_core_device *virtvdev,
> > > +				u8 req_bar_flags, u8 *bar, u64 *bar_offset)
> > > +{
> > > +	struct virtio_device *virtio_dev =
> > > +		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> > > +	struct virtio_admin_cmd_notify_info_result *out;
> > > +	struct scatterlist out_sg;
> > > +	struct virtio_admin_cmd cmd = {};
> > > +	int ret;
> > > +
> > > +	if (!virtio_dev)
> > > +		return -ENOTCONN;
> > > +
> > > +	out = kzalloc(sizeof(*out), GFP_KERNEL);
> > > +	if (!out)
> > > +		return -ENOMEM;
> > > +
> > > +	sg_init_one(&out_sg, out, sizeof(*out));
> > > +	cmd.opcode = VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO;
> > > +	cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> > > +	cmd.result_sg = &out_sg;
> > > +	cmd.group_member_id = virtvdev->vf_id + 1;
> > > +	ret = virtio_admin_cmd_exec(virtio_dev, &cmd);
> > > +	if (!ret) {
> > > +		struct virtio_admin_cmd_notify_info_data *entry;
> > > +		int i;
> > > +
> > > +		ret = -ENOENT;
> > > +		for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
> > > +			entry = &out->entries[i];
> > > +			if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
> > > +				break;
> > > +			if (entry->flags != req_bar_flags)
> > > +				continue;
> > > +			*bar = entry->bar;
> > > +			*bar_offset = le64_to_cpu(entry->offset);
> > > +			ret = 0;
> > > +			break;
> > > +		}
> > > +	}
> > > +
> > > +	kfree(out);
> > > +	return ret;
> > > +}
> > > diff --git a/drivers/vfio/pci/virtio/cmd.h b/drivers/vfio/pci/virtio/cmd.h
> > > new file mode 100644
> > > index 000000000000..c2a3645f4b90
> > > --- /dev/null
> > > +++ b/drivers/vfio/pci/virtio/cmd.h
> > > @@ -0,0 +1,27 @@
> > > +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> > > +/*
> > > + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
> > > + */
> > > +
> > > +#ifndef VIRTIO_VFIO_CMD_H
> > > +#define VIRTIO_VFIO_CMD_H
> > > +
> > > +#include <linux/kernel.h>
> > > +#include <linux/virtio.h>
> > > +#include <linux/vfio_pci_core.h>
> > > +#include <linux/virtio_pci.h>
> > > +
> > > +struct virtiovf_pci_core_device {
> > > +	struct vfio_pci_core_device core_device;
> > > +	int vf_id;
> > > +};
> > > +
> > > +int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
> > > +int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size);
> > > +int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> > > +			  u8 offset, u8 size, u8 *buf);
> > > +int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> > > +			 u8 offset, u8 size, u8 *buf);
> > > +int virtiovf_cmd_lq_read_notify(struct virtiovf_pci_core_device *virtvdev,
> > > +				u8 req_bar_flags, u8 *bar, u64 *bar_offset);
> > > +#endif /* VIRTIO_VFIO_CMD_H */
> > > -- 
> > > 2.27.0
> 


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-26  2:32                                     ` Jason Wang
@ 2023-09-26 11:49                                       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-26 11:49 UTC (permalink / raw)
  To: Jason Wang
  Cc: kvm, Maor Gottlieb, virtualization, Jason Gunthorpe, Jiri Pirko,
	Leon Romanovsky

On Tue, Sep 26, 2023 at 10:32:39AM +0800, Jason Wang wrote:
> It's the implementation details in legacy. The device needs to make
> sure (reset) the driver can work (is done before get_status return).

I think that there's no way to make it reliably work for all legacy drivers.

They just assumed a software backend and did not bother with DMA
ordering. You can try to avoid resets, they are not that common so
things will tend to mostly work if you don't stress them to much with
things like hot plug/unplug in a loop.  Or you can try to use a driver
after 2011 which is more aware of hardware ordering and flushes the
reset write with a read.  One of these two tricks, I think, is the magic
behind the device exposing memory bar 0 that you mention.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-26 11:49                                       ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-26 11:49 UTC (permalink / raw)
  To: Jason Wang
  Cc: Parav Pandit, Jason Gunthorpe, Alex Williamson, Yishai Hadas,
	kvm, virtualization, Feng Liu, Jiri Pirko, kevin.tian,
	joao.m.martins, Leon Romanovsky, Maor Gottlieb

On Tue, Sep 26, 2023 at 10:32:39AM +0800, Jason Wang wrote:
> It's the implementation details in legacy. The device needs to make
> sure (reset) the driver can work (is done before get_status return).

I think that there's no way to make it reliably work for all legacy drivers.

They just assumed a software backend and did not bother with DMA
ordering. You can try to avoid resets, they are not that common so
things will tend to mostly work if you don't stress them to much with
things like hot plug/unplug in a loop.  Or you can try to use a driver
after 2011 which is more aware of hardware ordering and flushes the
reset write with a read.  One of these two tricks, I think, is the magic
behind the device exposing memory bar 0 that you mention.

-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-26  5:42                                 ` Michael S. Tsirkin
  (?)
@ 2023-09-26 13:50                                 ` Jason Gunthorpe
  2023-09-27 21:38                                     ` Michael S. Tsirkin
  -1 siblings, 1 reply; 321+ messages in thread
From: Jason Gunthorpe @ 2023-09-26 13:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, Yishai Hadas, alex.williamson, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Tue, Sep 26, 2023 at 01:42:52AM -0400, Michael S. Tsirkin wrote:
> On Mon, Sep 25, 2023 at 09:40:59PM -0300, Jason Gunthorpe wrote:
> > On Mon, Sep 25, 2023 at 03:44:11PM -0400, Michael S. Tsirkin wrote:
> > > > VDPA is very different from this. You might call them both mediation,
> > > > sure, but then you need another word to describe the additional
> > > > changes VPDA is doing.
> > > 
> > > Sorry about hijacking the thread a little bit, but could you
> > > call out some of the changes that are the most problematic
> > > for you?
> > 
> > I don't really know these details.
> 
> Maybe, you then should desist from saying things like "It entirely fails
> to achieve the most important thing it needs to do!" You are not making
> any new friends with saying this about a piece of software without
> knowing the details.

I can't tell you what cloud operators are doing, but I can say with
confidence that it is not the same as VDPA. As I said, if you want to
know more details you need to ask a cloud operator.

Jason

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-21 19:58     ` Alex Williamson
@ 2023-09-26 15:20       ` Yishai Hadas via Virtualization
  -1 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas @ 2023-09-26 15:20 UTC (permalink / raw)
  To: Alex Williamson
  Cc: mst, jasowang, jgg, kvm, virtualization, parav, feliu, jiri,
	kevin.tian, joao.m.martins, leonro, maorg

On 21/09/2023 22:58, Alex Williamson wrote:
> On Thu, 21 Sep 2023 15:40:40 +0300
> Yishai Hadas <yishaih@nvidia.com> wrote:
>
>> Introduce a vfio driver over virtio devices to support the legacy
>> interface functionality for VFs.
>>
>> Background, from the virtio spec [1].
>> --------------------------------------------------------------------
>> In some systems, there is a need to support a virtio legacy driver with
>> a device that does not directly support the legacy interface. In such
>> scenarios, a group owner device can provide the legacy interface
>> functionality for the group member devices. The driver of the owner
>> device can then access the legacy interface of a member device on behalf
>> of the legacy member device driver.
>>
>> For example, with the SR-IOV group type, group members (VFs) can not
>> present the legacy interface in an I/O BAR in BAR0 as expected by the
>> legacy pci driver. If the legacy driver is running inside a virtual
>> machine, the hypervisor executing the virtual machine can present a
>> virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
>> legacy driver accesses to this I/O BAR and forwards them to the group
>> owner device (PF) using group administration commands.
>> --------------------------------------------------------------------
>>
>> Specifically, this driver adds support for a virtio-net VF to be exposed
>> as a transitional device to a guest driver and allows the legacy IO BAR
>> functionality on top.
>>
>> This allows a VM which uses a legacy virtio-net driver in the guest to
>> work transparently over a VF which its driver in the host is that new
>> driver.
>>
>> The driver can be extended easily to support some other types of virtio
>> devices (e.g virtio-blk), by adding in a few places the specific type
>> properties as was done for virtio-net.
>>
>> For now, only the virtio-net use case was tested and as such we introduce
>> the support only for such a device.
>>
>> Practically,
>> Upon probing a VF for a virtio-net device, in case its PF supports
>> legacy access over the virtio admin commands and the VF doesn't have BAR
>> 0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
>> transitional device with I/O BAR in BAR 0.
>>
>> The existence of the simulated I/O bar is reported later on by
>> overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
>> exposes itself as a transitional device by overwriting some properties
>> upon reading its config space.
>>
>> Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
>> guest may use it via read/write calls according to the virtio
>> specification.
>>
>> Any read/write towards the control parts of the BAR will be captured by
>> the new driver and will be translated into admin commands towards the
>> device.
>>
>> Any data path read/write access (i.e. virtio driver notifications) will
>> be forwarded to the physical BAR which its properties were supplied by
>> the command VIRTIO_PCI_QUEUE_NOTIFY upon the probing/init flow.
>>
>> With that code in place a legacy driver in the guest has the look and
>> feel as if having a transitional device with legacy support for both its
>> control and data path flows.
>>
>> [1]
>> https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
>>
>> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>> ---
>>   MAINTAINERS                      |   6 +
>>   drivers/vfio/pci/Kconfig         |   2 +
>>   drivers/vfio/pci/Makefile        |   2 +
>>   drivers/vfio/pci/virtio/Kconfig  |  15 +
>>   drivers/vfio/pci/virtio/Makefile |   4 +
>>   drivers/vfio/pci/virtio/cmd.c    |   4 +-
>>   drivers/vfio/pci/virtio/cmd.h    |   8 +
>>   drivers/vfio/pci/virtio/main.c   | 546 +++++++++++++++++++++++++++++++
>>   8 files changed, 585 insertions(+), 2 deletions(-)
>>   create mode 100644 drivers/vfio/pci/virtio/Kconfig
>>   create mode 100644 drivers/vfio/pci/virtio/Makefile
>>   create mode 100644 drivers/vfio/pci/virtio/main.c
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index bf0f54c24f81..5098418c8389 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -22624,6 +22624,12 @@ L:	kvm@vger.kernel.org
>>   S:	Maintained
>>   F:	drivers/vfio/pci/mlx5/
>>   
>> +VFIO VIRTIO PCI DRIVER
>> +M:	Yishai Hadas <yishaih@nvidia.com>
>> +L:	kvm@vger.kernel.org
>> +S:	Maintained
>> +F:	drivers/vfio/pci/virtio
>> +
>>   VFIO PCI DEVICE SPECIFIC DRIVERS
>>   R:	Jason Gunthorpe <jgg@nvidia.com>
>>   R:	Yishai Hadas <yishaih@nvidia.com>
>> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
>> index 8125e5f37832..18c397df566d 100644
>> --- a/drivers/vfio/pci/Kconfig
>> +++ b/drivers/vfio/pci/Kconfig
>> @@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
>>   
>>   source "drivers/vfio/pci/pds/Kconfig"
>>   
>> +source "drivers/vfio/pci/virtio/Kconfig"
>> +
>>   endmenu
>> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
>> index 45167be462d8..046139a4eca5 100644
>> --- a/drivers/vfio/pci/Makefile
>> +++ b/drivers/vfio/pci/Makefile
>> @@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
>>   obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
>>   
>>   obj-$(CONFIG_PDS_VFIO_PCI) += pds/
>> +
>> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
>> diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
>> new file mode 100644
>> index 000000000000..89eddce8b1bd
>> --- /dev/null
>> +++ b/drivers/vfio/pci/virtio/Kconfig
>> @@ -0,0 +1,15 @@
>> +# SPDX-License-Identifier: GPL-2.0-only
>> +config VIRTIO_VFIO_PCI
>> +        tristate "VFIO support for VIRTIO PCI devices"
>> +        depends on VIRTIO_PCI
>> +        select VFIO_PCI_CORE
>> +        help
>> +          This provides support for exposing VIRTIO VF devices using the VFIO
>> +          framework that can work with a legacy virtio driver in the guest.
>> +          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
>> +          not indicate I/O Space.
>> +          As of that this driver emulated I/O BAR in software to let a VF be
>> +          seen as a transitional device in the guest and let it work with
>> +          a legacy driver.
>> +
>> +          If you don't know what to do here, say N.
>> diff --git a/drivers/vfio/pci/virtio/Makefile b/drivers/vfio/pci/virtio/Makefile
>> new file mode 100644
>> index 000000000000..584372648a03
>> --- /dev/null
>> +++ b/drivers/vfio/pci/virtio/Makefile
>> @@ -0,0 +1,4 @@
>> +# SPDX-License-Identifier: GPL-2.0-only
>> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio-vfio-pci.o
>> +virtio-vfio-pci-y := main.o cmd.o
>> +
>> diff --git a/drivers/vfio/pci/virtio/cmd.c b/drivers/vfio/pci/virtio/cmd.c
>> index f068239cdbb0..aea9d25fbf1d 100644
>> --- a/drivers/vfio/pci/virtio/cmd.c
>> +++ b/drivers/vfio/pci/virtio/cmd.c
>> @@ -44,7 +44,7 @@ int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
>>   {
>>   	struct virtio_device *virtio_dev =
>>   		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
>> -	struct virtio_admin_cmd_data_lr_write *in;
>> +	struct virtio_admin_cmd_legacy_wr_data *in;
>>   	struct scatterlist in_sg;
>>   	struct virtio_admin_cmd cmd = {};
>>   	int ret;
>> @@ -74,7 +74,7 @@ int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
>>   {
>>   	struct virtio_device *virtio_dev =
>>   		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
>> -	struct virtio_admin_cmd_data_lr_read *in;
>> +	struct virtio_admin_cmd_legacy_rd_data *in;
>>   	struct scatterlist in_sg, out_sg;
>>   	struct virtio_admin_cmd cmd = {};
>>   	int ret;
>> diff --git a/drivers/vfio/pci/virtio/cmd.h b/drivers/vfio/pci/virtio/cmd.h
>> index c2a3645f4b90..347b1dc85570 100644
>> --- a/drivers/vfio/pci/virtio/cmd.h
>> +++ b/drivers/vfio/pci/virtio/cmd.h
>> @@ -13,7 +13,15 @@
>>   
>>   struct virtiovf_pci_core_device {
>>   	struct vfio_pci_core_device core_device;
>> +	u8 bar0_virtual_buf_size;
>> +	u8 *bar0_virtual_buf;
>> +	/* synchronize access to the virtual buf */
>> +	struct mutex bar_mutex;
>>   	int vf_id;
>> +	void __iomem *notify_addr;
>> +	u32 notify_offset;
>> +	u8 notify_bar;
>> +	u8 pci_cmd_io :1;
>>   };
>>   
>>   int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
>> diff --git a/drivers/vfio/pci/virtio/main.c b/drivers/vfio/pci/virtio/main.c
>> new file mode 100644
>> index 000000000000..2486991c49f3
>> --- /dev/null
>> +++ b/drivers/vfio/pci/virtio/main.c
>> @@ -0,0 +1,546 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
>> + */
>> +
>> +#include <linux/device.h>
>> +#include <linux/module.h>
>> +#include <linux/mutex.h>
>> +#include <linux/pci.h>
>> +#include <linux/pm_runtime.h>
>> +#include <linux/types.h>
>> +#include <linux/uaccess.h>
>> +#include <linux/vfio.h>
>> +#include <linux/vfio_pci_core.h>
>> +#include <linux/virtio_pci.h>
>> +#include <linux/virtio_net.h>
>> +#include <linux/virtio_pci_modern.h>
>> +
>> +#include "cmd.h"
>> +
>> +#define VIRTIO_LEGACY_IO_BAR_HEADER_LEN 20
>> +#define VIRTIO_LEGACY_IO_BAR_MSIX_HEADER_LEN 4
>> +
>> +static int virtiovf_issue_lr_cmd(struct virtiovf_pci_core_device *virtvdev,
>> +				 loff_t pos, char __user *buf,
>> +				 size_t count, bool read)
>> +{
>> +	u8 *bar0_buf = virtvdev->bar0_virtual_buf;
>> +	u16 opcode;
>> +	int ret;
>> +
>> +	mutex_lock(&virtvdev->bar_mutex);
>> +	if (read) {
>> +		opcode = (pos < VIRTIO_PCI_CONFIG_OFF(true)) ?
>> +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ :
>> +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ;
>> +		ret = virtiovf_cmd_lr_read(virtvdev, opcode, pos,
>> +					   count, bar0_buf + pos);
>> +		if (ret)
>> +			goto out;
>> +		if (copy_to_user(buf, bar0_buf + pos, count))
>> +			ret = -EFAULT;
>> +		goto out;
>> +	}
>> +
>> +	if (copy_from_user(bar0_buf + pos, buf, count)) {
>> +		ret = -EFAULT;
>> +		goto out;
>> +	}
>> +
>> +	opcode = (pos < VIRTIO_PCI_CONFIG_OFF(true)) ?
>> +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE :
>> +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE;
>> +	ret = virtiovf_cmd_lr_write(virtvdev, opcode, pos, count,
>> +				    bar0_buf + pos);
>> +out:
>> +	mutex_unlock(&virtvdev->bar_mutex);
>> +	return ret;
>> +}
>> +
>> +static int
>> +translate_io_bar_to_mem_bar(struct virtiovf_pci_core_device *virtvdev,
>> +			    loff_t pos, char __user *buf,
>> +			    size_t count, bool read)
>> +{
>> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
>> +	u16 queue_notify;
>> +	int ret;
>> +
>> +	if (pos + count > virtvdev->bar0_virtual_buf_size)
>> +		return -EINVAL;
>> +
>> +	switch (pos) {
>> +	case VIRTIO_PCI_QUEUE_NOTIFY:
>> +		if (count != sizeof(queue_notify))
>> +			return -EINVAL;
>> +		if (read) {
>> +			ret = vfio_pci_ioread16(core_device, true, &queue_notify,
>> +						virtvdev->notify_addr);
>> +			if (ret)
>> +				return ret;
>> +			if (copy_to_user(buf, &queue_notify,
>> +					 sizeof(queue_notify)))
>> +				return -EFAULT;
>> +			break;
>> +		}
>> +
>> +		if (copy_from_user(&queue_notify, buf, count))
>> +			return -EFAULT;
>> +
>> +		ret = vfio_pci_iowrite16(core_device, true, queue_notify,
>> +					 virtvdev->notify_addr);
>> +		break;
>> +	default:
>> +		ret = virtiovf_issue_lr_cmd(virtvdev, pos, buf, count, read);
>> +	}
>> +
>> +	return ret ? ret : count;
>> +}
>> +
>> +static bool range_contains_range(loff_t range1_start, size_t count1,
>> +				 loff_t range2_start, size_t count2,
>> +				 loff_t *start_offset)
>> +{
>> +	if (range1_start <= range2_start &&
>> +	    range1_start + count1 >= range2_start + count2) {
>> +		*start_offset = range2_start - range1_start;
>> +		return true;
>> +	}
>> +	return false;
>> +}
>> +
>> +static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
>> +					char __user *buf, size_t count,
>> +					loff_t *ppos)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
>> +	loff_t copy_offset;
>> +	__le32 val32;
>> +	__le16 val16;
>> +	u8 val8;
>> +	int ret;
>> +
>> +	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	if (range_contains_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
>> +				 &copy_offset)) {
>> +		val16 = cpu_to_le16(0x1000);
>> +		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
>> +			return -EFAULT;
>> +	}
> So we take a 0x1041 ("Virtio 1.0 network device") and turn it into a
> 0x1000 ("Virtio network device").  Are there no features implied by the
> device ID?  NB, a byte-wise access would read the real device ID.

 From spec POV 0x1000 is a transitional device which covers the 
functionality of 0x1041 device and the legacy device, so we should be 
fine here.

Re the byte-wise access, do we have such an access from QEMU ? I 
couldn't see a partial read of a config field.
As of that I preferred to keep the code simple and to not manage such a 
partial flow.
However, If we may still be concerned about, I can allow that partial 
read as part of V1.

What do you think ?

>> +
>> +	if (virtvdev->pci_cmd_io &&
>> +	    range_contains_range(pos, count, PCI_COMMAND, sizeof(val16),
>> +				 &copy_offset)) {
>> +		if (copy_from_user(&val16, buf, sizeof(val16)))
>> +			return -EFAULT;
>> +		val16 |= cpu_to_le16(PCI_COMMAND_IO);
>> +		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
>> +			return -EFAULT;
>> +	}
> So we can't turn off I/O memory.

See below as part of virtiovf_pci_core_write(), it can be turned off, 
the next virtiovf_pci_read_config() won't turn it on in that case.

This is what ' virtvdev->pci_cmd_io' field was used for.

>
>> +
>> +	if (range_contains_range(pos, count, PCI_REVISION_ID, sizeof(val8),
>> +				 &copy_offset)) {
>> +		/* Transional needs to have revision 0 */
>> +		val8 = 0;
>> +		if (copy_to_user(buf + copy_offset, &val8, sizeof(val8)))
>> +			return -EFAULT;
>> +	}
> Surely some driver cares about this, right?  How is this supposed to
> work in a world where libvirt parses modules.alias and automatically
> loads this driver rather than vfio-pci for all 0x1041 devices?  We'd
> need to denylist this driver to ever see the device for what it is.

This was needed by the guest driver to support both modern and legacy 
access, it can still chose the modern one.

Please see below re libvirt.

>
>> +
>> +	if (range_contains_range(pos, count, PCI_BASE_ADDRESS_0, sizeof(val32),
>> +				 &copy_offset)) {
>> +		val32 = cpu_to_le32(PCI_BASE_ADDRESS_SPACE_IO);
>> +		if (copy_to_user(buf + copy_offset, &val32, sizeof(val32)))
>> +			return -EFAULT;
>> +	}
> Sloppy BAR emulation compared to the real BARs.  QEMU obviously doesn't
> care.

 From what I could see, QEMU needs the bit for 'PCI_BASE_ADDRESS_SPACE_IO'.

It doesn't really care about the address as you wrote, this is why it 
was just left as zero here.
Does it make sense to you ?

>
>> +
>> +	if (range_contains_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16),
>> +				 &copy_offset)) {
>> +		/* Transitional devices use the PCI subsystem device id as
>> +		 * virtio device id, same as legacy driver always did.
>> +		 */
> Non-networking multi-line comment style throughout please.

Sure, will handle as part of V1.

>
>> +		val16 = cpu_to_le16(VIRTIO_ID_NET);
>> +		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
>> +			return -EFAULT;
>> +	}
>> +
>> +	return count;
>> +}
>> +
>> +static ssize_t
>> +virtiovf_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
>> +		       size_t count, loff_t *ppos)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
>> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
>> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
>> +	int ret;
>> +
>> +	if (!count)
>> +		return 0;
>> +
>> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX)
>> +		return virtiovf_pci_read_config(core_vdev, buf, count, ppos);
>> +
>> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
>> +		return vfio_pci_core_read(core_vdev, buf, count, ppos);
>> +
>> +	ret = pm_runtime_resume_and_get(&pdev->dev);
>> +	if (ret) {
>> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n",
>> +				     ret);
>> +		return -EIO;
>> +	}
>> +
>> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, buf, count, true);
> If the heart of this driver is simply pretending to have an I/O BAR
> where I/O accesses into that BAR are translated to accesses in the MMIO
> BAR, why can't this be done in the VMM, ie. QEMU?  Could I/O to MMIO
> translation in QEMU improve performance (ex. if the MMIO is mmap'd and
> can be accessed without bouncing back into kernel code)?
>
The I/O bar control registers access is not converted to MMIO but into 
admin commands.
Such admin commands transported using an admin queue owned by the 
hypervisor driver.
Hypervisor driver in future may use admin queue for other tasks such as 
device msix config, features provisioning, device migration commands 
(dirty page tracking, device state read/write) and may be more.
Only the driver notification register (i.e. kick/doorbell register) is 
converted to the MMIO.
Hence, the VFIO solution looks the better approach to match current UAPI.


>> +	pm_runtime_put(&pdev->dev);
>> +	return ret;
>> +}
>> +
>> +static ssize_t
>> +virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
>> +			size_t count, loff_t *ppos)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
>> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
>> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
>> +	int ret;
>> +
>> +	if (!count)
>> +		return 0;
>> +
>> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
>> +		loff_t copy_offset;
>> +		u16 cmd;
>> +
>> +		if (range_contains_range(pos, count, PCI_COMMAND, sizeof(cmd),
>> +					 &copy_offset)) {
>> +			if (copy_from_user(&cmd, buf + copy_offset, sizeof(cmd)))
>> +				return -EFAULT;
>> +			virtvdev->pci_cmd_io = (cmd & PCI_COMMAND_IO);
> If we're tracking writes to PCI_COMMAND_IO, why did we statically
> report I/O enabled in the read function previously?

In case it will be turned off here, we may not turn it on back upon the 
read(), please see the above note in that area.


>> +		}
>> +	}
>> +
>> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
>> +		return vfio_pci_core_write(core_vdev, buf, count, ppos);
>> +
>> +	ret = pm_runtime_resume_and_get(&pdev->dev);
>> +	if (ret) {
>> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n", ret);
>> +		return -EIO;
>> +	}
>> +
>> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, (char __user *)buf, count, false);
>> +	pm_runtime_put(&pdev->dev);
>> +	return ret;
>> +}
>> +
>> +static int
>> +virtiovf_pci_ioctl_get_region_info(struct vfio_device *core_vdev,
>> +				   unsigned int cmd, unsigned long arg)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>> +	unsigned long minsz = offsetofend(struct vfio_region_info, offset);
>> +	void __user *uarg = (void __user *)arg;
>> +	struct vfio_region_info info = {};
>> +
>> +	if (copy_from_user(&info, uarg, minsz))
>> +		return -EFAULT;
>> +
>> +	if (info.argsz < minsz)
>> +		return -EINVAL;
>> +
>> +	switch (info.index) {
>> +	case VFIO_PCI_BAR0_REGION_INDEX:
>> +		info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
>> +		info.size = virtvdev->bar0_virtual_buf_size;
>> +		info.flags = VFIO_REGION_INFO_FLAG_READ |
>> +			     VFIO_REGION_INFO_FLAG_WRITE;
>> +		return copy_to_user(uarg, &info, minsz) ? -EFAULT : 0;
>> +	default:
>> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
>> +	}
>> +}
>> +
>> +static long
>> +virtiovf_vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
>> +			     unsigned long arg)
>> +{
>> +	switch (cmd) {
>> +	case VFIO_DEVICE_GET_REGION_INFO:
>> +		return virtiovf_pci_ioctl_get_region_info(core_vdev, cmd, arg);
>> +	default:
>> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
>> +	}
>> +}
>> +
>> +static int
>> +virtiovf_set_notify_addr(struct virtiovf_pci_core_device *virtvdev)
>> +{
>> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
>> +	int ret;
>> +
>> +	/* Setup the BAR where the 'notify' exists to be used by vfio as well
>> +	 * This will let us mmap it only once and use it when needed.
>> +	 */
>> +	ret = vfio_pci_core_setup_barmap(core_device,
>> +					 virtvdev->notify_bar);
>> +	if (ret)
>> +		return ret;
>> +
>> +	virtvdev->notify_addr = core_device->barmap[virtvdev->notify_bar] +
>> +			virtvdev->notify_offset;
>> +	return 0;
>> +}
>> +
>> +static int virtiovf_pci_open_device(struct vfio_device *core_vdev)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>> +	struct vfio_pci_core_device *vdev = &virtvdev->core_device;
>> +	int ret;
>> +
>> +	ret = vfio_pci_core_enable(vdev);
>> +	if (ret)
>> +		return ret;
>> +
>> +	if (virtvdev->bar0_virtual_buf) {
>> +		/* upon close_device() the vfio_pci_core_disable() is called
>> +		 * and will close all the previous mmaps, so it seems that the
>> +		 * valid life cycle for the 'notify' addr is per open/close.
>> +		 */
>> +		ret = virtiovf_set_notify_addr(virtvdev);
>> +		if (ret) {
>> +			vfio_pci_core_disable(vdev);
>> +			return ret;
>> +		}
>> +	}
>> +
>> +	vfio_pci_core_finish_enable(vdev);
>> +	return 0;
>> +}
>> +
>> +static void virtiovf_pci_close_device(struct vfio_device *core_vdev)
>> +{
>> +	vfio_pci_core_close_device(core_vdev);
>> +}
> Why does this function exist?

 From symmetric reasons, as we have the virtiovf_pci_open_device() I put 
also the close() one.
However, we can just set virtiovf_pci_close_device() on the ops and drop 
this code.
>
>> +
>> +static int virtiovf_get_device_config_size(unsigned short device)
>> +{
>> +	switch (device) {
>> +	case 0x1041:
>> +		/* network card */
>> +		return offsetofend(struct virtio_net_config, status);
>> +	default:
>> +		return 0;
>> +	}
>> +}
>> +
>> +static int virtiovf_read_notify_info(struct virtiovf_pci_core_device *virtvdev)
>> +{
>> +	u64 offset;
>> +	int ret;
>> +	u8 bar;
>> +
>> +	ret = virtiovf_cmd_lq_read_notify(virtvdev,
>> +				VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM,
>> +				&bar, &offset);
>> +	if (ret)
>> +		return ret;
>> +
>> +	virtvdev->notify_bar = bar;
>> +	virtvdev->notify_offset = offset;
>> +	return 0;
>> +}
>> +
>> +static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>> +	struct pci_dev *pdev;
>> +	int ret;
>> +
>> +	ret = vfio_pci_core_init_dev(core_vdev);
>> +	if (ret)
>> +		return ret;
>> +
>> +	pdev = virtvdev->core_device.pdev;
>> +	virtvdev->vf_id = pci_iov_vf_id(pdev);
>> +	if (virtvdev->vf_id < 0)
>> +		return -EINVAL;
> vf_id is never used.

It's used as part of the virtio commands, see the previous preparation 
patch.

>
>> +
>> +	ret = virtiovf_read_notify_info(virtvdev);
>> +	if (ret)
>> +		return ret;
>> +
>> +	virtvdev->bar0_virtual_buf_size = VIRTIO_LEGACY_IO_BAR_HEADER_LEN +
>> +		VIRTIO_LEGACY_IO_BAR_MSIX_HEADER_LEN +
>> +		virtiovf_get_device_config_size(pdev->device);
>> +	virtvdev->bar0_virtual_buf = kzalloc(virtvdev->bar0_virtual_buf_size,
>> +					     GFP_KERNEL);
>> +	if (!virtvdev->bar0_virtual_buf)
>> +		return -ENOMEM;
>> +	mutex_init(&virtvdev->bar_mutex);
>> +	return 0;
>> +}
>> +
>> +static void virtiovf_pci_core_release_dev(struct vfio_device *core_vdev)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>> +
>> +	kfree(virtvdev->bar0_virtual_buf);
>> +	vfio_pci_core_release_dev(core_vdev);
>> +}
>> +
>> +static const struct vfio_device_ops virtiovf_acc_vfio_pci_tran_ops = {
>> +	.name = "virtio-transitional-vfio-pci",
>> +	.init = virtiovf_pci_init_device,
>> +	.release = virtiovf_pci_core_release_dev,
>> +	.open_device = virtiovf_pci_open_device,
>> +	.close_device = virtiovf_pci_close_device,
>> +	.ioctl = virtiovf_vfio_pci_core_ioctl,
>> +	.read = virtiovf_pci_core_read,
>> +	.write = virtiovf_pci_core_write,
>> +	.mmap = vfio_pci_core_mmap,
>> +	.request = vfio_pci_core_request,
>> +	.match = vfio_pci_core_match,
>> +	.bind_iommufd = vfio_iommufd_physical_bind,
>> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
>> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
>> +};
>> +
>> +static const struct vfio_device_ops virtiovf_acc_vfio_pci_ops = {
>> +	.name = "virtio-acc-vfio-pci",
>> +	.init = vfio_pci_core_init_dev,
>> +	.release = vfio_pci_core_release_dev,
>> +	.open_device = virtiovf_pci_open_device,
>> +	.close_device = virtiovf_pci_close_device,
>> +	.ioctl = vfio_pci_core_ioctl,
>> +	.device_feature = vfio_pci_core_ioctl_feature,
>> +	.read = vfio_pci_core_read,
>> +	.write = vfio_pci_core_write,
>> +	.mmap = vfio_pci_core_mmap,
>> +	.request = vfio_pci_core_request,
>> +	.match = vfio_pci_core_match,
>> +	.bind_iommufd = vfio_iommufd_physical_bind,
>> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
>> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
>> +};
> Why are we claiming devices that should just use vfio-pci instead?


Upon probe we may chose to set those default vfio-pci ops in case the 
device is not legacy capable.
This will eliminate any usage of the new driver functionality when it's 
not applicable.

>
>> +
>> +static bool virtiovf_bar0_exists(struct pci_dev *pdev)
>> +{
>> +	struct resource *res = pdev->resource;
>> +
>> +	return res->flags ? true : false;
>> +}
>> +
>> +#define VIRTIOVF_USE_ADMIN_CMD_BITMAP \
>> +	(BIT_ULL(VIRTIO_ADMIN_CMD_LIST_QUERY) | \
>> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LIST_USE) | \
>> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE) | \
>> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ) | \
>> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE) | \
>> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
>> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
>> +
>> +static bool virtiovf_support_legacy_access(struct pci_dev *pdev)
>> +{
>> +	int buf_size = DIV_ROUND_UP(VIRTIO_ADMIN_MAX_CMD_OPCODE, 64) * 8;
>> +	u8 *buf;
>> +	int ret;
>> +
>> +	/* Only virtio-net is supported/tested so far */
>> +	if (pdev->device != 0x1041)
>> +		return false;
> Seems like the ID table should handle this, why are we preemptively
> claiming all virtio devices... or actually all 0x1af4 devices, which
> might not even be virtio, ex. the non-virtio ivshmem devices is 0x1110.

Makes sense, will change in the ID table from PCI_ANY_ID to 0x1041 and 
cleanup that code.

>> +
>> +	buf = kzalloc(buf_size, GFP_KERNEL);
>> +	if (!buf)
>> +		return false;
>> +
>> +	ret = virtiovf_cmd_list_query(pdev, buf, buf_size);
>> +	if (ret)
>> +		goto end;
>> +
>> +	if ((le64_to_cpup((__le64 *)buf) & VIRTIOVF_USE_ADMIN_CMD_BITMAP) !=
>> +		VIRTIOVF_USE_ADMIN_CMD_BITMAP) {
>> +		ret = -EOPNOTSUPP;
>> +		goto end;
>> +	}
>> +
>> +	/* confirm the used commands */
>> +	memset(buf, 0, buf_size);
>> +	*(__le64 *)buf = cpu_to_le64(VIRTIOVF_USE_ADMIN_CMD_BITMAP);
>> +	ret = virtiovf_cmd_list_use(pdev, buf, buf_size);
>> +
>> +end:
>> +	kfree(buf);
>> +	return ret ? false : true;
>> +}
>> +
>> +static int virtiovf_pci_probe(struct pci_dev *pdev,
>> +			      const struct pci_device_id *id)
>> +{
>> +	const struct vfio_device_ops *ops = &virtiovf_acc_vfio_pci_ops;
>> +	struct virtiovf_pci_core_device *virtvdev;
>> +	int ret;
>> +
>> +	if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
>> +	    !virtiovf_bar0_exists(pdev) && pdev->msix_cap)
>> +		ops = &virtiovf_acc_vfio_pci_tran_ops;
>> +
>> +	virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
>> +				     &pdev->dev, ops);
>> +	if (IS_ERR(virtvdev))
>> +		return PTR_ERR(virtvdev);
>> +
>> +	dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
>> +	ret = vfio_pci_core_register_device(&virtvdev->core_device);
>> +	if (ret)
>> +		goto out;
>> +	return 0;
>> +out:
>> +	vfio_put_device(&virtvdev->core_device.vdev);
>> +	return ret;
>> +}
>> +
>> +static void virtiovf_pci_remove(struct pci_dev *pdev)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
>> +
>> +	vfio_pci_core_unregister_device(&virtvdev->core_device);
>> +	vfio_put_device(&virtvdev->core_device.vdev);
>> +}
>> +
>> +static const struct pci_device_id virtiovf_pci_table[] = {
>> +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, PCI_ANY_ID) },
> libvirt will blindly use this driver for all devices matching this as
> we've discussed how it should make use of modules.alias.  I don't think
> this driver should be squatting on devices where it doesn't add value
> and it's not clear whether this is adding or subtracting value in all
> cases for the one NIC that it modifies.


When the device is not legacy capable, we chose the vfio-pci default ops 
as pointed above, otherwise we may chose the new functionality to enable 
it in the guest.

>    How should libvirt choose when
> and where to use this driver?  What regressions are we going to see
> with VMs that previously saw "modern" virtio-net devices and now see a
> legacy compatible device?  Thanks,
We don't expect a regression here, a modern driver in the guest will 
continue using its direct access flow.

Do you see a real concern why to not enable it by default and come with 
some pre-configuration before the probe phase to activate it ?
If so, any specific suggestion how to manage that ?

Thanks,
Yishai

> Alex
>
>> +	{}
>> +};
>> +
>> +MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
>> +
>> +static struct pci_driver virtiovf_pci_driver = {
>> +	.name = KBUILD_MODNAME,
>> +	.id_table = virtiovf_pci_table,
>> +	.probe = virtiovf_pci_probe,
>> +	.remove = virtiovf_pci_remove,
>> +	.err_handler = &vfio_pci_core_err_handlers,
>> +	.driver_managed_dma = true,
>> +};
>> +
>> +module_pci_driver(virtiovf_pci_driver);
>> +
>> +MODULE_LICENSE("GPL");
>> +MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
>> +MODULE_DESCRIPTION(
>> +	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO device family");



^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-26 15:20       ` Yishai Hadas via Virtualization
  0 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-09-26 15:20 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, mst, maorg, virtualization, jgg, jiri, leonro

On 21/09/2023 22:58, Alex Williamson wrote:
> On Thu, 21 Sep 2023 15:40:40 +0300
> Yishai Hadas <yishaih@nvidia.com> wrote:
>
>> Introduce a vfio driver over virtio devices to support the legacy
>> interface functionality for VFs.
>>
>> Background, from the virtio spec [1].
>> --------------------------------------------------------------------
>> In some systems, there is a need to support a virtio legacy driver with
>> a device that does not directly support the legacy interface. In such
>> scenarios, a group owner device can provide the legacy interface
>> functionality for the group member devices. The driver of the owner
>> device can then access the legacy interface of a member device on behalf
>> of the legacy member device driver.
>>
>> For example, with the SR-IOV group type, group members (VFs) can not
>> present the legacy interface in an I/O BAR in BAR0 as expected by the
>> legacy pci driver. If the legacy driver is running inside a virtual
>> machine, the hypervisor executing the virtual machine can present a
>> virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
>> legacy driver accesses to this I/O BAR and forwards them to the group
>> owner device (PF) using group administration commands.
>> --------------------------------------------------------------------
>>
>> Specifically, this driver adds support for a virtio-net VF to be exposed
>> as a transitional device to a guest driver and allows the legacy IO BAR
>> functionality on top.
>>
>> This allows a VM which uses a legacy virtio-net driver in the guest to
>> work transparently over a VF which its driver in the host is that new
>> driver.
>>
>> The driver can be extended easily to support some other types of virtio
>> devices (e.g virtio-blk), by adding in a few places the specific type
>> properties as was done for virtio-net.
>>
>> For now, only the virtio-net use case was tested and as such we introduce
>> the support only for such a device.
>>
>> Practically,
>> Upon probing a VF for a virtio-net device, in case its PF supports
>> legacy access over the virtio admin commands and the VF doesn't have BAR
>> 0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
>> transitional device with I/O BAR in BAR 0.
>>
>> The existence of the simulated I/O bar is reported later on by
>> overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
>> exposes itself as a transitional device by overwriting some properties
>> upon reading its config space.
>>
>> Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
>> guest may use it via read/write calls according to the virtio
>> specification.
>>
>> Any read/write towards the control parts of the BAR will be captured by
>> the new driver and will be translated into admin commands towards the
>> device.
>>
>> Any data path read/write access (i.e. virtio driver notifications) will
>> be forwarded to the physical BAR which its properties were supplied by
>> the command VIRTIO_PCI_QUEUE_NOTIFY upon the probing/init flow.
>>
>> With that code in place a legacy driver in the guest has the look and
>> feel as if having a transitional device with legacy support for both its
>> control and data path flows.
>>
>> [1]
>> https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
>>
>> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>> ---
>>   MAINTAINERS                      |   6 +
>>   drivers/vfio/pci/Kconfig         |   2 +
>>   drivers/vfio/pci/Makefile        |   2 +
>>   drivers/vfio/pci/virtio/Kconfig  |  15 +
>>   drivers/vfio/pci/virtio/Makefile |   4 +
>>   drivers/vfio/pci/virtio/cmd.c    |   4 +-
>>   drivers/vfio/pci/virtio/cmd.h    |   8 +
>>   drivers/vfio/pci/virtio/main.c   | 546 +++++++++++++++++++++++++++++++
>>   8 files changed, 585 insertions(+), 2 deletions(-)
>>   create mode 100644 drivers/vfio/pci/virtio/Kconfig
>>   create mode 100644 drivers/vfio/pci/virtio/Makefile
>>   create mode 100644 drivers/vfio/pci/virtio/main.c
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index bf0f54c24f81..5098418c8389 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -22624,6 +22624,12 @@ L:	kvm@vger.kernel.org
>>   S:	Maintained
>>   F:	drivers/vfio/pci/mlx5/
>>   
>> +VFIO VIRTIO PCI DRIVER
>> +M:	Yishai Hadas <yishaih@nvidia.com>
>> +L:	kvm@vger.kernel.org
>> +S:	Maintained
>> +F:	drivers/vfio/pci/virtio
>> +
>>   VFIO PCI DEVICE SPECIFIC DRIVERS
>>   R:	Jason Gunthorpe <jgg@nvidia.com>
>>   R:	Yishai Hadas <yishaih@nvidia.com>
>> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
>> index 8125e5f37832..18c397df566d 100644
>> --- a/drivers/vfio/pci/Kconfig
>> +++ b/drivers/vfio/pci/Kconfig
>> @@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
>>   
>>   source "drivers/vfio/pci/pds/Kconfig"
>>   
>> +source "drivers/vfio/pci/virtio/Kconfig"
>> +
>>   endmenu
>> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
>> index 45167be462d8..046139a4eca5 100644
>> --- a/drivers/vfio/pci/Makefile
>> +++ b/drivers/vfio/pci/Makefile
>> @@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
>>   obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
>>   
>>   obj-$(CONFIG_PDS_VFIO_PCI) += pds/
>> +
>> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
>> diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
>> new file mode 100644
>> index 000000000000..89eddce8b1bd
>> --- /dev/null
>> +++ b/drivers/vfio/pci/virtio/Kconfig
>> @@ -0,0 +1,15 @@
>> +# SPDX-License-Identifier: GPL-2.0-only
>> +config VIRTIO_VFIO_PCI
>> +        tristate "VFIO support for VIRTIO PCI devices"
>> +        depends on VIRTIO_PCI
>> +        select VFIO_PCI_CORE
>> +        help
>> +          This provides support for exposing VIRTIO VF devices using the VFIO
>> +          framework that can work with a legacy virtio driver in the guest.
>> +          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
>> +          not indicate I/O Space.
>> +          As of that this driver emulated I/O BAR in software to let a VF be
>> +          seen as a transitional device in the guest and let it work with
>> +          a legacy driver.
>> +
>> +          If you don't know what to do here, say N.
>> diff --git a/drivers/vfio/pci/virtio/Makefile b/drivers/vfio/pci/virtio/Makefile
>> new file mode 100644
>> index 000000000000..584372648a03
>> --- /dev/null
>> +++ b/drivers/vfio/pci/virtio/Makefile
>> @@ -0,0 +1,4 @@
>> +# SPDX-License-Identifier: GPL-2.0-only
>> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio-vfio-pci.o
>> +virtio-vfio-pci-y := main.o cmd.o
>> +
>> diff --git a/drivers/vfio/pci/virtio/cmd.c b/drivers/vfio/pci/virtio/cmd.c
>> index f068239cdbb0..aea9d25fbf1d 100644
>> --- a/drivers/vfio/pci/virtio/cmd.c
>> +++ b/drivers/vfio/pci/virtio/cmd.c
>> @@ -44,7 +44,7 @@ int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
>>   {
>>   	struct virtio_device *virtio_dev =
>>   		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
>> -	struct virtio_admin_cmd_data_lr_write *in;
>> +	struct virtio_admin_cmd_legacy_wr_data *in;
>>   	struct scatterlist in_sg;
>>   	struct virtio_admin_cmd cmd = {};
>>   	int ret;
>> @@ -74,7 +74,7 @@ int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
>>   {
>>   	struct virtio_device *virtio_dev =
>>   		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
>> -	struct virtio_admin_cmd_data_lr_read *in;
>> +	struct virtio_admin_cmd_legacy_rd_data *in;
>>   	struct scatterlist in_sg, out_sg;
>>   	struct virtio_admin_cmd cmd = {};
>>   	int ret;
>> diff --git a/drivers/vfio/pci/virtio/cmd.h b/drivers/vfio/pci/virtio/cmd.h
>> index c2a3645f4b90..347b1dc85570 100644
>> --- a/drivers/vfio/pci/virtio/cmd.h
>> +++ b/drivers/vfio/pci/virtio/cmd.h
>> @@ -13,7 +13,15 @@
>>   
>>   struct virtiovf_pci_core_device {
>>   	struct vfio_pci_core_device core_device;
>> +	u8 bar0_virtual_buf_size;
>> +	u8 *bar0_virtual_buf;
>> +	/* synchronize access to the virtual buf */
>> +	struct mutex bar_mutex;
>>   	int vf_id;
>> +	void __iomem *notify_addr;
>> +	u32 notify_offset;
>> +	u8 notify_bar;
>> +	u8 pci_cmd_io :1;
>>   };
>>   
>>   int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
>> diff --git a/drivers/vfio/pci/virtio/main.c b/drivers/vfio/pci/virtio/main.c
>> new file mode 100644
>> index 000000000000..2486991c49f3
>> --- /dev/null
>> +++ b/drivers/vfio/pci/virtio/main.c
>> @@ -0,0 +1,546 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
>> + */
>> +
>> +#include <linux/device.h>
>> +#include <linux/module.h>
>> +#include <linux/mutex.h>
>> +#include <linux/pci.h>
>> +#include <linux/pm_runtime.h>
>> +#include <linux/types.h>
>> +#include <linux/uaccess.h>
>> +#include <linux/vfio.h>
>> +#include <linux/vfio_pci_core.h>
>> +#include <linux/virtio_pci.h>
>> +#include <linux/virtio_net.h>
>> +#include <linux/virtio_pci_modern.h>
>> +
>> +#include "cmd.h"
>> +
>> +#define VIRTIO_LEGACY_IO_BAR_HEADER_LEN 20
>> +#define VIRTIO_LEGACY_IO_BAR_MSIX_HEADER_LEN 4
>> +
>> +static int virtiovf_issue_lr_cmd(struct virtiovf_pci_core_device *virtvdev,
>> +				 loff_t pos, char __user *buf,
>> +				 size_t count, bool read)
>> +{
>> +	u8 *bar0_buf = virtvdev->bar0_virtual_buf;
>> +	u16 opcode;
>> +	int ret;
>> +
>> +	mutex_lock(&virtvdev->bar_mutex);
>> +	if (read) {
>> +		opcode = (pos < VIRTIO_PCI_CONFIG_OFF(true)) ?
>> +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ :
>> +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ;
>> +		ret = virtiovf_cmd_lr_read(virtvdev, opcode, pos,
>> +					   count, bar0_buf + pos);
>> +		if (ret)
>> +			goto out;
>> +		if (copy_to_user(buf, bar0_buf + pos, count))
>> +			ret = -EFAULT;
>> +		goto out;
>> +	}
>> +
>> +	if (copy_from_user(bar0_buf + pos, buf, count)) {
>> +		ret = -EFAULT;
>> +		goto out;
>> +	}
>> +
>> +	opcode = (pos < VIRTIO_PCI_CONFIG_OFF(true)) ?
>> +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE :
>> +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE;
>> +	ret = virtiovf_cmd_lr_write(virtvdev, opcode, pos, count,
>> +				    bar0_buf + pos);
>> +out:
>> +	mutex_unlock(&virtvdev->bar_mutex);
>> +	return ret;
>> +}
>> +
>> +static int
>> +translate_io_bar_to_mem_bar(struct virtiovf_pci_core_device *virtvdev,
>> +			    loff_t pos, char __user *buf,
>> +			    size_t count, bool read)
>> +{
>> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
>> +	u16 queue_notify;
>> +	int ret;
>> +
>> +	if (pos + count > virtvdev->bar0_virtual_buf_size)
>> +		return -EINVAL;
>> +
>> +	switch (pos) {
>> +	case VIRTIO_PCI_QUEUE_NOTIFY:
>> +		if (count != sizeof(queue_notify))
>> +			return -EINVAL;
>> +		if (read) {
>> +			ret = vfio_pci_ioread16(core_device, true, &queue_notify,
>> +						virtvdev->notify_addr);
>> +			if (ret)
>> +				return ret;
>> +			if (copy_to_user(buf, &queue_notify,
>> +					 sizeof(queue_notify)))
>> +				return -EFAULT;
>> +			break;
>> +		}
>> +
>> +		if (copy_from_user(&queue_notify, buf, count))
>> +			return -EFAULT;
>> +
>> +		ret = vfio_pci_iowrite16(core_device, true, queue_notify,
>> +					 virtvdev->notify_addr);
>> +		break;
>> +	default:
>> +		ret = virtiovf_issue_lr_cmd(virtvdev, pos, buf, count, read);
>> +	}
>> +
>> +	return ret ? ret : count;
>> +}
>> +
>> +static bool range_contains_range(loff_t range1_start, size_t count1,
>> +				 loff_t range2_start, size_t count2,
>> +				 loff_t *start_offset)
>> +{
>> +	if (range1_start <= range2_start &&
>> +	    range1_start + count1 >= range2_start + count2) {
>> +		*start_offset = range2_start - range1_start;
>> +		return true;
>> +	}
>> +	return false;
>> +}
>> +
>> +static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
>> +					char __user *buf, size_t count,
>> +					loff_t *ppos)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
>> +	loff_t copy_offset;
>> +	__le32 val32;
>> +	__le16 val16;
>> +	u8 val8;
>> +	int ret;
>> +
>> +	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	if (range_contains_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
>> +				 &copy_offset)) {
>> +		val16 = cpu_to_le16(0x1000);
>> +		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
>> +			return -EFAULT;
>> +	}
> So we take a 0x1041 ("Virtio 1.0 network device") and turn it into a
> 0x1000 ("Virtio network device").  Are there no features implied by the
> device ID?  NB, a byte-wise access would read the real device ID.

 From spec POV 0x1000 is a transitional device which covers the 
functionality of 0x1041 device and the legacy device, so we should be 
fine here.

Re the byte-wise access, do we have such an access from QEMU ? I 
couldn't see a partial read of a config field.
As of that I preferred to keep the code simple and to not manage such a 
partial flow.
However, If we may still be concerned about, I can allow that partial 
read as part of V1.

What do you think ?

>> +
>> +	if (virtvdev->pci_cmd_io &&
>> +	    range_contains_range(pos, count, PCI_COMMAND, sizeof(val16),
>> +				 &copy_offset)) {
>> +		if (copy_from_user(&val16, buf, sizeof(val16)))
>> +			return -EFAULT;
>> +		val16 |= cpu_to_le16(PCI_COMMAND_IO);
>> +		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
>> +			return -EFAULT;
>> +	}
> So we can't turn off I/O memory.

See below as part of virtiovf_pci_core_write(), it can be turned off, 
the next virtiovf_pci_read_config() won't turn it on in that case.

This is what ' virtvdev->pci_cmd_io' field was used for.

>
>> +
>> +	if (range_contains_range(pos, count, PCI_REVISION_ID, sizeof(val8),
>> +				 &copy_offset)) {
>> +		/* Transional needs to have revision 0 */
>> +		val8 = 0;
>> +		if (copy_to_user(buf + copy_offset, &val8, sizeof(val8)))
>> +			return -EFAULT;
>> +	}
> Surely some driver cares about this, right?  How is this supposed to
> work in a world where libvirt parses modules.alias and automatically
> loads this driver rather than vfio-pci for all 0x1041 devices?  We'd
> need to denylist this driver to ever see the device for what it is.

This was needed by the guest driver to support both modern and legacy 
access, it can still chose the modern one.

Please see below re libvirt.

>
>> +
>> +	if (range_contains_range(pos, count, PCI_BASE_ADDRESS_0, sizeof(val32),
>> +				 &copy_offset)) {
>> +		val32 = cpu_to_le32(PCI_BASE_ADDRESS_SPACE_IO);
>> +		if (copy_to_user(buf + copy_offset, &val32, sizeof(val32)))
>> +			return -EFAULT;
>> +	}
> Sloppy BAR emulation compared to the real BARs.  QEMU obviously doesn't
> care.

 From what I could see, QEMU needs the bit for 'PCI_BASE_ADDRESS_SPACE_IO'.

It doesn't really care about the address as you wrote, this is why it 
was just left as zero here.
Does it make sense to you ?

>
>> +
>> +	if (range_contains_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16),
>> +				 &copy_offset)) {
>> +		/* Transitional devices use the PCI subsystem device id as
>> +		 * virtio device id, same as legacy driver always did.
>> +		 */
> Non-networking multi-line comment style throughout please.

Sure, will handle as part of V1.

>
>> +		val16 = cpu_to_le16(VIRTIO_ID_NET);
>> +		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
>> +			return -EFAULT;
>> +	}
>> +
>> +	return count;
>> +}
>> +
>> +static ssize_t
>> +virtiovf_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
>> +		       size_t count, loff_t *ppos)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
>> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
>> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
>> +	int ret;
>> +
>> +	if (!count)
>> +		return 0;
>> +
>> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX)
>> +		return virtiovf_pci_read_config(core_vdev, buf, count, ppos);
>> +
>> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
>> +		return vfio_pci_core_read(core_vdev, buf, count, ppos);
>> +
>> +	ret = pm_runtime_resume_and_get(&pdev->dev);
>> +	if (ret) {
>> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n",
>> +				     ret);
>> +		return -EIO;
>> +	}
>> +
>> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, buf, count, true);
> If the heart of this driver is simply pretending to have an I/O BAR
> where I/O accesses into that BAR are translated to accesses in the MMIO
> BAR, why can't this be done in the VMM, ie. QEMU?  Could I/O to MMIO
> translation in QEMU improve performance (ex. if the MMIO is mmap'd and
> can be accessed without bouncing back into kernel code)?
>
The I/O bar control registers access is not converted to MMIO but into 
admin commands.
Such admin commands transported using an admin queue owned by the 
hypervisor driver.
Hypervisor driver in future may use admin queue for other tasks such as 
device msix config, features provisioning, device migration commands 
(dirty page tracking, device state read/write) and may be more.
Only the driver notification register (i.e. kick/doorbell register) is 
converted to the MMIO.
Hence, the VFIO solution looks the better approach to match current UAPI.


>> +	pm_runtime_put(&pdev->dev);
>> +	return ret;
>> +}
>> +
>> +static ssize_t
>> +virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
>> +			size_t count, loff_t *ppos)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
>> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
>> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
>> +	int ret;
>> +
>> +	if (!count)
>> +		return 0;
>> +
>> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
>> +		loff_t copy_offset;
>> +		u16 cmd;
>> +
>> +		if (range_contains_range(pos, count, PCI_COMMAND, sizeof(cmd),
>> +					 &copy_offset)) {
>> +			if (copy_from_user(&cmd, buf + copy_offset, sizeof(cmd)))
>> +				return -EFAULT;
>> +			virtvdev->pci_cmd_io = (cmd & PCI_COMMAND_IO);
> If we're tracking writes to PCI_COMMAND_IO, why did we statically
> report I/O enabled in the read function previously?

In case it will be turned off here, we may not turn it on back upon the 
read(), please see the above note in that area.


>> +		}
>> +	}
>> +
>> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
>> +		return vfio_pci_core_write(core_vdev, buf, count, ppos);
>> +
>> +	ret = pm_runtime_resume_and_get(&pdev->dev);
>> +	if (ret) {
>> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n", ret);
>> +		return -EIO;
>> +	}
>> +
>> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, (char __user *)buf, count, false);
>> +	pm_runtime_put(&pdev->dev);
>> +	return ret;
>> +}
>> +
>> +static int
>> +virtiovf_pci_ioctl_get_region_info(struct vfio_device *core_vdev,
>> +				   unsigned int cmd, unsigned long arg)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>> +	unsigned long minsz = offsetofend(struct vfio_region_info, offset);
>> +	void __user *uarg = (void __user *)arg;
>> +	struct vfio_region_info info = {};
>> +
>> +	if (copy_from_user(&info, uarg, minsz))
>> +		return -EFAULT;
>> +
>> +	if (info.argsz < minsz)
>> +		return -EINVAL;
>> +
>> +	switch (info.index) {
>> +	case VFIO_PCI_BAR0_REGION_INDEX:
>> +		info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
>> +		info.size = virtvdev->bar0_virtual_buf_size;
>> +		info.flags = VFIO_REGION_INFO_FLAG_READ |
>> +			     VFIO_REGION_INFO_FLAG_WRITE;
>> +		return copy_to_user(uarg, &info, minsz) ? -EFAULT : 0;
>> +	default:
>> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
>> +	}
>> +}
>> +
>> +static long
>> +virtiovf_vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
>> +			     unsigned long arg)
>> +{
>> +	switch (cmd) {
>> +	case VFIO_DEVICE_GET_REGION_INFO:
>> +		return virtiovf_pci_ioctl_get_region_info(core_vdev, cmd, arg);
>> +	default:
>> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
>> +	}
>> +}
>> +
>> +static int
>> +virtiovf_set_notify_addr(struct virtiovf_pci_core_device *virtvdev)
>> +{
>> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
>> +	int ret;
>> +
>> +	/* Setup the BAR where the 'notify' exists to be used by vfio as well
>> +	 * This will let us mmap it only once and use it when needed.
>> +	 */
>> +	ret = vfio_pci_core_setup_barmap(core_device,
>> +					 virtvdev->notify_bar);
>> +	if (ret)
>> +		return ret;
>> +
>> +	virtvdev->notify_addr = core_device->barmap[virtvdev->notify_bar] +
>> +			virtvdev->notify_offset;
>> +	return 0;
>> +}
>> +
>> +static int virtiovf_pci_open_device(struct vfio_device *core_vdev)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>> +	struct vfio_pci_core_device *vdev = &virtvdev->core_device;
>> +	int ret;
>> +
>> +	ret = vfio_pci_core_enable(vdev);
>> +	if (ret)
>> +		return ret;
>> +
>> +	if (virtvdev->bar0_virtual_buf) {
>> +		/* upon close_device() the vfio_pci_core_disable() is called
>> +		 * and will close all the previous mmaps, so it seems that the
>> +		 * valid life cycle for the 'notify' addr is per open/close.
>> +		 */
>> +		ret = virtiovf_set_notify_addr(virtvdev);
>> +		if (ret) {
>> +			vfio_pci_core_disable(vdev);
>> +			return ret;
>> +		}
>> +	}
>> +
>> +	vfio_pci_core_finish_enable(vdev);
>> +	return 0;
>> +}
>> +
>> +static void virtiovf_pci_close_device(struct vfio_device *core_vdev)
>> +{
>> +	vfio_pci_core_close_device(core_vdev);
>> +}
> Why does this function exist?

 From symmetric reasons, as we have the virtiovf_pci_open_device() I put 
also the close() one.
However, we can just set virtiovf_pci_close_device() on the ops and drop 
this code.
>
>> +
>> +static int virtiovf_get_device_config_size(unsigned short device)
>> +{
>> +	switch (device) {
>> +	case 0x1041:
>> +		/* network card */
>> +		return offsetofend(struct virtio_net_config, status);
>> +	default:
>> +		return 0;
>> +	}
>> +}
>> +
>> +static int virtiovf_read_notify_info(struct virtiovf_pci_core_device *virtvdev)
>> +{
>> +	u64 offset;
>> +	int ret;
>> +	u8 bar;
>> +
>> +	ret = virtiovf_cmd_lq_read_notify(virtvdev,
>> +				VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM,
>> +				&bar, &offset);
>> +	if (ret)
>> +		return ret;
>> +
>> +	virtvdev->notify_bar = bar;
>> +	virtvdev->notify_offset = offset;
>> +	return 0;
>> +}
>> +
>> +static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>> +	struct pci_dev *pdev;
>> +	int ret;
>> +
>> +	ret = vfio_pci_core_init_dev(core_vdev);
>> +	if (ret)
>> +		return ret;
>> +
>> +	pdev = virtvdev->core_device.pdev;
>> +	virtvdev->vf_id = pci_iov_vf_id(pdev);
>> +	if (virtvdev->vf_id < 0)
>> +		return -EINVAL;
> vf_id is never used.

It's used as part of the virtio commands, see the previous preparation 
patch.

>
>> +
>> +	ret = virtiovf_read_notify_info(virtvdev);
>> +	if (ret)
>> +		return ret;
>> +
>> +	virtvdev->bar0_virtual_buf_size = VIRTIO_LEGACY_IO_BAR_HEADER_LEN +
>> +		VIRTIO_LEGACY_IO_BAR_MSIX_HEADER_LEN +
>> +		virtiovf_get_device_config_size(pdev->device);
>> +	virtvdev->bar0_virtual_buf = kzalloc(virtvdev->bar0_virtual_buf_size,
>> +					     GFP_KERNEL);
>> +	if (!virtvdev->bar0_virtual_buf)
>> +		return -ENOMEM;
>> +	mutex_init(&virtvdev->bar_mutex);
>> +	return 0;
>> +}
>> +
>> +static void virtiovf_pci_core_release_dev(struct vfio_device *core_vdev)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>> +
>> +	kfree(virtvdev->bar0_virtual_buf);
>> +	vfio_pci_core_release_dev(core_vdev);
>> +}
>> +
>> +static const struct vfio_device_ops virtiovf_acc_vfio_pci_tran_ops = {
>> +	.name = "virtio-transitional-vfio-pci",
>> +	.init = virtiovf_pci_init_device,
>> +	.release = virtiovf_pci_core_release_dev,
>> +	.open_device = virtiovf_pci_open_device,
>> +	.close_device = virtiovf_pci_close_device,
>> +	.ioctl = virtiovf_vfio_pci_core_ioctl,
>> +	.read = virtiovf_pci_core_read,
>> +	.write = virtiovf_pci_core_write,
>> +	.mmap = vfio_pci_core_mmap,
>> +	.request = vfio_pci_core_request,
>> +	.match = vfio_pci_core_match,
>> +	.bind_iommufd = vfio_iommufd_physical_bind,
>> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
>> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
>> +};
>> +
>> +static const struct vfio_device_ops virtiovf_acc_vfio_pci_ops = {
>> +	.name = "virtio-acc-vfio-pci",
>> +	.init = vfio_pci_core_init_dev,
>> +	.release = vfio_pci_core_release_dev,
>> +	.open_device = virtiovf_pci_open_device,
>> +	.close_device = virtiovf_pci_close_device,
>> +	.ioctl = vfio_pci_core_ioctl,
>> +	.device_feature = vfio_pci_core_ioctl_feature,
>> +	.read = vfio_pci_core_read,
>> +	.write = vfio_pci_core_write,
>> +	.mmap = vfio_pci_core_mmap,
>> +	.request = vfio_pci_core_request,
>> +	.match = vfio_pci_core_match,
>> +	.bind_iommufd = vfio_iommufd_physical_bind,
>> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
>> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
>> +};
> Why are we claiming devices that should just use vfio-pci instead?


Upon probe we may chose to set those default vfio-pci ops in case the 
device is not legacy capable.
This will eliminate any usage of the new driver functionality when it's 
not applicable.

>
>> +
>> +static bool virtiovf_bar0_exists(struct pci_dev *pdev)
>> +{
>> +	struct resource *res = pdev->resource;
>> +
>> +	return res->flags ? true : false;
>> +}
>> +
>> +#define VIRTIOVF_USE_ADMIN_CMD_BITMAP \
>> +	(BIT_ULL(VIRTIO_ADMIN_CMD_LIST_QUERY) | \
>> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LIST_USE) | \
>> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE) | \
>> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ) | \
>> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE) | \
>> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
>> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
>> +
>> +static bool virtiovf_support_legacy_access(struct pci_dev *pdev)
>> +{
>> +	int buf_size = DIV_ROUND_UP(VIRTIO_ADMIN_MAX_CMD_OPCODE, 64) * 8;
>> +	u8 *buf;
>> +	int ret;
>> +
>> +	/* Only virtio-net is supported/tested so far */
>> +	if (pdev->device != 0x1041)
>> +		return false;
> Seems like the ID table should handle this, why are we preemptively
> claiming all virtio devices... or actually all 0x1af4 devices, which
> might not even be virtio, ex. the non-virtio ivshmem devices is 0x1110.

Makes sense, will change in the ID table from PCI_ANY_ID to 0x1041 and 
cleanup that code.

>> +
>> +	buf = kzalloc(buf_size, GFP_KERNEL);
>> +	if (!buf)
>> +		return false;
>> +
>> +	ret = virtiovf_cmd_list_query(pdev, buf, buf_size);
>> +	if (ret)
>> +		goto end;
>> +
>> +	if ((le64_to_cpup((__le64 *)buf) & VIRTIOVF_USE_ADMIN_CMD_BITMAP) !=
>> +		VIRTIOVF_USE_ADMIN_CMD_BITMAP) {
>> +		ret = -EOPNOTSUPP;
>> +		goto end;
>> +	}
>> +
>> +	/* confirm the used commands */
>> +	memset(buf, 0, buf_size);
>> +	*(__le64 *)buf = cpu_to_le64(VIRTIOVF_USE_ADMIN_CMD_BITMAP);
>> +	ret = virtiovf_cmd_list_use(pdev, buf, buf_size);
>> +
>> +end:
>> +	kfree(buf);
>> +	return ret ? false : true;
>> +}
>> +
>> +static int virtiovf_pci_probe(struct pci_dev *pdev,
>> +			      const struct pci_device_id *id)
>> +{
>> +	const struct vfio_device_ops *ops = &virtiovf_acc_vfio_pci_ops;
>> +	struct virtiovf_pci_core_device *virtvdev;
>> +	int ret;
>> +
>> +	if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
>> +	    !virtiovf_bar0_exists(pdev) && pdev->msix_cap)
>> +		ops = &virtiovf_acc_vfio_pci_tran_ops;
>> +
>> +	virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
>> +				     &pdev->dev, ops);
>> +	if (IS_ERR(virtvdev))
>> +		return PTR_ERR(virtvdev);
>> +
>> +	dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
>> +	ret = vfio_pci_core_register_device(&virtvdev->core_device);
>> +	if (ret)
>> +		goto out;
>> +	return 0;
>> +out:
>> +	vfio_put_device(&virtvdev->core_device.vdev);
>> +	return ret;
>> +}
>> +
>> +static void virtiovf_pci_remove(struct pci_dev *pdev)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
>> +
>> +	vfio_pci_core_unregister_device(&virtvdev->core_device);
>> +	vfio_put_device(&virtvdev->core_device.vdev);
>> +}
>> +
>> +static const struct pci_device_id virtiovf_pci_table[] = {
>> +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, PCI_ANY_ID) },
> libvirt will blindly use this driver for all devices matching this as
> we've discussed how it should make use of modules.alias.  I don't think
> this driver should be squatting on devices where it doesn't add value
> and it's not clear whether this is adding or subtracting value in all
> cases for the one NIC that it modifies.


When the device is not legacy capable, we chose the vfio-pci default ops 
as pointed above, otherwise we may chose the new functionality to enable 
it in the guest.

>    How should libvirt choose when
> and where to use this driver?  What regressions are we going to see
> with VMs that previously saw "modern" virtio-net devices and now see a
> legacy compatible device?  Thanks,
We don't expect a regression here, a modern driver in the guest will 
continue using its direct access flow.

Do you see a real concern why to not enable it by default and come with 
some pre-configuration before the probe phase to activate it ?
If so, any specific suggestion how to manage that ?

Thanks,
Yishai

> Alex
>
>> +	{}
>> +};
>> +
>> +MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
>> +
>> +static struct pci_driver virtiovf_pci_driver = {
>> +	.name = KBUILD_MODNAME,
>> +	.id_table = virtiovf_pci_table,
>> +	.probe = virtiovf_pci_probe,
>> +	.remove = virtiovf_pci_remove,
>> +	.err_handler = &vfio_pci_core_err_handlers,
>> +	.driver_managed_dma = true,
>> +};
>> +
>> +module_pci_driver(virtiovf_pci_driver);
>> +
>> +MODULE_LICENSE("GPL");
>> +MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
>> +MODULE_DESCRIPTION(
>> +	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO device family");


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-26 15:20       ` Yishai Hadas via Virtualization
@ 2023-09-26 17:00         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-26 17:00 UTC (permalink / raw)
  To: Yishai Hadas; +Cc: kvm, maorg, virtualization, jgg, jiri, leonro

On Tue, Sep 26, 2023 at 06:20:45PM +0300, Yishai Hadas wrote:
> On 21/09/2023 22:58, Alex Williamson wrote:
> > On Thu, 21 Sep 2023 15:40:40 +0300
> > Yishai Hadas <yishaih@nvidia.com> wrote:
> > 
> > > Introduce a vfio driver over virtio devices to support the legacy
> > > interface functionality for VFs.
> > > 
> > > Background, from the virtio spec [1].
> > > --------------------------------------------------------------------
> > > In some systems, there is a need to support a virtio legacy driver with
> > > a device that does not directly support the legacy interface. In such
> > > scenarios, a group owner device can provide the legacy interface
> > > functionality for the group member devices. The driver of the owner
> > > device can then access the legacy interface of a member device on behalf
> > > of the legacy member device driver.
> > > 
> > > For example, with the SR-IOV group type, group members (VFs) can not
> > > present the legacy interface in an I/O BAR in BAR0 as expected by the
> > > legacy pci driver. If the legacy driver is running inside a virtual
> > > machine, the hypervisor executing the virtual machine can present a
> > > virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
> > > legacy driver accesses to this I/O BAR and forwards them to the group
> > > owner device (PF) using group administration commands.
> > > --------------------------------------------------------------------
> > > 
> > > Specifically, this driver adds support for a virtio-net VF to be exposed
> > > as a transitional device to a guest driver and allows the legacy IO BAR
> > > functionality on top.
> > > 
> > > This allows a VM which uses a legacy virtio-net driver in the guest to
> > > work transparently over a VF which its driver in the host is that new
> > > driver.
> > > 
> > > The driver can be extended easily to support some other types of virtio
> > > devices (e.g virtio-blk), by adding in a few places the specific type
> > > properties as was done for virtio-net.
> > > 
> > > For now, only the virtio-net use case was tested and as such we introduce
> > > the support only for such a device.
> > > 
> > > Practically,
> > > Upon probing a VF for a virtio-net device, in case its PF supports
> > > legacy access over the virtio admin commands and the VF doesn't have BAR
> > > 0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
> > > transitional device with I/O BAR in BAR 0.
> > > 
> > > The existence of the simulated I/O bar is reported later on by
> > > overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
> > > exposes itself as a transitional device by overwriting some properties
> > > upon reading its config space.
> > > 
> > > Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
> > > guest may use it via read/write calls according to the virtio
> > > specification.
> > > 
> > > Any read/write towards the control parts of the BAR will be captured by
> > > the new driver and will be translated into admin commands towards the
> > > device.
> > > 
> > > Any data path read/write access (i.e. virtio driver notifications) will
> > > be forwarded to the physical BAR which its properties were supplied by
> > > the command VIRTIO_PCI_QUEUE_NOTIFY upon the probing/init flow.
> > > 
> > > With that code in place a legacy driver in the guest has the look and
> > > feel as if having a transitional device with legacy support for both its
> > > control and data path flows.
> > > 
> > > [1]
> > > https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
> > > 
> > > Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> > > ---
> > >   MAINTAINERS                      |   6 +
> > >   drivers/vfio/pci/Kconfig         |   2 +
> > >   drivers/vfio/pci/Makefile        |   2 +
> > >   drivers/vfio/pci/virtio/Kconfig  |  15 +
> > >   drivers/vfio/pci/virtio/Makefile |   4 +
> > >   drivers/vfio/pci/virtio/cmd.c    |   4 +-
> > >   drivers/vfio/pci/virtio/cmd.h    |   8 +
> > >   drivers/vfio/pci/virtio/main.c   | 546 +++++++++++++++++++++++++++++++
> > >   8 files changed, 585 insertions(+), 2 deletions(-)
> > >   create mode 100644 drivers/vfio/pci/virtio/Kconfig
> > >   create mode 100644 drivers/vfio/pci/virtio/Makefile
> > >   create mode 100644 drivers/vfio/pci/virtio/main.c
> > > 
> > > diff --git a/MAINTAINERS b/MAINTAINERS
> > > index bf0f54c24f81..5098418c8389 100644
> > > --- a/MAINTAINERS
> > > +++ b/MAINTAINERS
> > > @@ -22624,6 +22624,12 @@ L:	kvm@vger.kernel.org
> > >   S:	Maintained
> > >   F:	drivers/vfio/pci/mlx5/
> > > +VFIO VIRTIO PCI DRIVER
> > > +M:	Yishai Hadas <yishaih@nvidia.com>
> > > +L:	kvm@vger.kernel.org
> > > +S:	Maintained
> > > +F:	drivers/vfio/pci/virtio
> > > +
> > >   VFIO PCI DEVICE SPECIFIC DRIVERS
> > >   R:	Jason Gunthorpe <jgg@nvidia.com>
> > >   R:	Yishai Hadas <yishaih@nvidia.com>
> > > diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> > > index 8125e5f37832..18c397df566d 100644
> > > --- a/drivers/vfio/pci/Kconfig
> > > +++ b/drivers/vfio/pci/Kconfig
> > > @@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
> > >   source "drivers/vfio/pci/pds/Kconfig"
> > > +source "drivers/vfio/pci/virtio/Kconfig"
> > > +
> > >   endmenu
> > > diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
> > > index 45167be462d8..046139a4eca5 100644
> > > --- a/drivers/vfio/pci/Makefile
> > > +++ b/drivers/vfio/pci/Makefile
> > > @@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
> > >   obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
> > >   obj-$(CONFIG_PDS_VFIO_PCI) += pds/
> > > +
> > > +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
> > > diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
> > > new file mode 100644
> > > index 000000000000..89eddce8b1bd
> > > --- /dev/null
> > > +++ b/drivers/vfio/pci/virtio/Kconfig
> > > @@ -0,0 +1,15 @@
> > > +# SPDX-License-Identifier: GPL-2.0-only
> > > +config VIRTIO_VFIO_PCI
> > > +        tristate "VFIO support for VIRTIO PCI devices"
> > > +        depends on VIRTIO_PCI
> > > +        select VFIO_PCI_CORE
> > > +        help
> > > +          This provides support for exposing VIRTIO VF devices using the VFIO
> > > +          framework that can work with a legacy virtio driver in the guest.
> > > +          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
> > > +          not indicate I/O Space.
> > > +          As of that this driver emulated I/O BAR in software to let a VF be
> > > +          seen as a transitional device in the guest and let it work with
> > > +          a legacy driver.
> > > +
> > > +          If you don't know what to do here, say N.
> > > diff --git a/drivers/vfio/pci/virtio/Makefile b/drivers/vfio/pci/virtio/Makefile
> > > new file mode 100644
> > > index 000000000000..584372648a03
> > > --- /dev/null
> > > +++ b/drivers/vfio/pci/virtio/Makefile
> > > @@ -0,0 +1,4 @@
> > > +# SPDX-License-Identifier: GPL-2.0-only
> > > +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio-vfio-pci.o
> > > +virtio-vfio-pci-y := main.o cmd.o
> > > +
> > > diff --git a/drivers/vfio/pci/virtio/cmd.c b/drivers/vfio/pci/virtio/cmd.c
> > > index f068239cdbb0..aea9d25fbf1d 100644
> > > --- a/drivers/vfio/pci/virtio/cmd.c
> > > +++ b/drivers/vfio/pci/virtio/cmd.c
> > > @@ -44,7 +44,7 @@ int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> > >   {
> > >   	struct virtio_device *virtio_dev =
> > >   		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> > > -	struct virtio_admin_cmd_data_lr_write *in;
> > > +	struct virtio_admin_cmd_legacy_wr_data *in;
> > >   	struct scatterlist in_sg;
> > >   	struct virtio_admin_cmd cmd = {};
> > >   	int ret;
> > > @@ -74,7 +74,7 @@ int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> > >   {
> > >   	struct virtio_device *virtio_dev =
> > >   		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> > > -	struct virtio_admin_cmd_data_lr_read *in;
> > > +	struct virtio_admin_cmd_legacy_rd_data *in;
> > >   	struct scatterlist in_sg, out_sg;
> > >   	struct virtio_admin_cmd cmd = {};
> > >   	int ret;
> > > diff --git a/drivers/vfio/pci/virtio/cmd.h b/drivers/vfio/pci/virtio/cmd.h
> > > index c2a3645f4b90..347b1dc85570 100644
> > > --- a/drivers/vfio/pci/virtio/cmd.h
> > > +++ b/drivers/vfio/pci/virtio/cmd.h
> > > @@ -13,7 +13,15 @@
> > >   struct virtiovf_pci_core_device {
> > >   	struct vfio_pci_core_device core_device;
> > > +	u8 bar0_virtual_buf_size;
> > > +	u8 *bar0_virtual_buf;
> > > +	/* synchronize access to the virtual buf */
> > > +	struct mutex bar_mutex;
> > >   	int vf_id;
> > > +	void __iomem *notify_addr;
> > > +	u32 notify_offset;
> > > +	u8 notify_bar;
> > > +	u8 pci_cmd_io :1;
> > >   };
> > >   int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
> > > diff --git a/drivers/vfio/pci/virtio/main.c b/drivers/vfio/pci/virtio/main.c
> > > new file mode 100644
> > > index 000000000000..2486991c49f3
> > > --- /dev/null
> > > +++ b/drivers/vfio/pci/virtio/main.c
> > > @@ -0,0 +1,546 @@
> > > +// SPDX-License-Identifier: GPL-2.0-only
> > > +/*
> > > + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> > > + */
> > > +
> > > +#include <linux/device.h>
> > > +#include <linux/module.h>
> > > +#include <linux/mutex.h>
> > > +#include <linux/pci.h>
> > > +#include <linux/pm_runtime.h>
> > > +#include <linux/types.h>
> > > +#include <linux/uaccess.h>
> > > +#include <linux/vfio.h>
> > > +#include <linux/vfio_pci_core.h>
> > > +#include <linux/virtio_pci.h>
> > > +#include <linux/virtio_net.h>
> > > +#include <linux/virtio_pci_modern.h>
> > > +
> > > +#include "cmd.h"
> > > +
> > > +#define VIRTIO_LEGACY_IO_BAR_HEADER_LEN 20
> > > +#define VIRTIO_LEGACY_IO_BAR_MSIX_HEADER_LEN 4
> > > +
> > > +static int virtiovf_issue_lr_cmd(struct virtiovf_pci_core_device *virtvdev,
> > > +				 loff_t pos, char __user *buf,
> > > +				 size_t count, bool read)
> > > +{
> > > +	u8 *bar0_buf = virtvdev->bar0_virtual_buf;
> > > +	u16 opcode;
> > > +	int ret;
> > > +
> > > +	mutex_lock(&virtvdev->bar_mutex);
> > > +	if (read) {
> > > +		opcode = (pos < VIRTIO_PCI_CONFIG_OFF(true)) ?
> > > +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ :
> > > +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ;
> > > +		ret = virtiovf_cmd_lr_read(virtvdev, opcode, pos,
> > > +					   count, bar0_buf + pos);
> > > +		if (ret)
> > > +			goto out;
> > > +		if (copy_to_user(buf, bar0_buf + pos, count))
> > > +			ret = -EFAULT;
> > > +		goto out;
> > > +	}
> > > +
> > > +	if (copy_from_user(bar0_buf + pos, buf, count)) {
> > > +		ret = -EFAULT;
> > > +		goto out;
> > > +	}
> > > +
> > > +	opcode = (pos < VIRTIO_PCI_CONFIG_OFF(true)) ?
> > > +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE :
> > > +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE;
> > > +	ret = virtiovf_cmd_lr_write(virtvdev, opcode, pos, count,
> > > +				    bar0_buf + pos);
> > > +out:
> > > +	mutex_unlock(&virtvdev->bar_mutex);
> > > +	return ret;
> > > +}
> > > +
> > > +static int
> > > +translate_io_bar_to_mem_bar(struct virtiovf_pci_core_device *virtvdev,
> > > +			    loff_t pos, char __user *buf,
> > > +			    size_t count, bool read)
> > > +{
> > > +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
> > > +	u16 queue_notify;
> > > +	int ret;
> > > +
> > > +	if (pos + count > virtvdev->bar0_virtual_buf_size)
> > > +		return -EINVAL;
> > > +
> > > +	switch (pos) {
> > > +	case VIRTIO_PCI_QUEUE_NOTIFY:
> > > +		if (count != sizeof(queue_notify))
> > > +			return -EINVAL;
> > > +		if (read) {
> > > +			ret = vfio_pci_ioread16(core_device, true, &queue_notify,
> > > +						virtvdev->notify_addr);
> > > +			if (ret)
> > > +				return ret;
> > > +			if (copy_to_user(buf, &queue_notify,
> > > +					 sizeof(queue_notify)))
> > > +				return -EFAULT;
> > > +			break;
> > > +		}
> > > +
> > > +		if (copy_from_user(&queue_notify, buf, count))
> > > +			return -EFAULT;
> > > +
> > > +		ret = vfio_pci_iowrite16(core_device, true, queue_notify,
> > > +					 virtvdev->notify_addr);
> > > +		break;
> > > +	default:
> > > +		ret = virtiovf_issue_lr_cmd(virtvdev, pos, buf, count, read);
> > > +	}
> > > +
> > > +	return ret ? ret : count;
> > > +}
> > > +
> > > +static bool range_contains_range(loff_t range1_start, size_t count1,
> > > +				 loff_t range2_start, size_t count2,
> > > +				 loff_t *start_offset)
> > > +{
> > > +	if (range1_start <= range2_start &&
> > > +	    range1_start + count1 >= range2_start + count2) {
> > > +		*start_offset = range2_start - range1_start;
> > > +		return true;
> > > +	}
> > > +	return false;
> > > +}
> > > +
> > > +static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
> > > +					char __user *buf, size_t count,
> > > +					loff_t *ppos)
> > > +{
> > > +	struct virtiovf_pci_core_device *virtvdev = container_of(
> > > +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> > > +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> > > +	loff_t copy_offset;
> > > +	__le32 val32;
> > > +	__le16 val16;
> > > +	u8 val8;
> > > +	int ret;
> > > +
> > > +	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
> > > +	if (ret < 0)
> > > +		return ret;
> > > +
> > > +	if (range_contains_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
> > > +				 &copy_offset)) {
> > > +		val16 = cpu_to_le16(0x1000);
> > > +		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
> > > +			return -EFAULT;
> > > +	}
> > So we take a 0x1041 ("Virtio 1.0 network device") and turn it into a
> > 0x1000 ("Virtio network device").  Are there no features implied by the
> > device ID?  NB, a byte-wise access would read the real device ID.
> 
> From spec POV 0x1000 is a transitional device which covers the functionality
> of 0x1041 device and the legacy device, so we should be fine here.
> 
> Re the byte-wise access, do we have such an access from QEMU ? I couldn't
> see a partial read of a config field.
> As of that I preferred to keep the code simple and to not manage such a
> partial flow.
> However, If we may still be concerned about, I can allow that partial read
> as part of V1.
> 
> What do you think ?
> 
> > > +
> > > +	if (virtvdev->pci_cmd_io &&
> > > +	    range_contains_range(pos, count, PCI_COMMAND, sizeof(val16),
> > > +				 &copy_offset)) {
> > > +		if (copy_from_user(&val16, buf, sizeof(val16)))
> > > +			return -EFAULT;
> > > +		val16 |= cpu_to_le16(PCI_COMMAND_IO);
> > > +		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
> > > +			return -EFAULT;
> > > +	}
> > So we can't turn off I/O memory.
> 
> See below as part of virtiovf_pci_core_write(), it can be turned off, the
> next virtiovf_pci_read_config() won't turn it on in that case.
> 
> This is what ' virtvdev->pci_cmd_io' field was used for.
> 
> > 
> > > +
> > > +	if (range_contains_range(pos, count, PCI_REVISION_ID, sizeof(val8),
> > > +				 &copy_offset)) {
> > > +		/* Transional needs to have revision 0 */
> > > +		val8 = 0;
> > > +		if (copy_to_user(buf + copy_offset, &val8, sizeof(val8)))
> > > +			return -EFAULT;
> > > +	}
> > Surely some driver cares about this, right?  How is this supposed to
> > work in a world where libvirt parses modules.alias and automatically
> > loads this driver rather than vfio-pci for all 0x1041 devices?  We'd
> > need to denylist this driver to ever see the device for what it is.

I think I'm missing something. What in this patch might make
libvirt load this driver automatically?



> 
> This was needed by the guest driver to support both modern and legacy
> access, it can still chose the modern one.
> 
> Please see below re libvirt.
> 
> > 
> > > +
> > > +	if (range_contains_range(pos, count, PCI_BASE_ADDRESS_0, sizeof(val32),
> > > +				 &copy_offset)) {
> > > +		val32 = cpu_to_le32(PCI_BASE_ADDRESS_SPACE_IO);
> > > +		if (copy_to_user(buf + copy_offset, &val32, sizeof(val32)))
> > > +			return -EFAULT;
> > > +	}
> > Sloppy BAR emulation compared to the real BARs.  QEMU obviously doesn't
> > care.
> 
> From what I could see, QEMU needs the bit for 'PCI_BASE_ADDRESS_SPACE_IO'.
> 
> It doesn't really care about the address as you wrote, this is why it was
> just left as zero here.
> Does it make sense to you ?

I mean if all you care about is QEMU then you should just keep all this
code in QEMU. One you have some behaviour in UAPI you can never take
it back even if it's a bug userspace will come to depend on it.


> > 
> > > +
> > > +	if (range_contains_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16),
> > > +				 &copy_offset)) {
> > > +		/* Transitional devices use the PCI subsystem device id as
> > > +		 * virtio device id, same as legacy driver always did.
> > > +		 */
> > Non-networking multi-line comment style throughout please.
> 
> Sure, will handle as part of V1.
> 
> > 
> > > +		val16 = cpu_to_le16(VIRTIO_ID_NET);
> > > +		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
> > > +			return -EFAULT;
> > > +	}
> > > +
> > > +	return count;
> > > +}
> > > +
> > > +static ssize_t
> > > +virtiovf_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
> > > +		       size_t count, loff_t *ppos)
> > > +{
> > > +	struct virtiovf_pci_core_device *virtvdev = container_of(
> > > +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> > > +	struct pci_dev *pdev = virtvdev->core_device.pdev;
> > > +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> > > +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> > > +	int ret;
> > > +
> > > +	if (!count)
> > > +		return 0;
> > > +
> > > +	if (index == VFIO_PCI_CONFIG_REGION_INDEX)
> > > +		return virtiovf_pci_read_config(core_vdev, buf, count, ppos);
> > > +
> > > +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
> > > +		return vfio_pci_core_read(core_vdev, buf, count, ppos);
> > > +
> > > +	ret = pm_runtime_resume_and_get(&pdev->dev);
> > > +	if (ret) {
> > > +		pci_info_ratelimited(pdev, "runtime resume failed %d\n",
> > > +				     ret);
> > > +		return -EIO;
> > > +	}
> > > +
> > > +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, buf, count, true);
> > If the heart of this driver is simply pretending to have an I/O BAR
> > where I/O accesses into that BAR are translated to accesses in the MMIO
> > BAR, why can't this be done in the VMM, ie. QEMU?  Could I/O to MMIO
> > translation in QEMU improve performance (ex. if the MMIO is mmap'd and
> > can be accessed without bouncing back into kernel code)?

Hmm. Not this patch, but Jason tells me there are devices which actually do have
it implemented like this (with an MMIO BAR). You have to convert writes
into MMIO write+MMIO read to make it robus.


> The I/O bar control registers access is not converted to MMIO but into admin
> commands.
> Such admin commands transported using an admin queue owned by the hypervisor
> driver.
> Hypervisor driver in future may use admin queue for other tasks such as
> device msix config, features provisioning, device migration commands (dirty
> page tracking, device state read/write) and may be more.
> Only the driver notification register (i.e. kick/doorbell register) is
> converted to the MMIO.
> Hence, the VFIO solution looks the better approach to match current UAPI.
> 
> > > +	pm_runtime_put(&pdev->dev);
> > > +	return ret;
> > > +}
> > > +
> > > +static ssize_t
> > > +virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
> > > +			size_t count, loff_t *ppos)
> > > +{
> > > +	struct virtiovf_pci_core_device *virtvdev = container_of(
> > > +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> > > +	struct pci_dev *pdev = virtvdev->core_device.pdev;
> > > +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> > > +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> > > +	int ret;
> > > +
> > > +	if (!count)
> > > +		return 0;
> > > +
> > > +	if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
> > > +		loff_t copy_offset;
> > > +		u16 cmd;
> > > +
> > > +		if (range_contains_range(pos, count, PCI_COMMAND, sizeof(cmd),
> > > +					 &copy_offset)) {
> > > +			if (copy_from_user(&cmd, buf + copy_offset, sizeof(cmd)))
> > > +				return -EFAULT;
> > > +			virtvdev->pci_cmd_io = (cmd & PCI_COMMAND_IO);
> > If we're tracking writes to PCI_COMMAND_IO, why did we statically
> > report I/O enabled in the read function previously?
> 
> In case it will be turned off here, we may not turn it on back upon the
> read(), please see the above note in that area.
> 
> 
> > > +		}
> > > +	}
> > > +
> > > +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
> > > +		return vfio_pci_core_write(core_vdev, buf, count, ppos);
> > > +
> > > +	ret = pm_runtime_resume_and_get(&pdev->dev);
> > > +	if (ret) {
> > > +		pci_info_ratelimited(pdev, "runtime resume failed %d\n", ret);
> > > +		return -EIO;
> > > +	}
> > > +
> > > +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, (char __user *)buf, count, false);
> > > +	pm_runtime_put(&pdev->dev);
> > > +	return ret;
> > > +}
> > > +
> > > +static int
> > > +virtiovf_pci_ioctl_get_region_info(struct vfio_device *core_vdev,
> > > +				   unsigned int cmd, unsigned long arg)
> > > +{
> > > +	struct virtiovf_pci_core_device *virtvdev = container_of(
> > > +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> > > +	unsigned long minsz = offsetofend(struct vfio_region_info, offset);
> > > +	void __user *uarg = (void __user *)arg;
> > > +	struct vfio_region_info info = {};
> > > +
> > > +	if (copy_from_user(&info, uarg, minsz))
> > > +		return -EFAULT;
> > > +
> > > +	if (info.argsz < minsz)
> > > +		return -EINVAL;
> > > +
> > > +	switch (info.index) {
> > > +	case VFIO_PCI_BAR0_REGION_INDEX:
> > > +		info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
> > > +		info.size = virtvdev->bar0_virtual_buf_size;
> > > +		info.flags = VFIO_REGION_INFO_FLAG_READ |
> > > +			     VFIO_REGION_INFO_FLAG_WRITE;
> > > +		return copy_to_user(uarg, &info, minsz) ? -EFAULT : 0;
> > > +	default:
> > > +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
> > > +	}
> > > +}
> > > +
> > > +static long
> > > +virtiovf_vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
> > > +			     unsigned long arg)
> > > +{
> > > +	switch (cmd) {
> > > +	case VFIO_DEVICE_GET_REGION_INFO:
> > > +		return virtiovf_pci_ioctl_get_region_info(core_vdev, cmd, arg);
> > > +	default:
> > > +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
> > > +	}
> > > +}
> > > +
> > > +static int
> > > +virtiovf_set_notify_addr(struct virtiovf_pci_core_device *virtvdev)
> > > +{
> > > +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
> > > +	int ret;
> > > +
> > > +	/* Setup the BAR where the 'notify' exists to be used by vfio as well
> > > +	 * This will let us mmap it only once and use it when needed.
> > > +	 */
> > > +	ret = vfio_pci_core_setup_barmap(core_device,
> > > +					 virtvdev->notify_bar);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	virtvdev->notify_addr = core_device->barmap[virtvdev->notify_bar] +
> > > +			virtvdev->notify_offset;
> > > +	return 0;
> > > +}
> > > +
> > > +static int virtiovf_pci_open_device(struct vfio_device *core_vdev)
> > > +{
> > > +	struct virtiovf_pci_core_device *virtvdev = container_of(
> > > +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> > > +	struct vfio_pci_core_device *vdev = &virtvdev->core_device;
> > > +	int ret;
> > > +
> > > +	ret = vfio_pci_core_enable(vdev);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	if (virtvdev->bar0_virtual_buf) {
> > > +		/* upon close_device() the vfio_pci_core_disable() is called
> > > +		 * and will close all the previous mmaps, so it seems that the
> > > +		 * valid life cycle for the 'notify' addr is per open/close.
> > > +		 */
> > > +		ret = virtiovf_set_notify_addr(virtvdev);
> > > +		if (ret) {
> > > +			vfio_pci_core_disable(vdev);
> > > +			return ret;
> > > +		}
> > > +	}
> > > +
> > > +	vfio_pci_core_finish_enable(vdev);
> > > +	return 0;
> > > +}
> > > +
> > > +static void virtiovf_pci_close_device(struct vfio_device *core_vdev)
> > > +{
> > > +	vfio_pci_core_close_device(core_vdev);
> > > +}
> > Why does this function exist?
> 
> From symmetric reasons, as we have the virtiovf_pci_open_device() I put also
> the close() one.
> However, we can just set virtiovf_pci_close_device() on the ops and drop
> this code.
> > 
> > > +
> > > +static int virtiovf_get_device_config_size(unsigned short device)
> > > +{
> > > +	switch (device) {
> > > +	case 0x1041:
> > > +		/* network card */
> > > +		return offsetofend(struct virtio_net_config, status);
> > > +	default:
> > > +		return 0;
> > > +	}
> > > +}
> > > +
> > > +static int virtiovf_read_notify_info(struct virtiovf_pci_core_device *virtvdev)
> > > +{
> > > +	u64 offset;
> > > +	int ret;
> > > +	u8 bar;
> > > +
> > > +	ret = virtiovf_cmd_lq_read_notify(virtvdev,
> > > +				VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM,
> > > +				&bar, &offset);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	virtvdev->notify_bar = bar;
> > > +	virtvdev->notify_offset = offset;
> > > +	return 0;
> > > +}
> > > +
> > > +static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
> > > +{
> > > +	struct virtiovf_pci_core_device *virtvdev = container_of(
> > > +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> > > +	struct pci_dev *pdev;
> > > +	int ret;
> > > +
> > > +	ret = vfio_pci_core_init_dev(core_vdev);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	pdev = virtvdev->core_device.pdev;
> > > +	virtvdev->vf_id = pci_iov_vf_id(pdev);
> > > +	if (virtvdev->vf_id < 0)
> > > +		return -EINVAL;
> > vf_id is never used.
> 
> It's used as part of the virtio commands, see the previous preparation
> patch.
> 
> > 
> > > +
> > > +	ret = virtiovf_read_notify_info(virtvdev);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	virtvdev->bar0_virtual_buf_size = VIRTIO_LEGACY_IO_BAR_HEADER_LEN +
> > > +		VIRTIO_LEGACY_IO_BAR_MSIX_HEADER_LEN +
> > > +		virtiovf_get_device_config_size(pdev->device);
> > > +	virtvdev->bar0_virtual_buf = kzalloc(virtvdev->bar0_virtual_buf_size,
> > > +					     GFP_KERNEL);
> > > +	if (!virtvdev->bar0_virtual_buf)
> > > +		return -ENOMEM;
> > > +	mutex_init(&virtvdev->bar_mutex);
> > > +	return 0;
> > > +}
> > > +
> > > +static void virtiovf_pci_core_release_dev(struct vfio_device *core_vdev)
> > > +{
> > > +	struct virtiovf_pci_core_device *virtvdev = container_of(
> > > +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> > > +
> > > +	kfree(virtvdev->bar0_virtual_buf);
> > > +	vfio_pci_core_release_dev(core_vdev);
> > > +}
> > > +
> > > +static const struct vfio_device_ops virtiovf_acc_vfio_pci_tran_ops = {
> > > +	.name = "virtio-transitional-vfio-pci",
> > > +	.init = virtiovf_pci_init_device,
> > > +	.release = virtiovf_pci_core_release_dev,
> > > +	.open_device = virtiovf_pci_open_device,
> > > +	.close_device = virtiovf_pci_close_device,
> > > +	.ioctl = virtiovf_vfio_pci_core_ioctl,
> > > +	.read = virtiovf_pci_core_read,
> > > +	.write = virtiovf_pci_core_write,
> > > +	.mmap = vfio_pci_core_mmap,
> > > +	.request = vfio_pci_core_request,
> > > +	.match = vfio_pci_core_match,
> > > +	.bind_iommufd = vfio_iommufd_physical_bind,
> > > +	.unbind_iommufd = vfio_iommufd_physical_unbind,
> > > +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
> > > +};
> > > +
> > > +static const struct vfio_device_ops virtiovf_acc_vfio_pci_ops = {
> > > +	.name = "virtio-acc-vfio-pci",
> > > +	.init = vfio_pci_core_init_dev,
> > > +	.release = vfio_pci_core_release_dev,
> > > +	.open_device = virtiovf_pci_open_device,
> > > +	.close_device = virtiovf_pci_close_device,
> > > +	.ioctl = vfio_pci_core_ioctl,
> > > +	.device_feature = vfio_pci_core_ioctl_feature,
> > > +	.read = vfio_pci_core_read,
> > > +	.write = vfio_pci_core_write,
> > > +	.mmap = vfio_pci_core_mmap,
> > > +	.request = vfio_pci_core_request,
> > > +	.match = vfio_pci_core_match,
> > > +	.bind_iommufd = vfio_iommufd_physical_bind,
> > > +	.unbind_iommufd = vfio_iommufd_physical_unbind,
> > > +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
> > > +};
> > Why are we claiming devices that should just use vfio-pci instead?
> 
> 
> Upon probe we may chose to set those default vfio-pci ops in case the device
> is not legacy capable.
> This will eliminate any usage of the new driver functionality when it's not
> applicable.
> 
> > 
> > > +
> > > +static bool virtiovf_bar0_exists(struct pci_dev *pdev)
> > > +{
> > > +	struct resource *res = pdev->resource;
> > > +
> > > +	return res->flags ? true : false;
> > > +}
> > > +
> > > +#define VIRTIOVF_USE_ADMIN_CMD_BITMAP \
> > > +	(BIT_ULL(VIRTIO_ADMIN_CMD_LIST_QUERY) | \
> > > +	 BIT_ULL(VIRTIO_ADMIN_CMD_LIST_USE) | \
> > > +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE) | \
> > > +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ) | \
> > > +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE) | \
> > > +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
> > > +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
> > > +
> > > +static bool virtiovf_support_legacy_access(struct pci_dev *pdev)
> > > +{
> > > +	int buf_size = DIV_ROUND_UP(VIRTIO_ADMIN_MAX_CMD_OPCODE, 64) * 8;
> > > +	u8 *buf;
> > > +	int ret;
> > > +
> > > +	/* Only virtio-net is supported/tested so far */
> > > +	if (pdev->device != 0x1041)
> > > +		return false;
> > Seems like the ID table should handle this, why are we preemptively
> > claiming all virtio devices... or actually all 0x1af4 devices, which
> > might not even be virtio, ex. the non-virtio ivshmem devices is 0x1110.
> 
> Makes sense, will change in the ID table from PCI_ANY_ID to 0x1041 and
> cleanup that code.
> 
> > > +
> > > +	buf = kzalloc(buf_size, GFP_KERNEL);
> > > +	if (!buf)
> > > +		return false;
> > > +
> > > +	ret = virtiovf_cmd_list_query(pdev, buf, buf_size);
> > > +	if (ret)
> > > +		goto end;
> > > +
> > > +	if ((le64_to_cpup((__le64 *)buf) & VIRTIOVF_USE_ADMIN_CMD_BITMAP) !=
> > > +		VIRTIOVF_USE_ADMIN_CMD_BITMAP) {
> > > +		ret = -EOPNOTSUPP;
> > > +		goto end;
> > > +	}
> > > +
> > > +	/* confirm the used commands */
> > > +	memset(buf, 0, buf_size);
> > > +	*(__le64 *)buf = cpu_to_le64(VIRTIOVF_USE_ADMIN_CMD_BITMAP);
> > > +	ret = virtiovf_cmd_list_use(pdev, buf, buf_size);
> > > +
> > > +end:
> > > +	kfree(buf);
> > > +	return ret ? false : true;
> > > +}
> > > +
> > > +static int virtiovf_pci_probe(struct pci_dev *pdev,
> > > +			      const struct pci_device_id *id)
> > > +{
> > > +	const struct vfio_device_ops *ops = &virtiovf_acc_vfio_pci_ops;
> > > +	struct virtiovf_pci_core_device *virtvdev;
> > > +	int ret;
> > > +
> > > +	if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
> > > +	    !virtiovf_bar0_exists(pdev) && pdev->msix_cap)
> > > +		ops = &virtiovf_acc_vfio_pci_tran_ops;
> > > +
> > > +	virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
> > > +				     &pdev->dev, ops);
> > > +	if (IS_ERR(virtvdev))
> > > +		return PTR_ERR(virtvdev);
> > > +
> > > +	dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
> > > +	ret = vfio_pci_core_register_device(&virtvdev->core_device);
> > > +	if (ret)
> > > +		goto out;
> > > +	return 0;
> > > +out:
> > > +	vfio_put_device(&virtvdev->core_device.vdev);
> > > +	return ret;
> > > +}
> > > +
> > > +static void virtiovf_pci_remove(struct pci_dev *pdev)
> > > +{
> > > +	struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
> > > +
> > > +	vfio_pci_core_unregister_device(&virtvdev->core_device);
> > > +	vfio_put_device(&virtvdev->core_device.vdev);
> > > +}
> > > +
> > > +static const struct pci_device_id virtiovf_pci_table[] = {
> > > +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, PCI_ANY_ID) },
> > libvirt will blindly use this driver for all devices matching this as
> > we've discussed how it should make use of modules.alias.  I don't think
> > this driver should be squatting on devices where it doesn't add value
> > and it's not clear whether this is adding or subtracting value in all
> > cases for the one NIC that it modifies.
> 
> 
> When the device is not legacy capable, we chose the vfio-pci default ops as
> pointed above, otherwise we may chose the new functionality to enable it in
> the guest.
> 
> >    How should libvirt choose when
> > and where to use this driver?  What regressions are we going to see
> > with VMs that previously saw "modern" virtio-net devices and now see a
> > legacy compatible device?  Thanks,
> We don't expect a regression here, a modern driver in the guest will
> continue using its direct access flow.
> 
> Do you see a real concern why to not enable it by default and come with some
> pre-configuration before the probe phase to activate it ?
> If so, any specific suggestion how to manage that ?
> 
> Thanks,
> Yishai

I would not claim that it can't happen.
For example, a transitional device
must not in theory be safely passed through to guest userspace, because
guest then might try to use it through the legacy BAR
without acknowledging ACCESS_PLATFORM.
Do any guests check this and fail? Hard to say.

> > Alex
> > 
> > > +	{}
> > > +};
> > > +
> > > +MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
> > > +
> > > +static struct pci_driver virtiovf_pci_driver = {
> > > +	.name = KBUILD_MODNAME,
> > > +	.id_table = virtiovf_pci_table,
> > > +	.probe = virtiovf_pci_probe,
> > > +	.remove = virtiovf_pci_remove,
> > > +	.err_handler = &vfio_pci_core_err_handlers,
> > > +	.driver_managed_dma = true,
> > > +};
> > > +
> > > +module_pci_driver(virtiovf_pci_driver);
> > > +
> > > +MODULE_LICENSE("GPL");
> > > +MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
> > > +MODULE_DESCRIPTION(
> > > +	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO device family");
> 

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-26 17:00         ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-26 17:00 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: Alex Williamson, jasowang, jgg, kvm, virtualization, parav,
	feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Tue, Sep 26, 2023 at 06:20:45PM +0300, Yishai Hadas wrote:
> On 21/09/2023 22:58, Alex Williamson wrote:
> > On Thu, 21 Sep 2023 15:40:40 +0300
> > Yishai Hadas <yishaih@nvidia.com> wrote:
> > 
> > > Introduce a vfio driver over virtio devices to support the legacy
> > > interface functionality for VFs.
> > > 
> > > Background, from the virtio spec [1].
> > > --------------------------------------------------------------------
> > > In some systems, there is a need to support a virtio legacy driver with
> > > a device that does not directly support the legacy interface. In such
> > > scenarios, a group owner device can provide the legacy interface
> > > functionality for the group member devices. The driver of the owner
> > > device can then access the legacy interface of a member device on behalf
> > > of the legacy member device driver.
> > > 
> > > For example, with the SR-IOV group type, group members (VFs) can not
> > > present the legacy interface in an I/O BAR in BAR0 as expected by the
> > > legacy pci driver. If the legacy driver is running inside a virtual
> > > machine, the hypervisor executing the virtual machine can present a
> > > virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
> > > legacy driver accesses to this I/O BAR and forwards them to the group
> > > owner device (PF) using group administration commands.
> > > --------------------------------------------------------------------
> > > 
> > > Specifically, this driver adds support for a virtio-net VF to be exposed
> > > as a transitional device to a guest driver and allows the legacy IO BAR
> > > functionality on top.
> > > 
> > > This allows a VM which uses a legacy virtio-net driver in the guest to
> > > work transparently over a VF which its driver in the host is that new
> > > driver.
> > > 
> > > The driver can be extended easily to support some other types of virtio
> > > devices (e.g virtio-blk), by adding in a few places the specific type
> > > properties as was done for virtio-net.
> > > 
> > > For now, only the virtio-net use case was tested and as such we introduce
> > > the support only for such a device.
> > > 
> > > Practically,
> > > Upon probing a VF for a virtio-net device, in case its PF supports
> > > legacy access over the virtio admin commands and the VF doesn't have BAR
> > > 0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
> > > transitional device with I/O BAR in BAR 0.
> > > 
> > > The existence of the simulated I/O bar is reported later on by
> > > overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
> > > exposes itself as a transitional device by overwriting some properties
> > > upon reading its config space.
> > > 
> > > Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
> > > guest may use it via read/write calls according to the virtio
> > > specification.
> > > 
> > > Any read/write towards the control parts of the BAR will be captured by
> > > the new driver and will be translated into admin commands towards the
> > > device.
> > > 
> > > Any data path read/write access (i.e. virtio driver notifications) will
> > > be forwarded to the physical BAR which its properties were supplied by
> > > the command VIRTIO_PCI_QUEUE_NOTIFY upon the probing/init flow.
> > > 
> > > With that code in place a legacy driver in the guest has the look and
> > > feel as if having a transitional device with legacy support for both its
> > > control and data path flows.
> > > 
> > > [1]
> > > https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
> > > 
> > > Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> > > ---
> > >   MAINTAINERS                      |   6 +
> > >   drivers/vfio/pci/Kconfig         |   2 +
> > >   drivers/vfio/pci/Makefile        |   2 +
> > >   drivers/vfio/pci/virtio/Kconfig  |  15 +
> > >   drivers/vfio/pci/virtio/Makefile |   4 +
> > >   drivers/vfio/pci/virtio/cmd.c    |   4 +-
> > >   drivers/vfio/pci/virtio/cmd.h    |   8 +
> > >   drivers/vfio/pci/virtio/main.c   | 546 +++++++++++++++++++++++++++++++
> > >   8 files changed, 585 insertions(+), 2 deletions(-)
> > >   create mode 100644 drivers/vfio/pci/virtio/Kconfig
> > >   create mode 100644 drivers/vfio/pci/virtio/Makefile
> > >   create mode 100644 drivers/vfio/pci/virtio/main.c
> > > 
> > > diff --git a/MAINTAINERS b/MAINTAINERS
> > > index bf0f54c24f81..5098418c8389 100644
> > > --- a/MAINTAINERS
> > > +++ b/MAINTAINERS
> > > @@ -22624,6 +22624,12 @@ L:	kvm@vger.kernel.org
> > >   S:	Maintained
> > >   F:	drivers/vfio/pci/mlx5/
> > > +VFIO VIRTIO PCI DRIVER
> > > +M:	Yishai Hadas <yishaih@nvidia.com>
> > > +L:	kvm@vger.kernel.org
> > > +S:	Maintained
> > > +F:	drivers/vfio/pci/virtio
> > > +
> > >   VFIO PCI DEVICE SPECIFIC DRIVERS
> > >   R:	Jason Gunthorpe <jgg@nvidia.com>
> > >   R:	Yishai Hadas <yishaih@nvidia.com>
> > > diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> > > index 8125e5f37832..18c397df566d 100644
> > > --- a/drivers/vfio/pci/Kconfig
> > > +++ b/drivers/vfio/pci/Kconfig
> > > @@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
> > >   source "drivers/vfio/pci/pds/Kconfig"
> > > +source "drivers/vfio/pci/virtio/Kconfig"
> > > +
> > >   endmenu
> > > diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
> > > index 45167be462d8..046139a4eca5 100644
> > > --- a/drivers/vfio/pci/Makefile
> > > +++ b/drivers/vfio/pci/Makefile
> > > @@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
> > >   obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
> > >   obj-$(CONFIG_PDS_VFIO_PCI) += pds/
> > > +
> > > +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
> > > diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
> > > new file mode 100644
> > > index 000000000000..89eddce8b1bd
> > > --- /dev/null
> > > +++ b/drivers/vfio/pci/virtio/Kconfig
> > > @@ -0,0 +1,15 @@
> > > +# SPDX-License-Identifier: GPL-2.0-only
> > > +config VIRTIO_VFIO_PCI
> > > +        tristate "VFIO support for VIRTIO PCI devices"
> > > +        depends on VIRTIO_PCI
> > > +        select VFIO_PCI_CORE
> > > +        help
> > > +          This provides support for exposing VIRTIO VF devices using the VFIO
> > > +          framework that can work with a legacy virtio driver in the guest.
> > > +          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
> > > +          not indicate I/O Space.
> > > +          As of that this driver emulated I/O BAR in software to let a VF be
> > > +          seen as a transitional device in the guest and let it work with
> > > +          a legacy driver.
> > > +
> > > +          If you don't know what to do here, say N.
> > > diff --git a/drivers/vfio/pci/virtio/Makefile b/drivers/vfio/pci/virtio/Makefile
> > > new file mode 100644
> > > index 000000000000..584372648a03
> > > --- /dev/null
> > > +++ b/drivers/vfio/pci/virtio/Makefile
> > > @@ -0,0 +1,4 @@
> > > +# SPDX-License-Identifier: GPL-2.0-only
> > > +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio-vfio-pci.o
> > > +virtio-vfio-pci-y := main.o cmd.o
> > > +
> > > diff --git a/drivers/vfio/pci/virtio/cmd.c b/drivers/vfio/pci/virtio/cmd.c
> > > index f068239cdbb0..aea9d25fbf1d 100644
> > > --- a/drivers/vfio/pci/virtio/cmd.c
> > > +++ b/drivers/vfio/pci/virtio/cmd.c
> > > @@ -44,7 +44,7 @@ int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> > >   {
> > >   	struct virtio_device *virtio_dev =
> > >   		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> > > -	struct virtio_admin_cmd_data_lr_write *in;
> > > +	struct virtio_admin_cmd_legacy_wr_data *in;
> > >   	struct scatterlist in_sg;
> > >   	struct virtio_admin_cmd cmd = {};
> > >   	int ret;
> > > @@ -74,7 +74,7 @@ int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
> > >   {
> > >   	struct virtio_device *virtio_dev =
> > >   		virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> > > -	struct virtio_admin_cmd_data_lr_read *in;
> > > +	struct virtio_admin_cmd_legacy_rd_data *in;
> > >   	struct scatterlist in_sg, out_sg;
> > >   	struct virtio_admin_cmd cmd = {};
> > >   	int ret;
> > > diff --git a/drivers/vfio/pci/virtio/cmd.h b/drivers/vfio/pci/virtio/cmd.h
> > > index c2a3645f4b90..347b1dc85570 100644
> > > --- a/drivers/vfio/pci/virtio/cmd.h
> > > +++ b/drivers/vfio/pci/virtio/cmd.h
> > > @@ -13,7 +13,15 @@
> > >   struct virtiovf_pci_core_device {
> > >   	struct vfio_pci_core_device core_device;
> > > +	u8 bar0_virtual_buf_size;
> > > +	u8 *bar0_virtual_buf;
> > > +	/* synchronize access to the virtual buf */
> > > +	struct mutex bar_mutex;
> > >   	int vf_id;
> > > +	void __iomem *notify_addr;
> > > +	u32 notify_offset;
> > > +	u8 notify_bar;
> > > +	u8 pci_cmd_io :1;
> > >   };
> > >   int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
> > > diff --git a/drivers/vfio/pci/virtio/main.c b/drivers/vfio/pci/virtio/main.c
> > > new file mode 100644
> > > index 000000000000..2486991c49f3
> > > --- /dev/null
> > > +++ b/drivers/vfio/pci/virtio/main.c
> > > @@ -0,0 +1,546 @@
> > > +// SPDX-License-Identifier: GPL-2.0-only
> > > +/*
> > > + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> > > + */
> > > +
> > > +#include <linux/device.h>
> > > +#include <linux/module.h>
> > > +#include <linux/mutex.h>
> > > +#include <linux/pci.h>
> > > +#include <linux/pm_runtime.h>
> > > +#include <linux/types.h>
> > > +#include <linux/uaccess.h>
> > > +#include <linux/vfio.h>
> > > +#include <linux/vfio_pci_core.h>
> > > +#include <linux/virtio_pci.h>
> > > +#include <linux/virtio_net.h>
> > > +#include <linux/virtio_pci_modern.h>
> > > +
> > > +#include "cmd.h"
> > > +
> > > +#define VIRTIO_LEGACY_IO_BAR_HEADER_LEN 20
> > > +#define VIRTIO_LEGACY_IO_BAR_MSIX_HEADER_LEN 4
> > > +
> > > +static int virtiovf_issue_lr_cmd(struct virtiovf_pci_core_device *virtvdev,
> > > +				 loff_t pos, char __user *buf,
> > > +				 size_t count, bool read)
> > > +{
> > > +	u8 *bar0_buf = virtvdev->bar0_virtual_buf;
> > > +	u16 opcode;
> > > +	int ret;
> > > +
> > > +	mutex_lock(&virtvdev->bar_mutex);
> > > +	if (read) {
> > > +		opcode = (pos < VIRTIO_PCI_CONFIG_OFF(true)) ?
> > > +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ :
> > > +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ;
> > > +		ret = virtiovf_cmd_lr_read(virtvdev, opcode, pos,
> > > +					   count, bar0_buf + pos);
> > > +		if (ret)
> > > +			goto out;
> > > +		if (copy_to_user(buf, bar0_buf + pos, count))
> > > +			ret = -EFAULT;
> > > +		goto out;
> > > +	}
> > > +
> > > +	if (copy_from_user(bar0_buf + pos, buf, count)) {
> > > +		ret = -EFAULT;
> > > +		goto out;
> > > +	}
> > > +
> > > +	opcode = (pos < VIRTIO_PCI_CONFIG_OFF(true)) ?
> > > +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE :
> > > +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE;
> > > +	ret = virtiovf_cmd_lr_write(virtvdev, opcode, pos, count,
> > > +				    bar0_buf + pos);
> > > +out:
> > > +	mutex_unlock(&virtvdev->bar_mutex);
> > > +	return ret;
> > > +}
> > > +
> > > +static int
> > > +translate_io_bar_to_mem_bar(struct virtiovf_pci_core_device *virtvdev,
> > > +			    loff_t pos, char __user *buf,
> > > +			    size_t count, bool read)
> > > +{
> > > +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
> > > +	u16 queue_notify;
> > > +	int ret;
> > > +
> > > +	if (pos + count > virtvdev->bar0_virtual_buf_size)
> > > +		return -EINVAL;
> > > +
> > > +	switch (pos) {
> > > +	case VIRTIO_PCI_QUEUE_NOTIFY:
> > > +		if (count != sizeof(queue_notify))
> > > +			return -EINVAL;
> > > +		if (read) {
> > > +			ret = vfio_pci_ioread16(core_device, true, &queue_notify,
> > > +						virtvdev->notify_addr);
> > > +			if (ret)
> > > +				return ret;
> > > +			if (copy_to_user(buf, &queue_notify,
> > > +					 sizeof(queue_notify)))
> > > +				return -EFAULT;
> > > +			break;
> > > +		}
> > > +
> > > +		if (copy_from_user(&queue_notify, buf, count))
> > > +			return -EFAULT;
> > > +
> > > +		ret = vfio_pci_iowrite16(core_device, true, queue_notify,
> > > +					 virtvdev->notify_addr);
> > > +		break;
> > > +	default:
> > > +		ret = virtiovf_issue_lr_cmd(virtvdev, pos, buf, count, read);
> > > +	}
> > > +
> > > +	return ret ? ret : count;
> > > +}
> > > +
> > > +static bool range_contains_range(loff_t range1_start, size_t count1,
> > > +				 loff_t range2_start, size_t count2,
> > > +				 loff_t *start_offset)
> > > +{
> > > +	if (range1_start <= range2_start &&
> > > +	    range1_start + count1 >= range2_start + count2) {
> > > +		*start_offset = range2_start - range1_start;
> > > +		return true;
> > > +	}
> > > +	return false;
> > > +}
> > > +
> > > +static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
> > > +					char __user *buf, size_t count,
> > > +					loff_t *ppos)
> > > +{
> > > +	struct virtiovf_pci_core_device *virtvdev = container_of(
> > > +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> > > +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> > > +	loff_t copy_offset;
> > > +	__le32 val32;
> > > +	__le16 val16;
> > > +	u8 val8;
> > > +	int ret;
> > > +
> > > +	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
> > > +	if (ret < 0)
> > > +		return ret;
> > > +
> > > +	if (range_contains_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
> > > +				 &copy_offset)) {
> > > +		val16 = cpu_to_le16(0x1000);
> > > +		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
> > > +			return -EFAULT;
> > > +	}
> > So we take a 0x1041 ("Virtio 1.0 network device") and turn it into a
> > 0x1000 ("Virtio network device").  Are there no features implied by the
> > device ID?  NB, a byte-wise access would read the real device ID.
> 
> From spec POV 0x1000 is a transitional device which covers the functionality
> of 0x1041 device and the legacy device, so we should be fine here.
> 
> Re the byte-wise access, do we have such an access from QEMU ? I couldn't
> see a partial read of a config field.
> As of that I preferred to keep the code simple and to not manage such a
> partial flow.
> However, If we may still be concerned about, I can allow that partial read
> as part of V1.
> 
> What do you think ?
> 
> > > +
> > > +	if (virtvdev->pci_cmd_io &&
> > > +	    range_contains_range(pos, count, PCI_COMMAND, sizeof(val16),
> > > +				 &copy_offset)) {
> > > +		if (copy_from_user(&val16, buf, sizeof(val16)))
> > > +			return -EFAULT;
> > > +		val16 |= cpu_to_le16(PCI_COMMAND_IO);
> > > +		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
> > > +			return -EFAULT;
> > > +	}
> > So we can't turn off I/O memory.
> 
> See below as part of virtiovf_pci_core_write(), it can be turned off, the
> next virtiovf_pci_read_config() won't turn it on in that case.
> 
> This is what ' virtvdev->pci_cmd_io' field was used for.
> 
> > 
> > > +
> > > +	if (range_contains_range(pos, count, PCI_REVISION_ID, sizeof(val8),
> > > +				 &copy_offset)) {
> > > +		/* Transional needs to have revision 0 */
> > > +		val8 = 0;
> > > +		if (copy_to_user(buf + copy_offset, &val8, sizeof(val8)))
> > > +			return -EFAULT;
> > > +	}
> > Surely some driver cares about this, right?  How is this supposed to
> > work in a world where libvirt parses modules.alias and automatically
> > loads this driver rather than vfio-pci for all 0x1041 devices?  We'd
> > need to denylist this driver to ever see the device for what it is.

I think I'm missing something. What in this patch might make
libvirt load this driver automatically?



> 
> This was needed by the guest driver to support both modern and legacy
> access, it can still chose the modern one.
> 
> Please see below re libvirt.
> 
> > 
> > > +
> > > +	if (range_contains_range(pos, count, PCI_BASE_ADDRESS_0, sizeof(val32),
> > > +				 &copy_offset)) {
> > > +		val32 = cpu_to_le32(PCI_BASE_ADDRESS_SPACE_IO);
> > > +		if (copy_to_user(buf + copy_offset, &val32, sizeof(val32)))
> > > +			return -EFAULT;
> > > +	}
> > Sloppy BAR emulation compared to the real BARs.  QEMU obviously doesn't
> > care.
> 
> From what I could see, QEMU needs the bit for 'PCI_BASE_ADDRESS_SPACE_IO'.
> 
> It doesn't really care about the address as you wrote, this is why it was
> just left as zero here.
> Does it make sense to you ?

I mean if all you care about is QEMU then you should just keep all this
code in QEMU. One you have some behaviour in UAPI you can never take
it back even if it's a bug userspace will come to depend on it.


> > 
> > > +
> > > +	if (range_contains_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16),
> > > +				 &copy_offset)) {
> > > +		/* Transitional devices use the PCI subsystem device id as
> > > +		 * virtio device id, same as legacy driver always did.
> > > +		 */
> > Non-networking multi-line comment style throughout please.
> 
> Sure, will handle as part of V1.
> 
> > 
> > > +		val16 = cpu_to_le16(VIRTIO_ID_NET);
> > > +		if (copy_to_user(buf + copy_offset, &val16, sizeof(val16)))
> > > +			return -EFAULT;
> > > +	}
> > > +
> > > +	return count;
> > > +}
> > > +
> > > +static ssize_t
> > > +virtiovf_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
> > > +		       size_t count, loff_t *ppos)
> > > +{
> > > +	struct virtiovf_pci_core_device *virtvdev = container_of(
> > > +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> > > +	struct pci_dev *pdev = virtvdev->core_device.pdev;
> > > +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> > > +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> > > +	int ret;
> > > +
> > > +	if (!count)
> > > +		return 0;
> > > +
> > > +	if (index == VFIO_PCI_CONFIG_REGION_INDEX)
> > > +		return virtiovf_pci_read_config(core_vdev, buf, count, ppos);
> > > +
> > > +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
> > > +		return vfio_pci_core_read(core_vdev, buf, count, ppos);
> > > +
> > > +	ret = pm_runtime_resume_and_get(&pdev->dev);
> > > +	if (ret) {
> > > +		pci_info_ratelimited(pdev, "runtime resume failed %d\n",
> > > +				     ret);
> > > +		return -EIO;
> > > +	}
> > > +
> > > +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, buf, count, true);
> > If the heart of this driver is simply pretending to have an I/O BAR
> > where I/O accesses into that BAR are translated to accesses in the MMIO
> > BAR, why can't this be done in the VMM, ie. QEMU?  Could I/O to MMIO
> > translation in QEMU improve performance (ex. if the MMIO is mmap'd and
> > can be accessed without bouncing back into kernel code)?

Hmm. Not this patch, but Jason tells me there are devices which actually do have
it implemented like this (with an MMIO BAR). You have to convert writes
into MMIO write+MMIO read to make it robus.


> The I/O bar control registers access is not converted to MMIO but into admin
> commands.
> Such admin commands transported using an admin queue owned by the hypervisor
> driver.
> Hypervisor driver in future may use admin queue for other tasks such as
> device msix config, features provisioning, device migration commands (dirty
> page tracking, device state read/write) and may be more.
> Only the driver notification register (i.e. kick/doorbell register) is
> converted to the MMIO.
> Hence, the VFIO solution looks the better approach to match current UAPI.
> 
> > > +	pm_runtime_put(&pdev->dev);
> > > +	return ret;
> > > +}
> > > +
> > > +static ssize_t
> > > +virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
> > > +			size_t count, loff_t *ppos)
> > > +{
> > > +	struct virtiovf_pci_core_device *virtvdev = container_of(
> > > +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> > > +	struct pci_dev *pdev = virtvdev->core_device.pdev;
> > > +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> > > +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> > > +	int ret;
> > > +
> > > +	if (!count)
> > > +		return 0;
> > > +
> > > +	if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
> > > +		loff_t copy_offset;
> > > +		u16 cmd;
> > > +
> > > +		if (range_contains_range(pos, count, PCI_COMMAND, sizeof(cmd),
> > > +					 &copy_offset)) {
> > > +			if (copy_from_user(&cmd, buf + copy_offset, sizeof(cmd)))
> > > +				return -EFAULT;
> > > +			virtvdev->pci_cmd_io = (cmd & PCI_COMMAND_IO);
> > If we're tracking writes to PCI_COMMAND_IO, why did we statically
> > report I/O enabled in the read function previously?
> 
> In case it will be turned off here, we may not turn it on back upon the
> read(), please see the above note in that area.
> 
> 
> > > +		}
> > > +	}
> > > +
> > > +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
> > > +		return vfio_pci_core_write(core_vdev, buf, count, ppos);
> > > +
> > > +	ret = pm_runtime_resume_and_get(&pdev->dev);
> > > +	if (ret) {
> > > +		pci_info_ratelimited(pdev, "runtime resume failed %d\n", ret);
> > > +		return -EIO;
> > > +	}
> > > +
> > > +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, (char __user *)buf, count, false);
> > > +	pm_runtime_put(&pdev->dev);
> > > +	return ret;
> > > +}
> > > +
> > > +static int
> > > +virtiovf_pci_ioctl_get_region_info(struct vfio_device *core_vdev,
> > > +				   unsigned int cmd, unsigned long arg)
> > > +{
> > > +	struct virtiovf_pci_core_device *virtvdev = container_of(
> > > +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> > > +	unsigned long minsz = offsetofend(struct vfio_region_info, offset);
> > > +	void __user *uarg = (void __user *)arg;
> > > +	struct vfio_region_info info = {};
> > > +
> > > +	if (copy_from_user(&info, uarg, minsz))
> > > +		return -EFAULT;
> > > +
> > > +	if (info.argsz < minsz)
> > > +		return -EINVAL;
> > > +
> > > +	switch (info.index) {
> > > +	case VFIO_PCI_BAR0_REGION_INDEX:
> > > +		info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
> > > +		info.size = virtvdev->bar0_virtual_buf_size;
> > > +		info.flags = VFIO_REGION_INFO_FLAG_READ |
> > > +			     VFIO_REGION_INFO_FLAG_WRITE;
> > > +		return copy_to_user(uarg, &info, minsz) ? -EFAULT : 0;
> > > +	default:
> > > +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
> > > +	}
> > > +}
> > > +
> > > +static long
> > > +virtiovf_vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
> > > +			     unsigned long arg)
> > > +{
> > > +	switch (cmd) {
> > > +	case VFIO_DEVICE_GET_REGION_INFO:
> > > +		return virtiovf_pci_ioctl_get_region_info(core_vdev, cmd, arg);
> > > +	default:
> > > +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
> > > +	}
> > > +}
> > > +
> > > +static int
> > > +virtiovf_set_notify_addr(struct virtiovf_pci_core_device *virtvdev)
> > > +{
> > > +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
> > > +	int ret;
> > > +
> > > +	/* Setup the BAR where the 'notify' exists to be used by vfio as well
> > > +	 * This will let us mmap it only once and use it when needed.
> > > +	 */
> > > +	ret = vfio_pci_core_setup_barmap(core_device,
> > > +					 virtvdev->notify_bar);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	virtvdev->notify_addr = core_device->barmap[virtvdev->notify_bar] +
> > > +			virtvdev->notify_offset;
> > > +	return 0;
> > > +}
> > > +
> > > +static int virtiovf_pci_open_device(struct vfio_device *core_vdev)
> > > +{
> > > +	struct virtiovf_pci_core_device *virtvdev = container_of(
> > > +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> > > +	struct vfio_pci_core_device *vdev = &virtvdev->core_device;
> > > +	int ret;
> > > +
> > > +	ret = vfio_pci_core_enable(vdev);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	if (virtvdev->bar0_virtual_buf) {
> > > +		/* upon close_device() the vfio_pci_core_disable() is called
> > > +		 * and will close all the previous mmaps, so it seems that the
> > > +		 * valid life cycle for the 'notify' addr is per open/close.
> > > +		 */
> > > +		ret = virtiovf_set_notify_addr(virtvdev);
> > > +		if (ret) {
> > > +			vfio_pci_core_disable(vdev);
> > > +			return ret;
> > > +		}
> > > +	}
> > > +
> > > +	vfio_pci_core_finish_enable(vdev);
> > > +	return 0;
> > > +}
> > > +
> > > +static void virtiovf_pci_close_device(struct vfio_device *core_vdev)
> > > +{
> > > +	vfio_pci_core_close_device(core_vdev);
> > > +}
> > Why does this function exist?
> 
> From symmetric reasons, as we have the virtiovf_pci_open_device() I put also
> the close() one.
> However, we can just set virtiovf_pci_close_device() on the ops and drop
> this code.
> > 
> > > +
> > > +static int virtiovf_get_device_config_size(unsigned short device)
> > > +{
> > > +	switch (device) {
> > > +	case 0x1041:
> > > +		/* network card */
> > > +		return offsetofend(struct virtio_net_config, status);
> > > +	default:
> > > +		return 0;
> > > +	}
> > > +}
> > > +
> > > +static int virtiovf_read_notify_info(struct virtiovf_pci_core_device *virtvdev)
> > > +{
> > > +	u64 offset;
> > > +	int ret;
> > > +	u8 bar;
> > > +
> > > +	ret = virtiovf_cmd_lq_read_notify(virtvdev,
> > > +				VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM,
> > > +				&bar, &offset);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	virtvdev->notify_bar = bar;
> > > +	virtvdev->notify_offset = offset;
> > > +	return 0;
> > > +}
> > > +
> > > +static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
> > > +{
> > > +	struct virtiovf_pci_core_device *virtvdev = container_of(
> > > +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> > > +	struct pci_dev *pdev;
> > > +	int ret;
> > > +
> > > +	ret = vfio_pci_core_init_dev(core_vdev);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	pdev = virtvdev->core_device.pdev;
> > > +	virtvdev->vf_id = pci_iov_vf_id(pdev);
> > > +	if (virtvdev->vf_id < 0)
> > > +		return -EINVAL;
> > vf_id is never used.
> 
> It's used as part of the virtio commands, see the previous preparation
> patch.
> 
> > 
> > > +
> > > +	ret = virtiovf_read_notify_info(virtvdev);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	virtvdev->bar0_virtual_buf_size = VIRTIO_LEGACY_IO_BAR_HEADER_LEN +
> > > +		VIRTIO_LEGACY_IO_BAR_MSIX_HEADER_LEN +
> > > +		virtiovf_get_device_config_size(pdev->device);
> > > +	virtvdev->bar0_virtual_buf = kzalloc(virtvdev->bar0_virtual_buf_size,
> > > +					     GFP_KERNEL);
> > > +	if (!virtvdev->bar0_virtual_buf)
> > > +		return -ENOMEM;
> > > +	mutex_init(&virtvdev->bar_mutex);
> > > +	return 0;
> > > +}
> > > +
> > > +static void virtiovf_pci_core_release_dev(struct vfio_device *core_vdev)
> > > +{
> > > +	struct virtiovf_pci_core_device *virtvdev = container_of(
> > > +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> > > +
> > > +	kfree(virtvdev->bar0_virtual_buf);
> > > +	vfio_pci_core_release_dev(core_vdev);
> > > +}
> > > +
> > > +static const struct vfio_device_ops virtiovf_acc_vfio_pci_tran_ops = {
> > > +	.name = "virtio-transitional-vfio-pci",
> > > +	.init = virtiovf_pci_init_device,
> > > +	.release = virtiovf_pci_core_release_dev,
> > > +	.open_device = virtiovf_pci_open_device,
> > > +	.close_device = virtiovf_pci_close_device,
> > > +	.ioctl = virtiovf_vfio_pci_core_ioctl,
> > > +	.read = virtiovf_pci_core_read,
> > > +	.write = virtiovf_pci_core_write,
> > > +	.mmap = vfio_pci_core_mmap,
> > > +	.request = vfio_pci_core_request,
> > > +	.match = vfio_pci_core_match,
> > > +	.bind_iommufd = vfio_iommufd_physical_bind,
> > > +	.unbind_iommufd = vfio_iommufd_physical_unbind,
> > > +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
> > > +};
> > > +
> > > +static const struct vfio_device_ops virtiovf_acc_vfio_pci_ops = {
> > > +	.name = "virtio-acc-vfio-pci",
> > > +	.init = vfio_pci_core_init_dev,
> > > +	.release = vfio_pci_core_release_dev,
> > > +	.open_device = virtiovf_pci_open_device,
> > > +	.close_device = virtiovf_pci_close_device,
> > > +	.ioctl = vfio_pci_core_ioctl,
> > > +	.device_feature = vfio_pci_core_ioctl_feature,
> > > +	.read = vfio_pci_core_read,
> > > +	.write = vfio_pci_core_write,
> > > +	.mmap = vfio_pci_core_mmap,
> > > +	.request = vfio_pci_core_request,
> > > +	.match = vfio_pci_core_match,
> > > +	.bind_iommufd = vfio_iommufd_physical_bind,
> > > +	.unbind_iommufd = vfio_iommufd_physical_unbind,
> > > +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
> > > +};
> > Why are we claiming devices that should just use vfio-pci instead?
> 
> 
> Upon probe we may chose to set those default vfio-pci ops in case the device
> is not legacy capable.
> This will eliminate any usage of the new driver functionality when it's not
> applicable.
> 
> > 
> > > +
> > > +static bool virtiovf_bar0_exists(struct pci_dev *pdev)
> > > +{
> > > +	struct resource *res = pdev->resource;
> > > +
> > > +	return res->flags ? true : false;
> > > +}
> > > +
> > > +#define VIRTIOVF_USE_ADMIN_CMD_BITMAP \
> > > +	(BIT_ULL(VIRTIO_ADMIN_CMD_LIST_QUERY) | \
> > > +	 BIT_ULL(VIRTIO_ADMIN_CMD_LIST_USE) | \
> > > +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE) | \
> > > +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ) | \
> > > +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE) | \
> > > +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
> > > +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
> > > +
> > > +static bool virtiovf_support_legacy_access(struct pci_dev *pdev)
> > > +{
> > > +	int buf_size = DIV_ROUND_UP(VIRTIO_ADMIN_MAX_CMD_OPCODE, 64) * 8;
> > > +	u8 *buf;
> > > +	int ret;
> > > +
> > > +	/* Only virtio-net is supported/tested so far */
> > > +	if (pdev->device != 0x1041)
> > > +		return false;
> > Seems like the ID table should handle this, why are we preemptively
> > claiming all virtio devices... or actually all 0x1af4 devices, which
> > might not even be virtio, ex. the non-virtio ivshmem devices is 0x1110.
> 
> Makes sense, will change in the ID table from PCI_ANY_ID to 0x1041 and
> cleanup that code.
> 
> > > +
> > > +	buf = kzalloc(buf_size, GFP_KERNEL);
> > > +	if (!buf)
> > > +		return false;
> > > +
> > > +	ret = virtiovf_cmd_list_query(pdev, buf, buf_size);
> > > +	if (ret)
> > > +		goto end;
> > > +
> > > +	if ((le64_to_cpup((__le64 *)buf) & VIRTIOVF_USE_ADMIN_CMD_BITMAP) !=
> > > +		VIRTIOVF_USE_ADMIN_CMD_BITMAP) {
> > > +		ret = -EOPNOTSUPP;
> > > +		goto end;
> > > +	}
> > > +
> > > +	/* confirm the used commands */
> > > +	memset(buf, 0, buf_size);
> > > +	*(__le64 *)buf = cpu_to_le64(VIRTIOVF_USE_ADMIN_CMD_BITMAP);
> > > +	ret = virtiovf_cmd_list_use(pdev, buf, buf_size);
> > > +
> > > +end:
> > > +	kfree(buf);
> > > +	return ret ? false : true;
> > > +}
> > > +
> > > +static int virtiovf_pci_probe(struct pci_dev *pdev,
> > > +			      const struct pci_device_id *id)
> > > +{
> > > +	const struct vfio_device_ops *ops = &virtiovf_acc_vfio_pci_ops;
> > > +	struct virtiovf_pci_core_device *virtvdev;
> > > +	int ret;
> > > +
> > > +	if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
> > > +	    !virtiovf_bar0_exists(pdev) && pdev->msix_cap)
> > > +		ops = &virtiovf_acc_vfio_pci_tran_ops;
> > > +
> > > +	virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
> > > +				     &pdev->dev, ops);
> > > +	if (IS_ERR(virtvdev))
> > > +		return PTR_ERR(virtvdev);
> > > +
> > > +	dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
> > > +	ret = vfio_pci_core_register_device(&virtvdev->core_device);
> > > +	if (ret)
> > > +		goto out;
> > > +	return 0;
> > > +out:
> > > +	vfio_put_device(&virtvdev->core_device.vdev);
> > > +	return ret;
> > > +}
> > > +
> > > +static void virtiovf_pci_remove(struct pci_dev *pdev)
> > > +{
> > > +	struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
> > > +
> > > +	vfio_pci_core_unregister_device(&virtvdev->core_device);
> > > +	vfio_put_device(&virtvdev->core_device.vdev);
> > > +}
> > > +
> > > +static const struct pci_device_id virtiovf_pci_table[] = {
> > > +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, PCI_ANY_ID) },
> > libvirt will blindly use this driver for all devices matching this as
> > we've discussed how it should make use of modules.alias.  I don't think
> > this driver should be squatting on devices where it doesn't add value
> > and it's not clear whether this is adding or subtracting value in all
> > cases for the one NIC that it modifies.
> 
> 
> When the device is not legacy capable, we chose the vfio-pci default ops as
> pointed above, otherwise we may chose the new functionality to enable it in
> the guest.
> 
> >    How should libvirt choose when
> > and where to use this driver?  What regressions are we going to see
> > with VMs that previously saw "modern" virtio-net devices and now see a
> > legacy compatible device?  Thanks,
> We don't expect a regression here, a modern driver in the guest will
> continue using its direct access flow.
> 
> Do you see a real concern why to not enable it by default and come with some
> pre-configuration before the probe phase to activate it ?
> If so, any specific suggestion how to manage that ?
> 
> Thanks,
> Yishai

I would not claim that it can't happen.
For example, a transitional device
must not in theory be safely passed through to guest userspace, because
guest then might try to use it through the legacy BAR
without acknowledging ACCESS_PLATFORM.
Do any guests check this and fail? Hard to say.

> > Alex
> > 
> > > +	{}
> > > +};
> > > +
> > > +MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
> > > +
> > > +static struct pci_driver virtiovf_pci_driver = {
> > > +	.name = KBUILD_MODNAME,
> > > +	.id_table = virtiovf_pci_table,
> > > +	.probe = virtiovf_pci_probe,
> > > +	.remove = virtiovf_pci_remove,
> > > +	.err_handler = &vfio_pci_core_err_handlers,
> > > +	.driver_managed_dma = true,
> > > +};
> > > +
> > > +module_pci_driver(virtiovf_pci_driver);
> > > +
> > > +MODULE_LICENSE("GPL");
> > > +MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
> > > +MODULE_DESCRIPTION(
> > > +	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO device family");
> 


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 01/11] virtio-pci: Use virtio pci device layer vq info instead of generic one
  2023-09-21 13:46     ` Michael S. Tsirkin
@ 2023-09-26 19:13       ` Feng Liu via Virtualization
  -1 siblings, 0 replies; 321+ messages in thread
From: Feng Liu @ 2023-09-26 19:13 UTC (permalink / raw)
  To: Michael S. Tsirkin, Yishai Hadas
  Cc: alex.williamson, jasowang, jgg, kvm, virtualization, parav, jiri,
	kevin.tian, joao.m.martins, leonro, maorg



On 2023-09-21 a.m.9:46, Michael S. Tsirkin wrote:
> External email: Use caution opening links or attachments
> 
> 
> On Thu, Sep 21, 2023 at 03:40:30PM +0300, Yishai Hadas wrote:
>> From: Feng Liu <feliu@nvidia.com>
>>
>> Currently VQ deletion callback vp_del_vqs() processes generic
>> virtio_device level VQ list instead of VQ information available at PCI
>> layer.
>>
>> To adhere to the layering, use the pci device level VQ information
>> stored in the virtqueues or vqs.
>>
>> This also prepares the code to handle PCI layer admin vq life cycle to
>> be managed within the pci layer and thereby avoid undesired deletion of
>> admin vq by upper layer drivers (net, console, vfio), in the del_vqs()
>> callback.
> 
>> Signed-off-by: Feng Liu <feliu@nvidia.com>
>> Reviewed-by: Parav Pandit <parav@nvidia.com>
>> Reviewed-by: Jiri Pirko <jiri@nvidia.com>
>> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>> ---
>>   drivers/virtio/virtio_pci_common.c | 12 +++++++++---
>>   drivers/virtio/virtio_pci_common.h |  1 +
>>   2 files changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
>> index c2524a7207cf..7a3e6edc4dd6 100644
>> --- a/drivers/virtio/virtio_pci_common.c
>> +++ b/drivers/virtio/virtio_pci_common.c
>> @@ -232,12 +232,16 @@ static void vp_del_vq(struct virtqueue *vq)
>>   void vp_del_vqs(struct virtio_device *vdev)
>>   {
>>        struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>> -     struct virtqueue *vq, *n;
>> +     struct virtqueue *vq;
>>        int i;
>>
>> -     list_for_each_entry_safe(vq, n, &vdev->vqs, list) {
>> +     for (i = 0; i < vp_dev->nvqs; i++) {
>> +             if (!vp_dev->vqs[i])
>> +                     continue;
>> +
>> +             vq = vp_dev->vqs[i]->vq;
>>                if (vp_dev->per_vq_vectors) {
>> -                     int v = vp_dev->vqs[vq->index]->msix_vector;
>> +                     int v = vp_dev->vqs[i]->msix_vector;
>>
>>                        if (v != VIRTIO_MSI_NO_VECTOR) {
>>                                int irq = pci_irq_vector(vp_dev->pci_dev, v);
>> @@ -294,6 +298,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned int nvqs,
>>        vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
>>        if (!vp_dev->vqs)
>>                return -ENOMEM;
>> +     vp_dev->nvqs = nvqs;
>>
>>        if (per_vq_vectors) {
>>                /* Best option: one for change interrupt, one per vq. */
>> @@ -365,6 +370,7 @@ static int vp_find_vqs_intx(struct virtio_device *vdev, unsigned int nvqs,
>>        vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
>>        if (!vp_dev->vqs)
>>                return -ENOMEM;
>> +     vp_dev->nvqs = nvqs;
>>
>>        err = request_irq(vp_dev->pci_dev->irq, vp_interrupt, IRQF_SHARED,
>>                        dev_name(&vdev->dev), vp_dev);
>> diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
>> index 4b773bd7c58c..602021967aaa 100644
>> --- a/drivers/virtio/virtio_pci_common.h
>> +++ b/drivers/virtio/virtio_pci_common.h
>> @@ -60,6 +60,7 @@ struct virtio_pci_device {
>>
>>        /* array of all queues for house-keeping */
>>        struct virtio_pci_vq_info **vqs;
>> +     u32 nvqs;
> 
> I don't much like it that we are adding more duplicated info here.
> In fact, we tried removing the vqs array in
> 5c34d002dcc7a6dd665a19d098b4f4cd5501ba1a - there was some bug in that
> patch and the author didn't have the time to debug
> so I reverted but I don't really think we need to add to that.
> 

Hi Michael

As explained in commit message, this patch is mainly to prepare for the 
subsequent admin vq patches.

The admin vq is also established using the common mechanism of vring, 
and is added to vdev->vqs in __vring_new_virtqueue(). So vdev->vqs 
contains all virtqueues, including rxq, txq, ctrlvq and admin vq.

admin vq should be managed by the virito_pci layer and should not be 
created or deleted by upper driver (net, blk);
When the upper driver was unloaded, it will call del_vqs() interface, 
which wll call vp_del_vqs(), and vp_del_vqs() should not delete the 
admin vq, but only delete the virtqueues created by the upper driver 
such as rxq, txq, and ctrlq.


vp_dev->vqs[] array only contains virtqueues created by upper driver 
such as rxq, txq, ctrlq. Traversing vp_dev->vqs array can only delete 
the upper virtqueues, without the admin vq. Use the vdev->vqs linked 
list cannot meet the needs.


Can such an explanation be explained clearly? Or do you have any other 
alternative methods?

>>
>>        /* MSI-X support */
>>        int msix_enabled;
>> --
>> 2.27.0
> 

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 01/11] virtio-pci: Use virtio pci device layer vq info instead of generic one
@ 2023-09-26 19:13       ` Feng Liu via Virtualization
  0 siblings, 0 replies; 321+ messages in thread
From: Feng Liu via Virtualization @ 2023-09-26 19:13 UTC (permalink / raw)
  To: Michael S. Tsirkin, Yishai Hadas
  Cc: kvm, maorg, virtualization, jgg, jiri, leonro



On 2023-09-21 a.m.9:46, Michael S. Tsirkin wrote:
> External email: Use caution opening links or attachments
> 
> 
> On Thu, Sep 21, 2023 at 03:40:30PM +0300, Yishai Hadas wrote:
>> From: Feng Liu <feliu@nvidia.com>
>>
>> Currently VQ deletion callback vp_del_vqs() processes generic
>> virtio_device level VQ list instead of VQ information available at PCI
>> layer.
>>
>> To adhere to the layering, use the pci device level VQ information
>> stored in the virtqueues or vqs.
>>
>> This also prepares the code to handle PCI layer admin vq life cycle to
>> be managed within the pci layer and thereby avoid undesired deletion of
>> admin vq by upper layer drivers (net, console, vfio), in the del_vqs()
>> callback.
> 
>> Signed-off-by: Feng Liu <feliu@nvidia.com>
>> Reviewed-by: Parav Pandit <parav@nvidia.com>
>> Reviewed-by: Jiri Pirko <jiri@nvidia.com>
>> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>> ---
>>   drivers/virtio/virtio_pci_common.c | 12 +++++++++---
>>   drivers/virtio/virtio_pci_common.h |  1 +
>>   2 files changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
>> index c2524a7207cf..7a3e6edc4dd6 100644
>> --- a/drivers/virtio/virtio_pci_common.c
>> +++ b/drivers/virtio/virtio_pci_common.c
>> @@ -232,12 +232,16 @@ static void vp_del_vq(struct virtqueue *vq)
>>   void vp_del_vqs(struct virtio_device *vdev)
>>   {
>>        struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>> -     struct virtqueue *vq, *n;
>> +     struct virtqueue *vq;
>>        int i;
>>
>> -     list_for_each_entry_safe(vq, n, &vdev->vqs, list) {
>> +     for (i = 0; i < vp_dev->nvqs; i++) {
>> +             if (!vp_dev->vqs[i])
>> +                     continue;
>> +
>> +             vq = vp_dev->vqs[i]->vq;
>>                if (vp_dev->per_vq_vectors) {
>> -                     int v = vp_dev->vqs[vq->index]->msix_vector;
>> +                     int v = vp_dev->vqs[i]->msix_vector;
>>
>>                        if (v != VIRTIO_MSI_NO_VECTOR) {
>>                                int irq = pci_irq_vector(vp_dev->pci_dev, v);
>> @@ -294,6 +298,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned int nvqs,
>>        vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
>>        if (!vp_dev->vqs)
>>                return -ENOMEM;
>> +     vp_dev->nvqs = nvqs;
>>
>>        if (per_vq_vectors) {
>>                /* Best option: one for change interrupt, one per vq. */
>> @@ -365,6 +370,7 @@ static int vp_find_vqs_intx(struct virtio_device *vdev, unsigned int nvqs,
>>        vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
>>        if (!vp_dev->vqs)
>>                return -ENOMEM;
>> +     vp_dev->nvqs = nvqs;
>>
>>        err = request_irq(vp_dev->pci_dev->irq, vp_interrupt, IRQF_SHARED,
>>                        dev_name(&vdev->dev), vp_dev);
>> diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
>> index 4b773bd7c58c..602021967aaa 100644
>> --- a/drivers/virtio/virtio_pci_common.h
>> +++ b/drivers/virtio/virtio_pci_common.h
>> @@ -60,6 +60,7 @@ struct virtio_pci_device {
>>
>>        /* array of all queues for house-keeping */
>>        struct virtio_pci_vq_info **vqs;
>> +     u32 nvqs;
> 
> I don't much like it that we are adding more duplicated info here.
> In fact, we tried removing the vqs array in
> 5c34d002dcc7a6dd665a19d098b4f4cd5501ba1a - there was some bug in that
> patch and the author didn't have the time to debug
> so I reverted but I don't really think we need to add to that.
> 

Hi Michael

As explained in commit message, this patch is mainly to prepare for the 
subsequent admin vq patches.

The admin vq is also established using the common mechanism of vring, 
and is added to vdev->vqs in __vring_new_virtqueue(). So vdev->vqs 
contains all virtqueues, including rxq, txq, ctrlvq and admin vq.

admin vq should be managed by the virito_pci layer and should not be 
created or deleted by upper driver (net, blk);
When the upper driver was unloaded, it will call del_vqs() interface, 
which wll call vp_del_vqs(), and vp_del_vqs() should not delete the 
admin vq, but only delete the virtqueues created by the upper driver 
such as rxq, txq, and ctrlq.


vp_dev->vqs[] array only contains virtqueues created by upper driver 
such as rxq, txq, ctrlq. Traversing vp_dev->vqs array can only delete 
the upper virtqueues, without the admin vq. Use the vdev->vqs linked 
list cannot meet the needs.


Can such an explanation be explained clearly? Or do you have any other 
alternative methods?

>>
>>        /* MSI-X support */
>>        int msix_enabled;
>> --
>> 2.27.0
> 
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 03/11] virtio-pci: Introduce admin virtqueue
  2023-09-21 13:57     ` Michael S. Tsirkin
@ 2023-09-26 19:23       ` Feng Liu via Virtualization
  -1 siblings, 0 replies; 321+ messages in thread
From: Feng Liu @ 2023-09-26 19:23 UTC (permalink / raw)
  To: Michael S. Tsirkin, Yishai Hadas
  Cc: alex.williamson, jasowang, jgg, kvm, virtualization, parav, jiri,
	kevin.tian, joao.m.martins, leonro, maorg



On 2023-09-21 a.m.9:57, Michael S. Tsirkin wrote:
> External email: Use caution opening links or attachments
> 
> 
> On Thu, Sep 21, 2023 at 03:40:32PM +0300, Yishai Hadas wrote:
>> From: Feng Liu <feliu@nvidia.com>
>>
>> Introduce support for the admin virtqueue. By negotiating
>> VIRTIO_F_ADMIN_VQ feature, driver detects capability and creates one
>> administration virtqueue. Administration virtqueue implementation in
>> virtio pci generic layer, enables multiple types of upper layer
>> drivers such as vfio, net, blk to utilize it.
>>
>> Signed-off-by: Feng Liu <feliu@nvidia.com>
>> Reviewed-by: Parav Pandit <parav@nvidia.com>
>> Reviewed-by: Jiri Pirko <jiri@nvidia.com>
>> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>> ---
>>   drivers/virtio/Makefile                |  2 +-
>>   drivers/virtio/virtio.c                | 37 +++++++++++++--
>>   drivers/virtio/virtio_pci_common.h     | 15 +++++-
>>   drivers/virtio/virtio_pci_modern.c     | 10 +++-
>>   drivers/virtio/virtio_pci_modern_avq.c | 65 ++++++++++++++++++++++++++
> 
> if you have a .c file without a .h file you know there's something
> fishy. Just add this inside drivers/virtio/virtio_pci_modern.c ?
> 
Will do.

>>   include/linux/virtio_config.h          |  4 ++
>>   include/linux/virtio_pci_modern.h      |  3 ++
>>   7 files changed, 129 insertions(+), 7 deletions(-)
>>   create mode 100644 drivers/virtio/virtio_pci_modern_avq.c
>>
>> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
>> index 8e98d24917cc..dcc535b5b4d9 100644
>> --- a/drivers/virtio/Makefile
>> +++ b/drivers/virtio/Makefile
>> @@ -5,7 +5,7 @@ obj-$(CONFIG_VIRTIO_PCI_LIB) += virtio_pci_modern_dev.o
>>   obj-$(CONFIG_VIRTIO_PCI_LIB_LEGACY) += virtio_pci_legacy_dev.o
>>   obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
>>   obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
>> -virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
>> +virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o virtio_pci_modern_avq.o
>>   virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
>>   obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
>>   obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
>> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
>> index 3893dc29eb26..f4080692b351 100644
>> --- a/drivers/virtio/virtio.c
>> +++ b/drivers/virtio/virtio.c
>> @@ -302,9 +302,15 @@ static int virtio_dev_probe(struct device *_d)
>>        if (err)
>>                goto err;
>>
>> +     if (dev->config->create_avq) {
>> +             err = dev->config->create_avq(dev);
>> +             if (err)
>> +                     goto err;
>> +     }
>> +
>>        err = drv->probe(dev);
>>        if (err)
>> -             goto err;
>> +             goto err_probe;
>>
>>        /* If probe didn't do it, mark device DRIVER_OK ourselves. */
>>        if (!(dev->config->get_status(dev) & VIRTIO_CONFIG_S_DRIVER_OK))
>> @@ -316,6 +322,10 @@ static int virtio_dev_probe(struct device *_d)
>>        virtio_config_enable(dev);
>>
>>        return 0;
>> +
>> +err_probe:
>> +     if (dev->config->destroy_avq)
>> +             dev->config->destroy_avq(dev);
>>   err:
>>        virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
>>        return err;
>> @@ -331,6 +341,9 @@ static void virtio_dev_remove(struct device *_d)
>>
>>        drv->remove(dev);
>>
>> +     if (dev->config->destroy_avq)
>> +             dev->config->destroy_avq(dev);
>> +
>>        /* Driver should have reset device. */
>>        WARN_ON_ONCE(dev->config->get_status(dev));
>>
>> @@ -489,13 +502,20 @@ EXPORT_SYMBOL_GPL(unregister_virtio_device);
>>   int virtio_device_freeze(struct virtio_device *dev)
>>   {
>>        struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
>> +     int ret;
>>
>>        virtio_config_disable(dev);
>>
>>        dev->failed = dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED;
>>
>> -     if (drv && drv->freeze)
>> -             return drv->freeze(dev);
>> +     if (drv && drv->freeze) {
>> +             ret = drv->freeze(dev);
>> +             if (ret)
>> +                     return ret;
>> +     }
>> +
>> +     if (dev->config->destroy_avq)
>> +             dev->config->destroy_avq(dev);
>>
>>        return 0;
>>   }
>> @@ -532,10 +552,16 @@ int virtio_device_restore(struct virtio_device *dev)
>>        if (ret)
>>                goto err;
>>
>> +     if (dev->config->create_avq) {
>> +             ret = dev->config->create_avq(dev);
>> +             if (ret)
>> +                     goto err;
>> +     }
>> +
>>        if (drv->restore) {
>>                ret = drv->restore(dev);
>>                if (ret)
>> -                     goto err;
>> +                     goto err_restore;
>>        }
>>
>>        /* If restore didn't do it, mark device DRIVER_OK ourselves. */
>> @@ -546,6 +572,9 @@ int virtio_device_restore(struct virtio_device *dev)
>>
>>        return 0;
>>
>> +err_restore:
>> +     if (dev->config->destroy_avq)
>> +             dev->config->destroy_avq(dev);
>>   err:
>>        virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
>>        return ret;
>> diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
>> index 602021967aaa..9bffa95274b6 100644
>> --- a/drivers/virtio/virtio_pci_common.h
>> +++ b/drivers/virtio/virtio_pci_common.h
>> @@ -41,6 +41,14 @@ struct virtio_pci_vq_info {
>>        unsigned int msix_vector;
>>   };
>>
>> +struct virtio_avq {
> 
> admin_vq would be better. and this is pci specific yes? so virtio_pci_
> 

Will do.

>> +     /* Virtqueue info associated with this admin queue. */
>> +     struct virtio_pci_vq_info info;
>> +     /* Name of the admin queue: avq.$index. */
>> +     char name[10];
>> +     u16 vq_index;
>> +};
>> +
>>   /* Our device structure */
>>   struct virtio_pci_device {
>>        struct virtio_device vdev;
>> @@ -58,10 +66,13 @@ struct virtio_pci_device {
>>        spinlock_t lock;
>>        struct list_head virtqueues;
>>
>> -     /* array of all queues for house-keeping */
>> +     /* Array of all virtqueues reported in the
>> +      * PCI common config num_queues field
>> +      */
>>        struct virtio_pci_vq_info **vqs;
>>        u32 nvqs;
>>
>> +     struct virtio_avq *admin;
> 
> and this could be thinkably admin_vq.
> 
Will do.

>>        /* MSI-X support */
>>        int msix_enabled;
>>        int intx_enabled;
>> @@ -115,6 +126,8 @@ int vp_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
>>                const char * const names[], const bool *ctx,
>>                struct irq_affinity *desc);
>>   const char *vp_bus_name(struct virtio_device *vdev);
>> +void vp_destroy_avq(struct virtio_device *vdev);
>> +int vp_create_avq(struct virtio_device *vdev);
>>
>>   /* Setup the affinity for a virtqueue:
>>    * - force the affinity for per vq vector
>> diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
>> index d6bb68ba84e5..a72c87687196 100644
>> --- a/drivers/virtio/virtio_pci_modern.c
>> +++ b/drivers/virtio/virtio_pci_modern.c
>> @@ -37,6 +37,9 @@ static void vp_transport_features(struct virtio_device *vdev, u64 features)
>>
>>        if (features & BIT_ULL(VIRTIO_F_RING_RESET))
>>                __virtio_set_bit(vdev, VIRTIO_F_RING_RESET);
>> +
>> +     if (features & BIT_ULL(VIRTIO_F_ADMIN_VQ))
>> +             __virtio_set_bit(vdev, VIRTIO_F_ADMIN_VQ);
>>   }
>>
>>   /* virtio config->finalize_features() implementation */
>> @@ -317,7 +320,8 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
>>        else
>>                notify = vp_notify;
>>
>> -     if (index >= vp_modern_get_num_queues(mdev))
>> +     if (!((index < vp_modern_get_num_queues(mdev) ||
>> +           (vp_dev->admin && vp_dev->admin->vq_index == index))))
>>                return ERR_PTR(-EINVAL);
>>
>>        /* Check if queue is either not available or already active. */
>> @@ -509,6 +513,8 @@ static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
>>        .get_shm_region  = vp_get_shm_region,
>>        .disable_vq_and_reset = vp_modern_disable_vq_and_reset,
>>        .enable_vq_after_reset = vp_modern_enable_vq_after_reset,
>> +     .create_avq = vp_create_avq,
>> +     .destroy_avq = vp_destroy_avq,
>>   };
>>
>>   static const struct virtio_config_ops virtio_pci_config_ops = {
>> @@ -529,6 +535,8 @@ static const struct virtio_config_ops virtio_pci_config_ops = {
>>        .get_shm_region  = vp_get_shm_region,
>>        .disable_vq_and_reset = vp_modern_disable_vq_and_reset,
>>        .enable_vq_after_reset = vp_modern_enable_vq_after_reset,
>> +     .create_avq = vp_create_avq,
>> +     .destroy_avq = vp_destroy_avq,
>>   };
>>
>>   /* the PCI probing function */
>> diff --git a/drivers/virtio/virtio_pci_modern_avq.c b/drivers/virtio/virtio_pci_modern_avq.c
>> new file mode 100644
>> index 000000000000..114579ad788f
>> --- /dev/null
>> +++ b/drivers/virtio/virtio_pci_modern_avq.c
>> @@ -0,0 +1,65 @@
>> +// SPDX-License-Identifier: GPL-2.0-or-later
>> +
>> +#include <linux/virtio.h>
>> +#include "virtio_pci_common.h"
>> +
>> +static u16 vp_modern_avq_num(struct virtio_pci_modern_device *mdev)
>> +{
>> +     struct virtio_pci_modern_common_cfg __iomem *cfg;
>> +
>> +     cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common;
>> +     return vp_ioread16(&cfg->admin_queue_num);
>> +}
>> +
>> +static u16 vp_modern_avq_index(struct virtio_pci_modern_device *mdev)
>> +{
>> +     struct virtio_pci_modern_common_cfg __iomem *cfg;
>> +
>> +     cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common;
>> +     return vp_ioread16(&cfg->admin_queue_index);
>> +}
>> +
>> +int vp_create_avq(struct virtio_device *vdev)
>> +{
>> +     struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>> +     struct virtio_avq *avq;
>> +     struct virtqueue *vq;
>> +     u16 admin_q_num;
>> +
>> +     if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ))
>> +             return 0;
>> +
>> +     admin_q_num = vp_modern_avq_num(&vp_dev->mdev);
>> +     if (!admin_q_num)
>> +             return -EINVAL;
>> +
>> +     vp_dev->admin = kzalloc(sizeof(*vp_dev->admin), GFP_KERNEL);
>> +     if (!vp_dev->admin)
>> +             return -ENOMEM;
>> +
>> +     avq = vp_dev->admin;
>> +     avq->vq_index = vp_modern_avq_index(&vp_dev->mdev);
>> +     sprintf(avq->name, "avq.%u", avq->vq_index);
>> +     vq = vp_dev->setup_vq(vp_dev, &vp_dev->admin->info, avq->vq_index, NULL,
>> +                           avq->name, NULL, VIRTIO_MSI_NO_VECTOR);
>> +     if (IS_ERR(vq)) {
>> +             dev_err(&vdev->dev, "failed to setup admin virtqueue");
>> +             kfree(vp_dev->admin);
>> +             return PTR_ERR(vq);
>> +     }
>> +
>> +     vp_dev->admin->info.vq = vq;
>> +     vp_modern_set_queue_enable(&vp_dev->mdev, avq->info.vq->index, true);
>> +     return 0;
>> +}
>> +
>> +void vp_destroy_avq(struct virtio_device *vdev)
>> +{
>> +     struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>> +
>> +     if (!vp_dev->admin)
>> +             return;
>> +
>> +     vp_dev->del_vq(&vp_dev->admin->info);
>> +     kfree(vp_dev->admin);
>> +}
>> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
>> index 2b3438de2c4d..028c51ea90ee 100644
>> --- a/include/linux/virtio_config.h
>> +++ b/include/linux/virtio_config.h
>> @@ -93,6 +93,8 @@ typedef void vq_callback_t(struct virtqueue *);
>>    *   Returns 0 on success or error status
>>    *   If disable_vq_and_reset is set, then enable_vq_after_reset must also be
>>    *   set.
>> + * @create_avq: initialize admin virtqueue resource.
>> + * @destroy_avq: destroy admin virtqueue resource.
>>    */
>>   struct virtio_config_ops {
>>        void (*get)(struct virtio_device *vdev, unsigned offset,
>> @@ -120,6 +122,8 @@ struct virtio_config_ops {
>>                               struct virtio_shm_region *region, u8 id);
>>        int (*disable_vq_and_reset)(struct virtqueue *vq);
>>        int (*enable_vq_after_reset)(struct virtqueue *vq);
>> +     int (*create_avq)(struct virtio_device *vdev);
>> +     void (*destroy_avq)(struct virtio_device *vdev);
>>   };
>>
>>   /* If driver didn't advertise the feature, it will never appear. */
>> diff --git a/include/linux/virtio_pci_modern.h b/include/linux/virtio_pci_modern.h
>> index 067ac1d789bc..f6cb13d858fd 100644
>> --- a/include/linux/virtio_pci_modern.h
>> +++ b/include/linux/virtio_pci_modern.h
>> @@ -10,6 +10,9 @@ struct virtio_pci_modern_common_cfg {
>>
>>        __le16 queue_notify_data;       /* read-write */
>>        __le16 queue_reset;             /* read-write */
>> +
>> +     __le16 admin_queue_index;       /* read-only */
>> +     __le16 admin_queue_num;         /* read-only */
>>   };
> 
> 
> ouch.
> actually there's a problem
> 
>          mdev->common = vp_modern_map_capability(mdev, common,
>                                        sizeof(struct virtio_pci_common_cfg), 4,
>                                        0, sizeof(struct virtio_pci_common_cfg),
>                                        NULL, NULL);
> 
> extending this structure means some calls will start failing on
> existing devices.
> 
> even more of an ouch, when we added queue_notify_data and queue_reset we
> also possibly broke some devices. well hopefully not since no one
> reported failures but we really need to fix that.
> 
> 
Hi Michael

I didn’t see the fail in vp_modern_map_capability(), and 
vp_modern_map_capability() only read and map pci memory. The length of 
the memory mapping will increase as the struct virtio_pci_common_cfg 
increases. No errors are seen.

And according to the existing code, new pci configuration space members 
can only be added in struct virtio_pci_modern_common_cfg.

Every single entry added here is protected by feature bit, there is no 
bug AFAIK.

Could you help to explain it more detail?  Where and why it will fall if 
we add new member in struct virtio_pci_modern_common_cfg.


>>
>>   struct virtio_pci_modern_device {
>> --
>> 2.27.0
> 

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 03/11] virtio-pci: Introduce admin virtqueue
@ 2023-09-26 19:23       ` Feng Liu via Virtualization
  0 siblings, 0 replies; 321+ messages in thread
From: Feng Liu via Virtualization @ 2023-09-26 19:23 UTC (permalink / raw)
  To: Michael S. Tsirkin, Yishai Hadas
  Cc: kvm, maorg, virtualization, jgg, jiri, leonro



On 2023-09-21 a.m.9:57, Michael S. Tsirkin wrote:
> External email: Use caution opening links or attachments
> 
> 
> On Thu, Sep 21, 2023 at 03:40:32PM +0300, Yishai Hadas wrote:
>> From: Feng Liu <feliu@nvidia.com>
>>
>> Introduce support for the admin virtqueue. By negotiating
>> VIRTIO_F_ADMIN_VQ feature, driver detects capability and creates one
>> administration virtqueue. Administration virtqueue implementation in
>> virtio pci generic layer, enables multiple types of upper layer
>> drivers such as vfio, net, blk to utilize it.
>>
>> Signed-off-by: Feng Liu <feliu@nvidia.com>
>> Reviewed-by: Parav Pandit <parav@nvidia.com>
>> Reviewed-by: Jiri Pirko <jiri@nvidia.com>
>> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>> ---
>>   drivers/virtio/Makefile                |  2 +-
>>   drivers/virtio/virtio.c                | 37 +++++++++++++--
>>   drivers/virtio/virtio_pci_common.h     | 15 +++++-
>>   drivers/virtio/virtio_pci_modern.c     | 10 +++-
>>   drivers/virtio/virtio_pci_modern_avq.c | 65 ++++++++++++++++++++++++++
> 
> if you have a .c file without a .h file you know there's something
> fishy. Just add this inside drivers/virtio/virtio_pci_modern.c ?
> 
Will do.

>>   include/linux/virtio_config.h          |  4 ++
>>   include/linux/virtio_pci_modern.h      |  3 ++
>>   7 files changed, 129 insertions(+), 7 deletions(-)
>>   create mode 100644 drivers/virtio/virtio_pci_modern_avq.c
>>
>> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
>> index 8e98d24917cc..dcc535b5b4d9 100644
>> --- a/drivers/virtio/Makefile
>> +++ b/drivers/virtio/Makefile
>> @@ -5,7 +5,7 @@ obj-$(CONFIG_VIRTIO_PCI_LIB) += virtio_pci_modern_dev.o
>>   obj-$(CONFIG_VIRTIO_PCI_LIB_LEGACY) += virtio_pci_legacy_dev.o
>>   obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
>>   obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
>> -virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
>> +virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o virtio_pci_modern_avq.o
>>   virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
>>   obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
>>   obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
>> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
>> index 3893dc29eb26..f4080692b351 100644
>> --- a/drivers/virtio/virtio.c
>> +++ b/drivers/virtio/virtio.c
>> @@ -302,9 +302,15 @@ static int virtio_dev_probe(struct device *_d)
>>        if (err)
>>                goto err;
>>
>> +     if (dev->config->create_avq) {
>> +             err = dev->config->create_avq(dev);
>> +             if (err)
>> +                     goto err;
>> +     }
>> +
>>        err = drv->probe(dev);
>>        if (err)
>> -             goto err;
>> +             goto err_probe;
>>
>>        /* If probe didn't do it, mark device DRIVER_OK ourselves. */
>>        if (!(dev->config->get_status(dev) & VIRTIO_CONFIG_S_DRIVER_OK))
>> @@ -316,6 +322,10 @@ static int virtio_dev_probe(struct device *_d)
>>        virtio_config_enable(dev);
>>
>>        return 0;
>> +
>> +err_probe:
>> +     if (dev->config->destroy_avq)
>> +             dev->config->destroy_avq(dev);
>>   err:
>>        virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
>>        return err;
>> @@ -331,6 +341,9 @@ static void virtio_dev_remove(struct device *_d)
>>
>>        drv->remove(dev);
>>
>> +     if (dev->config->destroy_avq)
>> +             dev->config->destroy_avq(dev);
>> +
>>        /* Driver should have reset device. */
>>        WARN_ON_ONCE(dev->config->get_status(dev));
>>
>> @@ -489,13 +502,20 @@ EXPORT_SYMBOL_GPL(unregister_virtio_device);
>>   int virtio_device_freeze(struct virtio_device *dev)
>>   {
>>        struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
>> +     int ret;
>>
>>        virtio_config_disable(dev);
>>
>>        dev->failed = dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED;
>>
>> -     if (drv && drv->freeze)
>> -             return drv->freeze(dev);
>> +     if (drv && drv->freeze) {
>> +             ret = drv->freeze(dev);
>> +             if (ret)
>> +                     return ret;
>> +     }
>> +
>> +     if (dev->config->destroy_avq)
>> +             dev->config->destroy_avq(dev);
>>
>>        return 0;
>>   }
>> @@ -532,10 +552,16 @@ int virtio_device_restore(struct virtio_device *dev)
>>        if (ret)
>>                goto err;
>>
>> +     if (dev->config->create_avq) {
>> +             ret = dev->config->create_avq(dev);
>> +             if (ret)
>> +                     goto err;
>> +     }
>> +
>>        if (drv->restore) {
>>                ret = drv->restore(dev);
>>                if (ret)
>> -                     goto err;
>> +                     goto err_restore;
>>        }
>>
>>        /* If restore didn't do it, mark device DRIVER_OK ourselves. */
>> @@ -546,6 +572,9 @@ int virtio_device_restore(struct virtio_device *dev)
>>
>>        return 0;
>>
>> +err_restore:
>> +     if (dev->config->destroy_avq)
>> +             dev->config->destroy_avq(dev);
>>   err:
>>        virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
>>        return ret;
>> diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
>> index 602021967aaa..9bffa95274b6 100644
>> --- a/drivers/virtio/virtio_pci_common.h
>> +++ b/drivers/virtio/virtio_pci_common.h
>> @@ -41,6 +41,14 @@ struct virtio_pci_vq_info {
>>        unsigned int msix_vector;
>>   };
>>
>> +struct virtio_avq {
> 
> admin_vq would be better. and this is pci specific yes? so virtio_pci_
> 

Will do.

>> +     /* Virtqueue info associated with this admin queue. */
>> +     struct virtio_pci_vq_info info;
>> +     /* Name of the admin queue: avq.$index. */
>> +     char name[10];
>> +     u16 vq_index;
>> +};
>> +
>>   /* Our device structure */
>>   struct virtio_pci_device {
>>        struct virtio_device vdev;
>> @@ -58,10 +66,13 @@ struct virtio_pci_device {
>>        spinlock_t lock;
>>        struct list_head virtqueues;
>>
>> -     /* array of all queues for house-keeping */
>> +     /* Array of all virtqueues reported in the
>> +      * PCI common config num_queues field
>> +      */
>>        struct virtio_pci_vq_info **vqs;
>>        u32 nvqs;
>>
>> +     struct virtio_avq *admin;
> 
> and this could be thinkably admin_vq.
> 
Will do.

>>        /* MSI-X support */
>>        int msix_enabled;
>>        int intx_enabled;
>> @@ -115,6 +126,8 @@ int vp_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
>>                const char * const names[], const bool *ctx,
>>                struct irq_affinity *desc);
>>   const char *vp_bus_name(struct virtio_device *vdev);
>> +void vp_destroy_avq(struct virtio_device *vdev);
>> +int vp_create_avq(struct virtio_device *vdev);
>>
>>   /* Setup the affinity for a virtqueue:
>>    * - force the affinity for per vq vector
>> diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
>> index d6bb68ba84e5..a72c87687196 100644
>> --- a/drivers/virtio/virtio_pci_modern.c
>> +++ b/drivers/virtio/virtio_pci_modern.c
>> @@ -37,6 +37,9 @@ static void vp_transport_features(struct virtio_device *vdev, u64 features)
>>
>>        if (features & BIT_ULL(VIRTIO_F_RING_RESET))
>>                __virtio_set_bit(vdev, VIRTIO_F_RING_RESET);
>> +
>> +     if (features & BIT_ULL(VIRTIO_F_ADMIN_VQ))
>> +             __virtio_set_bit(vdev, VIRTIO_F_ADMIN_VQ);
>>   }
>>
>>   /* virtio config->finalize_features() implementation */
>> @@ -317,7 +320,8 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
>>        else
>>                notify = vp_notify;
>>
>> -     if (index >= vp_modern_get_num_queues(mdev))
>> +     if (!((index < vp_modern_get_num_queues(mdev) ||
>> +           (vp_dev->admin && vp_dev->admin->vq_index == index))))
>>                return ERR_PTR(-EINVAL);
>>
>>        /* Check if queue is either not available or already active. */
>> @@ -509,6 +513,8 @@ static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
>>        .get_shm_region  = vp_get_shm_region,
>>        .disable_vq_and_reset = vp_modern_disable_vq_and_reset,
>>        .enable_vq_after_reset = vp_modern_enable_vq_after_reset,
>> +     .create_avq = vp_create_avq,
>> +     .destroy_avq = vp_destroy_avq,
>>   };
>>
>>   static const struct virtio_config_ops virtio_pci_config_ops = {
>> @@ -529,6 +535,8 @@ static const struct virtio_config_ops virtio_pci_config_ops = {
>>        .get_shm_region  = vp_get_shm_region,
>>        .disable_vq_and_reset = vp_modern_disable_vq_and_reset,
>>        .enable_vq_after_reset = vp_modern_enable_vq_after_reset,
>> +     .create_avq = vp_create_avq,
>> +     .destroy_avq = vp_destroy_avq,
>>   };
>>
>>   /* the PCI probing function */
>> diff --git a/drivers/virtio/virtio_pci_modern_avq.c b/drivers/virtio/virtio_pci_modern_avq.c
>> new file mode 100644
>> index 000000000000..114579ad788f
>> --- /dev/null
>> +++ b/drivers/virtio/virtio_pci_modern_avq.c
>> @@ -0,0 +1,65 @@
>> +// SPDX-License-Identifier: GPL-2.0-or-later
>> +
>> +#include <linux/virtio.h>
>> +#include "virtio_pci_common.h"
>> +
>> +static u16 vp_modern_avq_num(struct virtio_pci_modern_device *mdev)
>> +{
>> +     struct virtio_pci_modern_common_cfg __iomem *cfg;
>> +
>> +     cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common;
>> +     return vp_ioread16(&cfg->admin_queue_num);
>> +}
>> +
>> +static u16 vp_modern_avq_index(struct virtio_pci_modern_device *mdev)
>> +{
>> +     struct virtio_pci_modern_common_cfg __iomem *cfg;
>> +
>> +     cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common;
>> +     return vp_ioread16(&cfg->admin_queue_index);
>> +}
>> +
>> +int vp_create_avq(struct virtio_device *vdev)
>> +{
>> +     struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>> +     struct virtio_avq *avq;
>> +     struct virtqueue *vq;
>> +     u16 admin_q_num;
>> +
>> +     if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ))
>> +             return 0;
>> +
>> +     admin_q_num = vp_modern_avq_num(&vp_dev->mdev);
>> +     if (!admin_q_num)
>> +             return -EINVAL;
>> +
>> +     vp_dev->admin = kzalloc(sizeof(*vp_dev->admin), GFP_KERNEL);
>> +     if (!vp_dev->admin)
>> +             return -ENOMEM;
>> +
>> +     avq = vp_dev->admin;
>> +     avq->vq_index = vp_modern_avq_index(&vp_dev->mdev);
>> +     sprintf(avq->name, "avq.%u", avq->vq_index);
>> +     vq = vp_dev->setup_vq(vp_dev, &vp_dev->admin->info, avq->vq_index, NULL,
>> +                           avq->name, NULL, VIRTIO_MSI_NO_VECTOR);
>> +     if (IS_ERR(vq)) {
>> +             dev_err(&vdev->dev, "failed to setup admin virtqueue");
>> +             kfree(vp_dev->admin);
>> +             return PTR_ERR(vq);
>> +     }
>> +
>> +     vp_dev->admin->info.vq = vq;
>> +     vp_modern_set_queue_enable(&vp_dev->mdev, avq->info.vq->index, true);
>> +     return 0;
>> +}
>> +
>> +void vp_destroy_avq(struct virtio_device *vdev)
>> +{
>> +     struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>> +
>> +     if (!vp_dev->admin)
>> +             return;
>> +
>> +     vp_dev->del_vq(&vp_dev->admin->info);
>> +     kfree(vp_dev->admin);
>> +}
>> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
>> index 2b3438de2c4d..028c51ea90ee 100644
>> --- a/include/linux/virtio_config.h
>> +++ b/include/linux/virtio_config.h
>> @@ -93,6 +93,8 @@ typedef void vq_callback_t(struct virtqueue *);
>>    *   Returns 0 on success or error status
>>    *   If disable_vq_and_reset is set, then enable_vq_after_reset must also be
>>    *   set.
>> + * @create_avq: initialize admin virtqueue resource.
>> + * @destroy_avq: destroy admin virtqueue resource.
>>    */
>>   struct virtio_config_ops {
>>        void (*get)(struct virtio_device *vdev, unsigned offset,
>> @@ -120,6 +122,8 @@ struct virtio_config_ops {
>>                               struct virtio_shm_region *region, u8 id);
>>        int (*disable_vq_and_reset)(struct virtqueue *vq);
>>        int (*enable_vq_after_reset)(struct virtqueue *vq);
>> +     int (*create_avq)(struct virtio_device *vdev);
>> +     void (*destroy_avq)(struct virtio_device *vdev);
>>   };
>>
>>   /* If driver didn't advertise the feature, it will never appear. */
>> diff --git a/include/linux/virtio_pci_modern.h b/include/linux/virtio_pci_modern.h
>> index 067ac1d789bc..f6cb13d858fd 100644
>> --- a/include/linux/virtio_pci_modern.h
>> +++ b/include/linux/virtio_pci_modern.h
>> @@ -10,6 +10,9 @@ struct virtio_pci_modern_common_cfg {
>>
>>        __le16 queue_notify_data;       /* read-write */
>>        __le16 queue_reset;             /* read-write */
>> +
>> +     __le16 admin_queue_index;       /* read-only */
>> +     __le16 admin_queue_num;         /* read-only */
>>   };
> 
> 
> ouch.
> actually there's a problem
> 
>          mdev->common = vp_modern_map_capability(mdev, common,
>                                        sizeof(struct virtio_pci_common_cfg), 4,
>                                        0, sizeof(struct virtio_pci_common_cfg),
>                                        NULL, NULL);
> 
> extending this structure means some calls will start failing on
> existing devices.
> 
> even more of an ouch, when we added queue_notify_data and queue_reset we
> also possibly broke some devices. well hopefully not since no one
> reported failures but we really need to fix that.
> 
> 
Hi Michael

I didn’t see the fail in vp_modern_map_capability(), and 
vp_modern_map_capability() only read and map pci memory. The length of 
the memory mapping will increase as the struct virtio_pci_common_cfg 
increases. No errors are seen.

And according to the existing code, new pci configuration space members 
can only be added in struct virtio_pci_modern_common_cfg.

Every single entry added here is protected by feature bit, there is no 
bug AFAIK.

Could you help to explain it more detail?  Where and why it will fall if 
we add new member in struct virtio_pci_modern_common_cfg.


>>
>>   struct virtio_pci_modern_device {
>> --
>> 2.27.0
> 
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-09-26 11:41         ` Michael S. Tsirkin
  (?)
@ 2023-09-27 13:18         ` Jason Gunthorpe
  2023-09-27 21:30             ` Michael S. Tsirkin
  -1 siblings, 1 reply; 321+ messages in thread
From: Jason Gunthorpe @ 2023-09-27 13:18 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Yishai Hadas, kvm, maorg, virtualization, jiri, leonro

On Tue, Sep 26, 2023 at 07:41:44AM -0400, Michael S. Tsirkin wrote:

> > By the way, this follows what was done already between vfio/mlx5 to
> > mlx5_core modules where mlx5_core exposes generic APIs to execute a command
> > and to get the a PF from a given mlx5 VF.
> 
> This is up to mlx5 maintainers. In particular they only need to worry
> that their patches work with specific hardware which they likely have.
> virtio has to work with multiple vendors - hardware and software -
> and exposing a low level API that I can't test on my laptop
> is not at all my ideal.

mlx5 has a reasonable API from the lower level that allows the vfio
driver to safely issue commands. The API provides all the safety and
locking you have been questioning here.

Then the vfio driver can form the commands directly and in the way it
needs. This avoids spewing code into the core modules that is only
used by vfio - which has been a key design consideration for our
driver layering.

I suggest following the same design here as it has been well proven.
Provide a solid API to operate the admin queue and let VFIO use
it. One of the main purposes of the admin queue is to deliver commands
on behalf of the VF driver, so this is a logical and reasonable place
to put an API.

> > This way, we can enable further commands to be added/extended
> > easily/cleanly.
> 
> Something for vfio maintainer to consider in case it was
> assumed that it's just this one weird thing
> but otherwise it's all generic vfio. It's not going to stop there,
> will it? The duplication of functionality with vdpa will continue :(

VFIO live migration is expected to come as well once OASIS completes
its work.

Parav, are there other things?

Jason

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 01/11] virtio-pci: Use virtio pci device layer vq info instead of generic one
  2023-09-26 19:13       ` Feng Liu via Virtualization
@ 2023-09-27 18:09         ` Feng Liu via Virtualization
  -1 siblings, 0 replies; 321+ messages in thread
From: Feng Liu @ 2023-09-27 18:09 UTC (permalink / raw)
  To: Michael S. Tsirkin, Yishai Hadas
  Cc: kvm, maorg, virtualization, jgg, jiri, leonro



On 2023-09-26 p.m.3:13, Feng Liu via Virtualization wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 2023-09-21 a.m.9:46, Michael S. Tsirkin wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On Thu, Sep 21, 2023 at 03:40:30PM +0300, Yishai Hadas wrote:
>>> From: Feng Liu <feliu@nvidia.com>
>>>

>>> pci_irq_vector(vp_dev->pci_dev, v);
>>> @@ -294,6 +298,7 @@ static int vp_find_vqs_msix(struct virtio_device 
>>> *vdev, unsigned int nvqs,
>>>        vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
>>>        if (!vp_dev->vqs)
>>>                return -ENOMEM;
>>> +     vp_dev->nvqs = nvqs;
>>>
>>>        if (per_vq_vectors) {
>>>                /* Best option: one for change interrupt, one per vq. */
>>> @@ -365,6 +370,7 @@ static int vp_find_vqs_intx(struct virtio_device 
>>> *vdev, unsigned int nvqs,
>>>        vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
>>>        if (!vp_dev->vqs)
>>>                return -ENOMEM;
>>> +     vp_dev->nvqs = nvqs;
>>>
>>>        err = request_irq(vp_dev->pci_dev->irq, vp_interrupt, 
>>> IRQF_SHARED,
>>>                        dev_name(&vdev->dev), vp_dev);
>>> diff --git a/drivers/virtio/virtio_pci_common.h 
>>> b/drivers/virtio/virtio_pci_common.h
>>> index 4b773bd7c58c..602021967aaa 100644
>>> --- a/drivers/virtio/virtio_pci_common.h
>>> +++ b/drivers/virtio/virtio_pci_common.h
>>> @@ -60,6 +60,7 @@ struct virtio_pci_device {
>>>
>>>        /* array of all queues for house-keeping */
>>>        struct virtio_pci_vq_info **vqs;
>>> +     u32 nvqs;
>>
>> I don't much like it that we are adding more duplicated info here.
>> In fact, we tried removing the vqs array in
>> 5c34d002dcc7a6dd665a19d098b4f4cd5501ba1a - there was some bug in that
>> patch and the author didn't have the time to debug
>> so I reverted but I don't really think we need to add to that.
>>
> 
> Hi Michael
> 
> As explained in commit message, this patch is mainly to prepare for the
> subsequent admin vq patches.
> 
> The admin vq is also established using the common mechanism of vring,
> and is added to vdev->vqs in __vring_new_virtqueue(). So vdev->vqs
> contains all virtqueues, including rxq, txq, ctrlvq and admin vq.
> 
> admin vq should be managed by the virito_pci layer and should not be
> created or deleted by upper driver (net, blk);
> When the upper driver was unloaded, it will call del_vqs() interface,
> which wll call vp_del_vqs(), and vp_del_vqs() should not delete the
> admin vq, but only delete the virtqueues created by the upper driver
> such as rxq, txq, and ctrlq.
> 
> 
> vp_dev->vqs[] array only contains virtqueues created by upper driver
> such as rxq, txq, ctrlq. Traversing vp_dev->vqs array can only delete
> the upper virtqueues, without the admin vq. Use the vdev->vqs linked
> list cannot meet the needs.
> 
> 
> Can such an explanation be explained clearly? Or do you have any other
> alternative methods?
> 

Hi, Michael
	Is the above explanations OK to you?

Thanks
Feng

>>>
>>>        /* MSI-X support */
>>>        int msix_enabled;
>>> -- 
>>> 2.27.0
>>
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 01/11] virtio-pci: Use virtio pci device layer vq info instead of generic one
@ 2023-09-27 18:09         ` Feng Liu via Virtualization
  0 siblings, 0 replies; 321+ messages in thread
From: Feng Liu via Virtualization @ 2023-09-27 18:09 UTC (permalink / raw)
  To: Michael S. Tsirkin, Yishai Hadas
  Cc: kvm, maorg, virtualization, jgg, jiri, leonro



On 2023-09-26 p.m.3:13, Feng Liu via Virtualization wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 2023-09-21 a.m.9:46, Michael S. Tsirkin wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On Thu, Sep 21, 2023 at 03:40:30PM +0300, Yishai Hadas wrote:
>>> From: Feng Liu <feliu@nvidia.com>
>>>

>>> pci_irq_vector(vp_dev->pci_dev, v);
>>> @@ -294,6 +298,7 @@ static int vp_find_vqs_msix(struct virtio_device 
>>> *vdev, unsigned int nvqs,
>>>        vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
>>>        if (!vp_dev->vqs)
>>>                return -ENOMEM;
>>> +     vp_dev->nvqs = nvqs;
>>>
>>>        if (per_vq_vectors) {
>>>                /* Best option: one for change interrupt, one per vq. */
>>> @@ -365,6 +370,7 @@ static int vp_find_vqs_intx(struct virtio_device 
>>> *vdev, unsigned int nvqs,
>>>        vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
>>>        if (!vp_dev->vqs)
>>>                return -ENOMEM;
>>> +     vp_dev->nvqs = nvqs;
>>>
>>>        err = request_irq(vp_dev->pci_dev->irq, vp_interrupt, 
>>> IRQF_SHARED,
>>>                        dev_name(&vdev->dev), vp_dev);
>>> diff --git a/drivers/virtio/virtio_pci_common.h 
>>> b/drivers/virtio/virtio_pci_common.h
>>> index 4b773bd7c58c..602021967aaa 100644
>>> --- a/drivers/virtio/virtio_pci_common.h
>>> +++ b/drivers/virtio/virtio_pci_common.h
>>> @@ -60,6 +60,7 @@ struct virtio_pci_device {
>>>
>>>        /* array of all queues for house-keeping */
>>>        struct virtio_pci_vq_info **vqs;
>>> +     u32 nvqs;
>>
>> I don't much like it that we are adding more duplicated info here.
>> In fact, we tried removing the vqs array in
>> 5c34d002dcc7a6dd665a19d098b4f4cd5501ba1a - there was some bug in that
>> patch and the author didn't have the time to debug
>> so I reverted but I don't really think we need to add to that.
>>
> 
> Hi Michael
> 
> As explained in commit message, this patch is mainly to prepare for the
> subsequent admin vq patches.
> 
> The admin vq is also established using the common mechanism of vring,
> and is added to vdev->vqs in __vring_new_virtqueue(). So vdev->vqs
> contains all virtqueues, including rxq, txq, ctrlvq and admin vq.
> 
> admin vq should be managed by the virito_pci layer and should not be
> created or deleted by upper driver (net, blk);
> When the upper driver was unloaded, it will call del_vqs() interface,
> which wll call vp_del_vqs(), and vp_del_vqs() should not delete the
> admin vq, but only delete the virtqueues created by the upper driver
> such as rxq, txq, and ctrlq.
> 
> 
> vp_dev->vqs[] array only contains virtqueues created by upper driver
> such as rxq, txq, ctrlq. Traversing vp_dev->vqs array can only delete
> the upper virtqueues, without the admin vq. Use the vdev->vqs linked
> list cannot meet the needs.
> 
> 
> Can such an explanation be explained clearly? Or do you have any other
> alternative methods?
> 

Hi, Michael
	Is the above explanations OK to you?

Thanks
Feng

>>>
>>>        /* MSI-X support */
>>>        int msix_enabled;
>>> -- 
>>> 2.27.0
>>
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 03/11] virtio-pci: Introduce admin virtqueue
  2023-09-26 19:23       ` Feng Liu via Virtualization
@ 2023-09-27 18:12         ` Feng Liu via Virtualization
  -1 siblings, 0 replies; 321+ messages in thread
From: Feng Liu @ 2023-09-27 18:12 UTC (permalink / raw)
  To: Michael S. Tsirkin, Yishai Hadas
  Cc: kvm, maorg, virtualization, jgg, jiri, leonro



On 2023-09-26 p.m.3:23, Feng Liu via Virtualization wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 2023-09-21 a.m.9:57, Michael S. Tsirkin wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On Thu, Sep 21, 2023 at 03:40:32PM +0300, Yishai Hadas wrote:
>>> From: Feng Liu <feliu@nvidia.com>


>>>   drivers/virtio/virtio_pci_modern_avq.c | 65 ++++++++++++++++++++++++++
>>
>> if you have a .c file without a .h file you know there's something
>> fishy. Just add this inside drivers/virtio/virtio_pci_modern.c ?
>>
> Will do.
> 

>>> +struct virtio_avq {
>>
>> admin_vq would be better. and this is pci specific yes? so virtio_pci_
>>
> 
> Will do.
> 

>>>
>>> +     struct virtio_avq *admin;
>>
>> and this could be thinkably admin_vq.
>>
> Will do.
> 

>>>
>>>   /* If driver didn't advertise the feature, it will never appear. */
>>> diff --git a/include/linux/virtio_pci_modern.h 
>>> b/include/linux/virtio_pci_modern.h
>>> index 067ac1d789bc..f6cb13d858fd 100644
>>> --- a/include/linux/virtio_pci_modern.h
>>> +++ b/include/linux/virtio_pci_modern.h
>>> @@ -10,6 +10,9 @@ struct virtio_pci_modern_common_cfg {
>>>
>>>        __le16 queue_notify_data;       /* read-write */
>>>        __le16 queue_reset;             /* read-write */
>>> +
>>> +     __le16 admin_queue_index;       /* read-only */
>>> +     __le16 admin_queue_num;         /* read-only */
>>>   };
>>
>>
>> ouch.
>> actually there's a problem
>>
>>          mdev->common = vp_modern_map_capability(mdev, common,
>>                                        sizeof(struct 
>> virtio_pci_common_cfg), 4,
>>                                        0, sizeof(struct 
>> virtio_pci_common_cfg),
>>                                        NULL, NULL);
>>
>> extending this structure means some calls will start failing on
>> existing devices.
>>
>> even more of an ouch, when we added queue_notify_data and queue_reset we
>> also possibly broke some devices. well hopefully not since no one
>> reported failures but we really need to fix that.
>>
>>
> Hi Michael
> 
> I didn’t see the fail in vp_modern_map_capability(), and
> vp_modern_map_capability() only read and map pci memory. The length of
> the memory mapping will increase as the struct virtio_pci_common_cfg
> increases. No errors are seen.
> 
> And according to the existing code, new pci configuration space members
> can only be added in struct virtio_pci_modern_common_cfg.
> 
> Every single entry added here is protected by feature bit, there is no
> bug AFAIK.
> 
> Could you help to explain it more detail?  Where and why it will fall if
> we add new member in struct virtio_pci_modern_common_cfg.
> 
> 
Hi, Michael
	Any comments about this?
Thanks
Feng

>>>
>>>   struct virtio_pci_modern_device {
>>> -- 
>>> 2.27.0
>>
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 03/11] virtio-pci: Introduce admin virtqueue
@ 2023-09-27 18:12         ` Feng Liu via Virtualization
  0 siblings, 0 replies; 321+ messages in thread
From: Feng Liu via Virtualization @ 2023-09-27 18:12 UTC (permalink / raw)
  To: Michael S. Tsirkin, Yishai Hadas
  Cc: kvm, maorg, virtualization, jgg, jiri, leonro



On 2023-09-26 p.m.3:23, Feng Liu via Virtualization wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 2023-09-21 a.m.9:57, Michael S. Tsirkin wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On Thu, Sep 21, 2023 at 03:40:32PM +0300, Yishai Hadas wrote:
>>> From: Feng Liu <feliu@nvidia.com>


>>>   drivers/virtio/virtio_pci_modern_avq.c | 65 ++++++++++++++++++++++++++
>>
>> if you have a .c file without a .h file you know there's something
>> fishy. Just add this inside drivers/virtio/virtio_pci_modern.c ?
>>
> Will do.
> 

>>> +struct virtio_avq {
>>
>> admin_vq would be better. and this is pci specific yes? so virtio_pci_
>>
> 
> Will do.
> 

>>>
>>> +     struct virtio_avq *admin;
>>
>> and this could be thinkably admin_vq.
>>
> Will do.
> 

>>>
>>>   /* If driver didn't advertise the feature, it will never appear. */
>>> diff --git a/include/linux/virtio_pci_modern.h 
>>> b/include/linux/virtio_pci_modern.h
>>> index 067ac1d789bc..f6cb13d858fd 100644
>>> --- a/include/linux/virtio_pci_modern.h
>>> +++ b/include/linux/virtio_pci_modern.h
>>> @@ -10,6 +10,9 @@ struct virtio_pci_modern_common_cfg {
>>>
>>>        __le16 queue_notify_data;       /* read-write */
>>>        __le16 queue_reset;             /* read-write */
>>> +
>>> +     __le16 admin_queue_index;       /* read-only */
>>> +     __le16 admin_queue_num;         /* read-only */
>>>   };
>>
>>
>> ouch.
>> actually there's a problem
>>
>>          mdev->common = vp_modern_map_capability(mdev, common,
>>                                        sizeof(struct 
>> virtio_pci_common_cfg), 4,
>>                                        0, sizeof(struct 
>> virtio_pci_common_cfg),
>>                                        NULL, NULL);
>>
>> extending this structure means some calls will start failing on
>> existing devices.
>>
>> even more of an ouch, when we added queue_notify_data and queue_reset we
>> also possibly broke some devices. well hopefully not since no one
>> reported failures but we really need to fix that.
>>
>>
> Hi Michael
> 
> I didn’t see the fail in vp_modern_map_capability(), and
> vp_modern_map_capability() only read and map pci memory. The length of
> the memory mapping will increase as the struct virtio_pci_common_cfg
> increases. No errors are seen.
> 
> And according to the existing code, new pci configuration space members
> can only be added in struct virtio_pci_modern_common_cfg.
> 
> Every single entry added here is protected by feature bit, there is no
> bug AFAIK.
> 
> Could you help to explain it more detail?  Where and why it will fall if
> we add new member in struct virtio_pci_modern_common_cfg.
> 
> 
Hi, Michael
	Any comments about this?
Thanks
Feng

>>>
>>>   struct virtio_pci_modern_device {
>>> -- 
>>> 2.27.0
>>
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 01/11] virtio-pci: Use virtio pci device layer vq info instead of generic one
  2023-09-27 18:09         ` Feng Liu via Virtualization
@ 2023-09-27 21:24           ` Michael S. Tsirkin
  -1 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-27 21:24 UTC (permalink / raw)
  To: Feng Liu; +Cc: Yishai Hadas, kvm, maorg, virtualization, jgg, jiri, leonro

On Wed, Sep 27, 2023 at 02:09:43PM -0400, Feng Liu wrote:
> 
> 
> On 2023-09-26 p.m.3:13, Feng Liu via Virtualization wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > On 2023-09-21 a.m.9:46, Michael S. Tsirkin wrote:
> > > External email: Use caution opening links or attachments
> > > 
> > > 
> > > On Thu, Sep 21, 2023 at 03:40:30PM +0300, Yishai Hadas wrote:
> > > > From: Feng Liu <feliu@nvidia.com>
> > > > 
> 
> > > > pci_irq_vector(vp_dev->pci_dev, v);
> > > > @@ -294,6 +298,7 @@ static int vp_find_vqs_msix(struct
> > > > virtio_device *vdev, unsigned int nvqs,
> > > >        vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
> > > >        if (!vp_dev->vqs)
> > > >                return -ENOMEM;
> > > > +     vp_dev->nvqs = nvqs;
> > > > 
> > > >        if (per_vq_vectors) {
> > > >                /* Best option: one for change interrupt, one per vq. */
> > > > @@ -365,6 +370,7 @@ static int vp_find_vqs_intx(struct
> > > > virtio_device *vdev, unsigned int nvqs,
> > > >        vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
> > > >        if (!vp_dev->vqs)
> > > >                return -ENOMEM;
> > > > +     vp_dev->nvqs = nvqs;
> > > > 
> > > >        err = request_irq(vp_dev->pci_dev->irq, vp_interrupt,
> > > > IRQF_SHARED,
> > > >                        dev_name(&vdev->dev), vp_dev);
> > > > diff --git a/drivers/virtio/virtio_pci_common.h
> > > > b/drivers/virtio/virtio_pci_common.h
> > > > index 4b773bd7c58c..602021967aaa 100644
> > > > --- a/drivers/virtio/virtio_pci_common.h
> > > > +++ b/drivers/virtio/virtio_pci_common.h
> > > > @@ -60,6 +60,7 @@ struct virtio_pci_device {
> > > > 
> > > >        /* array of all queues for house-keeping */
> > > >        struct virtio_pci_vq_info **vqs;
> > > > +     u32 nvqs;
> > > 
> > > I don't much like it that we are adding more duplicated info here.
> > > In fact, we tried removing the vqs array in
> > > 5c34d002dcc7a6dd665a19d098b4f4cd5501ba1a - there was some bug in that
> > > patch and the author didn't have the time to debug
> > > so I reverted but I don't really think we need to add to that.
> > > 
> > 
> > Hi Michael
> > 
> > As explained in commit message, this patch is mainly to prepare for the
> > subsequent admin vq patches.
> > 
> > The admin vq is also established using the common mechanism of vring,
> > and is added to vdev->vqs in __vring_new_virtqueue(). So vdev->vqs
> > contains all virtqueues, including rxq, txq, ctrlvq and admin vq.
> > 
> > admin vq should be managed by the virito_pci layer and should not be
> > created or deleted by upper driver (net, blk);
> > When the upper driver was unloaded, it will call del_vqs() interface,
> > which wll call vp_del_vqs(), and vp_del_vqs() should not delete the
> > admin vq, but only delete the virtqueues created by the upper driver
> > such as rxq, txq, and ctrlq.
> > 
> > 
> > vp_dev->vqs[] array only contains virtqueues created by upper driver
> > such as rxq, txq, ctrlq. Traversing vp_dev->vqs array can only delete
> > the upper virtqueues, without the admin vq. Use the vdev->vqs linked
> > list cannot meet the needs.
> > 
> > 
> > Can such an explanation be explained clearly? Or do you have any other
> > alternative methods?
> > 
> 
> Hi, Michael
> 	Is the above explanations OK to you?
> 
> Thanks
> Feng

First, the patch only addresses pci. Second, yes driver unload calls
del_vqs but doesn't it also reset the device? If this happens while
vfio tries to send commands to it then you have other problems.
And, for the baroque need of admin vq which
most devices don't have you are duplicating logic and wasting memory for
everyone.

What is a sane solution? virtio core was never designed to
allow two drivers accessing the same device. So don't try, add the logic
of device access in virtio core.  I feel the problem won't even exist if
instead of just exposing the device pointer you expose a sane interface.


> > > > 
> > > >        /* MSI-X support */
> > > >        int msix_enabled;
> > > > -- 
> > > > 2.27.0
> > > 
> > _______________________________________________
> > Virtualization mailing list
> > Virtualization@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/virtualization


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 01/11] virtio-pci: Use virtio pci device layer vq info instead of generic one
@ 2023-09-27 21:24           ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-27 21:24 UTC (permalink / raw)
  To: Feng Liu; +Cc: kvm, leonro, virtualization, jgg, jiri, maorg

On Wed, Sep 27, 2023 at 02:09:43PM -0400, Feng Liu wrote:
> 
> 
> On 2023-09-26 p.m.3:13, Feng Liu via Virtualization wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > On 2023-09-21 a.m.9:46, Michael S. Tsirkin wrote:
> > > External email: Use caution opening links or attachments
> > > 
> > > 
> > > On Thu, Sep 21, 2023 at 03:40:30PM +0300, Yishai Hadas wrote:
> > > > From: Feng Liu <feliu@nvidia.com>
> > > > 
> 
> > > > pci_irq_vector(vp_dev->pci_dev, v);
> > > > @@ -294,6 +298,7 @@ static int vp_find_vqs_msix(struct
> > > > virtio_device *vdev, unsigned int nvqs,
> > > >        vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
> > > >        if (!vp_dev->vqs)
> > > >                return -ENOMEM;
> > > > +     vp_dev->nvqs = nvqs;
> > > > 
> > > >        if (per_vq_vectors) {
> > > >                /* Best option: one for change interrupt, one per vq. */
> > > > @@ -365,6 +370,7 @@ static int vp_find_vqs_intx(struct
> > > > virtio_device *vdev, unsigned int nvqs,
> > > >        vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
> > > >        if (!vp_dev->vqs)
> > > >                return -ENOMEM;
> > > > +     vp_dev->nvqs = nvqs;
> > > > 
> > > >        err = request_irq(vp_dev->pci_dev->irq, vp_interrupt,
> > > > IRQF_SHARED,
> > > >                        dev_name(&vdev->dev), vp_dev);
> > > > diff --git a/drivers/virtio/virtio_pci_common.h
> > > > b/drivers/virtio/virtio_pci_common.h
> > > > index 4b773bd7c58c..602021967aaa 100644
> > > > --- a/drivers/virtio/virtio_pci_common.h
> > > > +++ b/drivers/virtio/virtio_pci_common.h
> > > > @@ -60,6 +60,7 @@ struct virtio_pci_device {
> > > > 
> > > >        /* array of all queues for house-keeping */
> > > >        struct virtio_pci_vq_info **vqs;
> > > > +     u32 nvqs;
> > > 
> > > I don't much like it that we are adding more duplicated info here.
> > > In fact, we tried removing the vqs array in
> > > 5c34d002dcc7a6dd665a19d098b4f4cd5501ba1a - there was some bug in that
> > > patch and the author didn't have the time to debug
> > > so I reverted but I don't really think we need to add to that.
> > > 
> > 
> > Hi Michael
> > 
> > As explained in commit message, this patch is mainly to prepare for the
> > subsequent admin vq patches.
> > 
> > The admin vq is also established using the common mechanism of vring,
> > and is added to vdev->vqs in __vring_new_virtqueue(). So vdev->vqs
> > contains all virtqueues, including rxq, txq, ctrlvq and admin vq.
> > 
> > admin vq should be managed by the virito_pci layer and should not be
> > created or deleted by upper driver (net, blk);
> > When the upper driver was unloaded, it will call del_vqs() interface,
> > which wll call vp_del_vqs(), and vp_del_vqs() should not delete the
> > admin vq, but only delete the virtqueues created by the upper driver
> > such as rxq, txq, and ctrlq.
> > 
> > 
> > vp_dev->vqs[] array only contains virtqueues created by upper driver
> > such as rxq, txq, ctrlq. Traversing vp_dev->vqs array can only delete
> > the upper virtqueues, without the admin vq. Use the vdev->vqs linked
> > list cannot meet the needs.
> > 
> > 
> > Can such an explanation be explained clearly? Or do you have any other
> > alternative methods?
> > 
> 
> Hi, Michael
> 	Is the above explanations OK to you?
> 
> Thanks
> Feng

First, the patch only addresses pci. Second, yes driver unload calls
del_vqs but doesn't it also reset the device? If this happens while
vfio tries to send commands to it then you have other problems.
And, for the baroque need of admin vq which
most devices don't have you are duplicating logic and wasting memory for
everyone.

What is a sane solution? virtio core was never designed to
allow two drivers accessing the same device. So don't try, add the logic
of device access in virtio core.  I feel the problem won't even exist if
instead of just exposing the device pointer you expose a sane interface.


> > > > 
> > > >        /* MSI-X support */
> > > >        int msix_enabled;
> > > > -- 
> > > > 2.27.0
> > > 
> > _______________________________________________
> > Virtualization mailing list
> > Virtualization@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/virtualization

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 03/11] virtio-pci: Introduce admin virtqueue
  2023-09-27 18:12         ` Feng Liu via Virtualization
@ 2023-09-27 21:27           ` Michael S. Tsirkin
  -1 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-27 21:27 UTC (permalink / raw)
  To: Feng Liu; +Cc: kvm, leonro, virtualization, jgg, jiri, maorg

On Wed, Sep 27, 2023 at 02:12:24PM -0400, Feng Liu wrote:
> 
> 
> On 2023-09-26 p.m.3:23, Feng Liu via Virtualization wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > On 2023-09-21 a.m.9:57, Michael S. Tsirkin wrote:
> > > External email: Use caution opening links or attachments
> > > 
> > > 
> > > On Thu, Sep 21, 2023 at 03:40:32PM +0300, Yishai Hadas wrote:
> > > > From: Feng Liu <feliu@nvidia.com>
> 
> 
> > > >   drivers/virtio/virtio_pci_modern_avq.c | 65 ++++++++++++++++++++++++++
> > > 
> > > if you have a .c file without a .h file you know there's something
> > > fishy. Just add this inside drivers/virtio/virtio_pci_modern.c ?
> > > 
> > Will do.
> > 
> 
> > > > +struct virtio_avq {
> > > 
> > > admin_vq would be better. and this is pci specific yes? so virtio_pci_
> > > 
> > 
> > Will do.
> > 
> 
> > > > 
> > > > +     struct virtio_avq *admin;
> > > 
> > > and this could be thinkably admin_vq.
> > > 
> > Will do.
> > 
> 
> > > > 
> > > >   /* If driver didn't advertise the feature, it will never appear. */
> > > > diff --git a/include/linux/virtio_pci_modern.h
> > > > b/include/linux/virtio_pci_modern.h
> > > > index 067ac1d789bc..f6cb13d858fd 100644
> > > > --- a/include/linux/virtio_pci_modern.h
> > > > +++ b/include/linux/virtio_pci_modern.h
> > > > @@ -10,6 +10,9 @@ struct virtio_pci_modern_common_cfg {
> > > > 
> > > >        __le16 queue_notify_data;       /* read-write */
> > > >        __le16 queue_reset;             /* read-write */
> > > > +
> > > > +     __le16 admin_queue_index;       /* read-only */
> > > > +     __le16 admin_queue_num;         /* read-only */
> > > >   };
> > > 
> > > 
> > > ouch.
> > > actually there's a problem
> > > 
> > >          mdev->common = vp_modern_map_capability(mdev, common,
> > >                                        sizeof(struct
> > > virtio_pci_common_cfg), 4,
> > >                                        0, sizeof(struct
> > > virtio_pci_common_cfg),
> > >                                        NULL, NULL);
> > > 
> > > extending this structure means some calls will start failing on
> > > existing devices.
> > > 
> > > even more of an ouch, when we added queue_notify_data and queue_reset we
> > > also possibly broke some devices. well hopefully not since no one
> > > reported failures but we really need to fix that.
> > > 
> > > 
> > Hi Michael
> > 
> > I didn’t see the fail in vp_modern_map_capability(), and
> > vp_modern_map_capability() only read and map pci memory. The length of
> > the memory mapping will increase as the struct virtio_pci_common_cfg
> > increases. No errors are seen.
> > 
> > And according to the existing code, new pci configuration space members
> > can only be added in struct virtio_pci_modern_common_cfg.
> > 
> > Every single entry added here is protected by feature bit, there is no
> > bug AFAIK.
> > 
> > Could you help to explain it more detail?  Where and why it will fall if
> > we add new member in struct virtio_pci_modern_common_cfg.
> > 
> > 
> Hi, Michael
> 	Any comments about this?
> Thanks
> Feng

If an existing device exposes a small
capability matching old size, then you change size then
the check will fail on the existing device and driver won't load.

All this happens way before feature bit checks.


> > > > 
> > > >   struct virtio_pci_modern_device {
> > > > -- 
> > > > 2.27.0
> > > 
> > _______________________________________________
> > Virtualization mailing list
> > Virtualization@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/virtualization

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 03/11] virtio-pci: Introduce admin virtqueue
@ 2023-09-27 21:27           ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-27 21:27 UTC (permalink / raw)
  To: Feng Liu; +Cc: Yishai Hadas, kvm, maorg, virtualization, jgg, jiri, leonro

On Wed, Sep 27, 2023 at 02:12:24PM -0400, Feng Liu wrote:
> 
> 
> On 2023-09-26 p.m.3:23, Feng Liu via Virtualization wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > On 2023-09-21 a.m.9:57, Michael S. Tsirkin wrote:
> > > External email: Use caution opening links or attachments
> > > 
> > > 
> > > On Thu, Sep 21, 2023 at 03:40:32PM +0300, Yishai Hadas wrote:
> > > > From: Feng Liu <feliu@nvidia.com>
> 
> 
> > > >   drivers/virtio/virtio_pci_modern_avq.c | 65 ++++++++++++++++++++++++++
> > > 
> > > if you have a .c file without a .h file you know there's something
> > > fishy. Just add this inside drivers/virtio/virtio_pci_modern.c ?
> > > 
> > Will do.
> > 
> 
> > > > +struct virtio_avq {
> > > 
> > > admin_vq would be better. and this is pci specific yes? so virtio_pci_
> > > 
> > 
> > Will do.
> > 
> 
> > > > 
> > > > +     struct virtio_avq *admin;
> > > 
> > > and this could be thinkably admin_vq.
> > > 
> > Will do.
> > 
> 
> > > > 
> > > >   /* If driver didn't advertise the feature, it will never appear. */
> > > > diff --git a/include/linux/virtio_pci_modern.h
> > > > b/include/linux/virtio_pci_modern.h
> > > > index 067ac1d789bc..f6cb13d858fd 100644
> > > > --- a/include/linux/virtio_pci_modern.h
> > > > +++ b/include/linux/virtio_pci_modern.h
> > > > @@ -10,6 +10,9 @@ struct virtio_pci_modern_common_cfg {
> > > > 
> > > >        __le16 queue_notify_data;       /* read-write */
> > > >        __le16 queue_reset;             /* read-write */
> > > > +
> > > > +     __le16 admin_queue_index;       /* read-only */
> > > > +     __le16 admin_queue_num;         /* read-only */
> > > >   };
> > > 
> > > 
> > > ouch.
> > > actually there's a problem
> > > 
> > >          mdev->common = vp_modern_map_capability(mdev, common,
> > >                                        sizeof(struct
> > > virtio_pci_common_cfg), 4,
> > >                                        0, sizeof(struct
> > > virtio_pci_common_cfg),
> > >                                        NULL, NULL);
> > > 
> > > extending this structure means some calls will start failing on
> > > existing devices.
> > > 
> > > even more of an ouch, when we added queue_notify_data and queue_reset we
> > > also possibly broke some devices. well hopefully not since no one
> > > reported failures but we really need to fix that.
> > > 
> > > 
> > Hi Michael
> > 
> > I didn’t see the fail in vp_modern_map_capability(), and
> > vp_modern_map_capability() only read and map pci memory. The length of
> > the memory mapping will increase as the struct virtio_pci_common_cfg
> > increases. No errors are seen.
> > 
> > And according to the existing code, new pci configuration space members
> > can only be added in struct virtio_pci_modern_common_cfg.
> > 
> > Every single entry added here is protected by feature bit, there is no
> > bug AFAIK.
> > 
> > Could you help to explain it more detail?  Where and why it will fall if
> > we add new member in struct virtio_pci_modern_common_cfg.
> > 
> > 
> Hi, Michael
> 	Any comments about this?
> Thanks
> Feng

If an existing device exposes a small
capability matching old size, then you change size then
the check will fail on the existing device and driver won't load.

All this happens way before feature bit checks.


> > > > 
> > > >   struct virtio_pci_modern_device {
> > > > -- 
> > > > 2.27.0
> > > 
> > _______________________________________________
> > Virtualization mailing list
> > Virtualization@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/virtualization


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-09-27 13:18         ` Jason Gunthorpe
@ 2023-09-27 21:30             ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-27 21:30 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Yishai Hadas, kvm, maorg, virtualization, jiri, leonro

On Wed, Sep 27, 2023 at 10:18:17AM -0300, Jason Gunthorpe wrote:
> On Tue, Sep 26, 2023 at 07:41:44AM -0400, Michael S. Tsirkin wrote:
> 
> > > By the way, this follows what was done already between vfio/mlx5 to
> > > mlx5_core modules where mlx5_core exposes generic APIs to execute a command
> > > and to get the a PF from a given mlx5 VF.
> > 
> > This is up to mlx5 maintainers. In particular they only need to worry
> > that their patches work with specific hardware which they likely have.
> > virtio has to work with multiple vendors - hardware and software -
> > and exposing a low level API that I can't test on my laptop
> > is not at all my ideal.
> 
> mlx5 has a reasonable API from the lower level that allows the vfio
> driver to safely issue commands. The API provides all the safety and
> locking you have been questioning here.
> 
> Then the vfio driver can form the commands directly and in the way it
> needs. This avoids spewing code into the core modules that is only
> used by vfio - which has been a key design consideration for our
> driver layering.
> 
> I suggest following the same design here as it has been well proven.
> Provide a solid API to operate the admin queue and let VFIO use
> it. One of the main purposes of the admin queue is to deliver commands
> on behalf of the VF driver, so this is a logical and reasonable place
> to put an API.

Not the way virtio is designed now. I guess mlx5 is designed in
a way that makes it safe.

> > > This way, we can enable further commands to be added/extended
> > > easily/cleanly.
> > 
> > Something for vfio maintainer to consider in case it was
> > assumed that it's just this one weird thing
> > but otherwise it's all generic vfio. It's not going to stop there,
> > will it? The duplication of functionality with vdpa will continue :(
> 
> VFIO live migration is expected to come as well once OASIS completes
> its work.

Exactly. Is there doubt vdpa will want to support live migration?
Put this code in a library please.

> Parav, are there other things?
> 
> Jason


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-09-27 21:30             ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-27 21:30 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kvm, maorg, virtualization, jiri, leonro

On Wed, Sep 27, 2023 at 10:18:17AM -0300, Jason Gunthorpe wrote:
> On Tue, Sep 26, 2023 at 07:41:44AM -0400, Michael S. Tsirkin wrote:
> 
> > > By the way, this follows what was done already between vfio/mlx5 to
> > > mlx5_core modules where mlx5_core exposes generic APIs to execute a command
> > > and to get the a PF from a given mlx5 VF.
> > 
> > This is up to mlx5 maintainers. In particular they only need to worry
> > that their patches work with specific hardware which they likely have.
> > virtio has to work with multiple vendors - hardware and software -
> > and exposing a low level API that I can't test on my laptop
> > is not at all my ideal.
> 
> mlx5 has a reasonable API from the lower level that allows the vfio
> driver to safely issue commands. The API provides all the safety and
> locking you have been questioning here.
> 
> Then the vfio driver can form the commands directly and in the way it
> needs. This avoids spewing code into the core modules that is only
> used by vfio - which has been a key design consideration for our
> driver layering.
> 
> I suggest following the same design here as it has been well proven.
> Provide a solid API to operate the admin queue and let VFIO use
> it. One of the main purposes of the admin queue is to deliver commands
> on behalf of the VF driver, so this is a logical and reasonable place
> to put an API.

Not the way virtio is designed now. I guess mlx5 is designed in
a way that makes it safe.

> > > This way, we can enable further commands to be added/extended
> > > easily/cleanly.
> > 
> > Something for vfio maintainer to consider in case it was
> > assumed that it's just this one weird thing
> > but otherwise it's all generic vfio. It's not going to stop there,
> > will it? The duplication of functionality with vdpa will continue :(
> 
> VFIO live migration is expected to come as well once OASIS completes
> its work.

Exactly. Is there doubt vdpa will want to support live migration?
Put this code in a library please.

> Parav, are there other things?
> 
> Jason

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-26 13:50                                 ` Jason Gunthorpe
@ 2023-09-27 21:38                                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-27 21:38 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kvm, maorg, virtualization, jiri, leonro

On Tue, Sep 26, 2023 at 10:50:57AM -0300, Jason Gunthorpe wrote:
> On Tue, Sep 26, 2023 at 01:42:52AM -0400, Michael S. Tsirkin wrote:
> > On Mon, Sep 25, 2023 at 09:40:59PM -0300, Jason Gunthorpe wrote:
> > > On Mon, Sep 25, 2023 at 03:44:11PM -0400, Michael S. Tsirkin wrote:
> > > > > VDPA is very different from this. You might call them both mediation,
> > > > > sure, but then you need another word to describe the additional
> > > > > changes VPDA is doing.
> > > > 
> > > > Sorry about hijacking the thread a little bit, but could you
> > > > call out some of the changes that are the most problematic
> > > > for you?
> > > 
> > > I don't really know these details.
> > 
> > Maybe, you then should desist from saying things like "It entirely fails
> > to achieve the most important thing it needs to do!" You are not making
> > any new friends with saying this about a piece of software without
> > knowing the details.
> 
> I can't tell you what cloud operators are doing, but I can say with
> confidence that it is not the same as VDPA. As I said, if you want to
> know more details you need to ask a cloud operator.
> 
> Jason

So it's not the changes that are problematic, it's that you have
customers who are not using vdpa. The "most important thing" that vdpa
fails at is simply converting your customers from vfio to vdpa.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-27 21:38                                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-27 21:38 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jason Wang, Yishai Hadas, alex.williamson, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Tue, Sep 26, 2023 at 10:50:57AM -0300, Jason Gunthorpe wrote:
> On Tue, Sep 26, 2023 at 01:42:52AM -0400, Michael S. Tsirkin wrote:
> > On Mon, Sep 25, 2023 at 09:40:59PM -0300, Jason Gunthorpe wrote:
> > > On Mon, Sep 25, 2023 at 03:44:11PM -0400, Michael S. Tsirkin wrote:
> > > > > VDPA is very different from this. You might call them both mediation,
> > > > > sure, but then you need another word to describe the additional
> > > > > changes VPDA is doing.
> > > > 
> > > > Sorry about hijacking the thread a little bit, but could you
> > > > call out some of the changes that are the most problematic
> > > > for you?
> > > 
> > > I don't really know these details.
> > 
> > Maybe, you then should desist from saying things like "It entirely fails
> > to achieve the most important thing it needs to do!" You are not making
> > any new friends with saying this about a piece of software without
> > knowing the details.
> 
> I can't tell you what cloud operators are doing, but I can say with
> confidence that it is not the same as VDPA. As I said, if you want to
> know more details you need to ask a cloud operator.
> 
> Jason

So it's not the changes that are problematic, it's that you have
customers who are not using vdpa. The "most important thing" that vdpa
fails at is simply converting your customers from vfio to vdpa.

-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-09-27 21:30             ` Michael S. Tsirkin
  (?)
@ 2023-09-27 23:16             ` Jason Gunthorpe
  2023-09-28  5:26                 ` Michael S. Tsirkin
  -1 siblings, 1 reply; 321+ messages in thread
From: Jason Gunthorpe @ 2023-09-27 23:16 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Yishai Hadas, kvm, maorg, virtualization, jiri, leonro

On Wed, Sep 27, 2023 at 05:30:04PM -0400, Michael S. Tsirkin wrote:
> On Wed, Sep 27, 2023 at 10:18:17AM -0300, Jason Gunthorpe wrote:
> > On Tue, Sep 26, 2023 at 07:41:44AM -0400, Michael S. Tsirkin wrote:
> > 
> > > > By the way, this follows what was done already between vfio/mlx5 to
> > > > mlx5_core modules where mlx5_core exposes generic APIs to execute a command
> > > > and to get the a PF from a given mlx5 VF.
> > > 
> > > This is up to mlx5 maintainers. In particular they only need to worry
> > > that their patches work with specific hardware which they likely have.
> > > virtio has to work with multiple vendors - hardware and software -
> > > and exposing a low level API that I can't test on my laptop
> > > is not at all my ideal.
> > 
> > mlx5 has a reasonable API from the lower level that allows the vfio
> > driver to safely issue commands. The API provides all the safety and
> > locking you have been questioning here.
> > 
> > Then the vfio driver can form the commands directly and in the way it
> > needs. This avoids spewing code into the core modules that is only
> > used by vfio - which has been a key design consideration for our
> > driver layering.
> > 
> > I suggest following the same design here as it has been well proven.
> > Provide a solid API to operate the admin queue and let VFIO use
> > it. One of the main purposes of the admin queue is to deliver commands
> > on behalf of the VF driver, so this is a logical and reasonable place
> > to put an API.
> 
> Not the way virtio is designed now. I guess mlx5 is designed in
> a way that makes it safe.

If you can't reliably issue commmands from the VF at all it doesn't
really matter where you put the code. Once that is established up then
an admin command execution interface is a nice cut point for
modularity.

The locking in mlx5 to make this safe is not too complex, if Feng
missed some items for virtio then he can work to fix it up.

> > VFIO live migration is expected to come as well once OASIS completes
> > its work.
> 
> Exactly. Is there doubt vdpa will want to support live migration?
> Put this code in a library please.

I have a doubt, you both said vdpa already does live migration, so
what will it even do with a live migration interface to a PCI
function?

It already has to use full mediation to operate a physical virtio
function, so it seems like it shouldn't need the migration interface?

Regardless, it is better kernel development hygiene to put the code
where it is used and wait for a second user to consolidate it than to
guess.

Jason

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-27 21:38                                     ` Michael S. Tsirkin
  (?)
@ 2023-09-27 23:20                                     ` Jason Gunthorpe
  2023-09-28  5:31                                         ` Michael S. Tsirkin
  -1 siblings, 1 reply; 321+ messages in thread
From: Jason Gunthorpe @ 2023-09-27 23:20 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, Yishai Hadas, alex.williamson, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Wed, Sep 27, 2023 at 05:38:55PM -0400, Michael S. Tsirkin wrote:
> On Tue, Sep 26, 2023 at 10:50:57AM -0300, Jason Gunthorpe wrote:
> > On Tue, Sep 26, 2023 at 01:42:52AM -0400, Michael S. Tsirkin wrote:
> > > On Mon, Sep 25, 2023 at 09:40:59PM -0300, Jason Gunthorpe wrote:
> > > > On Mon, Sep 25, 2023 at 03:44:11PM -0400, Michael S. Tsirkin wrote:
> > > > > > VDPA is very different from this. You might call them both mediation,
> > > > > > sure, but then you need another word to describe the additional
> > > > > > changes VPDA is doing.
> > > > > 
> > > > > Sorry about hijacking the thread a little bit, but could you
> > > > > call out some of the changes that are the most problematic
> > > > > for you?
> > > > 
> > > > I don't really know these details.
> > > 
> > > Maybe, you then should desist from saying things like "It entirely fails
> > > to achieve the most important thing it needs to do!" You are not making
> > > any new friends with saying this about a piece of software without
> > > knowing the details.
> > 
> > I can't tell you what cloud operators are doing, but I can say with
> > confidence that it is not the same as VDPA. As I said, if you want to
> > know more details you need to ask a cloud operator.
>
> So it's not the changes that are problematic, it's that you have
> customers who are not using vdpa. The "most important thing" that vdpa
> fails at is simply converting your customers from vfio to vdpa.

I said the most important thing was that VFIO presents exactly the
same virtio device to the VM as the baremetal. Do you dispute that,
technically, VDPA does not actually achieve that?

Then why is it so surprising that people don't want a solution that
changes the vPCI ABI they worked hard to create in the first place?

I'm still baffled why you think everyone should use vdpa..

Jason

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-09-27 23:16             ` Jason Gunthorpe
@ 2023-09-28  5:26                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-28  5:26 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kvm, maorg, virtualization, jiri, leonro

On Wed, Sep 27, 2023 at 08:16:00PM -0300, Jason Gunthorpe wrote:
> On Wed, Sep 27, 2023 at 05:30:04PM -0400, Michael S. Tsirkin wrote:
> > On Wed, Sep 27, 2023 at 10:18:17AM -0300, Jason Gunthorpe wrote:
> > > On Tue, Sep 26, 2023 at 07:41:44AM -0400, Michael S. Tsirkin wrote:
> > > 
> > > > > By the way, this follows what was done already between vfio/mlx5 to
> > > > > mlx5_core modules where mlx5_core exposes generic APIs to execute a command
> > > > > and to get the a PF from a given mlx5 VF.
> > > > 
> > > > This is up to mlx5 maintainers. In particular they only need to worry
> > > > that their patches work with specific hardware which they likely have.
> > > > virtio has to work with multiple vendors - hardware and software -
> > > > and exposing a low level API that I can't test on my laptop
> > > > is not at all my ideal.
> > > 
> > > mlx5 has a reasonable API from the lower level that allows the vfio
> > > driver to safely issue commands. The API provides all the safety and
> > > locking you have been questioning here.
> > > 
> > > Then the vfio driver can form the commands directly and in the way it
> > > needs. This avoids spewing code into the core modules that is only
> > > used by vfio - which has been a key design consideration for our
> > > driver layering.
> > > 
> > > I suggest following the same design here as it has been well proven.
> > > Provide a solid API to operate the admin queue and let VFIO use
> > > it. One of the main purposes of the admin queue is to deliver commands
> > > on behalf of the VF driver, so this is a logical and reasonable place
> > > to put an API.
> > 
> > Not the way virtio is designed now. I guess mlx5 is designed in
> > a way that makes it safe.
> 
> If you can't reliably issue commmands from the VF at all it doesn't
> really matter where you put the code. Once that is established up then
> an admin command execution interface is a nice cut point for
> modularity.
> 
> The locking in mlx5 to make this safe is not too complex, if Feng
> missed some items for virtio then he can work to fix it up.

Above two paragraphs don't make sense to me at all. VF issues
no commands and there's no locking.

> > > VFIO live migration is expected to come as well once OASIS completes
> > > its work.
> > 
> > Exactly. Is there doubt vdpa will want to support live migration?
> > Put this code in a library please.
> 
> I have a doubt, you both said vdpa already does live migration, so
> what will it even do with a live migration interface to a PCI
> function?

This is not the thread to explain how vdpa live migration works now and
why it needs new interfaces, sorry. Suffice is to say right now on virtio
tc Parav from nvidia is arguing for vdpa to use admin commands for
migration.

> It already has to use full mediation to operate a physical virtio
> function, so it seems like it shouldn't need the migration interface?
> 
> Regardless, it is better kernel development hygiene to put the code
> where it is used and wait for a second user to consolidate it than to
> guess.
> 
> Jason

Sorry no time right now to argue philosophy. I gave some hints on how to
make the virtio changes behave in a way that I'm ok with maintaining.
Hope they help.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-09-28  5:26                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-28  5:26 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Yishai Hadas, kvm, maorg, virtualization, jiri, leonro

On Wed, Sep 27, 2023 at 08:16:00PM -0300, Jason Gunthorpe wrote:
> On Wed, Sep 27, 2023 at 05:30:04PM -0400, Michael S. Tsirkin wrote:
> > On Wed, Sep 27, 2023 at 10:18:17AM -0300, Jason Gunthorpe wrote:
> > > On Tue, Sep 26, 2023 at 07:41:44AM -0400, Michael S. Tsirkin wrote:
> > > 
> > > > > By the way, this follows what was done already between vfio/mlx5 to
> > > > > mlx5_core modules where mlx5_core exposes generic APIs to execute a command
> > > > > and to get the a PF from a given mlx5 VF.
> > > > 
> > > > This is up to mlx5 maintainers. In particular they only need to worry
> > > > that their patches work with specific hardware which they likely have.
> > > > virtio has to work with multiple vendors - hardware and software -
> > > > and exposing a low level API that I can't test on my laptop
> > > > is not at all my ideal.
> > > 
> > > mlx5 has a reasonable API from the lower level that allows the vfio
> > > driver to safely issue commands. The API provides all the safety and
> > > locking you have been questioning here.
> > > 
> > > Then the vfio driver can form the commands directly and in the way it
> > > needs. This avoids spewing code into the core modules that is only
> > > used by vfio - which has been a key design consideration for our
> > > driver layering.
> > > 
> > > I suggest following the same design here as it has been well proven.
> > > Provide a solid API to operate the admin queue and let VFIO use
> > > it. One of the main purposes of the admin queue is to deliver commands
> > > on behalf of the VF driver, so this is a logical and reasonable place
> > > to put an API.
> > 
> > Not the way virtio is designed now. I guess mlx5 is designed in
> > a way that makes it safe.
> 
> If you can't reliably issue commmands from the VF at all it doesn't
> really matter where you put the code. Once that is established up then
> an admin command execution interface is a nice cut point for
> modularity.
> 
> The locking in mlx5 to make this safe is not too complex, if Feng
> missed some items for virtio then he can work to fix it up.

Above two paragraphs don't make sense to me at all. VF issues
no commands and there's no locking.

> > > VFIO live migration is expected to come as well once OASIS completes
> > > its work.
> > 
> > Exactly. Is there doubt vdpa will want to support live migration?
> > Put this code in a library please.
> 
> I have a doubt, you both said vdpa already does live migration, so
> what will it even do with a live migration interface to a PCI
> function?

This is not the thread to explain how vdpa live migration works now and
why it needs new interfaces, sorry. Suffice is to say right now on virtio
tc Parav from nvidia is arguing for vdpa to use admin commands for
migration.

> It already has to use full mediation to operate a physical virtio
> function, so it seems like it shouldn't need the migration interface?
> 
> Regardless, it is better kernel development hygiene to put the code
> where it is used and wait for a second user to consolidate it than to
> guess.
> 
> Jason

Sorry no time right now to argue philosophy. I gave some hints on how to
make the virtio changes behave in a way that I'm ok with maintaining.
Hope they help.

-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-27 23:20                                     ` Jason Gunthorpe
@ 2023-09-28  5:31                                         ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-28  5:31 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kvm, maorg, virtualization, jiri, leonro

On Wed, Sep 27, 2023 at 08:20:05PM -0300, Jason Gunthorpe wrote:
> On Wed, Sep 27, 2023 at 05:38:55PM -0400, Michael S. Tsirkin wrote:
> > On Tue, Sep 26, 2023 at 10:50:57AM -0300, Jason Gunthorpe wrote:
> > > On Tue, Sep 26, 2023 at 01:42:52AM -0400, Michael S. Tsirkin wrote:
> > > > On Mon, Sep 25, 2023 at 09:40:59PM -0300, Jason Gunthorpe wrote:
> > > > > On Mon, Sep 25, 2023 at 03:44:11PM -0400, Michael S. Tsirkin wrote:
> > > > > > > VDPA is very different from this. You might call them both mediation,
> > > > > > > sure, but then you need another word to describe the additional
> > > > > > > changes VPDA is doing.
> > > > > > 
> > > > > > Sorry about hijacking the thread a little bit, but could you
> > > > > > call out some of the changes that are the most problematic
> > > > > > for you?
> > > > > 
> > > > > I don't really know these details.
> > > > 
> > > > Maybe, you then should desist from saying things like "It entirely fails
> > > > to achieve the most important thing it needs to do!" You are not making
> > > > any new friends with saying this about a piece of software without
> > > > knowing the details.
> > > 
> > > I can't tell you what cloud operators are doing, but I can say with
> > > confidence that it is not the same as VDPA. As I said, if you want to
> > > know more details you need to ask a cloud operator.
> >
> > So it's not the changes that are problematic, it's that you have
> > customers who are not using vdpa. The "most important thing" that vdpa
> > fails at is simply converting your customers from vfio to vdpa.
> 
> I said the most important thing was that VFIO presents exactly the
> same virtio device to the VM as the baremetal. Do you dispute that,
> technically, VDPA does not actually achieve that?

I dispute that it is the most important. The important thing is to have
guests work.

> Then why is it so surprising that people don't want a solution that
> changes the vPCI ABI they worked hard to create in the first place?
> 
> I'm still baffled why you think everyone should use vdpa..
> 
> Jason

They shouldn't. If you want proprietary extensions then vfio is the way
to go, I don't think vdpa will support that.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-09-28  5:31                                         ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-09-28  5:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jason Wang, Yishai Hadas, alex.williamson, kvm, virtualization,
	parav, feliu, jiri, kevin.tian, joao.m.martins, leonro, maorg

On Wed, Sep 27, 2023 at 08:20:05PM -0300, Jason Gunthorpe wrote:
> On Wed, Sep 27, 2023 at 05:38:55PM -0400, Michael S. Tsirkin wrote:
> > On Tue, Sep 26, 2023 at 10:50:57AM -0300, Jason Gunthorpe wrote:
> > > On Tue, Sep 26, 2023 at 01:42:52AM -0400, Michael S. Tsirkin wrote:
> > > > On Mon, Sep 25, 2023 at 09:40:59PM -0300, Jason Gunthorpe wrote:
> > > > > On Mon, Sep 25, 2023 at 03:44:11PM -0400, Michael S. Tsirkin wrote:
> > > > > > > VDPA is very different from this. You might call them both mediation,
> > > > > > > sure, but then you need another word to describe the additional
> > > > > > > changes VPDA is doing.
> > > > > > 
> > > > > > Sorry about hijacking the thread a little bit, but could you
> > > > > > call out some of the changes that are the most problematic
> > > > > > for you?
> > > > > 
> > > > > I don't really know these details.
> > > > 
> > > > Maybe, you then should desist from saying things like "It entirely fails
> > > > to achieve the most important thing it needs to do!" You are not making
> > > > any new friends with saying this about a piece of software without
> > > > knowing the details.
> > > 
> > > I can't tell you what cloud operators are doing, but I can say with
> > > confidence that it is not the same as VDPA. As I said, if you want to
> > > know more details you need to ask a cloud operator.
> >
> > So it's not the changes that are problematic, it's that you have
> > customers who are not using vdpa. The "most important thing" that vdpa
> > fails at is simply converting your customers from vfio to vdpa.
> 
> I said the most important thing was that VFIO presents exactly the
> same virtio device to the VM as the baremetal. Do you dispute that,
> technically, VDPA does not actually achieve that?

I dispute that it is the most important. The important thing is to have
guests work.

> Then why is it so surprising that people don't want a solution that
> changes the vPCI ABI they worked hard to create in the first place?
> 
> I'm still baffled why you think everyone should use vdpa..
> 
> Jason

They shouldn't. If you want proprietary extensions then vfio is the way
to go, I don't think vdpa will support that.

-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* RE: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-26 17:00         ` Michael S. Tsirkin
@ 2023-10-02  4:38           ` Parav Pandit via Virtualization
  -1 siblings, 0 replies; 321+ messages in thread
From: Parav Pandit @ 2023-10-02  4:38 UTC (permalink / raw)
  To: Michael S. Tsirkin, Yishai Hadas
  Cc: Alex Williamson, jasowang, Jason Gunthorpe, kvm, virtualization,
	Feng Liu, Jiri Pirko, kevin.tian, joao.m.martins,
	Leon Romanovsky, Maor Gottlieb



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, September 26, 2023 10:30 PM

> For example, a transitional device
> must not in theory be safely passed through to guest userspace, because guest
> then might try to use it through the legacy BAR without acknowledging
> ACCESS_PLATFORM.
> Do any guests check this and fail? Hard to say.
>
ACCESS_PLATFORM is not offered on the legacy interface because legacy interface spec 0.9.5 didn't have it.
Whether guest VM maps it to user space and using GIOVA is completely unknown to the device.
And all of this is just fine, because IOMMU through vfio takes care of necessary translation with/without mapping the transitional device to the guest user space.

Hence, it is not a compat problem.
Anyways, only those user will attach a virtio device to vfio-virtio device when user care to expose transitional device in guest.

I can see that in future, when user wants to do this optionally, a devlink/sysfs knob will be added, at that point, one needs to have a disable_transitional flag.
So it may be worth to optionally enable transitional support on user request as Michael suggested.

^ permalink raw reply	[flat|nested] 321+ messages in thread

* RE: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-10-02  4:38           ` Parav Pandit via Virtualization
  0 siblings, 0 replies; 321+ messages in thread
From: Parav Pandit via Virtualization @ 2023-10-02  4:38 UTC (permalink / raw)
  To: Michael S. Tsirkin, Yishai Hadas
  Cc: kvm, Maor Gottlieb, virtualization, Jason Gunthorpe, Jiri Pirko,
	Leon Romanovsky



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, September 26, 2023 10:30 PM

> For example, a transitional device
> must not in theory be safely passed through to guest userspace, because guest
> then might try to use it through the legacy BAR without acknowledging
> ACCESS_PLATFORM.
> Do any guests check this and fail? Hard to say.
>
ACCESS_PLATFORM is not offered on the legacy interface because legacy interface spec 0.9.5 didn't have it.
Whether guest VM maps it to user space and using GIOVA is completely unknown to the device.
And all of this is just fine, because IOMMU through vfio takes care of necessary translation with/without mapping the transitional device to the guest user space.

Hence, it is not a compat problem.
Anyways, only those user will attach a virtio device to vfio-virtio device when user care to expose transitional device in guest.

I can see that in future, when user wants to do this optionally, a devlink/sysfs knob will be added, at that point, one needs to have a disable_transitional flag.
So it may be worth to optionally enable transitional support on user request as Michael suggested.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-09-26 11:41         ` Michael S. Tsirkin
@ 2023-10-02  6:28           ` Christoph Hellwig
  -1 siblings, 0 replies; 321+ messages in thread
From: Christoph Hellwig @ 2023-10-02  6:28 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Yishai Hadas, alex.williamson, jasowang, jgg, kvm,
	virtualization, parav, feliu, jiri, kevin.tian, joao.m.martins,
	leonro, maorg

On Tue, Sep 26, 2023 at 07:41:44AM -0400, Michael S. Tsirkin wrote:
> 
> Except, there's no reasonable way for virtio to know what is done with
> the device then. You are not using just 2 symbols at all, instead you
> are using the rich vq API which was explicitly designed for the driver
> running the device being responsible for serializing accesses. Which is
> actually loaded and running. And I *think* your use won't conflict ATM
> mostly by luck. Witness the hack in patch 01 as exhibit 1 - nothing
> at all even hints at the fact that the reason for the complicated
> dance is because another driver pokes at some of the vqs.

Fully agreed.  The smart nic vendors are trying to do the same mess
in nvme, and we really need to stop them and agree on proper standarized
live migration features implemented in the core virtio/nvme code.


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-02  6:28           ` Christoph Hellwig
  0 siblings, 0 replies; 321+ messages in thread
From: Christoph Hellwig @ 2023-10-02  6:28 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, maorg, virtualization, jgg, jiri, leonro

On Tue, Sep 26, 2023 at 07:41:44AM -0400, Michael S. Tsirkin wrote:
> 
> Except, there's no reasonable way for virtio to know what is done with
> the device then. You are not using just 2 symbols at all, instead you
> are using the rich vq API which was explicitly designed for the driver
> running the device being responsible for serializing accesses. Which is
> actually loaded and running. And I *think* your use won't conflict ATM
> mostly by luck. Witness the hack in patch 01 as exhibit 1 - nothing
> at all even hints at the fact that the reason for the complicated
> dance is because another driver pokes at some of the vqs.

Fully agreed.  The smart nic vendors are trying to do the same mess
in nvme, and we really need to stop them and agree on proper standarized
live migration features implemented in the core virtio/nvme code.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* RE: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-22 15:53     ` Michael S. Tsirkin
@ 2023-10-02 11:23       ` Parav Pandit via Virtualization
  -1 siblings, 0 replies; 321+ messages in thread
From: Parav Pandit @ 2023-10-02 11:23 UTC (permalink / raw)
  To: Michael S. Tsirkin, Yishai Hadas
  Cc: alex.williamson, jasowang, Jason Gunthorpe, kvm, virtualization,
	Feng Liu, Jiri Pirko, kevin.tian, joao.m.martins,
	Leon Romanovsky, Maor Gottlieb



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Friday, September 22, 2023 9:23 PM

> > +static int virtiovf_pci_probe(struct pci_dev *pdev,
> > +			      const struct pci_device_id *id) {
> > +	const struct vfio_device_ops *ops = &virtiovf_acc_vfio_pci_ops;
> > +	struct virtiovf_pci_core_device *virtvdev;
> > +	int ret;
> > +
> > +	if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
> > +	    !virtiovf_bar0_exists(pdev) && pdev->msix_cap)
> 
> I see this is the reason you set MSIX to true. But I think it's a misunderstanding -
> that true means MSIX is enabled by guest, not that it exists.

Msix check here just looks a sanity check to make sure that guest can enable msix.
The msix enable check should be in the read()/write() calls to decide which AQ command to choose from, 
i.e. to access common config or device config as written in the virtio spec. 

Yishai please fix the read() write() calls to dynamically consider the offset of 24/20 based on the msix enabled state.

^ permalink raw reply	[flat|nested] 321+ messages in thread

* RE: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-10-02 11:23       ` Parav Pandit via Virtualization
  0 siblings, 0 replies; 321+ messages in thread
From: Parav Pandit via Virtualization @ 2023-10-02 11:23 UTC (permalink / raw)
  To: Michael S. Tsirkin, Yishai Hadas
  Cc: kvm, Maor Gottlieb, virtualization, Jason Gunthorpe, Jiri Pirko,
	Leon Romanovsky



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Friday, September 22, 2023 9:23 PM

> > +static int virtiovf_pci_probe(struct pci_dev *pdev,
> > +			      const struct pci_device_id *id) {
> > +	const struct vfio_device_ops *ops = &virtiovf_acc_vfio_pci_ops;
> > +	struct virtiovf_pci_core_device *virtvdev;
> > +	int ret;
> > +
> > +	if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
> > +	    !virtiovf_bar0_exists(pdev) && pdev->msix_cap)
> 
> I see this is the reason you set MSIX to true. But I think it's a misunderstanding -
> that true means MSIX is enabled by guest, not that it exists.

Msix check here just looks a sanity check to make sure that guest can enable msix.
The msix enable check should be in the read()/write() calls to decide which AQ command to choose from, 
i.e. to access common config or device config as written in the virtio spec. 

Yishai please fix the read() write() calls to dynamically consider the offset of 24/20 based on the msix enabled state.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-02  6:28           ` Christoph Hellwig
  (?)
@ 2023-10-02 15:13           ` Jason Gunthorpe
  2023-10-05  8:49               ` Christoph Hellwig
  -1 siblings, 1 reply; 321+ messages in thread
From: Jason Gunthorpe @ 2023-10-02 15:13 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Michael S. Tsirkin, Yishai Hadas, alex.williamson, jasowang, kvm,
	virtualization, parav, feliu, jiri, kevin.tian, joao.m.martins,
	leonro, maorg

On Sun, Oct 01, 2023 at 11:28:26PM -0700, Christoph Hellwig wrote:
> On Tue, Sep 26, 2023 at 07:41:44AM -0400, Michael S. Tsirkin wrote:
> > 
> > Except, there's no reasonable way for virtio to know what is done with
> > the device then. You are not using just 2 symbols at all, instead you
> > are using the rich vq API which was explicitly designed for the driver
> > running the device being responsible for serializing accesses. Which is
> > actually loaded and running. And I *think* your use won't conflict ATM
> > mostly by luck. Witness the hack in patch 01 as exhibit 1 - nothing
> > at all even hints at the fact that the reason for the complicated
> > dance is because another driver pokes at some of the vqs.
> 
> Fully agreed.  The smart nic vendors are trying to do the same mess
> in nvme, and we really need to stop them and agree on proper standarized
> live migration features implemented in the core virtio/nvme code.

??? This patch series is an implementation of changes that OASIS
approved.

The live migration work is going to OASIS first no patches have been
presented.

This thread is arguing about how to split up the code for the
implementatin of the standard given that VFIO owns the VF and the
virtio core owns the PF. The standard defined that PF admin queue
operations are needed to do operations on behalf of the VF.

Jason

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 03/11] virtio-pci: Introduce admin virtqueue
  2023-09-27 21:27           ` Michael S. Tsirkin
@ 2023-10-02 18:07             ` Feng Liu via Virtualization
  -1 siblings, 0 replies; 321+ messages in thread
From: Feng Liu @ 2023-10-02 18:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Yishai Hadas, kvm, maorg, virtualization, jgg, jiri, leonro



On 2023-09-27 p.m.5:27, Michael S. Tsirkin wrote:

> 
> If an existing device exposes a small
> capability matching old size, then you change size then
> the check will fail on the existing device and driver won't load.
> 
> All this happens way before feature bit checks.
> 
> 
Will do

Thanks
Feng

>>>>>
>>>>>    struct virtio_pci_modern_device {
>>>>> --
>>>>> 2.27.0
>>>>
>>> _______________________________________________
>>> Virtualization mailing list
>>> Virtualization@lists.linux-foundation.org
>>> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
> 

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 03/11] virtio-pci: Introduce admin virtqueue
@ 2023-10-02 18:07             ` Feng Liu via Virtualization
  0 siblings, 0 replies; 321+ messages in thread
From: Feng Liu via Virtualization @ 2023-10-02 18:07 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, leonro, virtualization, jgg, jiri, maorg



On 2023-09-27 p.m.5:27, Michael S. Tsirkin wrote:

> 
> If an existing device exposes a small
> capability matching old size, then you change size then
> the check will fail on the existing device and driver won't load.
> 
> All this happens way before feature bit checks.
> 
> 
Will do

Thanks
Feng

>>>>>
>>>>>    struct virtio_pci_modern_device {
>>>>> --
>>>>> 2.27.0
>>>>
>>> _______________________________________________
>>> Virtualization mailing list
>>> Virtualization@lists.linux-foundation.org
>>> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
> 
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-02 15:13           ` Jason Gunthorpe
@ 2023-10-05  8:49               ` Christoph Hellwig
  0 siblings, 0 replies; 321+ messages in thread
From: Christoph Hellwig @ 2023-10-05  8:49 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvm, Michael S. Tsirkin, maorg, virtualization,
	Christoph Hellwig, jiri, leonro

On Mon, Oct 02, 2023 at 12:13:20PM -0300, Jason Gunthorpe wrote:
> ??? This patch series is an implementation of changes that OASIS
> approved.

I think you are fundamentally missing my point.  This is not about
who publish a spec, but how we struture Linux code.

And the problem is that we trea vfio as a separate thing, and not an
integral part of the driver.  vfio being separate totally makes sense
for the original purpose of vfio, that is a a no-op passthrough of
a device to userspace.

But for all the augmented vfio use cases it doesn't, for them the
augmented vfio functionality is an integral part of the core driver.
That is true for nvme, virtio and I'd argue mlx5 as well.

So we need to stop registering separate pci_drivers for this kind
of functionality, and instead have an interface to the driver to
switch to certain functionalities.

E.g. for this case there should be no new vfio-virtio device, but
instead you should be able to switch the virtio device to an
fake-legacy vfio mode.

Assuming the whole thing actually makes sense, as the use case seems
a bit fishy to start with, but I'll leave that argument to the virtio
maintainers.

Similarly for nvme.  We'll never accept a separate nvme-live migration
vfio driver.  This functionality needs to be part of the nvme driver,
probed there and fully controlled there.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-05  8:49               ` Christoph Hellwig
  0 siblings, 0 replies; 321+ messages in thread
From: Christoph Hellwig @ 2023-10-05  8:49 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Michael S. Tsirkin, Yishai Hadas,
	alex.williamson, jasowang, kvm, virtualization, parav, feliu,
	jiri, kevin.tian, joao.m.martins, leonro, maorg

On Mon, Oct 02, 2023 at 12:13:20PM -0300, Jason Gunthorpe wrote:
> ??? This patch series is an implementation of changes that OASIS
> approved.

I think you are fundamentally missing my point.  This is not about
who publish a spec, but how we struture Linux code.

And the problem is that we trea vfio as a separate thing, and not an
integral part of the driver.  vfio being separate totally makes sense
for the original purpose of vfio, that is a a no-op passthrough of
a device to userspace.

But for all the augmented vfio use cases it doesn't, for them the
augmented vfio functionality is an integral part of the core driver.
That is true for nvme, virtio and I'd argue mlx5 as well.

So we need to stop registering separate pci_drivers for this kind
of functionality, and instead have an interface to the driver to
switch to certain functionalities.

E.g. for this case there should be no new vfio-virtio device, but
instead you should be able to switch the virtio device to an
fake-legacy vfio mode.

Assuming the whole thing actually makes sense, as the use case seems
a bit fishy to start with, but I'll leave that argument to the virtio
maintainers.

Similarly for nvme.  We'll never accept a separate nvme-live migration
vfio driver.  This functionality needs to be part of the nvme driver,
probed there and fully controlled there.

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-05  8:49               ` Christoph Hellwig
  (?)
@ 2023-10-05 11:10               ` Jason Gunthorpe
  2023-10-06 13:09                   ` Christoph Hellwig
  -1 siblings, 1 reply; 321+ messages in thread
From: Jason Gunthorpe @ 2023-10-05 11:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Michael S. Tsirkin, Yishai Hadas, alex.williamson, jasowang, kvm,
	virtualization, parav, feliu, jiri, kevin.tian, joao.m.martins,
	leonro, maorg

On Thu, Oct 05, 2023 at 01:49:54AM -0700, Christoph Hellwig wrote:

> But for all the augmented vfio use cases it doesn't, for them the
> augmented vfio functionality is an integral part of the core driver.
> That is true for nvme, virtio and I'd argue mlx5 as well.

I don't agree with this. I see the extra functionality as being an
integral part of the VF and VFIO. The PF driver is only providing a
proxied communication channel.

It is a limitation of PCI that the PF must act as a proxy.

> So we need to stop registering separate pci_drivers for this kind
> of functionality, and instead have an interface to the driver to
> switch to certain functionalities.

?? We must bind something to the VF's pci_driver, what do you imagine
that is?

> E.g. for this case there should be no new vfio-virtio device, but
> instead you should be able to switch the virtio device to an
> fake-legacy vfio mode.

Are you aruging about how we reach to vfio_register_XX() and what
directory the file lives?

I don't know what "fake-legacy" even means, VFIO is not legacy.

There is alot of code in VFIO and the VMM side to take a VF and turn
it into a vPCI function. You can't just trivially duplicate VFIO in a
dozen drivers without creating a giant mess.

Further, userspace wants consistent ways to operate this stuff. If we
need a dozen ways to activate VFIO for every kind of driver that is
not a positive direction.

Basically, I don't know what you are suggesting here. We talked about
this before, and my position is still the same. Continuing to have
/dev/vfio/XX be the kernel uAPI for the VMM to work with non-mediated
vPCI functions with live migration is the technically correct thing to
do.

Why wouldn't it be?

> Similarly for nvme.  We'll never accept a separate nvme-live migration
> vfio driver.  This functionality needs to be part of the nvme driver,
> probed there and fully controlled there.

We can debate where to put the files when the standard is done, but
the end of the day it needs to create /dev/vfio/XXX.

Jason

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-05 11:10               ` Jason Gunthorpe
@ 2023-10-06 13:09                   ` Christoph Hellwig
  0 siblings, 0 replies; 321+ messages in thread
From: Christoph Hellwig @ 2023-10-06 13:09 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Michael S. Tsirkin, Yishai Hadas,
	alex.williamson, jasowang, kvm, virtualization, parav, feliu,
	jiri, kevin.tian, joao.m.martins, leonro, maorg

On Thu, Oct 05, 2023 at 08:10:04AM -0300, Jason Gunthorpe wrote:
> > But for all the augmented vfio use cases it doesn't, for them the
> > augmented vfio functionality is an integral part of the core driver.
> > That is true for nvme, virtio and I'd argue mlx5 as well.
> 
> I don't agree with this. I see the extra functionality as being an
> integral part of the VF and VFIO. The PF driver is only providing a
> proxied communication channel.
> 
> It is a limitation of PCI that the PF must act as a proxy.

For anything live migration it very fundamentally is not, as a function
that is visible to a guest by definition can't drive the migration
itself.  That isn't really a limitation in PCI, but follows form the
fact that something else must control a live migration that is
transparent to the guest.

> 
> > So we need to stop registering separate pci_drivers for this kind
> > of functionality, and instead have an interface to the driver to
> > switch to certain functionalities.
> 
> ?? We must bind something to the VF's pci_driver, what do you imagine
> that is?

The driver that knows this hardware.  In this case the virtio subsystem,
in case of nvme the nvme driver, and in case of mlx5 the mlx5 driver.

> > E.g. for this case there should be no new vfio-virtio device, but
> > instead you should be able to switch the virtio device to an
> > fake-legacy vfio mode.
> 
> Are you aruging about how we reach to vfio_register_XX() and what
> directory the file lives?

No.  That layout logically follows from what codebase the functionality
is part of, though.

> I don't know what "fake-legacy" even means, VFIO is not legacy.

The driver we're talking about in this thread fakes up a virtio_pci
legacy devie to the guest on top of a "modern" virtio_pci device.

> There is alot of code in VFIO and the VMM side to take a VF and turn
> it into a vPCI function. You can't just trivially duplicate VFIO in a
> dozen drivers without creating a giant mess.

I do not advocate for duplicating it.  But the code that calls this
functionality belongs into the driver that deals with the compound
device that we're doing this work for.

> Further, userspace wants consistent ways to operate this stuff. If we
> need a dozen ways to activate VFIO for every kind of driver that is
> not a positive direction.

We don't need a dozen ways.  We just need a single attribute on the
pci (or $OTHERBUS) devide that switches it to vfio mode.

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-06 13:09                   ` Christoph Hellwig
  0 siblings, 0 replies; 321+ messages in thread
From: Christoph Hellwig @ 2023-10-06 13:09 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvm, Michael S. Tsirkin, maorg, virtualization,
	Christoph Hellwig, jiri, leonro

On Thu, Oct 05, 2023 at 08:10:04AM -0300, Jason Gunthorpe wrote:
> > But for all the augmented vfio use cases it doesn't, for them the
> > augmented vfio functionality is an integral part of the core driver.
> > That is true for nvme, virtio and I'd argue mlx5 as well.
> 
> I don't agree with this. I see the extra functionality as being an
> integral part of the VF and VFIO. The PF driver is only providing a
> proxied communication channel.
> 
> It is a limitation of PCI that the PF must act as a proxy.

For anything live migration it very fundamentally is not, as a function
that is visible to a guest by definition can't drive the migration
itself.  That isn't really a limitation in PCI, but follows form the
fact that something else must control a live migration that is
transparent to the guest.

> 
> > So we need to stop registering separate pci_drivers for this kind
> > of functionality, and instead have an interface to the driver to
> > switch to certain functionalities.
> 
> ?? We must bind something to the VF's pci_driver, what do you imagine
> that is?

The driver that knows this hardware.  In this case the virtio subsystem,
in case of nvme the nvme driver, and in case of mlx5 the mlx5 driver.

> > E.g. for this case there should be no new vfio-virtio device, but
> > instead you should be able to switch the virtio device to an
> > fake-legacy vfio mode.
> 
> Are you aruging about how we reach to vfio_register_XX() and what
> directory the file lives?

No.  That layout logically follows from what codebase the functionality
is part of, though.

> I don't know what "fake-legacy" even means, VFIO is not legacy.

The driver we're talking about in this thread fakes up a virtio_pci
legacy devie to the guest on top of a "modern" virtio_pci device.

> There is alot of code in VFIO and the VMM side to take a VF and turn
> it into a vPCI function. You can't just trivially duplicate VFIO in a
> dozen drivers without creating a giant mess.

I do not advocate for duplicating it.  But the code that calls this
functionality belongs into the driver that deals with the compound
device that we're doing this work for.

> Further, userspace wants consistent ways to operate this stuff. If we
> need a dozen ways to activate VFIO for every kind of driver that is
> not a positive direction.

We don't need a dozen ways.  We just need a single attribute on the
pci (or $OTHERBUS) devide that switches it to vfio mode.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-26 11:49                                       ` Michael S. Tsirkin
@ 2023-10-08  4:28                                         ` Jason Wang
  -1 siblings, 0 replies; 321+ messages in thread
From: Jason Wang @ 2023-10-08  4:28 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, Maor Gottlieb, virtualization, Jason Gunthorpe, Jiri Pirko,
	Leon Romanovsky

On Tue, Sep 26, 2023 at 7:49 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Sep 26, 2023 at 10:32:39AM +0800, Jason Wang wrote:
> > It's the implementation details in legacy. The device needs to make
> > sure (reset) the driver can work (is done before get_status return).
>
> I think that there's no way to make it reliably work for all legacy drivers.

Yes, we may have ancient drivers.

>
> They just assumed a software backend and did not bother with DMA
> ordering. You can try to avoid resets, they are not that common so
> things will tend to mostly work if you don't stress them to much with
> things like hot plug/unplug in a loop.  Or you can try to use a driver
> after 2011 which is more aware of hardware ordering and flushes the
> reset write with a read.  One of these two tricks, I think, is the magic
> behind the device exposing memory bar 0 that you mention.

Right this is what I see for hardware legacy devices shipped by some
cloud vendors.

Thanks

>
> --
> MST
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-10-08  4:28                                         ` Jason Wang
  0 siblings, 0 replies; 321+ messages in thread
From: Jason Wang @ 2023-10-08  4:28 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Parav Pandit, Jason Gunthorpe, Alex Williamson, Yishai Hadas,
	kvm, virtualization, Feng Liu, Jiri Pirko, kevin.tian,
	joao.m.martins, Leon Romanovsky, Maor Gottlieb

On Tue, Sep 26, 2023 at 7:49 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Sep 26, 2023 at 10:32:39AM +0800, Jason Wang wrote:
> > It's the implementation details in legacy. The device needs to make
> > sure (reset) the driver can work (is done before get_status return).
>
> I think that there's no way to make it reliably work for all legacy drivers.

Yes, we may have ancient drivers.

>
> They just assumed a software backend and did not bother with DMA
> ordering. You can try to avoid resets, they are not that common so
> things will tend to mostly work if you don't stress them to much with
> things like hot plug/unplug in a loop.  Or you can try to use a driver
> after 2011 which is more aware of hardware ordering and flushes the
> reset write with a read.  One of these two tricks, I think, is the magic
> behind the device exposing memory bar 0 that you mention.

Right this is what I see for hardware legacy devices shipped by some
cloud vendors.

Thanks

>
> --
> MST
>


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-06 13:09                   ` Christoph Hellwig
  (?)
@ 2023-10-10 13:10                   ` Jason Gunthorpe
  2023-10-10 13:56                       ` Michael S. Tsirkin
  2023-10-11  6:26                       ` Christoph Hellwig
  -1 siblings, 2 replies; 321+ messages in thread
From: Jason Gunthorpe @ 2023-10-10 13:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Michael S. Tsirkin, Yishai Hadas, alex.williamson, jasowang, kvm,
	virtualization, parav, feliu, jiri, kevin.tian, joao.m.martins,
	leonro, maorg

On Fri, Oct 06, 2023 at 06:09:09AM -0700, Christoph Hellwig wrote:
> On Thu, Oct 05, 2023 at 08:10:04AM -0300, Jason Gunthorpe wrote:
> > > But for all the augmented vfio use cases it doesn't, for them the
> > > augmented vfio functionality is an integral part of the core driver.
> > > That is true for nvme, virtio and I'd argue mlx5 as well.
> > 
> > I don't agree with this. I see the extra functionality as being an
> > integral part of the VF and VFIO. The PF driver is only providing a
> > proxied communication channel.
> > 
> > It is a limitation of PCI that the PF must act as a proxy.
> 
> For anything live migration it very fundamentally is not, as a function
> that is visible to a guest by definition can't drive the migration
> itself.  That isn't really a limitation in PCI, but follows form the
> fact that something else must control a live migration that is
> transparent to the guest.

We've talked around ideas like allowing the VF config space to do some
of the work. For simple devices we could get away with 1 VF config
space register. (VF config space is owned by the hypervisor, not the
guest)

Devices that need DMA as part of their migration could be imagined to
co-opt a VF PASID or something. eg using ENQCMD.

SIOVr2 is discussing more a flexible RID mapping - there is a possible
route where a "VF" could actually have two RIDs, a hypervisor RID and a
guest RID.

It really is PCI limitations that force this design of making a PF
driver do dual duty as a fully functionally normal device and act as a
communication channel proxy to make a back channel into a SRIOV VF.

My view has always been that the VFIO live migration operations are
executed logically within the VF as they only effect the VF.

So we have a logical design seperation where VFIO world owns the
commands and the PF driver supplies the communication channel. This
works well for devices that already have a robust RPC interface to
their device FW.

> > ?? We must bind something to the VF's pci_driver, what do you imagine
> > that is?
> 
> The driver that knows this hardware.  In this case the virtio subsystem,
> in case of nvme the nvme driver, and in case of mlx5 the mlx5 driver.

But those are drivers operating the HW to create kernel devices. Here
we need a VFIO device. They can't co-exist, if you switch mlx5 from
normal to vfio you have to tear down the entire normal driver.

> > > E.g. for this case there should be no new vfio-virtio device, but
> > > instead you should be able to switch the virtio device to an
> > > fake-legacy vfio mode.
> > 
> > Are you aruging about how we reach to vfio_register_XX() and what
> > directory the file lives?
> 
> No.  That layout logically follows from what codebase the functionality
> is part of, though.

I don't understand what we are talking about really. Where do you
imagine the vfio_register_XX() goes?

> > I don't know what "fake-legacy" even means, VFIO is not legacy.
> 
> The driver we're talking about in this thread fakes up a virtio_pci
> legacy devie to the guest on top of a "modern" virtio_pci device.

I'm not sure I'd use the word fake, inb/outb are always trapped
operations in VMs. If the device provided a real IO BAR then VFIO
common code would trap and relay inb/outb to the device.

All this is doing is changing the inb/outb relay from using a physical
IO BAR to a DMA command ring.

The motivation is simply because normal IO BAR space is incredibly
limited and you can't get enough SRIOV functions when using it.

> > There is alot of code in VFIO and the VMM side to take a VF and turn
> > it into a vPCI function. You can't just trivially duplicate VFIO in a
> > dozen drivers without creating a giant mess.
> 
> I do not advocate for duplicating it.  But the code that calls this
> functionality belongs into the driver that deals with the compound
> device that we're doing this work for.

On one hand, I don't really care - we can put the code where people
like.

However - the Intel GPU VFIO driver is such a bad experiance I don't
want to encourage people to make VFIO drivers, or code that is only
used by VFIO drivers, that are not under drivers/vfio review.

> > Further, userspace wants consistent ways to operate this stuff. If we
> > need a dozen ways to activate VFIO for every kind of driver that is
> > not a positive direction.
> 
> We don't need a dozen ways.  We just need a single attribute on the
> pci (or $OTHERBUS) devide that switches it to vfio mode.

Well, we sort of do these days, it is just a convoluted bind thing.

Realistically switching modes requires unprobing the entire normal VF
driver. Having this be linked to the driver core probe/unprobe flows
is a good code reuse thing, IMHO.

We already spent alot of effort making this quite general from the
userspace perspective. Nobody yet came up with an idea to avoid the
ugly unbind/bind flow.

Be aware, there is a significant performance concern here. If you want
to create 1000 VFIO devices (this is a real thing), we *can't* probe a
normal driver first, it is too slow. We need a path that goes directly
from creating the RIDs to turning those RIDs into VFIO.

mlx5 takes *seconds* to complete its normal probe. We must avoid this.

Looking a few years into the future, with SIOVr1/2, the flow I want to
target is some uAPI commands:
  'create a PCI RID with params XYZ and attach a normal/VFIO/etc driver'
  'destroy a PCI RID'

We need to get away from this scheme of SRIOV where you bulk create a
bunch of empty VFs at one time and then have to somehow provision
them.

Jason

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-10 13:10                   ` Jason Gunthorpe
@ 2023-10-10 13:56                       ` Michael S. Tsirkin
  2023-10-11  6:26                       ` Christoph Hellwig
  1 sibling, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-10 13:56 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Yishai Hadas, alex.williamson, jasowang, kvm,
	virtualization, parav, feliu, jiri, kevin.tian, joao.m.martins,
	leonro, maorg

On Tue, Oct 10, 2023 at 10:10:31AM -0300, Jason Gunthorpe wrote:
> > > There is alot of code in VFIO and the VMM side to take a VF and turn
> > > it into a vPCI function. You can't just trivially duplicate VFIO in a
> > > dozen drivers without creating a giant mess.
> > 
> > I do not advocate for duplicating it.  But the code that calls this
> > functionality belongs into the driver that deals with the compound
> > device that we're doing this work for.
> 
> On one hand, I don't really care - we can put the code where people
> like.
> 
> However - the Intel GPU VFIO driver is such a bad experiance I don't
> want to encourage people to make VFIO drivers, or code that is only
> used by VFIO drivers, that are not under drivers/vfio review.

So if Alex feels it makes sense to add some virtio functionality
to vfio and is happy to maintain or let you maintain the UAPI
then why would I say no? But we never expected devices to have
two drivers like this does, so just exposing device pointer
and saying "use regular virtio APIs for the rest" does not
cut it, the new APIs have to make sense
so virtio drivers can develop normally without fear of stepping
on the toes of this admin driver.

-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-10 13:56                       ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-10 13:56 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvm, maorg, virtualization, Christoph Hellwig, jiri, leonro

On Tue, Oct 10, 2023 at 10:10:31AM -0300, Jason Gunthorpe wrote:
> > > There is alot of code in VFIO and the VMM side to take a VF and turn
> > > it into a vPCI function. You can't just trivially duplicate VFIO in a
> > > dozen drivers without creating a giant mess.
> > 
> > I do not advocate for duplicating it.  But the code that calls this
> > functionality belongs into the driver that deals with the compound
> > device that we're doing this work for.
> 
> On one hand, I don't really care - we can put the code where people
> like.
> 
> However - the Intel GPU VFIO driver is such a bad experiance I don't
> want to encourage people to make VFIO drivers, or code that is only
> used by VFIO drivers, that are not under drivers/vfio review.

So if Alex feels it makes sense to add some virtio functionality
to vfio and is happy to maintain or let you maintain the UAPI
then why would I say no? But we never expected devices to have
two drivers like this does, so just exposing device pointer
and saying "use regular virtio APIs for the rest" does not
cut it, the new APIs have to make sense
so virtio drivers can develop normally without fear of stepping
on the toes of this admin driver.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-10 13:56                       ` Michael S. Tsirkin
  (?)
@ 2023-10-10 14:08                       ` Jason Gunthorpe
  2023-10-10 14:54                           ` Michael S. Tsirkin
  -1 siblings, 1 reply; 321+ messages in thread
From: Jason Gunthorpe @ 2023-10-10 14:08 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Yishai Hadas, alex.williamson, jasowang, kvm,
	virtualization, parav, feliu, jiri, kevin.tian, joao.m.martins,
	leonro, maorg

On Tue, Oct 10, 2023 at 09:56:00AM -0400, Michael S. Tsirkin wrote:

> > However - the Intel GPU VFIO driver is such a bad experiance I don't
> > want to encourage people to make VFIO drivers, or code that is only
> > used by VFIO drivers, that are not under drivers/vfio review.
> 
> So if Alex feels it makes sense to add some virtio functionality
> to vfio and is happy to maintain or let you maintain the UAPI
> then why would I say no? But we never expected devices to have
> two drivers like this does, so just exposing device pointer
> and saying "use regular virtio APIs for the rest" does not
> cut it, the new APIs have to make sense
> so virtio drivers can develop normally without fear of stepping
> on the toes of this admin driver.

Please work with Yishai to get something that make sense to you. He
can post a v2 with the accumulated comments addressed so far and then
go over what the API between the drivers is.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-10 14:08                       ` Jason Gunthorpe
@ 2023-10-10 14:54                           ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-10 14:54 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvm, maorg, virtualization, Christoph Hellwig, jiri, leonro

On Tue, Oct 10, 2023 at 11:08:49AM -0300, Jason Gunthorpe wrote:
> On Tue, Oct 10, 2023 at 09:56:00AM -0400, Michael S. Tsirkin wrote:
> 
> > > However - the Intel GPU VFIO driver is such a bad experiance I don't
> > > want to encourage people to make VFIO drivers, or code that is only
> > > used by VFIO drivers, that are not under drivers/vfio review.
> > 
> > So if Alex feels it makes sense to add some virtio functionality
> > to vfio and is happy to maintain or let you maintain the UAPI
> > then why would I say no? But we never expected devices to have
> > two drivers like this does, so just exposing device pointer
> > and saying "use regular virtio APIs for the rest" does not
> > cut it, the new APIs have to make sense
> > so virtio drivers can develop normally without fear of stepping
> > on the toes of this admin driver.
> 
> Please work with Yishai to get something that make sense to you. He
> can post a v2 with the accumulated comments addressed so far and then
> go over what the API between the drivers is.
> 
> Thanks,
> Jason

/me shrugs. I pretty much posted suggestions already. Should not be hard.
Anything unclear - post on list.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-10 14:54                           ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-10 14:54 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Yishai Hadas, alex.williamson, jasowang, kvm,
	virtualization, parav, feliu, jiri, kevin.tian, joao.m.martins,
	leonro, maorg

On Tue, Oct 10, 2023 at 11:08:49AM -0300, Jason Gunthorpe wrote:
> On Tue, Oct 10, 2023 at 09:56:00AM -0400, Michael S. Tsirkin wrote:
> 
> > > However - the Intel GPU VFIO driver is such a bad experiance I don't
> > > want to encourage people to make VFIO drivers, or code that is only
> > > used by VFIO drivers, that are not under drivers/vfio review.
> > 
> > So if Alex feels it makes sense to add some virtio functionality
> > to vfio and is happy to maintain or let you maintain the UAPI
> > then why would I say no? But we never expected devices to have
> > two drivers like this does, so just exposing device pointer
> > and saying "use regular virtio APIs for the rest" does not
> > cut it, the new APIs have to make sense
> > so virtio drivers can develop normally without fear of stepping
> > on the toes of this admin driver.
> 
> Please work with Yishai to get something that make sense to you. He
> can post a v2 with the accumulated comments addressed so far and then
> go over what the API between the drivers is.
> 
> Thanks,
> Jason

/me shrugs. I pretty much posted suggestions already. Should not be hard.
Anything unclear - post on list.

-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-10 14:54                           ` Michael S. Tsirkin
@ 2023-10-10 15:09                             ` Yishai Hadas via Virtualization
  -1 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas @ 2023-10-10 15:09 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Gunthorpe
  Cc: Christoph Hellwig, alex.williamson, jasowang, kvm,
	virtualization, parav, feliu, jiri, kevin.tian, joao.m.martins,
	leonro, maorg

On 10/10/2023 17:54, Michael S. Tsirkin wrote:
> On Tue, Oct 10, 2023 at 11:08:49AM -0300, Jason Gunthorpe wrote:
>> On Tue, Oct 10, 2023 at 09:56:00AM -0400, Michael S. Tsirkin wrote:
>>
>>>> However - the Intel GPU VFIO driver is such a bad experiance I don't
>>>> want to encourage people to make VFIO drivers, or code that is only
>>>> used by VFIO drivers, that are not under drivers/vfio review.
>>> So if Alex feels it makes sense to add some virtio functionality
>>> to vfio and is happy to maintain or let you maintain the UAPI
>>> then why would I say no? But we never expected devices to have
>>> two drivers like this does, so just exposing device pointer
>>> and saying "use regular virtio APIs for the rest" does not
>>> cut it, the new APIs have to make sense
>>> so virtio drivers can develop normally without fear of stepping
>>> on the toes of this admin driver.
>> Please work with Yishai to get something that make sense to you. He
>> can post a v2 with the accumulated comments addressed so far and then
>> go over what the API between the drivers is.
>>
>> Thanks,
>> Jason
> /me shrugs. I pretty much posted suggestions already. Should not be hard.
> Anything unclear - post on list.
>
Yes, this is the plan.

We are working to address the comments that we got so far in both VFIO & 
VIRTIO, retest and send the next version.

Re the API between the modules, It looks like we have the below 
alternatives.

1) Proceed with current approach where we exposed a generic API to 
execute any admin command, however, make it much more solid inside VIRTIO.
2) Expose extra APIs from VIRTIO for commands that we can consider 
future client usage of them as of LIST_QUERY/LIST_USE, however still 
have the generic execute admin command for others.
3) Expose API per command from VIRTIO and fully drop the generic execute 
admin command.

Few notes:
Option #1 looks the most generic one, it drops the need to expose 
multiple symbols / APIs per command and for now we have a single client 
for them (i.e. VFIO).
Options #2 & #3, may still require to expose the 
virtio_pci_vf_get_pf_dev() API to let VFIO get the VIRTIO PF (struct 
virtio_device *) from its PCI device, each command will get it as its 
first argument.

Michael,
What do you suggest here ?

Thanks,
Yishai


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-10 15:09                             ` Yishai Hadas via Virtualization
  0 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-10-10 15:09 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Gunthorpe
  Cc: kvm, maorg, virtualization, Christoph Hellwig, jiri, leonro

On 10/10/2023 17:54, Michael S. Tsirkin wrote:
> On Tue, Oct 10, 2023 at 11:08:49AM -0300, Jason Gunthorpe wrote:
>> On Tue, Oct 10, 2023 at 09:56:00AM -0400, Michael S. Tsirkin wrote:
>>
>>>> However - the Intel GPU VFIO driver is such a bad experiance I don't
>>>> want to encourage people to make VFIO drivers, or code that is only
>>>> used by VFIO drivers, that are not under drivers/vfio review.
>>> So if Alex feels it makes sense to add some virtio functionality
>>> to vfio and is happy to maintain or let you maintain the UAPI
>>> then why would I say no? But we never expected devices to have
>>> two drivers like this does, so just exposing device pointer
>>> and saying "use regular virtio APIs for the rest" does not
>>> cut it, the new APIs have to make sense
>>> so virtio drivers can develop normally without fear of stepping
>>> on the toes of this admin driver.
>> Please work with Yishai to get something that make sense to you. He
>> can post a v2 with the accumulated comments addressed so far and then
>> go over what the API between the drivers is.
>>
>> Thanks,
>> Jason
> /me shrugs. I pretty much posted suggestions already. Should not be hard.
> Anything unclear - post on list.
>
Yes, this is the plan.

We are working to address the comments that we got so far in both VFIO & 
VIRTIO, retest and send the next version.

Re the API between the modules, It looks like we have the below 
alternatives.

1) Proceed with current approach where we exposed a generic API to 
execute any admin command, however, make it much more solid inside VIRTIO.
2) Expose extra APIs from VIRTIO for commands that we can consider 
future client usage of them as of LIST_QUERY/LIST_USE, however still 
have the generic execute admin command for others.
3) Expose API per command from VIRTIO and fully drop the generic execute 
admin command.

Few notes:
Option #1 looks the most generic one, it drops the need to expose 
multiple symbols / APIs per command and for now we have a single client 
for them (i.e. VFIO).
Options #2 & #3, may still require to expose the 
virtio_pci_vf_get_pf_dev() API to let VFIO get the VIRTIO PF (struct 
virtio_device *) from its PCI device, each command will get it as its 
first argument.

Michael,
What do you suggest here ?

Thanks,
Yishai

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-10 15:09                             ` Yishai Hadas via Virtualization
@ 2023-10-10 15:14                               ` Michael S. Tsirkin
  -1 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-10 15:14 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: kvm, maorg, virtualization, Christoph Hellwig, Jason Gunthorpe,
	jiri, leonro

On Tue, Oct 10, 2023 at 06:09:44PM +0300, Yishai Hadas wrote:
> On 10/10/2023 17:54, Michael S. Tsirkin wrote:
> > On Tue, Oct 10, 2023 at 11:08:49AM -0300, Jason Gunthorpe wrote:
> > > On Tue, Oct 10, 2023 at 09:56:00AM -0400, Michael S. Tsirkin wrote:
> > > 
> > > > > However - the Intel GPU VFIO driver is such a bad experiance I don't
> > > > > want to encourage people to make VFIO drivers, or code that is only
> > > > > used by VFIO drivers, that are not under drivers/vfio review.
> > > > So if Alex feels it makes sense to add some virtio functionality
> > > > to vfio and is happy to maintain or let you maintain the UAPI
> > > > then why would I say no? But we never expected devices to have
> > > > two drivers like this does, so just exposing device pointer
> > > > and saying "use regular virtio APIs for the rest" does not
> > > > cut it, the new APIs have to make sense
> > > > so virtio drivers can develop normally without fear of stepping
> > > > on the toes of this admin driver.
> > > Please work with Yishai to get something that make sense to you. He
> > > can post a v2 with the accumulated comments addressed so far and then
> > > go over what the API between the drivers is.
> > > 
> > > Thanks,
> > > Jason
> > /me shrugs. I pretty much posted suggestions already. Should not be hard.
> > Anything unclear - post on list.
> > 
> Yes, this is the plan.
> 
> We are working to address the comments that we got so far in both VFIO &
> VIRTIO, retest and send the next version.
> 
> Re the API between the modules, It looks like we have the below
> alternatives.
> 
> 1) Proceed with current approach where we exposed a generic API to execute
> any admin command, however, make it much more solid inside VIRTIO.
> 2) Expose extra APIs from VIRTIO for commands that we can consider future
> client usage of them as of LIST_QUERY/LIST_USE, however still have the
> generic execute admin command for others.
> 3) Expose API per command from VIRTIO and fully drop the generic execute
> admin command.
> 
> Few notes:
> Option #1 looks the most generic one, it drops the need to expose multiple
> symbols / APIs per command and for now we have a single client for them
> (i.e. VFIO).
> Options #2 & #3, may still require to expose the virtio_pci_vf_get_pf_dev()
> API to let VFIO get the VIRTIO PF (struct virtio_device *) from its PCI
> device, each command will get it as its first argument.
> 
> Michael,
> What do you suggest here ?
> 
> Thanks,
> Yishai

I suggest 3 but call it on the VF. commands will switch to PF
internally as needed. For example, intel might be interested in exposing
admin commands through a memory BAR of VF itself.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-10 15:14                               ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-10 15:14 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: Jason Gunthorpe, Christoph Hellwig, alex.williamson, jasowang,
	kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, leonro, maorg

On Tue, Oct 10, 2023 at 06:09:44PM +0300, Yishai Hadas wrote:
> On 10/10/2023 17:54, Michael S. Tsirkin wrote:
> > On Tue, Oct 10, 2023 at 11:08:49AM -0300, Jason Gunthorpe wrote:
> > > On Tue, Oct 10, 2023 at 09:56:00AM -0400, Michael S. Tsirkin wrote:
> > > 
> > > > > However - the Intel GPU VFIO driver is such a bad experiance I don't
> > > > > want to encourage people to make VFIO drivers, or code that is only
> > > > > used by VFIO drivers, that are not under drivers/vfio review.
> > > > So if Alex feels it makes sense to add some virtio functionality
> > > > to vfio and is happy to maintain or let you maintain the UAPI
> > > > then why would I say no? But we never expected devices to have
> > > > two drivers like this does, so just exposing device pointer
> > > > and saying "use regular virtio APIs for the rest" does not
> > > > cut it, the new APIs have to make sense
> > > > so virtio drivers can develop normally without fear of stepping
> > > > on the toes of this admin driver.
> > > Please work with Yishai to get something that make sense to you. He
> > > can post a v2 with the accumulated comments addressed so far and then
> > > go over what the API between the drivers is.
> > > 
> > > Thanks,
> > > Jason
> > /me shrugs. I pretty much posted suggestions already. Should not be hard.
> > Anything unclear - post on list.
> > 
> Yes, this is the plan.
> 
> We are working to address the comments that we got so far in both VFIO &
> VIRTIO, retest and send the next version.
> 
> Re the API between the modules, It looks like we have the below
> alternatives.
> 
> 1) Proceed with current approach where we exposed a generic API to execute
> any admin command, however, make it much more solid inside VIRTIO.
> 2) Expose extra APIs from VIRTIO for commands that we can consider future
> client usage of them as of LIST_QUERY/LIST_USE, however still have the
> generic execute admin command for others.
> 3) Expose API per command from VIRTIO and fully drop the generic execute
> admin command.
> 
> Few notes:
> Option #1 looks the most generic one, it drops the need to expose multiple
> symbols / APIs per command and for now we have a single client for them
> (i.e. VFIO).
> Options #2 & #3, may still require to expose the virtio_pci_vf_get_pf_dev()
> API to let VFIO get the VIRTIO PF (struct virtio_device *) from its PCI
> device, each command will get it as its first argument.
> 
> Michael,
> What do you suggest here ?
> 
> Thanks,
> Yishai

I suggest 3 but call it on the VF. commands will switch to PF
internally as needed. For example, intel might be interested in exposing
admin commands through a memory BAR of VF itself.

-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-10 15:14                               ` Michael S. Tsirkin
@ 2023-10-10 15:43                                 ` Yishai Hadas via Virtualization
  -1 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas @ 2023-10-10 15:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Gunthorpe, Christoph Hellwig, alex.williamson, jasowang,
	kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, leonro, maorg

On 10/10/2023 18:14, Michael S. Tsirkin wrote:
> On Tue, Oct 10, 2023 at 06:09:44PM +0300, Yishai Hadas wrote:
>> On 10/10/2023 17:54, Michael S. Tsirkin wrote:
>>> On Tue, Oct 10, 2023 at 11:08:49AM -0300, Jason Gunthorpe wrote:
>>>> On Tue, Oct 10, 2023 at 09:56:00AM -0400, Michael S. Tsirkin wrote:
>>>>
>>>>>> However - the Intel GPU VFIO driver is such a bad experiance I don't
>>>>>> want to encourage people to make VFIO drivers, or code that is only
>>>>>> used by VFIO drivers, that are not under drivers/vfio review.
>>>>> So if Alex feels it makes sense to add some virtio functionality
>>>>> to vfio and is happy to maintain or let you maintain the UAPI
>>>>> then why would I say no? But we never expected devices to have
>>>>> two drivers like this does, so just exposing device pointer
>>>>> and saying "use regular virtio APIs for the rest" does not
>>>>> cut it, the new APIs have to make sense
>>>>> so virtio drivers can develop normally without fear of stepping
>>>>> on the toes of this admin driver.
>>>> Please work with Yishai to get something that make sense to you. He
>>>> can post a v2 with the accumulated comments addressed so far and then
>>>> go over what the API between the drivers is.
>>>>
>>>> Thanks,
>>>> Jason
>>> /me shrugs. I pretty much posted suggestions already. Should not be hard.
>>> Anything unclear - post on list.
>>>
>> Yes, this is the plan.
>>
>> We are working to address the comments that we got so far in both VFIO &
>> VIRTIO, retest and send the next version.
>>
>> Re the API between the modules, It looks like we have the below
>> alternatives.
>>
>> 1) Proceed with current approach where we exposed a generic API to execute
>> any admin command, however, make it much more solid inside VIRTIO.
>> 2) Expose extra APIs from VIRTIO for commands that we can consider future
>> client usage of them as of LIST_QUERY/LIST_USE, however still have the
>> generic execute admin command for others.
>> 3) Expose API per command from VIRTIO and fully drop the generic execute
>> admin command.
>>
>> Few notes:
>> Option #1 looks the most generic one, it drops the need to expose multiple
>> symbols / APIs per command and for now we have a single client for them
>> (i.e. VFIO).
>> Options #2 & #3, may still require to expose the virtio_pci_vf_get_pf_dev()
>> API to let VFIO get the VIRTIO PF (struct virtio_device *) from its PCI
>> device, each command will get it as its first argument.
>>
>> Michael,
>> What do you suggest here ?
>>
>> Thanks,
>> Yishai
> I suggest 3 but call it on the VF. commands will switch to PF
> internally as needed. For example, intel might be interested in exposing
> admin commands through a memory BAR of VF itself.
>
The driver who owns the VF is VFIO, it's not a VIRTIO one.

The ability to get the VIRTIO PF is from the PCI device (i.e. struct 
pci_dev).

In addition,
virtio_pci_vf_get_pf_dev() was implemented for now in virtio-pci as it 
worked on pci_dev.
Assuming that we'll put each command inside virtio as the generic layer, 
we won't be able to call/use this API internally to get the PF as of 
cyclic dependencies between the modules, link will fail.

So in option #3 we may still need to get outside into VFIO the VIRTIO PF 
and give it as pointer to VIRTIO upon each command.

Does it work for you ?

Yishai


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-10 15:43                                 ` Yishai Hadas via Virtualization
  0 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-10-10 15:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, maorg, virtualization, Christoph Hellwig, Jason Gunthorpe,
	jiri, leonro

On 10/10/2023 18:14, Michael S. Tsirkin wrote:
> On Tue, Oct 10, 2023 at 06:09:44PM +0300, Yishai Hadas wrote:
>> On 10/10/2023 17:54, Michael S. Tsirkin wrote:
>>> On Tue, Oct 10, 2023 at 11:08:49AM -0300, Jason Gunthorpe wrote:
>>>> On Tue, Oct 10, 2023 at 09:56:00AM -0400, Michael S. Tsirkin wrote:
>>>>
>>>>>> However - the Intel GPU VFIO driver is such a bad experiance I don't
>>>>>> want to encourage people to make VFIO drivers, or code that is only
>>>>>> used by VFIO drivers, that are not under drivers/vfio review.
>>>>> So if Alex feels it makes sense to add some virtio functionality
>>>>> to vfio and is happy to maintain or let you maintain the UAPI
>>>>> then why would I say no? But we never expected devices to have
>>>>> two drivers like this does, so just exposing device pointer
>>>>> and saying "use regular virtio APIs for the rest" does not
>>>>> cut it, the new APIs have to make sense
>>>>> so virtio drivers can develop normally without fear of stepping
>>>>> on the toes of this admin driver.
>>>> Please work with Yishai to get something that make sense to you. He
>>>> can post a v2 with the accumulated comments addressed so far and then
>>>> go over what the API between the drivers is.
>>>>
>>>> Thanks,
>>>> Jason
>>> /me shrugs. I pretty much posted suggestions already. Should not be hard.
>>> Anything unclear - post on list.
>>>
>> Yes, this is the plan.
>>
>> We are working to address the comments that we got so far in both VFIO &
>> VIRTIO, retest and send the next version.
>>
>> Re the API between the modules, It looks like we have the below
>> alternatives.
>>
>> 1) Proceed with current approach where we exposed a generic API to execute
>> any admin command, however, make it much more solid inside VIRTIO.
>> 2) Expose extra APIs from VIRTIO for commands that we can consider future
>> client usage of them as of LIST_QUERY/LIST_USE, however still have the
>> generic execute admin command for others.
>> 3) Expose API per command from VIRTIO and fully drop the generic execute
>> admin command.
>>
>> Few notes:
>> Option #1 looks the most generic one, it drops the need to expose multiple
>> symbols / APIs per command and for now we have a single client for them
>> (i.e. VFIO).
>> Options #2 & #3, may still require to expose the virtio_pci_vf_get_pf_dev()
>> API to let VFIO get the VIRTIO PF (struct virtio_device *) from its PCI
>> device, each command will get it as its first argument.
>>
>> Michael,
>> What do you suggest here ?
>>
>> Thanks,
>> Yishai
> I suggest 3 but call it on the VF. commands will switch to PF
> internally as needed. For example, intel might be interested in exposing
> admin commands through a memory BAR of VF itself.
>
The driver who owns the VF is VFIO, it's not a VIRTIO one.

The ability to get the VIRTIO PF is from the PCI device (i.e. struct 
pci_dev).

In addition,
virtio_pci_vf_get_pf_dev() was implemented for now in virtio-pci as it 
worked on pci_dev.
Assuming that we'll put each command inside virtio as the generic layer, 
we won't be able to call/use this API internally to get the PF as of 
cyclic dependencies between the modules, link will fail.

So in option #3 we may still need to get outside into VFIO the VIRTIO PF 
and give it as pointer to VIRTIO upon each command.

Does it work for you ?

Yishai

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* RE: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-10 15:43                                 ` Yishai Hadas via Virtualization
@ 2023-10-10 15:58                                   ` Parav Pandit via Virtualization
  -1 siblings, 0 replies; 321+ messages in thread
From: Parav Pandit @ 2023-10-10 15:58 UTC (permalink / raw)
  To: Yishai Hadas, Michael S. Tsirkin
  Cc: Jason Gunthorpe, Christoph Hellwig, alex.williamson, jasowang,
	kvm, virtualization, Feng Liu, Jiri Pirko, kevin.tian,
	joao.m.martins, Leon Romanovsky, Maor Gottlieb



> From: Yishai Hadas <yishaih@nvidia.com>
> Sent: Tuesday, October 10, 2023 9:14 PM
> 
> On 10/10/2023 18:14, Michael S. Tsirkin wrote:
> > On Tue, Oct 10, 2023 at 06:09:44PM +0300, Yishai Hadas wrote:
> >> On 10/10/2023 17:54, Michael S. Tsirkin wrote:
> >>> On Tue, Oct 10, 2023 at 11:08:49AM -0300, Jason Gunthorpe wrote:
> >>>> On Tue, Oct 10, 2023 at 09:56:00AM -0400, Michael S. Tsirkin wrote:
> >>>>
> >>>>>> However - the Intel GPU VFIO driver is such a bad experiance I
> >>>>>> don't want to encourage people to make VFIO drivers, or code that
> >>>>>> is only used by VFIO drivers, that are not under drivers/vfio review.
> >>>>> So if Alex feels it makes sense to add some virtio functionality
> >>>>> to vfio and is happy to maintain or let you maintain the UAPI then
> >>>>> why would I say no? But we never expected devices to have two
> >>>>> drivers like this does, so just exposing device pointer and saying
> >>>>> "use regular virtio APIs for the rest" does not cut it, the new
> >>>>> APIs have to make sense so virtio drivers can develop normally
> >>>>> without fear of stepping on the toes of this admin driver.
> >>>> Please work with Yishai to get something that make sense to you. He
> >>>> can post a v2 with the accumulated comments addressed so far and
> >>>> then go over what the API between the drivers is.
> >>>>
> >>>> Thanks,
> >>>> Jason
> >>> /me shrugs. I pretty much posted suggestions already. Should not be hard.
> >>> Anything unclear - post on list.
> >>>
> >> Yes, this is the plan.
> >>
> >> We are working to address the comments that we got so far in both
> >> VFIO & VIRTIO, retest and send the next version.
> >>
> >> Re the API between the modules, It looks like we have the below
> >> alternatives.
> >>
> >> 1) Proceed with current approach where we exposed a generic API to
> >> execute any admin command, however, make it much more solid inside
> VIRTIO.
> >> 2) Expose extra APIs from VIRTIO for commands that we can consider
> >> future client usage of them as of LIST_QUERY/LIST_USE, however still
> >> have the generic execute admin command for others.
> >> 3) Expose API per command from VIRTIO and fully drop the generic
> >> execute admin command.
> >>
> >> Few notes:
> >> Option #1 looks the most generic one, it drops the need to expose
> >> multiple symbols / APIs per command and for now we have a single
> >> client for them (i.e. VFIO).
> >> Options #2 & #3, may still require to expose the
> >> virtio_pci_vf_get_pf_dev() API to let VFIO get the VIRTIO PF (struct
> >> virtio_device *) from its PCI device, each command will get it as its first
> argument.
> >>
> >> Michael,
> >> What do you suggest here ?
> >>
> >> Thanks,
> >> Yishai
> > I suggest 3 but call it on the VF. commands will switch to PF
> > internally as needed. For example, intel might be interested in
> > exposing admin commands through a memory BAR of VF itself.
> >
> The driver who owns the VF is VFIO, it's not a VIRTIO one.
> 
> The ability to get the VIRTIO PF is from the PCI device (i.e. struct pci_dev).
> 
> In addition,
> virtio_pci_vf_get_pf_dev() was implemented for now in virtio-pci as it worked
> on pci_dev.
> Assuming that we'll put each command inside virtio as the generic layer, we
> won't be able to call/use this API internally to get the PF as of cyclic
> dependencies between the modules, link will fail.
> 
> So in option #3 we may still need to get outside into VFIO the VIRTIO PF and
> give it as pointer to VIRTIO upon each command.
>
I think,
For #3 the virtio level API signature should be,

virtio_admin_legacy_xyz_cmd(struct virtio_device *dev, u64 group_member_id, ....);

This maintains the right abstraction needed between vfio, generic virtio and virtio transport (pci) layer.

^ permalink raw reply	[flat|nested] 321+ messages in thread

* RE: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-10 15:58                                   ` Parav Pandit via Virtualization
  0 siblings, 0 replies; 321+ messages in thread
From: Parav Pandit via Virtualization @ 2023-10-10 15:58 UTC (permalink / raw)
  To: Yishai Hadas, Michael S. Tsirkin
  Cc: kvm, Maor Gottlieb, virtualization, Christoph Hellwig,
	Jason Gunthorpe, Jiri Pirko, Leon Romanovsky



> From: Yishai Hadas <yishaih@nvidia.com>
> Sent: Tuesday, October 10, 2023 9:14 PM
> 
> On 10/10/2023 18:14, Michael S. Tsirkin wrote:
> > On Tue, Oct 10, 2023 at 06:09:44PM +0300, Yishai Hadas wrote:
> >> On 10/10/2023 17:54, Michael S. Tsirkin wrote:
> >>> On Tue, Oct 10, 2023 at 11:08:49AM -0300, Jason Gunthorpe wrote:
> >>>> On Tue, Oct 10, 2023 at 09:56:00AM -0400, Michael S. Tsirkin wrote:
> >>>>
> >>>>>> However - the Intel GPU VFIO driver is such a bad experiance I
> >>>>>> don't want to encourage people to make VFIO drivers, or code that
> >>>>>> is only used by VFIO drivers, that are not under drivers/vfio review.
> >>>>> So if Alex feels it makes sense to add some virtio functionality
> >>>>> to vfio and is happy to maintain or let you maintain the UAPI then
> >>>>> why would I say no? But we never expected devices to have two
> >>>>> drivers like this does, so just exposing device pointer and saying
> >>>>> "use regular virtio APIs for the rest" does not cut it, the new
> >>>>> APIs have to make sense so virtio drivers can develop normally
> >>>>> without fear of stepping on the toes of this admin driver.
> >>>> Please work with Yishai to get something that make sense to you. He
> >>>> can post a v2 with the accumulated comments addressed so far and
> >>>> then go over what the API between the drivers is.
> >>>>
> >>>> Thanks,
> >>>> Jason
> >>> /me shrugs. I pretty much posted suggestions already. Should not be hard.
> >>> Anything unclear - post on list.
> >>>
> >> Yes, this is the plan.
> >>
> >> We are working to address the comments that we got so far in both
> >> VFIO & VIRTIO, retest and send the next version.
> >>
> >> Re the API between the modules, It looks like we have the below
> >> alternatives.
> >>
> >> 1) Proceed with current approach where we exposed a generic API to
> >> execute any admin command, however, make it much more solid inside
> VIRTIO.
> >> 2) Expose extra APIs from VIRTIO for commands that we can consider
> >> future client usage of them as of LIST_QUERY/LIST_USE, however still
> >> have the generic execute admin command for others.
> >> 3) Expose API per command from VIRTIO and fully drop the generic
> >> execute admin command.
> >>
> >> Few notes:
> >> Option #1 looks the most generic one, it drops the need to expose
> >> multiple symbols / APIs per command and for now we have a single
> >> client for them (i.e. VFIO).
> >> Options #2 & #3, may still require to expose the
> >> virtio_pci_vf_get_pf_dev() API to let VFIO get the VIRTIO PF (struct
> >> virtio_device *) from its PCI device, each command will get it as its first
> argument.
> >>
> >> Michael,
> >> What do you suggest here ?
> >>
> >> Thanks,
> >> Yishai
> > I suggest 3 but call it on the VF. commands will switch to PF
> > internally as needed. For example, intel might be interested in
> > exposing admin commands through a memory BAR of VF itself.
> >
> The driver who owns the VF is VFIO, it's not a VIRTIO one.
> 
> The ability to get the VIRTIO PF is from the PCI device (i.e. struct pci_dev).
> 
> In addition,
> virtio_pci_vf_get_pf_dev() was implemented for now in virtio-pci as it worked
> on pci_dev.
> Assuming that we'll put each command inside virtio as the generic layer, we
> won't be able to call/use this API internally to get the PF as of cyclic
> dependencies between the modules, link will fail.
> 
> So in option #3 we may still need to get outside into VFIO the VIRTIO PF and
> give it as pointer to VIRTIO upon each command.
>
I think,
For #3 the virtio level API signature should be,

virtio_admin_legacy_xyz_cmd(struct virtio_device *dev, u64 group_member_id, ....);

This maintains the right abstraction needed between vfio, generic virtio and virtio transport (pci) layer.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-10 15:43                                 ` Yishai Hadas via Virtualization
@ 2023-10-10 15:58                                   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-10 15:58 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: Jason Gunthorpe, Christoph Hellwig, alex.williamson, jasowang,
	kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, leonro, maorg

On Tue, Oct 10, 2023 at 06:43:32PM +0300, Yishai Hadas wrote:
> On 10/10/2023 18:14, Michael S. Tsirkin wrote:
> > On Tue, Oct 10, 2023 at 06:09:44PM +0300, Yishai Hadas wrote:
> > > On 10/10/2023 17:54, Michael S. Tsirkin wrote:
> > > > On Tue, Oct 10, 2023 at 11:08:49AM -0300, Jason Gunthorpe wrote:
> > > > > On Tue, Oct 10, 2023 at 09:56:00AM -0400, Michael S. Tsirkin wrote:
> > > > > 
> > > > > > > However - the Intel GPU VFIO driver is such a bad experiance I don't
> > > > > > > want to encourage people to make VFIO drivers, or code that is only
> > > > > > > used by VFIO drivers, that are not under drivers/vfio review.
> > > > > > So if Alex feels it makes sense to add some virtio functionality
> > > > > > to vfio and is happy to maintain or let you maintain the UAPI
> > > > > > then why would I say no? But we never expected devices to have
> > > > > > two drivers like this does, so just exposing device pointer
> > > > > > and saying "use regular virtio APIs for the rest" does not
> > > > > > cut it, the new APIs have to make sense
> > > > > > so virtio drivers can develop normally without fear of stepping
> > > > > > on the toes of this admin driver.
> > > > > Please work with Yishai to get something that make sense to you. He
> > > > > can post a v2 with the accumulated comments addressed so far and then
> > > > > go over what the API between the drivers is.
> > > > > 
> > > > > Thanks,
> > > > > Jason
> > > > /me shrugs. I pretty much posted suggestions already. Should not be hard.
> > > > Anything unclear - post on list.
> > > > 
> > > Yes, this is the plan.
> > > 
> > > We are working to address the comments that we got so far in both VFIO &
> > > VIRTIO, retest and send the next version.
> > > 
> > > Re the API between the modules, It looks like we have the below
> > > alternatives.
> > > 
> > > 1) Proceed with current approach where we exposed a generic API to execute
> > > any admin command, however, make it much more solid inside VIRTIO.
> > > 2) Expose extra APIs from VIRTIO for commands that we can consider future
> > > client usage of them as of LIST_QUERY/LIST_USE, however still have the
> > > generic execute admin command for others.
> > > 3) Expose API per command from VIRTIO and fully drop the generic execute
> > > admin command.
> > > 
> > > Few notes:
> > > Option #1 looks the most generic one, it drops the need to expose multiple
> > > symbols / APIs per command and for now we have a single client for them
> > > (i.e. VFIO).
> > > Options #2 & #3, may still require to expose the virtio_pci_vf_get_pf_dev()
> > > API to let VFIO get the VIRTIO PF (struct virtio_device *) from its PCI
> > > device, each command will get it as its first argument.
> > > 
> > > Michael,
> > > What do you suggest here ?
> > > 
> > > Thanks,
> > > Yishai
> > I suggest 3 but call it on the VF. commands will switch to PF
> > internally as needed. For example, intel might be interested in exposing
> > admin commands through a memory BAR of VF itself.
> > 
> The driver who owns the VF is VFIO, it's not a VIRTIO one.
> 
> The ability to get the VIRTIO PF is from the PCI device (i.e. struct
> pci_dev).
> 
> In addition,
> virtio_pci_vf_get_pf_dev() was implemented for now in virtio-pci as it
> worked on pci_dev.

On pci_dev of vf, yes? So again just move this into each command,
that's all. I.e. pass pci_dev to each.

> Assuming that we'll put each command inside virtio as the generic layer, we
> won't be able to call/use this API internally to get the PF as of cyclic
> dependencies between the modules, link will fail.
> 
> So in option #3 we may still need to get outside into VFIO the VIRTIO PF and
> give it as pointer to VIRTIO upon each command.
> 
> Does it work for you ?
> 
> Yishai


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-10 15:58                                   ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-10 15:58 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: kvm, maorg, virtualization, Christoph Hellwig, Jason Gunthorpe,
	jiri, leonro

On Tue, Oct 10, 2023 at 06:43:32PM +0300, Yishai Hadas wrote:
> On 10/10/2023 18:14, Michael S. Tsirkin wrote:
> > On Tue, Oct 10, 2023 at 06:09:44PM +0300, Yishai Hadas wrote:
> > > On 10/10/2023 17:54, Michael S. Tsirkin wrote:
> > > > On Tue, Oct 10, 2023 at 11:08:49AM -0300, Jason Gunthorpe wrote:
> > > > > On Tue, Oct 10, 2023 at 09:56:00AM -0400, Michael S. Tsirkin wrote:
> > > > > 
> > > > > > > However - the Intel GPU VFIO driver is such a bad experiance I don't
> > > > > > > want to encourage people to make VFIO drivers, or code that is only
> > > > > > > used by VFIO drivers, that are not under drivers/vfio review.
> > > > > > So if Alex feels it makes sense to add some virtio functionality
> > > > > > to vfio and is happy to maintain or let you maintain the UAPI
> > > > > > then why would I say no? But we never expected devices to have
> > > > > > two drivers like this does, so just exposing device pointer
> > > > > > and saying "use regular virtio APIs for the rest" does not
> > > > > > cut it, the new APIs have to make sense
> > > > > > so virtio drivers can develop normally without fear of stepping
> > > > > > on the toes of this admin driver.
> > > > > Please work with Yishai to get something that make sense to you. He
> > > > > can post a v2 with the accumulated comments addressed so far and then
> > > > > go over what the API between the drivers is.
> > > > > 
> > > > > Thanks,
> > > > > Jason
> > > > /me shrugs. I pretty much posted suggestions already. Should not be hard.
> > > > Anything unclear - post on list.
> > > > 
> > > Yes, this is the plan.
> > > 
> > > We are working to address the comments that we got so far in both VFIO &
> > > VIRTIO, retest and send the next version.
> > > 
> > > Re the API between the modules, It looks like we have the below
> > > alternatives.
> > > 
> > > 1) Proceed with current approach where we exposed a generic API to execute
> > > any admin command, however, make it much more solid inside VIRTIO.
> > > 2) Expose extra APIs from VIRTIO for commands that we can consider future
> > > client usage of them as of LIST_QUERY/LIST_USE, however still have the
> > > generic execute admin command for others.
> > > 3) Expose API per command from VIRTIO and fully drop the generic execute
> > > admin command.
> > > 
> > > Few notes:
> > > Option #1 looks the most generic one, it drops the need to expose multiple
> > > symbols / APIs per command and for now we have a single client for them
> > > (i.e. VFIO).
> > > Options #2 & #3, may still require to expose the virtio_pci_vf_get_pf_dev()
> > > API to let VFIO get the VIRTIO PF (struct virtio_device *) from its PCI
> > > device, each command will get it as its first argument.
> > > 
> > > Michael,
> > > What do you suggest here ?
> > > 
> > > Thanks,
> > > Yishai
> > I suggest 3 but call it on the VF. commands will switch to PF
> > internally as needed. For example, intel might be interested in exposing
> > admin commands through a memory BAR of VF itself.
> > 
> The driver who owns the VF is VFIO, it's not a VIRTIO one.
> 
> The ability to get the VIRTIO PF is from the PCI device (i.e. struct
> pci_dev).
> 
> In addition,
> virtio_pci_vf_get_pf_dev() was implemented for now in virtio-pci as it
> worked on pci_dev.

On pci_dev of vf, yes? So again just move this into each command,
that's all. I.e. pass pci_dev to each.

> Assuming that we'll put each command inside virtio as the generic layer, we
> won't be able to call/use this API internally to get the PF as of cyclic
> dependencies between the modules, link will fail.
> 
> So in option #3 we may still need to get outside into VFIO the VIRTIO PF and
> give it as pointer to VIRTIO upon each command.
> 
> Does it work for you ?
> 
> Yishai

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-10 15:14                               ` Michael S. Tsirkin
  (?)
  (?)
@ 2023-10-10 15:59                               ` Jason Gunthorpe
  2023-10-10 16:03                                   ` Michael S. Tsirkin
  2023-10-11  6:13                                   ` Christoph Hellwig
  -1 siblings, 2 replies; 321+ messages in thread
From: Jason Gunthorpe @ 2023-10-10 15:59 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Yishai Hadas, Christoph Hellwig, alex.williamson, jasowang, kvm,
	virtualization, parav, feliu, jiri, kevin.tian, joao.m.martins,
	leonro, maorg

On Tue, Oct 10, 2023 at 11:14:56AM -0400, Michael S. Tsirkin wrote:

> I suggest 3 but call it on the VF. commands will switch to PF
> internally as needed. For example, intel might be interested in exposing
> admin commands through a memory BAR of VF itself.

FWIW, we have been pushing back on such things in VFIO, so it will
have to be very carefully security justified.

Probably since that is not standard it should just live in under some
intel-only vfio driver behavior, not in virtio land.

It is also costly to switch between pf/vf, it should not be done
pointlessly on the fast path.

Jason

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-10 15:59                               ` Jason Gunthorpe
@ 2023-10-10 16:03                                   ` Michael S. Tsirkin
  2023-10-11  6:13                                   ` Christoph Hellwig
  1 sibling, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-10 16:03 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvm, maorg, virtualization, Christoph Hellwig, jiri, leonro

On Tue, Oct 10, 2023 at 12:59:37PM -0300, Jason Gunthorpe wrote:
> On Tue, Oct 10, 2023 at 11:14:56AM -0400, Michael S. Tsirkin wrote:
> 
> > I suggest 3 but call it on the VF. commands will switch to PF
> > internally as needed. For example, intel might be interested in exposing
> > admin commands through a memory BAR of VF itself.
> 
> FWIW, we have been pushing back on such things in VFIO, so it will
> have to be very carefully security justified.
> 
> Probably since that is not standard it should just live in under some
> intel-only vfio driver behavior, not in virtio land.
> 
> It is also costly to switch between pf/vf, it should not be done
> pointlessly on the fast path.
> 
> Jason

Currently, the switch seems to be just a cast of private data.
I am suggesting keeping that cast inside virtio. Why is that
expensive?

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-10 16:03                                   ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-10 16:03 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yishai Hadas, Christoph Hellwig, alex.williamson, jasowang, kvm,
	virtualization, parav, feliu, jiri, kevin.tian, joao.m.martins,
	leonro, maorg

On Tue, Oct 10, 2023 at 12:59:37PM -0300, Jason Gunthorpe wrote:
> On Tue, Oct 10, 2023 at 11:14:56AM -0400, Michael S. Tsirkin wrote:
> 
> > I suggest 3 but call it on the VF. commands will switch to PF
> > internally as needed. For example, intel might be interested in exposing
> > admin commands through a memory BAR of VF itself.
> 
> FWIW, we have been pushing back on such things in VFIO, so it will
> have to be very carefully security justified.
> 
> Probably since that is not standard it should just live in under some
> intel-only vfio driver behavior, not in virtio land.
> 
> It is also costly to switch between pf/vf, it should not be done
> pointlessly on the fast path.
> 
> Jason

Currently, the switch seems to be just a cast of private data.
I am suggesting keeping that cast inside virtio. Why is that
expensive?


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-10 16:03                                   ` Michael S. Tsirkin
  (?)
@ 2023-10-10 16:07                                   ` Jason Gunthorpe
  2023-10-10 16:21                                       ` Parav Pandit via Virtualization
  -1 siblings, 1 reply; 321+ messages in thread
From: Jason Gunthorpe @ 2023-10-10 16:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Yishai Hadas, Christoph Hellwig, alex.williamson, jasowang, kvm,
	virtualization, parav, feliu, jiri, kevin.tian, joao.m.martins,
	leonro, maorg

On Tue, Oct 10, 2023 at 12:03:29PM -0400, Michael S. Tsirkin wrote:
> On Tue, Oct 10, 2023 at 12:59:37PM -0300, Jason Gunthorpe wrote:
> > On Tue, Oct 10, 2023 at 11:14:56AM -0400, Michael S. Tsirkin wrote:
> > 
> > > I suggest 3 but call it on the VF. commands will switch to PF
> > > internally as needed. For example, intel might be interested in exposing
> > > admin commands through a memory BAR of VF itself.
> > 
> > FWIW, we have been pushing back on such things in VFIO, so it will
> > have to be very carefully security justified.
> > 
> > Probably since that is not standard it should just live in under some
> > intel-only vfio driver behavior, not in virtio land.
> > 
> > It is also costly to switch between pf/vf, it should not be done
> > pointlessly on the fast path.
> > 
> > Jason
> 
> Currently, the switch seems to be just a cast of private data.
> I am suggesting keeping that cast inside virtio. Why is that
> expensive?

pci_iov_get_pf_drvdata() does a bunch of sanity checks and function
calls. It was not intended to be used on a fast path.

Jason 

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-10 15:58                                   ` Michael S. Tsirkin
@ 2023-10-10 16:09                                     ` Yishai Hadas via Virtualization
  -1 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas @ 2023-10-10 16:09 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Gunthorpe, Christoph Hellwig, alex.williamson, jasowang,
	kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, leonro, maorg

On 10/10/2023 18:58, Michael S. Tsirkin wrote:
> On Tue, Oct 10, 2023 at 06:43:32PM +0300, Yishai Hadas wrote:
>> On 10/10/2023 18:14, Michael S. Tsirkin wrote:
>>> On Tue, Oct 10, 2023 at 06:09:44PM +0300, Yishai Hadas wrote:
>>>> On 10/10/2023 17:54, Michael S. Tsirkin wrote:
>>>>> On Tue, Oct 10, 2023 at 11:08:49AM -0300, Jason Gunthorpe wrote:
>>>>>> On Tue, Oct 10, 2023 at 09:56:00AM -0400, Michael S. Tsirkin wrote:
>>>>>>
>>>>>>>> However - the Intel GPU VFIO driver is such a bad experiance I don't
>>>>>>>> want to encourage people to make VFIO drivers, or code that is only
>>>>>>>> used by VFIO drivers, that are not under drivers/vfio review.
>>>>>>> So if Alex feels it makes sense to add some virtio functionality
>>>>>>> to vfio and is happy to maintain or let you maintain the UAPI
>>>>>>> then why would I say no? But we never expected devices to have
>>>>>>> two drivers like this does, so just exposing device pointer
>>>>>>> and saying "use regular virtio APIs for the rest" does not
>>>>>>> cut it, the new APIs have to make sense
>>>>>>> so virtio drivers can develop normally without fear of stepping
>>>>>>> on the toes of this admin driver.
>>>>>> Please work with Yishai to get something that make sense to you. He
>>>>>> can post a v2 with the accumulated comments addressed so far and then
>>>>>> go over what the API between the drivers is.
>>>>>>
>>>>>> Thanks,
>>>>>> Jason
>>>>> /me shrugs. I pretty much posted suggestions already. Should not be hard.
>>>>> Anything unclear - post on list.
>>>>>
>>>> Yes, this is the plan.
>>>>
>>>> We are working to address the comments that we got so far in both VFIO &
>>>> VIRTIO, retest and send the next version.
>>>>
>>>> Re the API between the modules, It looks like we have the below
>>>> alternatives.
>>>>
>>>> 1) Proceed with current approach where we exposed a generic API to execute
>>>> any admin command, however, make it much more solid inside VIRTIO.
>>>> 2) Expose extra APIs from VIRTIO for commands that we can consider future
>>>> client usage of them as of LIST_QUERY/LIST_USE, however still have the
>>>> generic execute admin command for others.
>>>> 3) Expose API per command from VIRTIO and fully drop the generic execute
>>>> admin command.
>>>>
>>>> Few notes:
>>>> Option #1 looks the most generic one, it drops the need to expose multiple
>>>> symbols / APIs per command and for now we have a single client for them
>>>> (i.e. VFIO).
>>>> Options #2 & #3, may still require to expose the virtio_pci_vf_get_pf_dev()
>>>> API to let VFIO get the VIRTIO PF (struct virtio_device *) from its PCI
>>>> device, each command will get it as its first argument.
>>>>
>>>> Michael,
>>>> What do you suggest here ?
>>>>
>>>> Thanks,
>>>> Yishai
>>> I suggest 3 but call it on the VF. commands will switch to PF
>>> internally as needed. For example, intel might be interested in exposing
>>> admin commands through a memory BAR of VF itself.
>>>
>> The driver who owns the VF is VFIO, it's not a VIRTIO one.
>>
>> The ability to get the VIRTIO PF is from the PCI device (i.e. struct
>> pci_dev).
>>
>> In addition,
>> virtio_pci_vf_get_pf_dev() was implemented for now in virtio-pci as it
>> worked on pci_dev.
> On pci_dev of vf, yes? So again just move this into each command,
> that's all. I.e. pass pci_dev to each.

How about the cyclic dependencies issue inside VIRTIO that I mentioned  
below ?

In my suggestion it's fine, VFIO will get the PF and give it to VIRTIO 
per command.

Yishai

>> Assuming that we'll put each command inside virtio as the generic layer, we
>> won't be able to call/use this API internally to get the PF as of cyclic
>> dependencies between the modules, link will fail.
>>
>> So in option #3 we may still need to get outside into VFIO the VIRTIO PF and
>> give it as pointer to VIRTIO upon each command.
>>
>> Does it work for you ?
>>
>> Yishai



^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-10 16:09                                     ` Yishai Hadas via Virtualization
  0 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-10-10 16:09 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, maorg, virtualization, Christoph Hellwig, Jason Gunthorpe,
	jiri, leonro

On 10/10/2023 18:58, Michael S. Tsirkin wrote:
> On Tue, Oct 10, 2023 at 06:43:32PM +0300, Yishai Hadas wrote:
>> On 10/10/2023 18:14, Michael S. Tsirkin wrote:
>>> On Tue, Oct 10, 2023 at 06:09:44PM +0300, Yishai Hadas wrote:
>>>> On 10/10/2023 17:54, Michael S. Tsirkin wrote:
>>>>> On Tue, Oct 10, 2023 at 11:08:49AM -0300, Jason Gunthorpe wrote:
>>>>>> On Tue, Oct 10, 2023 at 09:56:00AM -0400, Michael S. Tsirkin wrote:
>>>>>>
>>>>>>>> However - the Intel GPU VFIO driver is such a bad experiance I don't
>>>>>>>> want to encourage people to make VFIO drivers, or code that is only
>>>>>>>> used by VFIO drivers, that are not under drivers/vfio review.
>>>>>>> So if Alex feels it makes sense to add some virtio functionality
>>>>>>> to vfio and is happy to maintain or let you maintain the UAPI
>>>>>>> then why would I say no? But we never expected devices to have
>>>>>>> two drivers like this does, so just exposing device pointer
>>>>>>> and saying "use regular virtio APIs for the rest" does not
>>>>>>> cut it, the new APIs have to make sense
>>>>>>> so virtio drivers can develop normally without fear of stepping
>>>>>>> on the toes of this admin driver.
>>>>>> Please work with Yishai to get something that make sense to you. He
>>>>>> can post a v2 with the accumulated comments addressed so far and then
>>>>>> go over what the API between the drivers is.
>>>>>>
>>>>>> Thanks,
>>>>>> Jason
>>>>> /me shrugs. I pretty much posted suggestions already. Should not be hard.
>>>>> Anything unclear - post on list.
>>>>>
>>>> Yes, this is the plan.
>>>>
>>>> We are working to address the comments that we got so far in both VFIO &
>>>> VIRTIO, retest and send the next version.
>>>>
>>>> Re the API between the modules, It looks like we have the below
>>>> alternatives.
>>>>
>>>> 1) Proceed with current approach where we exposed a generic API to execute
>>>> any admin command, however, make it much more solid inside VIRTIO.
>>>> 2) Expose extra APIs from VIRTIO for commands that we can consider future
>>>> client usage of them as of LIST_QUERY/LIST_USE, however still have the
>>>> generic execute admin command for others.
>>>> 3) Expose API per command from VIRTIO and fully drop the generic execute
>>>> admin command.
>>>>
>>>> Few notes:
>>>> Option #1 looks the most generic one, it drops the need to expose multiple
>>>> symbols / APIs per command and for now we have a single client for them
>>>> (i.e. VFIO).
>>>> Options #2 & #3, may still require to expose the virtio_pci_vf_get_pf_dev()
>>>> API to let VFIO get the VIRTIO PF (struct virtio_device *) from its PCI
>>>> device, each command will get it as its first argument.
>>>>
>>>> Michael,
>>>> What do you suggest here ?
>>>>
>>>> Thanks,
>>>> Yishai
>>> I suggest 3 but call it on the VF. commands will switch to PF
>>> internally as needed. For example, intel might be interested in exposing
>>> admin commands through a memory BAR of VF itself.
>>>
>> The driver who owns the VF is VFIO, it's not a VIRTIO one.
>>
>> The ability to get the VIRTIO PF is from the PCI device (i.e. struct
>> pci_dev).
>>
>> In addition,
>> virtio_pci_vf_get_pf_dev() was implemented for now in virtio-pci as it
>> worked on pci_dev.
> On pci_dev of vf, yes? So again just move this into each command,
> that's all. I.e. pass pci_dev to each.

How about the cyclic dependencies issue inside VIRTIO that I mentioned  
below ?

In my suggestion it's fine, VFIO will get the PF and give it to VIRTIO 
per command.

Yishai

>> Assuming that we'll put each command inside virtio as the generic layer, we
>> won't be able to call/use this API internally to get the PF as of cyclic
>> dependencies between the modules, link will fail.
>>
>> So in option #3 we may still need to get outside into VFIO the VIRTIO PF and
>> give it as pointer to VIRTIO upon each command.
>>
>> Does it work for you ?
>>
>> Yishai


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* RE: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-10 16:07                                   ` Jason Gunthorpe
@ 2023-10-10 16:21                                       ` Parav Pandit via Virtualization
  0 siblings, 0 replies; 321+ messages in thread
From: Parav Pandit @ 2023-10-10 16:21 UTC (permalink / raw)
  To: Jason Gunthorpe, Michael S. Tsirkin
  Cc: Yishai Hadas, Christoph Hellwig, alex.williamson, jasowang, kvm,
	virtualization, Feng Liu, Jiri Pirko, kevin.tian, joao.m.martins,
	Leon Romanovsky, Maor Gottlieb


> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, October 10, 2023 9:37 PM
> 
> On Tue, Oct 10, 2023 at 12:03:29PM -0400, Michael S. Tsirkin wrote:
> > On Tue, Oct 10, 2023 at 12:59:37PM -0300, Jason Gunthorpe wrote:
> > > On Tue, Oct 10, 2023 at 11:14:56AM -0400, Michael S. Tsirkin wrote:
> > >
> > > > I suggest 3 but call it on the VF. commands will switch to PF
> > > > internally as needed. For example, intel might be interested in
> > > > exposing admin commands through a memory BAR of VF itself.

If in the future if one does admin command on the VF memory BAR, there is no need of cast either.
vfio-virtio-pci driver can do on the pci vf device directly.

(though per VF memory registers would be anti-scale design for real hw; to discuss in other forum).

^ permalink raw reply	[flat|nested] 321+ messages in thread

* RE: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-10 16:21                                       ` Parav Pandit via Virtualization
  0 siblings, 0 replies; 321+ messages in thread
From: Parav Pandit via Virtualization @ 2023-10-10 16:21 UTC (permalink / raw)
  To: Jason Gunthorpe, Michael S. Tsirkin
  Cc: kvm, Maor Gottlieb, virtualization, Christoph Hellwig,
	Jiri Pirko, Leon Romanovsky


> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, October 10, 2023 9:37 PM
> 
> On Tue, Oct 10, 2023 at 12:03:29PM -0400, Michael S. Tsirkin wrote:
> > On Tue, Oct 10, 2023 at 12:59:37PM -0300, Jason Gunthorpe wrote:
> > > On Tue, Oct 10, 2023 at 11:14:56AM -0400, Michael S. Tsirkin wrote:
> > >
> > > > I suggest 3 but call it on the VF. commands will switch to PF
> > > > internally as needed. For example, intel might be interested in
> > > > exposing admin commands through a memory BAR of VF itself.

If in the future if one does admin command on the VF memory BAR, there is no need of cast either.
vfio-virtio-pci driver can do on the pci vf device directly.

(though per VF memory registers would be anti-scale design for real hw; to discuss in other forum).
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-10 16:21                                       ` Parav Pandit via Virtualization
@ 2023-10-10 20:38                                         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-10 20:38 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Jason Gunthorpe, Yishai Hadas, Christoph Hellwig,
	alex.williamson, jasowang, kvm, virtualization, Feng Liu,
	Jiri Pirko, kevin.tian, joao.m.martins, Leon Romanovsky,
	Maor Gottlieb

On Tue, Oct 10, 2023 at 04:21:15PM +0000, Parav Pandit wrote:
> 
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Tuesday, October 10, 2023 9:37 PM
> > 
> > On Tue, Oct 10, 2023 at 12:03:29PM -0400, Michael S. Tsirkin wrote:
> > > On Tue, Oct 10, 2023 at 12:59:37PM -0300, Jason Gunthorpe wrote:
> > > > On Tue, Oct 10, 2023 at 11:14:56AM -0400, Michael S. Tsirkin wrote:
> > > >
> > > > > I suggest 3 but call it on the VF. commands will switch to PF
> > > > > internally as needed. For example, intel might be interested in
> > > > > exposing admin commands through a memory BAR of VF itself.
> 
> If in the future if one does admin command on the VF memory BAR, there is no need of cast either.
> vfio-virtio-pci driver can do on the pci vf device directly.

this is why I want the API to get the VF pci device as a parameter.
I don't get what is cyclic about it, yet.

> (though per VF memory registers would be anti-scale design for real hw; to discuss in other forum).

up to hardware vendor really.


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-10 20:38                                         ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-10 20:38 UTC (permalink / raw)
  To: Parav Pandit
  Cc: kvm, Maor Gottlieb, virtualization, Christoph Hellwig,
	Jason Gunthorpe, Jiri Pirko, Leon Romanovsky

On Tue, Oct 10, 2023 at 04:21:15PM +0000, Parav Pandit wrote:
> 
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Tuesday, October 10, 2023 9:37 PM
> > 
> > On Tue, Oct 10, 2023 at 12:03:29PM -0400, Michael S. Tsirkin wrote:
> > > On Tue, Oct 10, 2023 at 12:59:37PM -0300, Jason Gunthorpe wrote:
> > > > On Tue, Oct 10, 2023 at 11:14:56AM -0400, Michael S. Tsirkin wrote:
> > > >
> > > > > I suggest 3 but call it on the VF. commands will switch to PF
> > > > > internally as needed. For example, intel might be interested in
> > > > > exposing admin commands through a memory BAR of VF itself.
> 
> If in the future if one does admin command on the VF memory BAR, there is no need of cast either.
> vfio-virtio-pci driver can do on the pci vf device directly.

this is why I want the API to get the VF pci device as a parameter.
I don't get what is cyclic about it, yet.

> (though per VF memory registers would be anti-scale design for real hw; to discuss in other forum).

up to hardware vendor really.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-10 16:09                                     ` Yishai Hadas via Virtualization
@ 2023-10-10 20:42                                       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-10 20:42 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: Jason Gunthorpe, Christoph Hellwig, alex.williamson, jasowang,
	kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, leonro, maorg

On Tue, Oct 10, 2023 at 07:09:08PM +0300, Yishai Hadas wrote:
> 
> > > Assuming that we'll put each command inside virtio as the generic layer, we
> > > won't be able to call/use this API internally to get the PF as of cyclic
> > > dependencies between the modules, link will fail.

I just mean:
virtio_admin_legacy_io_write(sruct pci_device *,  ....)


internally it starts from vf gets the pf (or vf itself or whatever
the transport is) sends command gets status returns.

what is cyclic here?

-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-10 20:42                                       ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-10 20:42 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: kvm, maorg, virtualization, Christoph Hellwig, Jason Gunthorpe,
	jiri, leonro

On Tue, Oct 10, 2023 at 07:09:08PM +0300, Yishai Hadas wrote:
> 
> > > Assuming that we'll put each command inside virtio as the generic layer, we
> > > won't be able to call/use this API internally to get the PF as of cyclic
> > > dependencies between the modules, link will fail.

I just mean:
virtio_admin_legacy_io_write(sruct pci_device *,  ....)


internally it starts from vf gets the pf (or vf itself or whatever
the transport is) sends command gets status returns.

what is cyclic here?

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-10 15:43                                 ` Yishai Hadas via Virtualization
@ 2023-10-11  6:12                                   ` Christoph Hellwig
  -1 siblings, 0 replies; 321+ messages in thread
From: Christoph Hellwig @ 2023-10-11  6:12 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: Michael S. Tsirkin, Jason Gunthorpe, Christoph Hellwig,
	alex.williamson, jasowang, kvm, virtualization, parav, feliu,
	jiri, kevin.tian, joao.m.martins, leonro, maorg

On Tue, Oct 10, 2023 at 06:43:32PM +0300, Yishai Hadas wrote:
> > I suggest 3 but call it on the VF. commands will switch to PF
> > internally as needed. For example, intel might be interested in exposing
> > admin commands through a memory BAR of VF itself.
> > 
> The driver who owns the VF is VFIO, it's not a VIRTIO one.

And to loop back into my previous discussion: that's the fundamental
problem here.  If it is owned by the virtio subsystem, which just
calls into vfio we would not have this problem, including the
circular loops and exposed APIs.


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-11  6:12                                   ` Christoph Hellwig
  0 siblings, 0 replies; 321+ messages in thread
From: Christoph Hellwig @ 2023-10-11  6:12 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: kvm, Michael S. Tsirkin, maorg, virtualization,
	Christoph Hellwig, Jason Gunthorpe, jiri, leonro

On Tue, Oct 10, 2023 at 06:43:32PM +0300, Yishai Hadas wrote:
> > I suggest 3 but call it on the VF. commands will switch to PF
> > internally as needed. For example, intel might be interested in exposing
> > admin commands through a memory BAR of VF itself.
> > 
> The driver who owns the VF is VFIO, it's not a VIRTIO one.

And to loop back into my previous discussion: that's the fundamental
problem here.  If it is owned by the virtio subsystem, which just
calls into vfio we would not have this problem, including the
circular loops and exposed APIs.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-10 15:59                               ` Jason Gunthorpe
@ 2023-10-11  6:13                                   ` Christoph Hellwig
  2023-10-11  6:13                                   ` Christoph Hellwig
  1 sibling, 0 replies; 321+ messages in thread
From: Christoph Hellwig @ 2023-10-11  6:13 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Michael S. Tsirkin, Yishai Hadas, Christoph Hellwig,
	alex.williamson, jasowang, kvm, virtualization, parav, feliu,
	jiri, kevin.tian, joao.m.martins, leonro, maorg

On Tue, Oct 10, 2023 at 12:59:37PM -0300, Jason Gunthorpe wrote:
> On Tue, Oct 10, 2023 at 11:14:56AM -0400, Michael S. Tsirkin wrote:
> 
> > I suggest 3 but call it on the VF. commands will switch to PF
> > internally as needed. For example, intel might be interested in exposing
> > admin commands through a memory BAR of VF itself.
> 
> FWIW, we have been pushing back on such things in VFIO, so it will
> have to be very carefully security justified.
> 
> Probably since that is not standard it should just live in under some
> intel-only vfio driver behavior, not in virtio land.

Btw, what is that intel thing everyone is talking about?  And why
would the virtio core support vendor specific behavior like that?


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-11  6:13                                   ` Christoph Hellwig
  0 siblings, 0 replies; 321+ messages in thread
From: Christoph Hellwig @ 2023-10-11  6:13 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvm, Michael S. Tsirkin, maorg, virtualization,
	Christoph Hellwig, jiri, leonro

On Tue, Oct 10, 2023 at 12:59:37PM -0300, Jason Gunthorpe wrote:
> On Tue, Oct 10, 2023 at 11:14:56AM -0400, Michael S. Tsirkin wrote:
> 
> > I suggest 3 but call it on the VF. commands will switch to PF
> > internally as needed. For example, intel might be interested in exposing
> > admin commands through a memory BAR of VF itself.
> 
> FWIW, we have been pushing back on such things in VFIO, so it will
> have to be very carefully security justified.
> 
> Probably since that is not standard it should just live in under some
> intel-only vfio driver behavior, not in virtio land.

Btw, what is that intel thing everyone is talking about?  And why
would the virtio core support vendor specific behavior like that?

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-10 13:10                   ` Jason Gunthorpe
@ 2023-10-11  6:26                       ` Christoph Hellwig
  2023-10-11  6:26                       ` Christoph Hellwig
  1 sibling, 0 replies; 321+ messages in thread
From: Christoph Hellwig @ 2023-10-11  6:26 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Michael S. Tsirkin, Yishai Hadas,
	alex.williamson, jasowang, kvm, virtualization, parav, feliu,
	jiri, kevin.tian, joao.m.martins, leonro, maorg

On Tue, Oct 10, 2023 at 10:10:31AM -0300, Jason Gunthorpe wrote:
> We've talked around ideas like allowing the VF config space to do some
> of the work. For simple devices we could get away with 1 VF config
> space register. (VF config space is owned by the hypervisor, not the
> guest)

Which assumes you're actually using VFs and not multiple PFs, which
is a very limiting assumption.  It also limits your from actually
using DMA during the live migration process, which again is major
limitation once you have a non-tivial amount of state.

> SIOVr2 is discussing more a flexible RID mapping - there is a possible
> route where a "VF" could actually have two RIDs, a hypervisor RID and a
> guest RID.

Well, then you go down the SIOV route, which requires a complex driver
actually presenting the guest visible device anyway.

> It really is PCI limitations that force this design of making a PF
> driver do dual duty as a fully functionally normal device and act as a
> communication channel proxy to make a back channel into a SRIOV VF.
> 
> My view has always been that the VFIO live migration operations are
> executed logically within the VF as they only effect the VF.
> 
> So we have a logical design seperation where VFIO world owns the
> commands and the PF driver supplies the communication channel. This
> works well for devices that already have a robust RPC interface to
> their device FW.

Independent of my above points on the doubts on VF-controlled live
migration for PCe device I absolutely agree with your that the Linux
abstraction and user interface should be VF based.  Which further
reinforeces my point that the VFIO driver for the controlled function
(PF or VF) and the Linux driver for the controlling function (better
be a PF in practice) must be very tightly integrated.  And the best
way to do that is to export the vfio nodes from the Linux driver
that knowns the hardware and not split out into a separate one.

> > The driver that knows this hardware.  In this case the virtio subsystem,
> > in case of nvme the nvme driver, and in case of mlx5 the mlx5 driver.
> 
> But those are drivers operating the HW to create kernel devices. Here
> we need a VFIO device. They can't co-exist, if you switch mlx5 from
> normal to vfio you have to tear down the entire normal driver.

Yes, absolutey.  And if we're smart enough we structure it in a way
that we never even initialize the bits of the driver only needed for
the normal kernel consumers.

> > No.  That layout logically follows from what codebase the functionality
> > is part of, though.
> 
> I don't understand what we are talking about really. Where do you
> imagine the vfio_register_XX() goes?

In the driver controlling the hardware.  E.g. for virtio in
driver/virtio/ and for nvme in drivers/nvme/ and for mlx5
in the mlx5 driver directory.

> > > I don't know what "fake-legacy" even means, VFIO is not legacy.
> > 
> > The driver we're talking about in this thread fakes up a virtio_pci
> > legacy devie to the guest on top of a "modern" virtio_pci device.
> 
> I'm not sure I'd use the word fake, inb/outb are always trapped
> operations in VMs. If the device provided a real IO BAR then VFIO
> common code would trap and relay inb/outb to the device.
> 
> All this is doing is changing the inb/outb relay from using a physical
> IO BAR to a DMA command ring.
> 
> The motivation is simply because normal IO BAR space is incredibly
> limited and you can't get enough SRIOV functions when using it.

The fake is not meant as a judgement.  But it creates a virtio-legacy
device that in this form does not exist in hardware.  That's what
I call fake.  If you prefer a different term that's fine with me too.

> > > There is alot of code in VFIO and the VMM side to take a VF and turn
> > > it into a vPCI function. You can't just trivially duplicate VFIO in a
> > > dozen drivers without creating a giant mess.
> > 
> > I do not advocate for duplicating it.  But the code that calls this
> > functionality belongs into the driver that deals with the compound
> > device that we're doing this work for.
> 
> On one hand, I don't really care - we can put the code where people
> like.
> 
> However - the Intel GPU VFIO driver is such a bad experiance I don't
> want to encourage people to make VFIO drivers, or code that is only
> used by VFIO drivers, that are not under drivers/vfio review.

We can and should require vfio review for users of the vfio API.
But to be honest code placement was not the problem with i915.  The
problem was that the mdev APIs (under drivers/vfio) were a complete
trainwreck when it was written, and that the driver had a horrible
hypervisor API abstraction.

> Be aware, there is a significant performance concern here. If you want
> to create 1000 VFIO devices (this is a real thing), we *can't* probe a
> normal driver first, it is too slow. We need a path that goes directly
> from creating the RIDs to turning those RIDs into VFIO.

And by calling the vfio funtions from mlx5 you get this easily.

But I think you're totally mixing things up here anyway.

For mdev/SIOV like flows you must call vfio APIs from the main
driver anyway, as there is no pci_dev to probe on anyway.  That's
what i915 does btw.

For "classic" vfio that requires a pci_dev (or $otherbus_dev) we need
to have a similar flow.  And I think the best way is to have the
bus-level attribute on the device and/or a device-specific side band
protocol to device how new functions are probed.  With that you
avoid all the duplicate PCI IDs for the binding, and actually allow to
sanely establush a communication channel between the functions.
Because without that there is no way to know how any two functions
related.  The driver might think they know, but there's all kinds of
whacky PCI passthough schemes that will break such a logic.


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-11  6:26                       ` Christoph Hellwig
  0 siblings, 0 replies; 321+ messages in thread
From: Christoph Hellwig @ 2023-10-11  6:26 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvm, Michael S. Tsirkin, maorg, virtualization,
	Christoph Hellwig, jiri, leonro

On Tue, Oct 10, 2023 at 10:10:31AM -0300, Jason Gunthorpe wrote:
> We've talked around ideas like allowing the VF config space to do some
> of the work. For simple devices we could get away with 1 VF config
> space register. (VF config space is owned by the hypervisor, not the
> guest)

Which assumes you're actually using VFs and not multiple PFs, which
is a very limiting assumption.  It also limits your from actually
using DMA during the live migration process, which again is major
limitation once you have a non-tivial amount of state.

> SIOVr2 is discussing more a flexible RID mapping - there is a possible
> route where a "VF" could actually have two RIDs, a hypervisor RID and a
> guest RID.

Well, then you go down the SIOV route, which requires a complex driver
actually presenting the guest visible device anyway.

> It really is PCI limitations that force this design of making a PF
> driver do dual duty as a fully functionally normal device and act as a
> communication channel proxy to make a back channel into a SRIOV VF.
> 
> My view has always been that the VFIO live migration operations are
> executed logically within the VF as they only effect the VF.
> 
> So we have a logical design seperation where VFIO world owns the
> commands and the PF driver supplies the communication channel. This
> works well for devices that already have a robust RPC interface to
> their device FW.

Independent of my above points on the doubts on VF-controlled live
migration for PCe device I absolutely agree with your that the Linux
abstraction and user interface should be VF based.  Which further
reinforeces my point that the VFIO driver for the controlled function
(PF or VF) and the Linux driver for the controlling function (better
be a PF in practice) must be very tightly integrated.  And the best
way to do that is to export the vfio nodes from the Linux driver
that knowns the hardware and not split out into a separate one.

> > The driver that knows this hardware.  In this case the virtio subsystem,
> > in case of nvme the nvme driver, and in case of mlx5 the mlx5 driver.
> 
> But those are drivers operating the HW to create kernel devices. Here
> we need a VFIO device. They can't co-exist, if you switch mlx5 from
> normal to vfio you have to tear down the entire normal driver.

Yes, absolutey.  And if we're smart enough we structure it in a way
that we never even initialize the bits of the driver only needed for
the normal kernel consumers.

> > No.  That layout logically follows from what codebase the functionality
> > is part of, though.
> 
> I don't understand what we are talking about really. Where do you
> imagine the vfio_register_XX() goes?

In the driver controlling the hardware.  E.g. for virtio in
driver/virtio/ and for nvme in drivers/nvme/ and for mlx5
in the mlx5 driver directory.

> > > I don't know what "fake-legacy" even means, VFIO is not legacy.
> > 
> > The driver we're talking about in this thread fakes up a virtio_pci
> > legacy devie to the guest on top of a "modern" virtio_pci device.
> 
> I'm not sure I'd use the word fake, inb/outb are always trapped
> operations in VMs. If the device provided a real IO BAR then VFIO
> common code would trap and relay inb/outb to the device.
> 
> All this is doing is changing the inb/outb relay from using a physical
> IO BAR to a DMA command ring.
> 
> The motivation is simply because normal IO BAR space is incredibly
> limited and you can't get enough SRIOV functions when using it.

The fake is not meant as a judgement.  But it creates a virtio-legacy
device that in this form does not exist in hardware.  That's what
I call fake.  If you prefer a different term that's fine with me too.

> > > There is alot of code in VFIO and the VMM side to take a VF and turn
> > > it into a vPCI function. You can't just trivially duplicate VFIO in a
> > > dozen drivers without creating a giant mess.
> > 
> > I do not advocate for duplicating it.  But the code that calls this
> > functionality belongs into the driver that deals with the compound
> > device that we're doing this work for.
> 
> On one hand, I don't really care - we can put the code where people
> like.
> 
> However - the Intel GPU VFIO driver is such a bad experiance I don't
> want to encourage people to make VFIO drivers, or code that is only
> used by VFIO drivers, that are not under drivers/vfio review.

We can and should require vfio review for users of the vfio API.
But to be honest code placement was not the problem with i915.  The
problem was that the mdev APIs (under drivers/vfio) were a complete
trainwreck when it was written, and that the driver had a horrible
hypervisor API abstraction.

> Be aware, there is a significant performance concern here. If you want
> to create 1000 VFIO devices (this is a real thing), we *can't* probe a
> normal driver first, it is too slow. We need a path that goes directly
> from creating the RIDs to turning those RIDs into VFIO.

And by calling the vfio funtions from mlx5 you get this easily.

But I think you're totally mixing things up here anyway.

For mdev/SIOV like flows you must call vfio APIs from the main
driver anyway, as there is no pci_dev to probe on anyway.  That's
what i915 does btw.

For "classic" vfio that requires a pci_dev (or $otherbus_dev) we need
to have a similar flow.  And I think the best way is to have the
bus-level attribute on the device and/or a device-specific side band
protocol to device how new functions are probed.  With that you
avoid all the duplicate PCI IDs for the binding, and actually allow to
sanely establush a communication channel between the functions.
Because without that there is no way to know how any two functions
related.  The driver might think they know, but there's all kinds of
whacky PCI passthough schemes that will break such a logic.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-11  6:13                                   ` Christoph Hellwig
@ 2023-10-11  6:43                                     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-11  6:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: kvm, maorg, virtualization, Jason Gunthorpe, jiri, leonro

On Tue, Oct 10, 2023 at 11:13:30PM -0700, Christoph Hellwig wrote:
> On Tue, Oct 10, 2023 at 12:59:37PM -0300, Jason Gunthorpe wrote:
> > On Tue, Oct 10, 2023 at 11:14:56AM -0400, Michael S. Tsirkin wrote:
> > 
> > > I suggest 3 but call it on the VF. commands will switch to PF
> > > internally as needed. For example, intel might be interested in exposing
> > > admin commands through a memory BAR of VF itself.
> > 
> > FWIW, we have been pushing back on such things in VFIO, so it will
> > have to be very carefully security justified.
> > 
> > Probably since that is not standard it should just live in under some
> > intel-only vfio driver behavior, not in virtio land.
> 
> Btw, what is that intel thing everyone is talking about?  And why
> would the virtio core support vendor specific behavior like that?

It's not a thing it's Zhu Lingshan :) intel is just one of the vendors
that implemented vdpa support and so Zhu Lingshan from intel is working
on vdpa and has also proposed virtio spec extensions for migration.
intel's driver is called ifcvf.  vdpa composes all this stuff that is
added to vfio in userspace, so it's a different approach.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-11  6:43                                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-11  6:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jason Gunthorpe, Yishai Hadas, alex.williamson, jasowang, kvm,
	virtualization, parav, feliu, jiri, kevin.tian, joao.m.martins,
	leonro, maorg

On Tue, Oct 10, 2023 at 11:13:30PM -0700, Christoph Hellwig wrote:
> On Tue, Oct 10, 2023 at 12:59:37PM -0300, Jason Gunthorpe wrote:
> > On Tue, Oct 10, 2023 at 11:14:56AM -0400, Michael S. Tsirkin wrote:
> > 
> > > I suggest 3 but call it on the VF. commands will switch to PF
> > > internally as needed. For example, intel might be interested in exposing
> > > admin commands through a memory BAR of VF itself.
> > 
> > FWIW, we have been pushing back on such things in VFIO, so it will
> > have to be very carefully security justified.
> > 
> > Probably since that is not standard it should just live in under some
> > intel-only vfio driver behavior, not in virtio land.
> 
> Btw, what is that intel thing everyone is talking about?  And why
> would the virtio core support vendor specific behavior like that?

It's not a thing it's Zhu Lingshan :) intel is just one of the vendors
that implemented vdpa support and so Zhu Lingshan from intel is working
on vdpa and has also proposed virtio spec extensions for migration.
intel's driver is called ifcvf.  vdpa composes all this stuff that is
added to vfio in userspace, so it's a different approach.

-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-11  6:43                                     ` Michael S. Tsirkin
@ 2023-10-11  6:59                                       ` Christoph Hellwig
  -1 siblings, 0 replies; 321+ messages in thread
From: Christoph Hellwig @ 2023-10-11  6:59 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Jason Gunthorpe, Yishai Hadas,
	alex.williamson, jasowang, kvm, virtualization, parav, feliu,
	jiri, kevin.tian, joao.m.martins, leonro, maorg

On Wed, Oct 11, 2023 at 02:43:37AM -0400, Michael S. Tsirkin wrote:
> > Btw, what is that intel thing everyone is talking about?  And why
> > would the virtio core support vendor specific behavior like that?
> 
> It's not a thing it's Zhu Lingshan :) intel is just one of the vendors
> that implemented vdpa support and so Zhu Lingshan from intel is working
> on vdpa and has also proposed virtio spec extensions for migration.
> intel's driver is called ifcvf.  vdpa composes all this stuff that is
> added to vfio in userspace, so it's a different approach.

Well, so let's call it virtio live migration instead of intel.

And please work all together in the virtio committee that you have
one way of communication between controlling and controlled functions.
If one extension does it one way and the other a different way that's
just creating a giant mess.


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-11  6:59                                       ` Christoph Hellwig
  0 siblings, 0 replies; 321+ messages in thread
From: Christoph Hellwig @ 2023-10-11  6:59 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, maorg, virtualization, Christoph Hellwig, Jason Gunthorpe,
	jiri, leonro

On Wed, Oct 11, 2023 at 02:43:37AM -0400, Michael S. Tsirkin wrote:
> > Btw, what is that intel thing everyone is talking about?  And why
> > would the virtio core support vendor specific behavior like that?
> 
> It's not a thing it's Zhu Lingshan :) intel is just one of the vendors
> that implemented vdpa support and so Zhu Lingshan from intel is working
> on vdpa and has also proposed virtio spec extensions for migration.
> intel's driver is called ifcvf.  vdpa composes all this stuff that is
> added to vfio in userspace, so it's a different approach.

Well, so let's call it virtio live migration instead of intel.

And please work all together in the virtio committee that you have
one way of communication between controlling and controlled functions.
If one extension does it one way and the other a different way that's
just creating a giant mess.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-10 20:42                                       ` Michael S. Tsirkin
@ 2023-10-11  7:44                                         ` Yishai Hadas via Virtualization
  -1 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas @ 2023-10-11  7:44 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Gunthorpe, Christoph Hellwig, alex.williamson, jasowang,
	kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, leonro, maorg

On 10/10/2023 23:42, Michael S. Tsirkin wrote:
> On Tue, Oct 10, 2023 at 07:09:08PM +0300, Yishai Hadas wrote:
>>>> Assuming that we'll put each command inside virtio as the generic layer, we
>>>> won't be able to call/use this API internally to get the PF as of cyclic
>>>> dependencies between the modules, link will fail.
> I just mean:
> virtio_admin_legacy_io_write(sruct pci_device *,  ....)
>
>
> internally it starts from vf gets the pf (or vf itself or whatever
> the transport is) sends command gets status returns.
>
> what is cyclic here?
>
virtio-pci depends on virtio [1].

If we put the commands in the generic layer as we expect it to be (i.e. 
virtio), then trying to call internally call for 
virtio_pci_vf_get_pf_dev() to get the PF from the VF will end-up by a 
linker cyclic error as of below [2].

As of that, someone can suggest to put the commands in virtio-pci, 
however this will fully bypass the generic layer of virtio and future 
clients won't be able to use it.

In addition, passing in the VF PCI pointer instead of the VF group 
member ID + the VIRTIO PF device, will require in the future to 
duplicate each command once we'll use SIOV devices.

Instead, we suggest the below API for the above example.

virtio_admin_legacy_io_write(virtio_device *virtio_dev,  u64 
group_member_id,  ....)

[1]

[yishaih@reg-l-vrt-209 linux]$ modinfo virtio-pci
filename: /lib/modules/6.6.0-rc2+/kernel/drivers/virtio/virtio_pci.ko
version:        1
license:        GPL
description:    virtio-pci
author:         Anthony Liguori <aliguori@us.ibm.com>
srcversion:     7355EAC9408D38891938391
alias:          pci:v00001AF4d*sv*sd*bc*sc*i*
depends: virtio_pci_modern_dev,virtio,virtio_ring,virtio_pci_legacy_dev
retpoline:      Y
intree:         Y
name:           virtio_pci
vermagic:       6.6.0-rc2+ SMP preempt mod_unload modversions
parm:           force_legacy:Force legacy mode for transitional virtio 1 
devices (bool)

[2]

depmod: ERROR: Cycle detected: virtio -> virtio_pci -> virtio
depmod: ERROR: Found 2 modules in dependency cycles!
make[2]: *** [scripts/Makefile.modinst:128: depmod] Error 1
make[1]: *** [/images/yishaih/src/kernel/linux/Makefile:1821: 
modules_install] Error 2

Yishai


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-11  7:44                                         ` Yishai Hadas via Virtualization
  0 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-10-11  7:44 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, maorg, virtualization, Christoph Hellwig, Jason Gunthorpe,
	jiri, leonro

On 10/10/2023 23:42, Michael S. Tsirkin wrote:
> On Tue, Oct 10, 2023 at 07:09:08PM +0300, Yishai Hadas wrote:
>>>> Assuming that we'll put each command inside virtio as the generic layer, we
>>>> won't be able to call/use this API internally to get the PF as of cyclic
>>>> dependencies between the modules, link will fail.
> I just mean:
> virtio_admin_legacy_io_write(sruct pci_device *,  ....)
>
>
> internally it starts from vf gets the pf (or vf itself or whatever
> the transport is) sends command gets status returns.
>
> what is cyclic here?
>
virtio-pci depends on virtio [1].

If we put the commands in the generic layer as we expect it to be (i.e. 
virtio), then trying to call internally call for 
virtio_pci_vf_get_pf_dev() to get the PF from the VF will end-up by a 
linker cyclic error as of below [2].

As of that, someone can suggest to put the commands in virtio-pci, 
however this will fully bypass the generic layer of virtio and future 
clients won't be able to use it.

In addition, passing in the VF PCI pointer instead of the VF group 
member ID + the VIRTIO PF device, will require in the future to 
duplicate each command once we'll use SIOV devices.

Instead, we suggest the below API for the above example.

virtio_admin_legacy_io_write(virtio_device *virtio_dev,  u64 
group_member_id,  ....)

[1]

[yishaih@reg-l-vrt-209 linux]$ modinfo virtio-pci
filename: /lib/modules/6.6.0-rc2+/kernel/drivers/virtio/virtio_pci.ko
version:        1
license:        GPL
description:    virtio-pci
author:         Anthony Liguori <aliguori@us.ibm.com>
srcversion:     7355EAC9408D38891938391
alias:          pci:v00001AF4d*sv*sd*bc*sc*i*
depends: virtio_pci_modern_dev,virtio,virtio_ring,virtio_pci_legacy_dev
retpoline:      Y
intree:         Y
name:           virtio_pci
vermagic:       6.6.0-rc2+ SMP preempt mod_unload modversions
parm:           force_legacy:Force legacy mode for transitional virtio 1 
devices (bool)

[2]

depmod: ERROR: Cycle detected: virtio -> virtio_pci -> virtio
depmod: ERROR: Found 2 modules in dependency cycles!
make[2]: *** [scripts/Makefile.modinst:128: depmod] Error 1
make[1]: *** [/images/yishaih/src/kernel/linux/Makefile:1821: 
modules_install] Error 2

Yishai

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* RE: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-11  6:59                                       ` Christoph Hellwig
@ 2023-10-11  8:00                                         ` Parav Pandit via Virtualization
  -1 siblings, 0 replies; 321+ messages in thread
From: Parav Pandit @ 2023-10-11  8:00 UTC (permalink / raw)
  To: Christoph Hellwig, Michael S. Tsirkin
  Cc: Jason Gunthorpe, Yishai Hadas, alex.williamson, jasowang, kvm,
	virtualization, Feng Liu, Jiri Pirko, kevin.tian, joao.m.martins,
	Leon Romanovsky, Maor Gottlieb

Hi Christoph,

> From: Christoph Hellwig <hch@infradead.org>
> Sent: Wednesday, October 11, 2023 12:29 PM
> 
> On Wed, Oct 11, 2023 at 02:43:37AM -0400, Michael S. Tsirkin wrote:
> > > Btw, what is that intel thing everyone is talking about?  And why
> > > would the virtio core support vendor specific behavior like that?
> >
> > It's not a thing it's Zhu Lingshan :) intel is just one of the vendors
> > that implemented vdpa support and so Zhu Lingshan from intel is
> > working on vdpa and has also proposed virtio spec extensions for migration.
> > intel's driver is called ifcvf.  vdpa composes all this stuff that is
> > added to vfio in userspace, so it's a different approach.
> 
> Well, so let's call it virtio live migration instead of intel.
> 
> And please work all together in the virtio committee that you have one way of
> communication between controlling and controlled functions.
> If one extension does it one way and the other a different way that's just
> creating a giant mess.

We in virtio committee are working on VF device migration where:
VF = controlled function
PF = controlling function

The second proposal is what Michael mentioned from Intel that somehow combine controlled and controlling function as single entity on VF.

The main reasons I find it weird are:
1. it must always need to do mediation to do fake the device reset, and flr flows
2. dma cannot work as you explained for complex device state
3. it needs constant knowledge of each tiny things for each virtio device type

Such single entity appears a bit very weird to me but maybe it is just me.

^ permalink raw reply	[flat|nested] 321+ messages in thread

* RE: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-11  8:00                                         ` Parav Pandit via Virtualization
  0 siblings, 0 replies; 321+ messages in thread
From: Parav Pandit via Virtualization @ 2023-10-11  8:00 UTC (permalink / raw)
  To: Christoph Hellwig, Michael S. Tsirkin
  Cc: kvm, Maor Gottlieb, virtualization, Jason Gunthorpe, Jiri Pirko,
	Leon Romanovsky

Hi Christoph,

> From: Christoph Hellwig <hch@infradead.org>
> Sent: Wednesday, October 11, 2023 12:29 PM
> 
> On Wed, Oct 11, 2023 at 02:43:37AM -0400, Michael S. Tsirkin wrote:
> > > Btw, what is that intel thing everyone is talking about?  And why
> > > would the virtio core support vendor specific behavior like that?
> >
> > It's not a thing it's Zhu Lingshan :) intel is just one of the vendors
> > that implemented vdpa support and so Zhu Lingshan from intel is
> > working on vdpa and has also proposed virtio spec extensions for migration.
> > intel's driver is called ifcvf.  vdpa composes all this stuff that is
> > added to vfio in userspace, so it's a different approach.
> 
> Well, so let's call it virtio live migration instead of intel.
> 
> And please work all together in the virtio committee that you have one way of
> communication between controlling and controlled functions.
> If one extension does it one way and the other a different way that's just
> creating a giant mess.

We in virtio committee are working on VF device migration where:
VF = controlled function
PF = controlling function

The second proposal is what Michael mentioned from Intel that somehow combine controlled and controlling function as single entity on VF.

The main reasons I find it weird are:
1. it must always need to do mediation to do fake the device reset, and flr flows
2. dma cannot work as you explained for complex device state
3. it needs constant knowledge of each tiny things for each virtio device type

Such single entity appears a bit very weird to me but maybe it is just me.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-11  7:44                                         ` Yishai Hadas via Virtualization
@ 2023-10-11  8:02                                           ` Michael S. Tsirkin
  -1 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-11  8:02 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: kvm, maorg, virtualization, Christoph Hellwig, Jason Gunthorpe,
	jiri, leonro

On Wed, Oct 11, 2023 at 10:44:49AM +0300, Yishai Hadas wrote:
> On 10/10/2023 23:42, Michael S. Tsirkin wrote:
> > On Tue, Oct 10, 2023 at 07:09:08PM +0300, Yishai Hadas wrote:
> > > > > Assuming that we'll put each command inside virtio as the generic layer, we
> > > > > won't be able to call/use this API internally to get the PF as of cyclic
> > > > > dependencies between the modules, link will fail.
> > I just mean:
> > virtio_admin_legacy_io_write(sruct pci_device *,  ....)
> > 
> > 
> > internally it starts from vf gets the pf (or vf itself or whatever
> > the transport is) sends command gets status returns.
> > 
> > what is cyclic here?
> > 
> virtio-pci depends on virtio [1].
> 
> If we put the commands in the generic layer as we expect it to be (i.e.
> virtio), then trying to call internally call for virtio_pci_vf_get_pf_dev()
> to get the PF from the VF will end-up by a linker cyclic error as of below
> [2].
> 
> As of that, someone can suggest to put the commands in virtio-pci, however
> this will fully bypass the generic layer of virtio and future clients won't
> be able to use it.

virtio_pci would get pci device.
virtio pci convers that to virtio device of owner + group member id and calls virtio.

no cycles and minimal transport specific code, right?

> In addition, passing in the VF PCI pointer instead of the VF group member ID
> + the VIRTIO PF device, will require in the future to duplicate each command
> once we'll use SIOV devices.

I don't think anyone knows how will SIOV look. But shuffling
APIs around is not a big deal. We'll see.

> Instead, we suggest the below API for the above example.
> 
> virtio_admin_legacy_io_write(virtio_device *virtio_dev,  u64
> group_member_id,  ....)
> 
> [1]

> [yishaih@reg-l-vrt-209 linux]$ modinfo virtio-pci
> filename: /lib/modules/6.6.0-rc2+/kernel/drivers/virtio/virtio_pci.ko
> version:        1
> license:        GPL
> description:    virtio-pci
> author:         Anthony Liguori <aliguori@us.ibm.com>
> srcversion:     7355EAC9408D38891938391
> alias:          pci:v00001AF4d*sv*sd*bc*sc*i*
> depends: virtio_pci_modern_dev,virtio,virtio_ring,virtio_pci_legacy_dev
> retpoline:      Y
> intree:         Y
> name:           virtio_pci
> vermagic:       6.6.0-rc2+ SMP preempt mod_unload modversions
> parm:           force_legacy:Force legacy mode for transitional virtio 1
> devices (bool)
> 
> [2]
> 
> depmod: ERROR: Cycle detected: virtio -> virtio_pci -> virtio
> depmod: ERROR: Found 2 modules in dependency cycles!
> make[2]: *** [scripts/Makefile.modinst:128: depmod] Error 1
> make[1]: *** [/images/yishaih/src/kernel/linux/Makefile:1821:
> modules_install] Error 2
> 
> Yishai

virtio absolutely must not depend on virtio pci, it is used on
systems without pci at all.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-11  8:02                                           ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-11  8:02 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: Jason Gunthorpe, Christoph Hellwig, alex.williamson, jasowang,
	kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, leonro, maorg

On Wed, Oct 11, 2023 at 10:44:49AM +0300, Yishai Hadas wrote:
> On 10/10/2023 23:42, Michael S. Tsirkin wrote:
> > On Tue, Oct 10, 2023 at 07:09:08PM +0300, Yishai Hadas wrote:
> > > > > Assuming that we'll put each command inside virtio as the generic layer, we
> > > > > won't be able to call/use this API internally to get the PF as of cyclic
> > > > > dependencies between the modules, link will fail.
> > I just mean:
> > virtio_admin_legacy_io_write(sruct pci_device *,  ....)
> > 
> > 
> > internally it starts from vf gets the pf (or vf itself or whatever
> > the transport is) sends command gets status returns.
> > 
> > what is cyclic here?
> > 
> virtio-pci depends on virtio [1].
> 
> If we put the commands in the generic layer as we expect it to be (i.e.
> virtio), then trying to call internally call for virtio_pci_vf_get_pf_dev()
> to get the PF from the VF will end-up by a linker cyclic error as of below
> [2].
> 
> As of that, someone can suggest to put the commands in virtio-pci, however
> this will fully bypass the generic layer of virtio and future clients won't
> be able to use it.

virtio_pci would get pci device.
virtio pci convers that to virtio device of owner + group member id and calls virtio.

no cycles and minimal transport specific code, right?

> In addition, passing in the VF PCI pointer instead of the VF group member ID
> + the VIRTIO PF device, will require in the future to duplicate each command
> once we'll use SIOV devices.

I don't think anyone knows how will SIOV look. But shuffling
APIs around is not a big deal. We'll see.

> Instead, we suggest the below API for the above example.
> 
> virtio_admin_legacy_io_write(virtio_device *virtio_dev,  u64
> group_member_id,  ....)
> 
> [1]

> [yishaih@reg-l-vrt-209 linux]$ modinfo virtio-pci
> filename: /lib/modules/6.6.0-rc2+/kernel/drivers/virtio/virtio_pci.ko
> version:        1
> license:        GPL
> description:    virtio-pci
> author:         Anthony Liguori <aliguori@us.ibm.com>
> srcversion:     7355EAC9408D38891938391
> alias:          pci:v00001AF4d*sv*sd*bc*sc*i*
> depends: virtio_pci_modern_dev,virtio,virtio_ring,virtio_pci_legacy_dev
> retpoline:      Y
> intree:         Y
> name:           virtio_pci
> vermagic:       6.6.0-rc2+ SMP preempt mod_unload modversions
> parm:           force_legacy:Force legacy mode for transitional virtio 1
> devices (bool)
> 
> [2]
> 
> depmod: ERROR: Cycle detected: virtio -> virtio_pci -> virtio
> depmod: ERROR: Found 2 modules in dependency cycles!
> make[2]: *** [scripts/Makefile.modinst:128: depmod] Error 1
> make[1]: *** [/images/yishaih/src/kernel/linux/Makefile:1821:
> modules_install] Error 2
> 
> Yishai

virtio absolutely must not depend on virtio pci, it is used on
systems without pci at all.


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-11  8:00                                         ` Parav Pandit via Virtualization
@ 2023-10-11  8:10                                           ` Michael S. Tsirkin
  -1 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-11  8:10 UTC (permalink / raw)
  To: Parav Pandit
  Cc: kvm, Maor Gottlieb, virtualization, Christoph Hellwig,
	Jason Gunthorpe, Jiri Pirko, Leon Romanovsky

On Wed, Oct 11, 2023 at 08:00:57AM +0000, Parav Pandit wrote:
> Hi Christoph,
> 
> > From: Christoph Hellwig <hch@infradead.org>
> > Sent: Wednesday, October 11, 2023 12:29 PM
> > 
> > On Wed, Oct 11, 2023 at 02:43:37AM -0400, Michael S. Tsirkin wrote:
> > > > Btw, what is that intel thing everyone is talking about?  And why
> > > > would the virtio core support vendor specific behavior like that?
> > >
> > > It's not a thing it's Zhu Lingshan :) intel is just one of the vendors
> > > that implemented vdpa support and so Zhu Lingshan from intel is
> > > working on vdpa and has also proposed virtio spec extensions for migration.
> > > intel's driver is called ifcvf.  vdpa composes all this stuff that is
> > > added to vfio in userspace, so it's a different approach.
> > 
> > Well, so let's call it virtio live migration instead of intel.
> > 
> > And please work all together in the virtio committee that you have one way of
> > communication between controlling and controlled functions.
> > If one extension does it one way and the other a different way that's just
> > creating a giant mess.
> 
> We in virtio committee are working on VF device migration where:
> VF = controlled function
> PF = controlling function
> 
> The second proposal is what Michael mentioned from Intel that somehow combine controlled and controlling function as single entity on VF.
> 
> The main reasons I find it weird are:
> 1. it must always need to do mediation to do fake the device reset, and flr flows
> 2. dma cannot work as you explained for complex device state
> 3. it needs constant knowledge of each tiny things for each virtio device type
> 
> Such single entity appears a bit very weird to me but maybe it is just me.

Yea it appears to include everyone from nvidia. Others are used to it -
this is exactly what happens with virtio generally. E.g. vhost
processes fast path in the kernel and control path is in userspace.
vdpa has been largely modeled after that, for better or worse.
-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-11  8:10                                           ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-11  8:10 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Christoph Hellwig, Jason Gunthorpe, Yishai Hadas,
	alex.williamson, jasowang, kvm, virtualization, Feng Liu,
	Jiri Pirko, kevin.tian, joao.m.martins, Leon Romanovsky,
	Maor Gottlieb

On Wed, Oct 11, 2023 at 08:00:57AM +0000, Parav Pandit wrote:
> Hi Christoph,
> 
> > From: Christoph Hellwig <hch@infradead.org>
> > Sent: Wednesday, October 11, 2023 12:29 PM
> > 
> > On Wed, Oct 11, 2023 at 02:43:37AM -0400, Michael S. Tsirkin wrote:
> > > > Btw, what is that intel thing everyone is talking about?  And why
> > > > would the virtio core support vendor specific behavior like that?
> > >
> > > It's not a thing it's Zhu Lingshan :) intel is just one of the vendors
> > > that implemented vdpa support and so Zhu Lingshan from intel is
> > > working on vdpa and has also proposed virtio spec extensions for migration.
> > > intel's driver is called ifcvf.  vdpa composes all this stuff that is
> > > added to vfio in userspace, so it's a different approach.
> > 
> > Well, so let's call it virtio live migration instead of intel.
> > 
> > And please work all together in the virtio committee that you have one way of
> > communication between controlling and controlled functions.
> > If one extension does it one way and the other a different way that's just
> > creating a giant mess.
> 
> We in virtio committee are working on VF device migration where:
> VF = controlled function
> PF = controlling function
> 
> The second proposal is what Michael mentioned from Intel that somehow combine controlled and controlling function as single entity on VF.
> 
> The main reasons I find it weird are:
> 1. it must always need to do mediation to do fake the device reset, and flr flows
> 2. dma cannot work as you explained for complex device state
> 3. it needs constant knowledge of each tiny things for each virtio device type
> 
> Such single entity appears a bit very weird to me but maybe it is just me.

Yea it appears to include everyone from nvidia. Others are used to it -
this is exactly what happens with virtio generally. E.g. vhost
processes fast path in the kernel and control path is in userspace.
vdpa has been largely modeled after that, for better or worse.
-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-11  6:59                                       ` Christoph Hellwig
@ 2023-10-11  8:12                                         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-11  8:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: kvm, maorg, virtualization, Jason Gunthorpe, jiri, leonro

On Tue, Oct 10, 2023 at 11:59:26PM -0700, Christoph Hellwig wrote:
> On Wed, Oct 11, 2023 at 02:43:37AM -0400, Michael S. Tsirkin wrote:
> > > Btw, what is that intel thing everyone is talking about?  And why
> > > would the virtio core support vendor specific behavior like that?
> > 
> > It's not a thing it's Zhu Lingshan :) intel is just one of the vendors
> > that implemented vdpa support and so Zhu Lingshan from intel is working
> > on vdpa and has also proposed virtio spec extensions for migration.
> > intel's driver is called ifcvf.  vdpa composes all this stuff that is
> > added to vfio in userspace, so it's a different approach.
> 
> Well, so let's call it virtio live migration instead of intel.
> 
> And please work all together in the virtio committee that you have
> one way of communication between controlling and controlled functions.
> If one extension does it one way and the other a different way that's
> just creating a giant mess.

Absolutely, this is exactly what I keep suggesting. Thanks for
bringing this up, will help me drive the point home!

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-11  8:12                                         ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-11  8:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jason Gunthorpe, Yishai Hadas, alex.williamson, jasowang, kvm,
	virtualization, parav, feliu, jiri, kevin.tian, joao.m.martins,
	leonro, maorg

On Tue, Oct 10, 2023 at 11:59:26PM -0700, Christoph Hellwig wrote:
> On Wed, Oct 11, 2023 at 02:43:37AM -0400, Michael S. Tsirkin wrote:
> > > Btw, what is that intel thing everyone is talking about?  And why
> > > would the virtio core support vendor specific behavior like that?
> > 
> > It's not a thing it's Zhu Lingshan :) intel is just one of the vendors
> > that implemented vdpa support and so Zhu Lingshan from intel is working
> > on vdpa and has also proposed virtio spec extensions for migration.
> > intel's driver is called ifcvf.  vdpa composes all this stuff that is
> > added to vfio in userspace, so it's a different approach.
> 
> Well, so let's call it virtio live migration instead of intel.
> 
> And please work all together in the virtio committee that you have
> one way of communication between controlling and controlled functions.
> If one extension does it one way and the other a different way that's
> just creating a giant mess.

Absolutely, this is exactly what I keep suggesting. Thanks for
bringing this up, will help me drive the point home!

-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-11  8:02                                           ` Michael S. Tsirkin
@ 2023-10-11  8:58                                             ` Yishai Hadas via Virtualization
  -1 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas @ 2023-10-11  8:58 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Gunthorpe, Christoph Hellwig, alex.williamson, jasowang,
	kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, leonro, maorg

On 11/10/2023 11:02, Michael S. Tsirkin wrote:
> On Wed, Oct 11, 2023 at 10:44:49AM +0300, Yishai Hadas wrote:
>> On 10/10/2023 23:42, Michael S. Tsirkin wrote:
>>> On Tue, Oct 10, 2023 at 07:09:08PM +0300, Yishai Hadas wrote:
>>>>>> Assuming that we'll put each command inside virtio as the generic layer, we
>>>>>> won't be able to call/use this API internally to get the PF as of cyclic
>>>>>> dependencies between the modules, link will fail.
>>> I just mean:
>>> virtio_admin_legacy_io_write(sruct pci_device *,  ....)
>>>
>>>
>>> internally it starts from vf gets the pf (or vf itself or whatever
>>> the transport is) sends command gets status returns.
>>>
>>> what is cyclic here?
>>>
>> virtio-pci depends on virtio [1].
>>
>> If we put the commands in the generic layer as we expect it to be (i.e.
>> virtio), then trying to call internally call for virtio_pci_vf_get_pf_dev()
>> to get the PF from the VF will end-up by a linker cyclic error as of below
>> [2].
>>
>> As of that, someone can suggest to put the commands in virtio-pci, however
>> this will fully bypass the generic layer of virtio and future clients won't
>> be able to use it.
> virtio_pci would get pci device.
> virtio pci convers that to virtio device of owner + group member id and calls virtio.

Do you suggest another set of exported symbols (i.e per command ) in 
virtio which will get the owner device + group member + the extra 
specific command parameters ?

This will end-up duplicating the number of export symbols per command.

> no cycles and minimal transport specific code, right?

See my above note, if we may just call virtio without any further work 
on the command's input, than YES.

If so, virtio will prepare the command by setting the relevant SG lists 
and other data and finally will call:

vdev->config->exec_admin_cmd(vdev, cmd);

Was that your plan ?

>
>> In addition, passing in the VF PCI pointer instead of the VF group member ID
>> + the VIRTIO PF device, will require in the future to duplicate each command
>> once we'll use SIOV devices.
> I don't think anyone knows how will SIOV look. But shuffling
> APIs around is not a big deal. We'll see.

As you are the maintainer it's up-to-you, just need to consider another 
further duplication here.

Yishai

>
>> Instead, we suggest the below API for the above example.
>>
>> virtio_admin_legacy_io_write(virtio_device *virtio_dev,  u64
>> group_member_id,  ....)
>>
>> [1]
>> [yishaih@reg-l-vrt-209 linux]$ modinfo virtio-pci
>> filename: /lib/modules/6.6.0-rc2+/kernel/drivers/virtio/virtio_pci.ko
>> version:        1
>> license:        GPL
>> description:    virtio-pci
>> author:         Anthony Liguori <aliguori@us.ibm.com>
>> srcversion:     7355EAC9408D38891938391
>> alias:          pci:v00001AF4d*sv*sd*bc*sc*i*
>> depends: virtio_pci_modern_dev,virtio,virtio_ring,virtio_pci_legacy_dev
>> retpoline:      Y
>> intree:         Y
>> name:           virtio_pci
>> vermagic:       6.6.0-rc2+ SMP preempt mod_unload modversions
>> parm:           force_legacy:Force legacy mode for transitional virtio 1
>> devices (bool)
>>
>> [2]
>>
>> depmod: ERROR: Cycle detected: virtio -> virtio_pci -> virtio
>> depmod: ERROR: Found 2 modules in dependency cycles!
>> make[2]: *** [scripts/Makefile.modinst:128: depmod] Error 1
>> make[1]: *** [/images/yishaih/src/kernel/linux/Makefile:1821:
>> modules_install] Error 2
>>
>> Yishai
> virtio absolutely must not depend on virtio pci, it is used on
> systems without pci at all.
>


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-11  8:58                                             ` Yishai Hadas via Virtualization
  0 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-10-11  8:58 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, maorg, virtualization, Christoph Hellwig, Jason Gunthorpe,
	jiri, leonro

On 11/10/2023 11:02, Michael S. Tsirkin wrote:
> On Wed, Oct 11, 2023 at 10:44:49AM +0300, Yishai Hadas wrote:
>> On 10/10/2023 23:42, Michael S. Tsirkin wrote:
>>> On Tue, Oct 10, 2023 at 07:09:08PM +0300, Yishai Hadas wrote:
>>>>>> Assuming that we'll put each command inside virtio as the generic layer, we
>>>>>> won't be able to call/use this API internally to get the PF as of cyclic
>>>>>> dependencies between the modules, link will fail.
>>> I just mean:
>>> virtio_admin_legacy_io_write(sruct pci_device *,  ....)
>>>
>>>
>>> internally it starts from vf gets the pf (or vf itself or whatever
>>> the transport is) sends command gets status returns.
>>>
>>> what is cyclic here?
>>>
>> virtio-pci depends on virtio [1].
>>
>> If we put the commands in the generic layer as we expect it to be (i.e.
>> virtio), then trying to call internally call for virtio_pci_vf_get_pf_dev()
>> to get the PF from the VF will end-up by a linker cyclic error as of below
>> [2].
>>
>> As of that, someone can suggest to put the commands in virtio-pci, however
>> this will fully bypass the generic layer of virtio and future clients won't
>> be able to use it.
> virtio_pci would get pci device.
> virtio pci convers that to virtio device of owner + group member id and calls virtio.

Do you suggest another set of exported symbols (i.e per command ) in 
virtio which will get the owner device + group member + the extra 
specific command parameters ?

This will end-up duplicating the number of export symbols per command.

> no cycles and minimal transport specific code, right?

See my above note, if we may just call virtio without any further work 
on the command's input, than YES.

If so, virtio will prepare the command by setting the relevant SG lists 
and other data and finally will call:

vdev->config->exec_admin_cmd(vdev, cmd);

Was that your plan ?

>
>> In addition, passing in the VF PCI pointer instead of the VF group member ID
>> + the VIRTIO PF device, will require in the future to duplicate each command
>> once we'll use SIOV devices.
> I don't think anyone knows how will SIOV look. But shuffling
> APIs around is not a big deal. We'll see.

As you are the maintainer it's up-to-you, just need to consider another 
further duplication here.

Yishai

>
>> Instead, we suggest the below API for the above example.
>>
>> virtio_admin_legacy_io_write(virtio_device *virtio_dev,  u64
>> group_member_id,  ....)
>>
>> [1]
>> [yishaih@reg-l-vrt-209 linux]$ modinfo virtio-pci
>> filename: /lib/modules/6.6.0-rc2+/kernel/drivers/virtio/virtio_pci.ko
>> version:        1
>> license:        GPL
>> description:    virtio-pci
>> author:         Anthony Liguori <aliguori@us.ibm.com>
>> srcversion:     7355EAC9408D38891938391
>> alias:          pci:v00001AF4d*sv*sd*bc*sc*i*
>> depends: virtio_pci_modern_dev,virtio,virtio_ring,virtio_pci_legacy_dev
>> retpoline:      Y
>> intree:         Y
>> name:           virtio_pci
>> vermagic:       6.6.0-rc2+ SMP preempt mod_unload modversions
>> parm:           force_legacy:Force legacy mode for transitional virtio 1
>> devices (bool)
>>
>> [2]
>>
>> depmod: ERROR: Cycle detected: virtio -> virtio_pci -> virtio
>> depmod: ERROR: Found 2 modules in dependency cycles!
>> make[2]: *** [scripts/Makefile.modinst:128: depmod] Error 1
>> make[1]: *** [/images/yishaih/src/kernel/linux/Makefile:1821:
>> modules_install] Error 2
>>
>> Yishai
> virtio absolutely must not depend on virtio pci, it is used on
> systems without pci at all.
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-11  8:58                                             ` Yishai Hadas via Virtualization
@ 2023-10-11  9:03                                               ` Michael S. Tsirkin
  -1 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-11  9:03 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: kvm, maorg, virtualization, Christoph Hellwig, Jason Gunthorpe,
	jiri, leonro

On Wed, Oct 11, 2023 at 11:58:11AM +0300, Yishai Hadas wrote:
> On 11/10/2023 11:02, Michael S. Tsirkin wrote:
> > On Wed, Oct 11, 2023 at 10:44:49AM +0300, Yishai Hadas wrote:
> > > On 10/10/2023 23:42, Michael S. Tsirkin wrote:
> > > > On Tue, Oct 10, 2023 at 07:09:08PM +0300, Yishai Hadas wrote:
> > > > > > > Assuming that we'll put each command inside virtio as the generic layer, we
> > > > > > > won't be able to call/use this API internally to get the PF as of cyclic
> > > > > > > dependencies between the modules, link will fail.
> > > > I just mean:
> > > > virtio_admin_legacy_io_write(sruct pci_device *,  ....)
> > > > 
> > > > 
> > > > internally it starts from vf gets the pf (or vf itself or whatever
> > > > the transport is) sends command gets status returns.
> > > > 
> > > > what is cyclic here?
> > > > 
> > > virtio-pci depends on virtio [1].
> > > 
> > > If we put the commands in the generic layer as we expect it to be (i.e.
> > > virtio), then trying to call internally call for virtio_pci_vf_get_pf_dev()
> > > to get the PF from the VF will end-up by a linker cyclic error as of below
> > > [2].
> > > 
> > > As of that, someone can suggest to put the commands in virtio-pci, however
> > > this will fully bypass the generic layer of virtio and future clients won't
> > > be able to use it.
> > virtio_pci would get pci device.
> > virtio pci convers that to virtio device of owner + group member id and calls virtio.
> 
> Do you suggest another set of exported symbols (i.e per command ) in virtio
> which will get the owner device + group member + the extra specific command
> parameters ?
> 
> This will end-up duplicating the number of export symbols per command.

Or make them inline.
Or maybe actually even the specific commands should live inside virtio pci
they are pci specific after all.

> > no cycles and minimal transport specific code, right?
> 
> See my above note, if we may just call virtio without any further work on
> the command's input, than YES.
> 
> If so, virtio will prepare the command by setting the relevant SG lists and
> other data and finally will call:
> 
> vdev->config->exec_admin_cmd(vdev, cmd);
> 
> Was that your plan ?

is vdev the pf? then it won't support the transport where commands
are submitted through bar0 of vf itself.

> > 
> > > In addition, passing in the VF PCI pointer instead of the VF group member ID
> > > + the VIRTIO PF device, will require in the future to duplicate each command
> > > once we'll use SIOV devices.
> > I don't think anyone knows how will SIOV look. But shuffling
> > APIs around is not a big deal. We'll see.
> 
> As you are the maintainer it's up-to-you, just need to consider another
> further duplication here.
> 
> Yishai
> 
> > 
> > > Instead, we suggest the below API for the above example.
> > > 
> > > virtio_admin_legacy_io_write(virtio_device *virtio_dev,  u64
> > > group_member_id,  ....)
> > > 
> > > [1]
> > > [yishaih@reg-l-vrt-209 linux]$ modinfo virtio-pci
> > > filename: /lib/modules/6.6.0-rc2+/kernel/drivers/virtio/virtio_pci.ko
> > > version:        1
> > > license:        GPL
> > > description:    virtio-pci
> > > author:         Anthony Liguori <aliguori@us.ibm.com>
> > > srcversion:     7355EAC9408D38891938391
> > > alias:          pci:v00001AF4d*sv*sd*bc*sc*i*
> > > depends: virtio_pci_modern_dev,virtio,virtio_ring,virtio_pci_legacy_dev
> > > retpoline:      Y
> > > intree:         Y
> > > name:           virtio_pci
> > > vermagic:       6.6.0-rc2+ SMP preempt mod_unload modversions
> > > parm:           force_legacy:Force legacy mode for transitional virtio 1
> > > devices (bool)
> > > 
> > > [2]
> > > 
> > > depmod: ERROR: Cycle detected: virtio -> virtio_pci -> virtio
> > > depmod: ERROR: Found 2 modules in dependency cycles!
> > > make[2]: *** [scripts/Makefile.modinst:128: depmod] Error 1
> > > make[1]: *** [/images/yishaih/src/kernel/linux/Makefile:1821:
> > > modules_install] Error 2
> > > 
> > > Yishai
> > virtio absolutely must not depend on virtio pci, it is used on
> > systems without pci at all.
> > 

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-11  9:03                                               ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-11  9:03 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: Jason Gunthorpe, Christoph Hellwig, alex.williamson, jasowang,
	kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, leonro, maorg

On Wed, Oct 11, 2023 at 11:58:11AM +0300, Yishai Hadas wrote:
> On 11/10/2023 11:02, Michael S. Tsirkin wrote:
> > On Wed, Oct 11, 2023 at 10:44:49AM +0300, Yishai Hadas wrote:
> > > On 10/10/2023 23:42, Michael S. Tsirkin wrote:
> > > > On Tue, Oct 10, 2023 at 07:09:08PM +0300, Yishai Hadas wrote:
> > > > > > > Assuming that we'll put each command inside virtio as the generic layer, we
> > > > > > > won't be able to call/use this API internally to get the PF as of cyclic
> > > > > > > dependencies between the modules, link will fail.
> > > > I just mean:
> > > > virtio_admin_legacy_io_write(sruct pci_device *,  ....)
> > > > 
> > > > 
> > > > internally it starts from vf gets the pf (or vf itself or whatever
> > > > the transport is) sends command gets status returns.
> > > > 
> > > > what is cyclic here?
> > > > 
> > > virtio-pci depends on virtio [1].
> > > 
> > > If we put the commands in the generic layer as we expect it to be (i.e.
> > > virtio), then trying to call internally call for virtio_pci_vf_get_pf_dev()
> > > to get the PF from the VF will end-up by a linker cyclic error as of below
> > > [2].
> > > 
> > > As of that, someone can suggest to put the commands in virtio-pci, however
> > > this will fully bypass the generic layer of virtio and future clients won't
> > > be able to use it.
> > virtio_pci would get pci device.
> > virtio pci convers that to virtio device of owner + group member id and calls virtio.
> 
> Do you suggest another set of exported symbols (i.e per command ) in virtio
> which will get the owner device + group member + the extra specific command
> parameters ?
> 
> This will end-up duplicating the number of export symbols per command.

Or make them inline.
Or maybe actually even the specific commands should live inside virtio pci
they are pci specific after all.

> > no cycles and minimal transport specific code, right?
> 
> See my above note, if we may just call virtio without any further work on
> the command's input, than YES.
> 
> If so, virtio will prepare the command by setting the relevant SG lists and
> other data and finally will call:
> 
> vdev->config->exec_admin_cmd(vdev, cmd);
> 
> Was that your plan ?

is vdev the pf? then it won't support the transport where commands
are submitted through bar0 of vf itself.

> > 
> > > In addition, passing in the VF PCI pointer instead of the VF group member ID
> > > + the VIRTIO PF device, will require in the future to duplicate each command
> > > once we'll use SIOV devices.
> > I don't think anyone knows how will SIOV look. But shuffling
> > APIs around is not a big deal. We'll see.
> 
> As you are the maintainer it's up-to-you, just need to consider another
> further duplication here.
> 
> Yishai
> 
> > 
> > > Instead, we suggest the below API for the above example.
> > > 
> > > virtio_admin_legacy_io_write(virtio_device *virtio_dev,  u64
> > > group_member_id,  ....)
> > > 
> > > [1]
> > > [yishaih@reg-l-vrt-209 linux]$ modinfo virtio-pci
> > > filename: /lib/modules/6.6.0-rc2+/kernel/drivers/virtio/virtio_pci.ko
> > > version:        1
> > > license:        GPL
> > > description:    virtio-pci
> > > author:         Anthony Liguori <aliguori@us.ibm.com>
> > > srcversion:     7355EAC9408D38891938391
> > > alias:          pci:v00001AF4d*sv*sd*bc*sc*i*
> > > depends: virtio_pci_modern_dev,virtio,virtio_ring,virtio_pci_legacy_dev
> > > retpoline:      Y
> > > intree:         Y
> > > name:           virtio_pci
> > > vermagic:       6.6.0-rc2+ SMP preempt mod_unload modversions
> > > parm:           force_legacy:Force legacy mode for transitional virtio 1
> > > devices (bool)
> > > 
> > > [2]
> > > 
> > > depmod: ERROR: Cycle detected: virtio -> virtio_pci -> virtio
> > > depmod: ERROR: Found 2 modules in dependency cycles!
> > > make[2]: *** [scripts/Makefile.modinst:128: depmod] Error 1
> > > make[1]: *** [/images/yishaih/src/kernel/linux/Makefile:1821:
> > > modules_install] Error 2
> > > 
> > > Yishai
> > virtio absolutely must not depend on virtio pci, it is used on
> > systems without pci at all.
> > 


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-11  9:03                                               ` Michael S. Tsirkin
@ 2023-10-11 11:25                                                 ` Yishai Hadas via Virtualization
  -1 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas @ 2023-10-11 11:25 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Gunthorpe, Christoph Hellwig, alex.williamson, jasowang,
	kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, leonro, maorg

On 11/10/2023 12:03, Michael S. Tsirkin wrote:
> On Wed, Oct 11, 2023 at 11:58:11AM +0300, Yishai Hadas wrote:
>> On 11/10/2023 11:02, Michael S. Tsirkin wrote:
>>> On Wed, Oct 11, 2023 at 10:44:49AM +0300, Yishai Hadas wrote:
>>>> On 10/10/2023 23:42, Michael S. Tsirkin wrote:
>>>>> On Tue, Oct 10, 2023 at 07:09:08PM +0300, Yishai Hadas wrote:
>>>>>>>> Assuming that we'll put each command inside virtio as the generic layer, we
>>>>>>>> won't be able to call/use this API internally to get the PF as of cyclic
>>>>>>>> dependencies between the modules, link will fail.
>>>>> I just mean:
>>>>> virtio_admin_legacy_io_write(sruct pci_device *,  ....)
>>>>>
>>>>>
>>>>> internally it starts from vf gets the pf (or vf itself or whatever
>>>>> the transport is) sends command gets status returns.
>>>>>
>>>>> what is cyclic here?
>>>>>
>>>> virtio-pci depends on virtio [1].
>>>>
>>>> If we put the commands in the generic layer as we expect it to be (i.e.
>>>> virtio), then trying to call internally call for virtio_pci_vf_get_pf_dev()
>>>> to get the PF from the VF will end-up by a linker cyclic error as of below
>>>> [2].
>>>>
>>>> As of that, someone can suggest to put the commands in virtio-pci, however
>>>> this will fully bypass the generic layer of virtio and future clients won't
>>>> be able to use it.
>>> virtio_pci would get pci device.
>>> virtio pci convers that to virtio device of owner + group member id and calls virtio.
>> Do you suggest another set of exported symbols (i.e per command ) in virtio
>> which will get the owner device + group member + the extra specific command
>> parameters ?
>>
>> This will end-up duplicating the number of export symbols per command.
> Or make them inline.
> Or maybe actually even the specific commands should live inside virtio pci
> they are pci specific after all.

OK, let's leave them in virtio-pci as you suggested here.

You can see below [1] some scheme of how a specific command will look like.

Few notes:
- virtio_pci_vf_get_pf_dev() will become a static function.

- The commands will be placed inside virtio_pci_common.c and will use 
locally the above static function to get the owner PF.

- Post of preparing the command we may call directly to 
vp_avq_cmd_exec() which is part of vfio-pci and not to virtio.

- vp_avq_cmd_exec() will be part of virtio_pci_modern.c as you asked in 
the ML.

- The AQ creation/destruction will still be called upon probing virtio 
as was in V0, it will use the underlay config->create/destroy_avq() ops 
if exist.

- virtio_admin_cmd_exec() won't be exported any more outside virtio, 
we'll have an exported symbol in virtio-pci per command.

Is the above fine for you ?

By the way, from API namespace POV, are you fine with 
virtio_admin_legacy_io_write() or maybe let's have '_pci' as part of the 
name ? (i.e. virtio_pci_admin_legacy_io_write)

[1]

int virtio_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode, u8 
offset,
                  u8 size, u8 *buf)
{
     struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
     struct virtio_admin_cmd_legacy_wr_data *in;
     struct virtio_admin_cmd cmd = {};
     struct scatterlist in_sg;
     int ret;
     int vf_id;

     if (!virtio_dev)
         return -ENODEV;

     vf_id = pci_iov_vf_id(pdev);
     if (vf_id < 0)
         return -EINVAL;

     in = kzalloc(sizeof(*in) + size, GFP_KERNEL);
     if (!in)
         return -ENOMEM;

     in->offset = offset;
     memcpy(in->registers, buf, size);
     sg_init_one(&in_sg, in, sizeof(*in) + size);
     cmd.opcode = opcode;
     cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
     cmd.group_member_id = vf_id + 1;
     cmd.data_sg = &in_sg;
     ret = vp_avq_cmd_exec(virtio_dev, &cmd);

     kfree(in);
     return ret;
} EXPORT_SYMBOL_GPL(virtio_admin_legacy_io_write);

>
>>> no cycles and minimal transport specific code, right?
>> See my above note, if we may just call virtio without any further work on
>> the command's input, than YES.
>>
>> If so, virtio will prepare the command by setting the relevant SG lists and
>> other data and finally will call:
>>
>> vdev->config->exec_admin_cmd(vdev, cmd);
>>
>> Was that your plan ?
> is vdev the pf? then it won't support the transport where commands
> are submitted through bar0 of vf itself.

Yes, it's a PF.
Based on current spec for the existing admin commands we issue commands 
only on the PF.

In any case, moving to the above suggested scheme to handle per command 
and to get the VF PCI as the first argument we now have a full control 
for any future command.

Yishai

>>>> In addition, passing in the VF PCI pointer instead of the VF group member ID
>>>> + the VIRTIO PF device, will require in the future to duplicate each command
>>>> once we'll use SIOV devices.
>>> I don't think anyone knows how will SIOV look. But shuffling
>>> APIs around is not a big deal. We'll see.
>> As you are the maintainer it's up-to-you, just need to consider another
>> further duplication here.
>>
>> Yishai
>>
>>>> Instead, we suggest the below API for the above example.
>>>>
>>>> virtio_admin_legacy_io_write(virtio_device *virtio_dev,  u64
>>>> group_member_id,  ....)
>>>>
>>>> [1]
>>>> [yishaih@reg-l-vrt-209 linux]$ modinfo virtio-pci
>>>> filename: /lib/modules/6.6.0-rc2+/kernel/drivers/virtio/virtio_pci.ko
>>>> version:        1
>>>> license:        GPL
>>>> description:    virtio-pci
>>>> author:         Anthony Liguori <aliguori@us.ibm.com>
>>>> srcversion:     7355EAC9408D38891938391
>>>> alias:          pci:v00001AF4d*sv*sd*bc*sc*i*
>>>> depends: virtio_pci_modern_dev,virtio,virtio_ring,virtio_pci_legacy_dev
>>>> retpoline:      Y
>>>> intree:         Y
>>>> name:           virtio_pci
>>>> vermagic:       6.6.0-rc2+ SMP preempt mod_unload modversions
>>>> parm:           force_legacy:Force legacy mode for transitional virtio 1
>>>> devices (bool)
>>>>
>>>> [2]
>>>>
>>>> depmod: ERROR: Cycle detected: virtio -> virtio_pci -> virtio
>>>> depmod: ERROR: Found 2 modules in dependency cycles!
>>>> make[2]: *** [scripts/Makefile.modinst:128: depmod] Error 1
>>>> make[1]: *** [/images/yishaih/src/kernel/linux/Makefile:1821:
>>>> modules_install] Error 2
>>>>
>>>> Yishai
>>> virtio absolutely must not depend on virtio pci, it is used on
>>> systems without pci at all.
>>>


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-11 11:25                                                 ` Yishai Hadas via Virtualization
  0 siblings, 0 replies; 321+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-10-11 11:25 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, maorg, virtualization, Christoph Hellwig, Jason Gunthorpe,
	jiri, leonro

On 11/10/2023 12:03, Michael S. Tsirkin wrote:
> On Wed, Oct 11, 2023 at 11:58:11AM +0300, Yishai Hadas wrote:
>> On 11/10/2023 11:02, Michael S. Tsirkin wrote:
>>> On Wed, Oct 11, 2023 at 10:44:49AM +0300, Yishai Hadas wrote:
>>>> On 10/10/2023 23:42, Michael S. Tsirkin wrote:
>>>>> On Tue, Oct 10, 2023 at 07:09:08PM +0300, Yishai Hadas wrote:
>>>>>>>> Assuming that we'll put each command inside virtio as the generic layer, we
>>>>>>>> won't be able to call/use this API internally to get the PF as of cyclic
>>>>>>>> dependencies between the modules, link will fail.
>>>>> I just mean:
>>>>> virtio_admin_legacy_io_write(sruct pci_device *,  ....)
>>>>>
>>>>>
>>>>> internally it starts from vf gets the pf (or vf itself or whatever
>>>>> the transport is) sends command gets status returns.
>>>>>
>>>>> what is cyclic here?
>>>>>
>>>> virtio-pci depends on virtio [1].
>>>>
>>>> If we put the commands in the generic layer as we expect it to be (i.e.
>>>> virtio), then trying to call internally call for virtio_pci_vf_get_pf_dev()
>>>> to get the PF from the VF will end-up by a linker cyclic error as of below
>>>> [2].
>>>>
>>>> As of that, someone can suggest to put the commands in virtio-pci, however
>>>> this will fully bypass the generic layer of virtio and future clients won't
>>>> be able to use it.
>>> virtio_pci would get pci device.
>>> virtio pci convers that to virtio device of owner + group member id and calls virtio.
>> Do you suggest another set of exported symbols (i.e per command ) in virtio
>> which will get the owner device + group member + the extra specific command
>> parameters ?
>>
>> This will end-up duplicating the number of export symbols per command.
> Or make them inline.
> Or maybe actually even the specific commands should live inside virtio pci
> they are pci specific after all.

OK, let's leave them in virtio-pci as you suggested here.

You can see below [1] some scheme of how a specific command will look like.

Few notes:
- virtio_pci_vf_get_pf_dev() will become a static function.

- The commands will be placed inside virtio_pci_common.c and will use 
locally the above static function to get the owner PF.

- Post of preparing the command we may call directly to 
vp_avq_cmd_exec() which is part of vfio-pci and not to virtio.

- vp_avq_cmd_exec() will be part of virtio_pci_modern.c as you asked in 
the ML.

- The AQ creation/destruction will still be called upon probing virtio 
as was in V0, it will use the underlay config->create/destroy_avq() ops 
if exist.

- virtio_admin_cmd_exec() won't be exported any more outside virtio, 
we'll have an exported symbol in virtio-pci per command.

Is the above fine for you ?

By the way, from API namespace POV, are you fine with 
virtio_admin_legacy_io_write() or maybe let's have '_pci' as part of the 
name ? (i.e. virtio_pci_admin_legacy_io_write)

[1]

int virtio_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode, u8 
offset,
                  u8 size, u8 *buf)
{
     struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
     struct virtio_admin_cmd_legacy_wr_data *in;
     struct virtio_admin_cmd cmd = {};
     struct scatterlist in_sg;
     int ret;
     int vf_id;

     if (!virtio_dev)
         return -ENODEV;

     vf_id = pci_iov_vf_id(pdev);
     if (vf_id < 0)
         return -EINVAL;

     in = kzalloc(sizeof(*in) + size, GFP_KERNEL);
     if (!in)
         return -ENOMEM;

     in->offset = offset;
     memcpy(in->registers, buf, size);
     sg_init_one(&in_sg, in, sizeof(*in) + size);
     cmd.opcode = opcode;
     cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
     cmd.group_member_id = vf_id + 1;
     cmd.data_sg = &in_sg;
     ret = vp_avq_cmd_exec(virtio_dev, &cmd);

     kfree(in);
     return ret;
} EXPORT_SYMBOL_GPL(virtio_admin_legacy_io_write);

>
>>> no cycles and minimal transport specific code, right?
>> See my above note, if we may just call virtio without any further work on
>> the command's input, than YES.
>>
>> If so, virtio will prepare the command by setting the relevant SG lists and
>> other data and finally will call:
>>
>> vdev->config->exec_admin_cmd(vdev, cmd);
>>
>> Was that your plan ?
> is vdev the pf? then it won't support the transport where commands
> are submitted through bar0 of vf itself.

Yes, it's a PF.
Based on current spec for the existing admin commands we issue commands 
only on the PF.

In any case, moving to the above suggested scheme to handle per command 
and to get the VF PCI as the first argument we now have a full control 
for any future command.

Yishai

>>>> In addition, passing in the VF PCI pointer instead of the VF group member ID
>>>> + the VIRTIO PF device, will require in the future to duplicate each command
>>>> once we'll use SIOV devices.
>>> I don't think anyone knows how will SIOV look. But shuffling
>>> APIs around is not a big deal. We'll see.
>> As you are the maintainer it's up-to-you, just need to consider another
>> further duplication here.
>>
>> Yishai
>>
>>>> Instead, we suggest the below API for the above example.
>>>>
>>>> virtio_admin_legacy_io_write(virtio_device *virtio_dev,  u64
>>>> group_member_id,  ....)
>>>>
>>>> [1]
>>>> [yishaih@reg-l-vrt-209 linux]$ modinfo virtio-pci
>>>> filename: /lib/modules/6.6.0-rc2+/kernel/drivers/virtio/virtio_pci.ko
>>>> version:        1
>>>> license:        GPL
>>>> description:    virtio-pci
>>>> author:         Anthony Liguori <aliguori@us.ibm.com>
>>>> srcversion:     7355EAC9408D38891938391
>>>> alias:          pci:v00001AF4d*sv*sd*bc*sc*i*
>>>> depends: virtio_pci_modern_dev,virtio,virtio_ring,virtio_pci_legacy_dev
>>>> retpoline:      Y
>>>> intree:         Y
>>>> name:           virtio_pci
>>>> vermagic:       6.6.0-rc2+ SMP preempt mod_unload modversions
>>>> parm:           force_legacy:Force legacy mode for transitional virtio 1
>>>> devices (bool)
>>>>
>>>> [2]
>>>>
>>>> depmod: ERROR: Cycle detected: virtio -> virtio_pci -> virtio
>>>> depmod: ERROR: Found 2 modules in dependency cycles!
>>>> make[2]: *** [scripts/Makefile.modinst:128: depmod] Error 1
>>>> make[1]: *** [/images/yishaih/src/kernel/linux/Makefile:1821:
>>>> modules_install] Error 2
>>>>
>>>> Yishai
>>> virtio absolutely must not depend on virtio pci, it is used on
>>> systems without pci at all.
>>>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-11  8:10                                           ` Michael S. Tsirkin
  (?)
@ 2023-10-11 12:18                                           ` Jason Gunthorpe
  2023-10-11 17:03                                               ` Michael S. Tsirkin
  2023-10-11 17:05                                               ` Michael S. Tsirkin
  -1 siblings, 2 replies; 321+ messages in thread
From: Jason Gunthorpe @ 2023-10-11 12:18 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Parav Pandit, Christoph Hellwig, Yishai Hadas, alex.williamson,
	jasowang, kvm, virtualization, Feng Liu, Jiri Pirko, kevin.tian,
	joao.m.martins, Leon Romanovsky, Maor Gottlieb

On Wed, Oct 11, 2023 at 04:10:58AM -0400, Michael S. Tsirkin wrote:
> On Wed, Oct 11, 2023 at 08:00:57AM +0000, Parav Pandit wrote:
> > Hi Christoph,
> > 
> > > From: Christoph Hellwig <hch@infradead.org>
> > > Sent: Wednesday, October 11, 2023 12:29 PM
> > > 
> > > On Wed, Oct 11, 2023 at 02:43:37AM -0400, Michael S. Tsirkin wrote:
> > > > > Btw, what is that intel thing everyone is talking about?  And why
> > > > > would the virtio core support vendor specific behavior like that?
> > > >
> > > > It's not a thing it's Zhu Lingshan :) intel is just one of the vendors
> > > > that implemented vdpa support and so Zhu Lingshan from intel is
> > > > working on vdpa and has also proposed virtio spec extensions for migration.
> > > > intel's driver is called ifcvf.  vdpa composes all this stuff that is
> > > > added to vfio in userspace, so it's a different approach.
> > > 
> > > Well, so let's call it virtio live migration instead of intel.
> > > 
> > > And please work all together in the virtio committee that you have one way of
> > > communication between controlling and controlled functions.
> > > If one extension does it one way and the other a different way that's just
> > > creating a giant mess.
> > 
> > We in virtio committee are working on VF device migration where:
> > VF = controlled function
> > PF = controlling function
> > 
> > The second proposal is what Michael mentioned from Intel that somehow combine controlled and controlling function as single entity on VF.
> > 
> > The main reasons I find it weird are:
> > 1. it must always need to do mediation to do fake the device reset, and flr flows
> > 2. dma cannot work as you explained for complex device state
> > 3. it needs constant knowledge of each tiny things for each virtio device type
> > 
> > Such single entity appears a bit very weird to me but maybe it is just me.
> 
> Yea it appears to include everyone from nvidia. Others are used to it -
> this is exactly what happens with virtio generally. E.g. vhost
> processes fast path in the kernel and control path is in userspace.
> vdpa has been largely modeled after that, for better or worse.

As Parav says, you can't use DMA for any migration flows, and you open
a single VF scheme up to PCI P2P attacks from the VM. It is a pretty
bad design.

vfio reviewers will reject things like this that are not secure - we
just did for Intel E800, for instance.

With VDPA doing the same stuff as vfio I'm not sure who is auditing it
for security.

The simple way to be sure is to never touch the PCI function that has
DMA assigned to a VM from the hypervisor, except through config space.

Beyond that.. Well, think carefully about security.

IMHO the single-VF approach is not suitable for standardization.

Jason

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-11  6:26                       ` Christoph Hellwig
  (?)
@ 2023-10-11 13:57                       ` Jason Gunthorpe
  2023-10-11 14:17                           ` Christoph Hellwig
  -1 siblings, 1 reply; 321+ messages in thread
From: Jason Gunthorpe @ 2023-10-11 13:57 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Michael S. Tsirkin, Yishai Hadas, alex.williamson, jasowang, kvm,
	virtualization, parav, feliu, jiri, kevin.tian, joao.m.martins,
	leonro, maorg

On Tue, Oct 10, 2023 at 11:26:42PM -0700, Christoph Hellwig wrote:
> On Tue, Oct 10, 2023 at 10:10:31AM -0300, Jason Gunthorpe wrote:
> > We've talked around ideas like allowing the VF config space to do some
> > of the work. For simple devices we could get away with 1 VF config
> > space register. (VF config space is owned by the hypervisor, not the
> > guest)
> 
> Which assumes you're actually using VFs and not multiple PFs, which
> is a very limiting assumption.  

? It doesn't matter VF/PF, the same functions config space could do
simple migration.

> It also limits your from actually
> using DMA during the live migration process, which again is major
> limitation once you have a non-tivial amount of state.

Yes, this is a dealbreaker for big cases. But we do see several
smaller/simpler devices that don't use DMA in their migration.

> > SIOVr2 is discussing more a flexible RID mapping - there is a possible
> > route where a "VF" could actually have two RIDs, a hypervisor RID and a
> > guest RID.
> 
> Well, then you go down the SIOV route, which requires a complex driver
> actually presenting the guest visible device anyway.

Yep
 
> Independent of my above points on the doubts on VF-controlled live
> migration for PCe device I absolutely agree with your that the Linux
> abstraction and user interface should be VF based.  Which further
> reinforeces my point that the VFIO driver for the controlled function
> (PF or VF) and the Linux driver for the controlling function (better
> be a PF in practice) must be very tightly integrated.  And the best
> way to do that is to export the vfio nodes from the Linux driver
> that knowns the hardware and not split out into a separate one.

I'm not sure how we get to "very tightly integrated". We have many
examples of live migration vfio drivers now and they do not seem to
require tight integration. The PF driver only has to provide a way to
execute a small number of proxied operations.

Regardless, I'm not too fussed about what directory the implementation
lives in, though I do prefer the current arrangement where VFIO only
stuff is in drivers/vfio. I like the process we have where subsystems
are responsible for the code that implements the subsystem ops.

> > However - the Intel GPU VFIO driver is such a bad experiance I don't
> > want to encourage people to make VFIO drivers, or code that is only
> > used by VFIO drivers, that are not under drivers/vfio review.
> 
> We can and should require vfio review for users of the vfio API.
> But to be honest code placement was not the problem with i915.  The
> problem was that the mdev APIs (under drivers/vfio) were a complete
> trainwreck when it was written, and that the driver had a horrible
> hypervisor API abstraction.

E800 also made some significant security mistakes that VFIO side
caught. I think would have been missed if it went into a netdev
tree.

Even unrelated to mdev, Intel GPU is still not using the vfio side
properly, and the way it hacked into KVM to try to get page tracking
is totally logically wrong (but Works For Me (tm))

Aside from technical concerns, I do have a big process worry
here. vfio is responsible for the security side of the review of
things implementing its ops.

> > Be aware, there is a significant performance concern here. If you want
> > to create 1000 VFIO devices (this is a real thing), we *can't* probe a
> > normal driver first, it is too slow. We need a path that goes directly
> > from creating the RIDs to turning those RIDs into VFIO.
> 
> And by calling the vfio funtions from mlx5 you get this easily.

"easily" I don't know about that :)

> For mdev/SIOV like flows you must call vfio APIs from the main
> driver anyway, as there is no pci_dev to probe on anyway.  That's
> what i915 does btw.

IMHO i915 is not an good example to copy.

mlx5 is already much closer to your ideal, and I would hold up as the
right general direction for SIOV/mdev/etc, as we basically already do
a lot of SIOV ideas.

mlx5 is a multi-subsystem device. It has driver components in net,
VDPA and infiniband. It can create non-PCI "functions".

It is not feasible, process wise, for all of this to live under one
directory. We *want* the driver split up by subystem and subsystem
maintainer.

So, we created the auxiliary_device stuff to manage this. It can do
what you are imagining, I think.

The core PF/VF driver is in charge of what to carve off to a sub
system driver. IIRC mlx5 uses netlink to deliver commands to trigger
this (eg create a VDPA device). An auxilary_device is created and the
target subsystem driver probes to that and autoloads. eg see
drivers/vdpa/mlx5/net/mlx5_vnet.c

They are not 'tightly coupled', the opposite really. The
auxilary_device comes along with a mlx5 API that allows all the
subsystem to do what they need on the HW mostly independently. For
mlx5 this is mostly a way to execute FW RPC commands.

So, if you want to turn the VFIO stuff inside out, I'd still suggest
to have the VFIO driver part under drivers/vfio and probe to an
auxilary_device that represents the aspect of the HW to turn into VFIO
(or VPDA, or whatever). The 'core' driver can provide an appropriate
API between its VFIO part and its core part.

We lack a common uAPI to trigger this creation, but otherwise the
infrastructure exists and works well now. It allows subsystems to
remain together and complex devices to spread their functionality to
multiple subsystems.

The current pci_iov_get_pf_drvdata() hack in VFIO is really a short
cut to doing the auxilary_device stuff. (actually we tried to build
this with auxilary_device first, it did not work out, needs more
driver core infastructure).

I can easially imagine all the current VFIO drivers probing to an
auxilary_device and obtinaing the VF pci_device and the handle for the
core functionalitty directly without the pci_iov_get_pf_drvdata()
approach.

> For "classic" vfio that requires a pci_dev (or $otherbus_dev) we need
> to have a similar flow.  And I think the best way is to have the
> bus-level attribute on the device and/or a device-specific side band
> protocol to device how new functions are probed.  With that you
> avoid all the duplicate PCI IDs for the binding, and actually allow to
> sanely establush a communication channel between the functions.
> Because without that there is no way to know how any two functions
> related.  The driver might think they know, but there's all kinds of
> whacky PCI passthough schemes that will break such a logic.

Yes, if things are not simple PF/VF then Linux struggles at the driver
core level. auxilary_devices are a way out of that since one spot can
figure out how to assemble the multi-component device and then
delegate portions of the HW to other subsystems.

If something wants to probe its own driver to a PF/VF to assemble the
components it can do that and then bundle it up into an aux device and
trigger a VFIO/etc driver to run on that bundle of resources.

We don't *need* to put all the VFIO code someplace else to put the
control over slicing the HW into a shared core driver. mlx5 and
several other drivers now already demonstrates all of this.

Jason

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-11 13:57                       ` Jason Gunthorpe
@ 2023-10-11 14:17                           ` Christoph Hellwig
  0 siblings, 0 replies; 321+ messages in thread
From: Christoph Hellwig @ 2023-10-11 14:17 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Michael S. Tsirkin, Yishai Hadas,
	alex.williamson, jasowang, kvm, virtualization, parav, feliu,
	jiri, kevin.tian, joao.m.martins, leonro, maorg

On Wed, Oct 11, 2023 at 10:57:09AM -0300, Jason Gunthorpe wrote:
> > Independent of my above points on the doubts on VF-controlled live
> > migration for PCe device I absolutely agree with your that the Linux
> > abstraction and user interface should be VF based.  Which further
> > reinforeces my point that the VFIO driver for the controlled function
> > (PF or VF) and the Linux driver for the controlling function (better
> > be a PF in practice) must be very tightly integrated.  And the best
> > way to do that is to export the vfio nodes from the Linux driver
> > that knowns the hardware and not split out into a separate one.
> 
> I'm not sure how we get to "very tightly integrated". We have many
> examples of live migration vfio drivers now and they do not seem to
> require tight integration. The PF driver only has to provide a way to
> execute a small number of proxied operations.

Yes.  And for that I need to know what VF it actually is dealing
with.  Which is tight integration in my book.

> Regardless, I'm not too fussed about what directory the implementation
> lives in, though I do prefer the current arrangement where VFIO only
> stuff is in drivers/vfio. I like the process we have where subsystems
> are responsible for the code that implements the subsystem ops.

I really don't care about where the code lives (in the directory tree)
either.  But as you see with virtio trying to split it out into
an arbitrary module causes all kinds of pain.

> 
> E800 also made some significant security mistakes that VFIO side
> caught. I think would have been missed if it went into a netdev
> tree.
> 
> Even unrelated to mdev, Intel GPU is still not using the vfio side
> properly, and the way it hacked into KVM to try to get page tracking
> is totally logically wrong (but Works For Me (tm))
> 
> Aside from technical concerns, I do have a big process worry
> here. vfio is responsible for the security side of the review of
> things implementing its ops.

Yes, anytjing exposing a vfio node needs vfio review, period.  And
I don't think where the code lived was the i915 problem.  The problem
was they they were the first open user of the mdev API, which was
just a badly deisgned hook for never published code at that time, and
they then shoehorned it into a weird hypervisor abstraction.  There's
no good way to succeed with that.

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-11 14:17                           ` Christoph Hellwig
  0 siblings, 0 replies; 321+ messages in thread
From: Christoph Hellwig @ 2023-10-11 14:17 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvm, Michael S. Tsirkin, maorg, virtualization,
	Christoph Hellwig, jiri, leonro

On Wed, Oct 11, 2023 at 10:57:09AM -0300, Jason Gunthorpe wrote:
> > Independent of my above points on the doubts on VF-controlled live
> > migration for PCe device I absolutely agree with your that the Linux
> > abstraction and user interface should be VF based.  Which further
> > reinforeces my point that the VFIO driver for the controlled function
> > (PF or VF) and the Linux driver for the controlling function (better
> > be a PF in practice) must be very tightly integrated.  And the best
> > way to do that is to export the vfio nodes from the Linux driver
> > that knowns the hardware and not split out into a separate one.
> 
> I'm not sure how we get to "very tightly integrated". We have many
> examples of live migration vfio drivers now and they do not seem to
> require tight integration. The PF driver only has to provide a way to
> execute a small number of proxied operations.

Yes.  And for that I need to know what VF it actually is dealing
with.  Which is tight integration in my book.

> Regardless, I'm not too fussed about what directory the implementation
> lives in, though I do prefer the current arrangement where VFIO only
> stuff is in drivers/vfio. I like the process we have where subsystems
> are responsible for the code that implements the subsystem ops.

I really don't care about where the code lives (in the directory tree)
either.  But as you see with virtio trying to split it out into
an arbitrary module causes all kinds of pain.

> 
> E800 also made some significant security mistakes that VFIO side
> caught. I think would have been missed if it went into a netdev
> tree.
> 
> Even unrelated to mdev, Intel GPU is still not using the vfio side
> properly, and the way it hacked into KVM to try to get page tracking
> is totally logically wrong (but Works For Me (tm))
> 
> Aside from technical concerns, I do have a big process worry
> here. vfio is responsible for the security side of the review of
> things implementing its ops.

Yes, anytjing exposing a vfio node needs vfio review, period.  And
I don't think where the code lived was the i915 problem.  The problem
was they they were the first open user of the mdev API, which was
just a badly deisgned hook for never published code at that time, and
they then shoehorned it into a weird hypervisor abstraction.  There's
no good way to succeed with that.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-11 14:17                           ` Christoph Hellwig
  (?)
@ 2023-10-11 14:58                           ` Jason Gunthorpe
  2023-10-11 16:59                               ` Michael S. Tsirkin
  -1 siblings, 1 reply; 321+ messages in thread
From: Jason Gunthorpe @ 2023-10-11 14:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Michael S. Tsirkin, Yishai Hadas, alex.williamson, jasowang, kvm,
	virtualization, parav, feliu, jiri, kevin.tian, joao.m.martins,
	leonro, maorg

On Wed, Oct 11, 2023 at 07:17:25AM -0700, Christoph Hellwig wrote:
> On Wed, Oct 11, 2023 at 10:57:09AM -0300, Jason Gunthorpe wrote:
> > > Independent of my above points on the doubts on VF-controlled live
> > > migration for PCe device I absolutely agree with your that the Linux
> > > abstraction and user interface should be VF based.  Which further
> > > reinforeces my point that the VFIO driver for the controlled function
> > > (PF or VF) and the Linux driver for the controlling function (better
> > > be a PF in practice) must be very tightly integrated.  And the best
> > > way to do that is to export the vfio nodes from the Linux driver
> > > that knowns the hardware and not split out into a separate one.
> > 
> > I'm not sure how we get to "very tightly integrated". We have many
> > examples of live migration vfio drivers now and they do not seem to
> > require tight integration. The PF driver only has to provide a way to
> > execute a small number of proxied operations.
> 
> Yes.  And for that I need to know what VF it actually is dealing
> with.  Which is tight integration in my book.

Well, I see two modalities here

Simple devices with a fixed PF/VF relationship use a VF pci_device for
VFIO and pci_iov_get_pf_drvdata()/related APIs to assemble their
parts. This is very limited (and kind of hacky).

Complex devices can use an auxiliary_device for VFIO and assemble
their parts however they like.

After probe is done the VFIO code operates effectively identically
regardless of how the components were found.

Intel is going to submit their IDXD SIOV driver "soon" and I'd like to
pause there and have a real discussion about how to manage VFIO
lifecycle and dynamic "function" creation in this brave new world.

Ideally we can get a lifecycle API that works uniformly for PCI VFs
too. Then maybe this gets more resolved.

In my mind at least, definately no mdevs and that sysfs GUID junk. :(

> I really don't care about where the code lives (in the directory tree)
> either.  But as you see with virtio trying to split it out into
> an arbitrary module causes all kinds of pain.

Trying to put VFIO-only code in virtio is what causes all the
issues. If you mis-design the API boundary everything will be painful,
no matter where you put the code.

Jason

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-11 14:58                           ` Jason Gunthorpe
@ 2023-10-11 16:59                               ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-11 16:59 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Yishai Hadas, alex.williamson, jasowang, kvm,
	virtualization, parav, feliu, jiri, kevin.tian, joao.m.martins,
	leonro, maorg

On Wed, Oct 11, 2023 at 11:58:10AM -0300, Jason Gunthorpe wrote:
> Trying to put VFIO-only code in virtio is what causes all the
> issues. If you mis-design the API boundary everything will be painful,
> no matter where you put the code.

Are you implying the whole idea of adding these legacy virtio admin
commands to virtio spec was a design mistake?
It was nvidia guys who proposed it, so I'm surprised to hear you say this.

-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-11 16:59                               ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-11 16:59 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvm, maorg, virtualization, Christoph Hellwig, jiri, leonro

On Wed, Oct 11, 2023 at 11:58:10AM -0300, Jason Gunthorpe wrote:
> Trying to put VFIO-only code in virtio is what causes all the
> issues. If you mis-design the API boundary everything will be painful,
> no matter where you put the code.

Are you implying the whole idea of adding these legacy virtio admin
commands to virtio spec was a design mistake?
It was nvidia guys who proposed it, so I'm surprised to hear you say this.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-11 12:18                                           ` Jason Gunthorpe
@ 2023-10-11 17:03                                               ` Michael S. Tsirkin
  2023-10-11 17:05                                               ` Michael S. Tsirkin
  1 sibling, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-11 17:03 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvm, Maor Gottlieb, virtualization, Christoph Hellwig,
	Jiri Pirko, Leon Romanovsky

On Wed, Oct 11, 2023 at 09:18:49AM -0300, Jason Gunthorpe wrote:
> The simple way to be sure is to never touch the PCI function that has
> DMA assigned to a VM from the hypervisor, except through config space.

What makes config space different that it's safe though?
Isn't this more of a "we can't avoid touching config space" than
that it's safe? The line doesn't look that bright to me -
if there's e.g. a memory area designed explicitly for
hypervisor to poke at, that seems fine.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-11 17:03                                               ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-11 17:03 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Parav Pandit, Christoph Hellwig, Yishai Hadas, alex.williamson,
	jasowang, kvm, virtualization, Feng Liu, Jiri Pirko, kevin.tian,
	joao.m.martins, Leon Romanovsky, Maor Gottlieb

On Wed, Oct 11, 2023 at 09:18:49AM -0300, Jason Gunthorpe wrote:
> The simple way to be sure is to never touch the PCI function that has
> DMA assigned to a VM from the hypervisor, except through config space.

What makes config space different that it's safe though?
Isn't this more of a "we can't avoid touching config space" than
that it's safe? The line doesn't look that bright to me -
if there's e.g. a memory area designed explicitly for
hypervisor to poke at, that seems fine.

-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-11 12:18                                           ` Jason Gunthorpe
@ 2023-10-11 17:05                                               ` Michael S. Tsirkin
  2023-10-11 17:05                                               ` Michael S. Tsirkin
  1 sibling, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-11 17:05 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvm, Maor Gottlieb, virtualization, Christoph Hellwig,
	Jiri Pirko, Leon Romanovsky

On Wed, Oct 11, 2023 at 09:18:49AM -0300, Jason Gunthorpe wrote:
> With VDPA doing the same stuff as vfio I'm not sure who is auditing it
> for security.

Check the signed off tags and who sends the pull requests if you want to
know.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-11 17:05                                               ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-11 17:05 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Parav Pandit, Christoph Hellwig, Yishai Hadas, alex.williamson,
	jasowang, kvm, virtualization, Feng Liu, Jiri Pirko, kevin.tian,
	joao.m.martins, Leon Romanovsky, Maor Gottlieb

On Wed, Oct 11, 2023 at 09:18:49AM -0300, Jason Gunthorpe wrote:
> With VDPA doing the same stuff as vfio I'm not sure who is auditing it
> for security.

Check the signed off tags and who sends the pull requests if you want to
know.

-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-11 16:59                               ` Michael S. Tsirkin
  (?)
@ 2023-10-11 17:19                               ` Jason Gunthorpe
  2023-10-11 20:20                                   ` Michael S. Tsirkin
  -1 siblings, 1 reply; 321+ messages in thread
From: Jason Gunthorpe @ 2023-10-11 17:19 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Yishai Hadas, alex.williamson, jasowang, kvm,
	virtualization, parav, feliu, jiri, kevin.tian, joao.m.martins,
	leonro, maorg

On Wed, Oct 11, 2023 at 12:59:30PM -0400, Michael S. Tsirkin wrote:
> On Wed, Oct 11, 2023 at 11:58:10AM -0300, Jason Gunthorpe wrote:
> > Trying to put VFIO-only code in virtio is what causes all the
> > issues. If you mis-design the API boundary everything will be painful,
> > no matter where you put the code.
> 
> Are you implying the whole idea of adding these legacy virtio admin
> commands to virtio spec was a design mistake?

No, I'm saying again that trying to relocate all the vfio code into
drivers/virtio is a mistake

Jason

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-11 17:03                                               ` Michael S. Tsirkin
  (?)
@ 2023-10-11 17:20                                               ` Jason Gunthorpe
  -1 siblings, 0 replies; 321+ messages in thread
From: Jason Gunthorpe @ 2023-10-11 17:20 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Parav Pandit, Christoph Hellwig, Yishai Hadas, alex.williamson,
	jasowang, kvm, virtualization, Feng Liu, Jiri Pirko, kevin.tian,
	joao.m.martins, Leon Romanovsky, Maor Gottlieb

On Wed, Oct 11, 2023 at 01:03:09PM -0400, Michael S. Tsirkin wrote:
> On Wed, Oct 11, 2023 at 09:18:49AM -0300, Jason Gunthorpe wrote:
> > The simple way to be sure is to never touch the PCI function that has
> > DMA assigned to a VM from the hypervisor, except through config space.
> 
> What makes config space different that it's safe though?

Hypervisor fully mediates it and it is not accessible to P2P attacks.

> Isn't this more of a "we can't avoid touching config space" than
> that it's safe? The line doesn't look that bright to me -
> if there's e.g. a memory area designed explicitly for
> hypervisor to poke at, that seems fine.

It is not.

Jason 

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-11 17:19                               ` Jason Gunthorpe
@ 2023-10-11 20:20                                   ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-11 20:20 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Yishai Hadas, alex.williamson, jasowang, kvm,
	virtualization, parav, feliu, jiri, kevin.tian, joao.m.martins,
	leonro, maorg

On Wed, Oct 11, 2023 at 02:19:44PM -0300, Jason Gunthorpe wrote:
> On Wed, Oct 11, 2023 at 12:59:30PM -0400, Michael S. Tsirkin wrote:
> > On Wed, Oct 11, 2023 at 11:58:10AM -0300, Jason Gunthorpe wrote:
> > > Trying to put VFIO-only code in virtio is what causes all the
> > > issues. If you mis-design the API boundary everything will be painful,
> > > no matter where you put the code.
> > 
> > Are you implying the whole idea of adding these legacy virtio admin
> > commands to virtio spec was a design mistake?
> 
> No, I'm saying again that trying to relocate all the vfio code into
> drivers/virtio is a mistake
> 
> Jason

Yea please don't. And by the same token, please do not put
implementations of virtio spec under drivers/vfio.

-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-11 20:20                                   ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-11 20:20 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvm, maorg, virtualization, Christoph Hellwig, jiri, leonro

On Wed, Oct 11, 2023 at 02:19:44PM -0300, Jason Gunthorpe wrote:
> On Wed, Oct 11, 2023 at 12:59:30PM -0400, Michael S. Tsirkin wrote:
> > On Wed, Oct 11, 2023 at 11:58:10AM -0300, Jason Gunthorpe wrote:
> > > Trying to put VFIO-only code in virtio is what causes all the
> > > issues. If you mis-design the API boundary everything will be painful,
> > > no matter where you put the code.
> > 
> > Are you implying the whole idea of adding these legacy virtio admin
> > commands to virtio spec was a design mistake?
> 
> No, I'm saying again that trying to relocate all the vfio code into
> drivers/virtio is a mistake
> 
> Jason

Yea please don't. And by the same token, please do not put
implementations of virtio spec under drivers/vfio.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-11  8:00                                         ` Parav Pandit via Virtualization
@ 2023-10-12 10:29                                           ` Zhu, Lingshan
  -1 siblings, 0 replies; 321+ messages in thread
From: Zhu, Lingshan @ 2023-10-12 10:29 UTC (permalink / raw)
  To: Parav Pandit, Christoph Hellwig, Michael S. Tsirkin, Jason Wang
  Cc: kvm, Maor Gottlieb, virtualization, Jason Gunthorpe, Jiri Pirko,
	Leon Romanovsky



On 10/11/2023 4:00 PM, Parav Pandit via Virtualization wrote:
> Hi Christoph,
>
>> From: Christoph Hellwig <hch@infradead.org>
>> Sent: Wednesday, October 11, 2023 12:29 PM
>>
>> On Wed, Oct 11, 2023 at 02:43:37AM -0400, Michael S. Tsirkin wrote:
>>>> Btw, what is that intel thing everyone is talking about?  And why
>>>> would the virtio core support vendor specific behavior like that?
>>> It's not a thing it's Zhu Lingshan :) intel is just one of the vendors
>>> that implemented vdpa support and so Zhu Lingshan from intel is
>>> working on vdpa and has also proposed virtio spec extensions for migration.
>>> intel's driver is called ifcvf.  vdpa composes all this stuff that is
>>> added to vfio in userspace, so it's a different approach.
>> Well, so let's call it virtio live migration instead of intel.
>>
>> And please work all together in the virtio committee that you have one way of
>> communication between controlling and controlled functions.
>> If one extension does it one way and the other a different way that's just
>> creating a giant mess.
> We in virtio committee are working on VF device migration where:
> VF = controlled function
> PF = controlling function
>
> The second proposal is what Michael mentioned from Intel that somehow combine controlled and controlling function as single entity on VF.
>
> The main reasons I find it weird are:
> 1. it must always need to do mediation to do fake the device reset, and flr flows
> 2. dma cannot work as you explained for complex device state
> 3. it needs constant knowledge of each tiny things for each virtio device type
>
> Such single entity appears a bit very weird to me but maybe it is just me.
sorry for the late reply, we have discussed this for weeks in virtio 
mailing list.
I have proposed a live migration solution which is a config space solution.

We(me, Jason and Eugenio) have been working on this solution for more 
than two years
and we are implementing virtio live migration basic facilities.

The implementation is transport specific, e.g., for PCI we implement new 
or extend registers which
work as other config space registers do.

The reason we are arguing is:
I am not sure admin vq based live migration solution is a good choice, 
because:
1) it does not work for nested
2) it does not work for bare metal
3) QOS problem
4) security leaks.

Sorry to span the discussions here.

Thanks,
Zhu Lingshan
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-12 10:29                                           ` Zhu, Lingshan
  0 siblings, 0 replies; 321+ messages in thread
From: Zhu, Lingshan @ 2023-10-12 10:29 UTC (permalink / raw)
  To: Parav Pandit, Christoph Hellwig, Michael S. Tsirkin, Jason Wang
  Cc: kvm, Maor Gottlieb, virtualization, Jason Gunthorpe, Jiri Pirko,
	Leon Romanovsky



On 10/11/2023 4:00 PM, Parav Pandit via Virtualization wrote:
> Hi Christoph,
>
>> From: Christoph Hellwig <hch@infradead.org>
>> Sent: Wednesday, October 11, 2023 12:29 PM
>>
>> On Wed, Oct 11, 2023 at 02:43:37AM -0400, Michael S. Tsirkin wrote:
>>>> Btw, what is that intel thing everyone is talking about?  And why
>>>> would the virtio core support vendor specific behavior like that?
>>> It's not a thing it's Zhu Lingshan :) intel is just one of the vendors
>>> that implemented vdpa support and so Zhu Lingshan from intel is
>>> working on vdpa and has also proposed virtio spec extensions for migration.
>>> intel's driver is called ifcvf.  vdpa composes all this stuff that is
>>> added to vfio in userspace, so it's a different approach.
>> Well, so let's call it virtio live migration instead of intel.
>>
>> And please work all together in the virtio committee that you have one way of
>> communication between controlling and controlled functions.
>> If one extension does it one way and the other a different way that's just
>> creating a giant mess.
> We in virtio committee are working on VF device migration where:
> VF = controlled function
> PF = controlling function
>
> The second proposal is what Michael mentioned from Intel that somehow combine controlled and controlling function as single entity on VF.
>
> The main reasons I find it weird are:
> 1. it must always need to do mediation to do fake the device reset, and flr flows
> 2. dma cannot work as you explained for complex device state
> 3. it needs constant knowledge of each tiny things for each virtio device type
>
> Such single entity appears a bit very weird to me but maybe it is just me.
sorry for the late reply, we have discussed this for weeks in virtio 
mailing list.
I have proposed a live migration solution which is a config space solution.

We(me, Jason and Eugenio) have been working on this solution for more 
than two years
and we are implementing virtio live migration basic facilities.

The implementation is transport specific, e.g., for PCI we implement new 
or extend registers which
work as other config space registers do.

The reason we are arguing is:
I am not sure admin vq based live migration solution is a good choice, 
because:
1) it does not work for nested
2) it does not work for bare metal
3) QOS problem
4) security leaks.

Sorry to span the discussions here.

Thanks,
Zhu Lingshan
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-11  6:59                                       ` Christoph Hellwig
@ 2023-10-12 10:30                                         ` Zhu, Lingshan
  -1 siblings, 0 replies; 321+ messages in thread
From: Zhu, Lingshan @ 2023-10-12 10:30 UTC (permalink / raw)
  To: Christoph Hellwig, Michael S. Tsirkin
  Cc: kvm, maorg, virtualization, Jason Gunthorpe, jiri, leonro



On 10/11/2023 2:59 PM, Christoph Hellwig wrote:
> On Wed, Oct 11, 2023 at 02:43:37AM -0400, Michael S. Tsirkin wrote:
>>> Btw, what is that intel thing everyone is talking about?  And why
>>> would the virtio core support vendor specific behavior like that?
>> It's not a thing it's Zhu Lingshan :) intel is just one of the vendors
>> that implemented vdpa support and so Zhu Lingshan from intel is working
>> on vdpa and has also proposed virtio spec extensions for migration.
>> intel's driver is called ifcvf.  vdpa composes all this stuff that is
>> added to vfio in userspace, so it's a different approach.
> Well, so let's call it virtio live migration instead of intel.
>
> And please work all together in the virtio committee that you have
> one way of communication between controlling and controlled functions.
> If one extension does it one way and the other a different way that's
> just creating a giant mess.
I hope so, Jason Wang has proposed a solution to cooperate, but sadly
rejected...
>
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-12 10:30                                         ` Zhu, Lingshan
  0 siblings, 0 replies; 321+ messages in thread
From: Zhu, Lingshan @ 2023-10-12 10:30 UTC (permalink / raw)
  To: Christoph Hellwig, Michael S. Tsirkin
  Cc: kvm, maorg, virtualization, Jason Gunthorpe, jiri, leonro



On 10/11/2023 2:59 PM, Christoph Hellwig wrote:
> On Wed, Oct 11, 2023 at 02:43:37AM -0400, Michael S. Tsirkin wrote:
>>> Btw, what is that intel thing everyone is talking about?  And why
>>> would the virtio core support vendor specific behavior like that?
>> It's not a thing it's Zhu Lingshan :) intel is just one of the vendors
>> that implemented vdpa support and so Zhu Lingshan from intel is working
>> on vdpa and has also proposed virtio spec extensions for migration.
>> intel's driver is called ifcvf.  vdpa composes all this stuff that is
>> added to vfio in userspace, so it's a different approach.
> Well, so let's call it virtio live migration instead of intel.
>
> And please work all together in the virtio committee that you have
> one way of communication between controlling and controlled functions.
> If one extension does it one way and the other a different way that's
> just creating a giant mess.
I hope so, Jason Wang has proposed a solution to cooperate, but sadly
rejected...
>
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-09-26  3:45                                       ` Parav Pandit via Virtualization
@ 2023-10-12 10:52                                         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-12 10:52 UTC (permalink / raw)
  To: Parav Pandit
  Cc: kvm, Maor Gottlieb, virtualization, Jason Gunthorpe, Jiri Pirko,
	Leon Romanovsky

On Tue, Sep 26, 2023 at 03:45:36AM +0000, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Tuesday, September 26, 2023 12:06 AM
> 
> > One can thinkably do that wait in hardware, though. Just defer completion until
> > read is done.
> >
> Once OASIS does such new interface and if some hw vendor _actually_ wants to do such complex hw, may be vfio driver can adopt to it.

The reset behaviour I describe is already in the spec. What else do you
want OASIS to standardize? Virtio currently is just a register map it
does not yet include suggestions on how exactly do pci express
transactions look. You feel we should add that?

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-10-12 10:52                                         ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-12 10:52 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Jason Wang, Jason Gunthorpe, Alex Williamson, Yishai Hadas, kvm,
	virtualization, Feng Liu, Jiri Pirko, kevin.tian, joao.m.martins,
	Leon Romanovsky, Maor Gottlieb

On Tue, Sep 26, 2023 at 03:45:36AM +0000, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Tuesday, September 26, 2023 12:06 AM
> 
> > One can thinkably do that wait in hardware, though. Just defer completion until
> > read is done.
> >
> Once OASIS does such new interface and if some hw vendor _actually_ wants to do such complex hw, may be vfio driver can adopt to it.

The reset behaviour I describe is already in the spec. What else do you
want OASIS to standardize? Virtio currently is just a register map it
does not yet include suggestions on how exactly do pci express
transactions look. You feel we should add that?

-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* RE: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-10-12 10:52                                         ` Michael S. Tsirkin
@ 2023-10-12 11:11                                           ` Parav Pandit via Virtualization
  -1 siblings, 0 replies; 321+ messages in thread
From: Parav Pandit @ 2023-10-12 11:11 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, Jason Gunthorpe, Alex Williamson, Yishai Hadas, kvm,
	virtualization, Feng Liu, Jiri Pirko, kevin.tian, joao.m.martins,
	Leon Romanovsky, Maor Gottlieb


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Thursday, October 12, 2023 4:23 PM
> 
> On Tue, Sep 26, 2023 at 03:45:36AM +0000, Parav Pandit wrote:
> >
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Tuesday, September 26, 2023 12:06 AM
> >
> > > One can thinkably do that wait in hardware, though. Just defer
> > > completion until read is done.
> > >
> > Once OASIS does such new interface and if some hw vendor _actually_ wants
> to do such complex hw, may be vfio driver can adopt to it.
> 
> The reset behaviour I describe is already in the spec. What else do you want
> OASIS to standardize? Virtio currently is just a register map it does not yet
> include suggestions on how exactly do pci express transactions look. You feel we
> should add that?

The reset behavior in the spec for modern as listed in [1] and [2] is just fine.

What I meant is in context of having MMIO based legacy registers to "defer completion until read is done".
I think you meant, "Just differ read completion, until reset is done".
This means the hw needs to finish the device reset for thousands of devices within the read completion timeout of the pci.
So when if OASIS does such standardization, someone can implement it.

What I recollect, is OASIS didn't not standardize such anti-scale approach and took the admin command approach which achieve better scale.
Hope I clarified.

I am not expecting OASIS to do anything extra for legacy registers.

[1] The device MUST reset when 0 is written to device_status, and present a 0 in device_status once that is done.
[2] After writing 0 to device_status, the driver MUST wait for a read of device_status to return 0 before reinitializing
the device.

^ permalink raw reply	[flat|nested] 321+ messages in thread

* RE: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-10-12 11:11                                           ` Parav Pandit via Virtualization
  0 siblings, 0 replies; 321+ messages in thread
From: Parav Pandit via Virtualization @ 2023-10-12 11:11 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, Maor Gottlieb, virtualization, Jason Gunthorpe, Jiri Pirko,
	Leon Romanovsky


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Thursday, October 12, 2023 4:23 PM
> 
> On Tue, Sep 26, 2023 at 03:45:36AM +0000, Parav Pandit wrote:
> >
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Tuesday, September 26, 2023 12:06 AM
> >
> > > One can thinkably do that wait in hardware, though. Just defer
> > > completion until read is done.
> > >
> > Once OASIS does such new interface and if some hw vendor _actually_ wants
> to do such complex hw, may be vfio driver can adopt to it.
> 
> The reset behaviour I describe is already in the spec. What else do you want
> OASIS to standardize? Virtio currently is just a register map it does not yet
> include suggestions on how exactly do pci express transactions look. You feel we
> should add that?

The reset behavior in the spec for modern as listed in [1] and [2] is just fine.

What I meant is in context of having MMIO based legacy registers to "defer completion until read is done".
I think you meant, "Just differ read completion, until reset is done".
This means the hw needs to finish the device reset for thousands of devices within the read completion timeout of the pci.
So when if OASIS does such standardization, someone can implement it.

What I recollect, is OASIS didn't not standardize such anti-scale approach and took the admin command approach which achieve better scale.
Hope I clarified.

I am not expecting OASIS to do anything extra for legacy registers.

[1] The device MUST reset when 0 is written to device_status, and present a 0 in device_status once that is done.
[2] After writing 0 to device_status, the driver MUST wait for a read of device_status to return 0 before reinitializing
the device.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-10-12 11:11                                           ` Parav Pandit via Virtualization
@ 2023-10-12 11:30                                             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-12 11:30 UTC (permalink / raw)
  To: Parav Pandit
  Cc: kvm, Maor Gottlieb, virtualization, Jason Gunthorpe, Jiri Pirko,
	Leon Romanovsky

On Thu, Oct 12, 2023 at 11:11:20AM +0000, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Thursday, October 12, 2023 4:23 PM
> > 
> > On Tue, Sep 26, 2023 at 03:45:36AM +0000, Parav Pandit wrote:
> > >
> > >
> > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > Sent: Tuesday, September 26, 2023 12:06 AM
> > >
> > > > One can thinkably do that wait in hardware, though. Just defer
> > > > completion until read is done.
> > > >
> > > Once OASIS does such new interface and if some hw vendor _actually_ wants
> > to do such complex hw, may be vfio driver can adopt to it.
> > 
> > The reset behaviour I describe is already in the spec. What else do you want
> > OASIS to standardize? Virtio currently is just a register map it does not yet
> > include suggestions on how exactly do pci express transactions look. You feel we
> > should add that?
> 
> The reset behavior in the spec for modern as listed in [1] and [2] is just fine.
> 
> What I meant is in context of having MMIO based legacy registers to "defer completion until read is done".
> I think you meant, "Just differ read completion, until reset is done".

yes

> This means the hw needs to finish the device reset for thousands of devices within the read completion timeout of the pci.

no, each device does it's own reset.

> So when if OASIS does such standardization, someone can implement it.
> 
> What I recollect, is OASIS didn't not standardize such anti-scale approach and took the admin command approach which achieve better scale.
> Hope I clarified.

You are talking about the extension for trap and emulate.
I am instead talking about devices that work with
existing legacy linux drivers with no traps.

> I am not expecting OASIS to do anything extra for legacy registers.
> 
> [1] The device MUST reset when 0 is written to device_status, and present a 0 in device_status once that is done.
> [2] After writing 0 to device_status, the driver MUST wait for a read of device_status to return 0 before reinitializing
> the device.

We can add a note explaining that legacy drivers do not wait
after doing reset, that is not a problem.
If someone wants to make a device that works with existing
legacy linux drivers, they can do that.
Won't work with all drivers though, which is why oasis did not
want to standardize this.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-10-12 11:30                                             ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-12 11:30 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Jason Wang, Jason Gunthorpe, Alex Williamson, Yishai Hadas, kvm,
	virtualization, Feng Liu, Jiri Pirko, kevin.tian, joao.m.martins,
	Leon Romanovsky, Maor Gottlieb

On Thu, Oct 12, 2023 at 11:11:20AM +0000, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Thursday, October 12, 2023 4:23 PM
> > 
> > On Tue, Sep 26, 2023 at 03:45:36AM +0000, Parav Pandit wrote:
> > >
> > >
> > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > Sent: Tuesday, September 26, 2023 12:06 AM
> > >
> > > > One can thinkably do that wait in hardware, though. Just defer
> > > > completion until read is done.
> > > >
> > > Once OASIS does such new interface and if some hw vendor _actually_ wants
> > to do such complex hw, may be vfio driver can adopt to it.
> > 
> > The reset behaviour I describe is already in the spec. What else do you want
> > OASIS to standardize? Virtio currently is just a register map it does not yet
> > include suggestions on how exactly do pci express transactions look. You feel we
> > should add that?
> 
> The reset behavior in the spec for modern as listed in [1] and [2] is just fine.
> 
> What I meant is in context of having MMIO based legacy registers to "defer completion until read is done".
> I think you meant, "Just differ read completion, until reset is done".

yes

> This means the hw needs to finish the device reset for thousands of devices within the read completion timeout of the pci.

no, each device does it's own reset.

> So when if OASIS does such standardization, someone can implement it.
> 
> What I recollect, is OASIS didn't not standardize such anti-scale approach and took the admin command approach which achieve better scale.
> Hope I clarified.

You are talking about the extension for trap and emulate.
I am instead talking about devices that work with
existing legacy linux drivers with no traps.

> I am not expecting OASIS to do anything extra for legacy registers.
> 
> [1] The device MUST reset when 0 is written to device_status, and present a 0 in device_status once that is done.
> [2] After writing 0 to device_status, the driver MUST wait for a read of device_status to return 0 before reinitializing
> the device.

We can add a note explaining that legacy drivers do not wait
after doing reset, that is not a problem.
If someone wants to make a device that works with existing
legacy linux drivers, they can do that.
Won't work with all drivers though, which is why oasis did not
want to standardize this.

-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* RE: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-10-12 11:30                                             ` Michael S. Tsirkin
@ 2023-10-12 11:40                                               ` Parav Pandit via Virtualization
  -1 siblings, 0 replies; 321+ messages in thread
From: Parav Pandit @ 2023-10-12 11:40 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, Jason Gunthorpe, Alex Williamson, Yishai Hadas, kvm,
	virtualization, Feng Liu, Jiri Pirko, kevin.tian, joao.m.martins,
	Leon Romanovsky, Maor Gottlieb


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Thursday, October 12, 2023 5:00 PM

> I am instead talking about devices that work with existing legacy linux drivers
> with no traps.
> 
Yep, I understood.

> > I am not expecting OASIS to do anything extra for legacy registers.
> >
> > [1] The device MUST reset when 0 is written to device_status, and present a 0
> in device_status once that is done.
> > [2] After writing 0 to device_status, the driver MUST wait for a read
> > of device_status to return 0 before reinitializing the device.
> 
> We can add a note explaining that legacy drivers do not wait after doing reset,
> that is not a problem.
> If someone wants to make a device that works with existing legacy linux drivers,
> they can do that.
> Won't work with all drivers though, which is why oasis did not want to
> standardize this.

Ok. thanks.

^ permalink raw reply	[flat|nested] 321+ messages in thread

* RE: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-10-12 11:40                                               ` Parav Pandit via Virtualization
  0 siblings, 0 replies; 321+ messages in thread
From: Parav Pandit via Virtualization @ 2023-10-12 11:40 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, Maor Gottlieb, virtualization, Jason Gunthorpe, Jiri Pirko,
	Leon Romanovsky


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Thursday, October 12, 2023 5:00 PM

> I am instead talking about devices that work with existing legacy linux drivers
> with no traps.
> 
Yep, I understood.

> > I am not expecting OASIS to do anything extra for legacy registers.
> >
> > [1] The device MUST reset when 0 is written to device_status, and present a 0
> in device_status once that is done.
> > [2] After writing 0 to device_status, the driver MUST wait for a read
> > of device_status to return 0 before reinitializing the device.
> 
> We can add a note explaining that legacy drivers do not wait after doing reset,
> that is not a problem.
> If someone wants to make a device that works with existing legacy linux drivers,
> they can do that.
> Won't work with all drivers though, which is why oasis did not want to
> standardize this.

Ok. thanks.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-12 10:29                                           ` Zhu, Lingshan
  (?)
@ 2023-10-12 13:27                                           ` Jason Gunthorpe
  2023-10-13 10:28                                             ` Zhu, Lingshan
  -1 siblings, 1 reply; 321+ messages in thread
From: Jason Gunthorpe @ 2023-10-12 13:27 UTC (permalink / raw)
  To: Zhu, Lingshan
  Cc: Parav Pandit, Christoph Hellwig, Michael S. Tsirkin, Jason Wang,
	kvm, Maor Gottlieb, virtualization, Jiri Pirko, Leon Romanovsky

On Thu, Oct 12, 2023 at 06:29:47PM +0800, Zhu, Lingshan wrote:

> sorry for the late reply, we have discussed this for weeks in virtio mailing
> list. I have proposed a live migration solution which is a config space solution.

I'm sorry that can't be a serious proposal - config space can't do
DMA, it is not suitable.

Jason

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-12 13:27                                           ` Jason Gunthorpe
@ 2023-10-13 10:28                                             ` Zhu, Lingshan
  2023-10-13 13:50                                                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 321+ messages in thread
From: Zhu, Lingshan @ 2023-10-13 10:28 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvm, Michael S. Tsirkin, Leon Romanovsky, virtualization,
	Christoph Hellwig, Jiri Pirko, Maor Gottlieb


[-- Attachment #1.1: Type: text/plain, Size: 856 bytes --]



On 10/12/2023 9:27 PM, Jason Gunthorpe wrote:
> On Thu, Oct 12, 2023 at 06:29:47PM +0800, Zhu, Lingshan wrote:
>
>> sorry for the late reply, we have discussed this for weeks in virtio mailing
>> list. I have proposed a live migration solution which is a config space solution.
> I'm sorry that can't be a serious proposal - config space can't do
> DMA, it is not suitable.
config space only controls the live migration process and config the 
related facilities.
We don't use config space to transfer data.

The new added registers work like queue_enable or features.

For example, we use DMA to report dirty pages and MMIO to fetch the 
dirty data.

I remember in another thread you said:"you can't use DMA for any 
migration flows"

And I agree to that statement, so we use config space registers to 
control the flow.

Thanks,
Zhu Lingshan
>
> Jason

[-- Attachment #1.2: Type: text/html, Size: 1820 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-13 10:28                                             ` Zhu, Lingshan
@ 2023-10-13 13:50                                                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-13 13:50 UTC (permalink / raw)
  To: Zhu, Lingshan
  Cc: kvm, Leon Romanovsky, virtualization, Christoph Hellwig,
	Jason Gunthorpe, Jiri Pirko, Maor Gottlieb

On Fri, Oct 13, 2023 at 06:28:34PM +0800, Zhu, Lingshan wrote:
> 
> 
> On 10/12/2023 9:27 PM, Jason Gunthorpe wrote:
> 
>     On Thu, Oct 12, 2023 at 06:29:47PM +0800, Zhu, Lingshan wrote:
> 
> 
>         sorry for the late reply, we have discussed this for weeks in virtio mailing
>         list. I have proposed a live migration solution which is a config space solution.
> 
>     I'm sorry that can't be a serious proposal - config space can't do
>     DMA, it is not suitable.
> 
> config space only controls the live migration process and config the related
> facilities.
> We don't use config space to transfer data.
> 
> The new added registers work like queue_enable or features.
> 
> For example, we use DMA to report dirty pages and MMIO to fetch the dirty data.
> 
> I remember in another thread you said:"you can't use DMA for any migration
> flows"
> 
> And I agree to that statement, so we use config space registers to control the
> flow.
> 
> Thanks,
> Zhu Lingshan
> 
> 
>     Jason
> 

If you are using dma then I don't see what's wrong with admin vq.
dma is all it does.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-13 13:50                                                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-13 13:50 UTC (permalink / raw)
  To: Zhu, Lingshan
  Cc: Jason Gunthorpe, Parav Pandit, Christoph Hellwig, Jason Wang,
	kvm, Maor Gottlieb, virtualization, Jiri Pirko, Leon Romanovsky

On Fri, Oct 13, 2023 at 06:28:34PM +0800, Zhu, Lingshan wrote:
> 
> 
> On 10/12/2023 9:27 PM, Jason Gunthorpe wrote:
> 
>     On Thu, Oct 12, 2023 at 06:29:47PM +0800, Zhu, Lingshan wrote:
> 
> 
>         sorry for the late reply, we have discussed this for weeks in virtio mailing
>         list. I have proposed a live migration solution which is a config space solution.
> 
>     I'm sorry that can't be a serious proposal - config space can't do
>     DMA, it is not suitable.
> 
> config space only controls the live migration process and config the related
> facilities.
> We don't use config space to transfer data.
> 
> The new added registers work like queue_enable or features.
> 
> For example, we use DMA to report dirty pages and MMIO to fetch the dirty data.
> 
> I remember in another thread you said:"you can't use DMA for any migration
> flows"
> 
> And I agree to that statement, so we use config space registers to control the
> flow.
> 
> Thanks,
> Zhu Lingshan
> 
> 
>     Jason
> 

If you are using dma then I don't see what's wrong with admin vq.
dma is all it does.


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-13 13:50                                                 ` Michael S. Tsirkin
@ 2023-10-16  8:33                                                   ` Zhu, Lingshan
  -1 siblings, 0 replies; 321+ messages in thread
From: Zhu, Lingshan @ 2023-10-16  8:33 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Gunthorpe, Parav Pandit, Christoph Hellwig, Jason Wang,
	kvm, Maor Gottlieb, virtualization, Jiri Pirko, Leon Romanovsky



On 10/13/2023 9:50 PM, Michael S. Tsirkin wrote:
> On Fri, Oct 13, 2023 at 06:28:34PM +0800, Zhu, Lingshan wrote:
>>
>> On 10/12/2023 9:27 PM, Jason Gunthorpe wrote:
>>
>>      On Thu, Oct 12, 2023 at 06:29:47PM +0800, Zhu, Lingshan wrote:
>>
>>
>>          sorry for the late reply, we have discussed this for weeks in virtio mailing
>>          list. I have proposed a live migration solution which is a config space solution.
>>
>>      I'm sorry that can't be a serious proposal - config space can't do
>>      DMA, it is not suitable.
>>
>> config space only controls the live migration process and config the related
>> facilities.
>> We don't use config space to transfer data.
>>
>> The new added registers work like queue_enable or features.
>>
>> For example, we use DMA to report dirty pages and MMIO to fetch the dirty data.
>>
>> I remember in another thread you said:"you can't use DMA for any migration
>> flows"
>>
>> And I agree to that statement, so we use config space registers to control the
>> flow.
>>
>> Thanks,
>> Zhu Lingshan
>>
>>
>>      Jason
>>
> If you are using dma then I don't see what's wrong with admin vq.
> dma is all it does.
dma != admin vq,

and I think we have discussed many details in pros and cons
in admin vq live migration proposal in virtio-comment.
I am not sure we should span the discussions here, repeat them over again.

Thanks
>


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-16  8:33                                                   ` Zhu, Lingshan
  0 siblings, 0 replies; 321+ messages in thread
From: Zhu, Lingshan @ 2023-10-16  8:33 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, Leon Romanovsky, virtualization, Christoph Hellwig,
	Jason Gunthorpe, Jiri Pirko, Maor Gottlieb



On 10/13/2023 9:50 PM, Michael S. Tsirkin wrote:
> On Fri, Oct 13, 2023 at 06:28:34PM +0800, Zhu, Lingshan wrote:
>>
>> On 10/12/2023 9:27 PM, Jason Gunthorpe wrote:
>>
>>      On Thu, Oct 12, 2023 at 06:29:47PM +0800, Zhu, Lingshan wrote:
>>
>>
>>          sorry for the late reply, we have discussed this for weeks in virtio mailing
>>          list. I have proposed a live migration solution which is a config space solution.
>>
>>      I'm sorry that can't be a serious proposal - config space can't do
>>      DMA, it is not suitable.
>>
>> config space only controls the live migration process and config the related
>> facilities.
>> We don't use config space to transfer data.
>>
>> The new added registers work like queue_enable or features.
>>
>> For example, we use DMA to report dirty pages and MMIO to fetch the dirty data.
>>
>> I remember in another thread you said:"you can't use DMA for any migration
>> flows"
>>
>> And I agree to that statement, so we use config space registers to control the
>> flow.
>>
>> Thanks,
>> Zhu Lingshan
>>
>>
>>      Jason
>>
> If you are using dma then I don't see what's wrong with admin vq.
> dma is all it does.
dma != admin vq,

and I think we have discussed many details in pros and cons
in admin vq live migration proposal in virtio-comment.
I am not sure we should span the discussions here, repeat them over again.

Thanks
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-16  8:33                                                   ` Zhu, Lingshan
@ 2023-10-16  8:52                                                     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-16  8:52 UTC (permalink / raw)
  To: Zhu, Lingshan
  Cc: kvm, Leon Romanovsky, virtualization, Christoph Hellwig,
	Jason Gunthorpe, Jiri Pirko, Maor Gottlieb

On Mon, Oct 16, 2023 at 04:33:10PM +0800, Zhu, Lingshan wrote:
> 
> 
> On 10/13/2023 9:50 PM, Michael S. Tsirkin wrote:
> > On Fri, Oct 13, 2023 at 06:28:34PM +0800, Zhu, Lingshan wrote:
> > > 
> > > On 10/12/2023 9:27 PM, Jason Gunthorpe wrote:
> > > 
> > >      On Thu, Oct 12, 2023 at 06:29:47PM +0800, Zhu, Lingshan wrote:
> > > 
> > > 
> > >          sorry for the late reply, we have discussed this for weeks in virtio mailing
> > >          list. I have proposed a live migration solution which is a config space solution.
> > > 
> > >      I'm sorry that can't be a serious proposal - config space can't do
> > >      DMA, it is not suitable.
> > > 
> > > config space only controls the live migration process and config the related
> > > facilities.
> > > We don't use config space to transfer data.
> > > 
> > > The new added registers work like queue_enable or features.
> > > 
> > > For example, we use DMA to report dirty pages and MMIO to fetch the dirty data.
> > > 
> > > I remember in another thread you said:"you can't use DMA for any migration
> > > flows"
> > > 
> > > And I agree to that statement, so we use config space registers to control the
> > > flow.
> > > 
> > > Thanks,
> > > Zhu Lingshan
> > > 
> > > 
> > >      Jason
> > > 
> > If you are using dma then I don't see what's wrong with admin vq.
> > dma is all it does.
> dma != admin vq,

Well they share the same issue that they don't work for nesting
because DMA can not be intercepted.

> and I think we have discussed many details in pros and cons
> in admin vq live migration proposal in virtio-comment.
> I am not sure we should span the discussions here, repeat them over again.
> 
> Thanks
> > 

Yea let's not.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-16  8:52                                                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 321+ messages in thread
From: Michael S. Tsirkin @ 2023-10-16  8:52 UTC (permalink / raw)
  To: Zhu, Lingshan
  Cc: Jason Gunthorpe, Parav Pandit, Christoph Hellwig, Jason Wang,
	kvm, Maor Gottlieb, virtualization, Jiri Pirko, Leon Romanovsky

On Mon, Oct 16, 2023 at 04:33:10PM +0800, Zhu, Lingshan wrote:
> 
> 
> On 10/13/2023 9:50 PM, Michael S. Tsirkin wrote:
> > On Fri, Oct 13, 2023 at 06:28:34PM +0800, Zhu, Lingshan wrote:
> > > 
> > > On 10/12/2023 9:27 PM, Jason Gunthorpe wrote:
> > > 
> > >      On Thu, Oct 12, 2023 at 06:29:47PM +0800, Zhu, Lingshan wrote:
> > > 
> > > 
> > >          sorry for the late reply, we have discussed this for weeks in virtio mailing
> > >          list. I have proposed a live migration solution which is a config space solution.
> > > 
> > >      I'm sorry that can't be a serious proposal - config space can't do
> > >      DMA, it is not suitable.
> > > 
> > > config space only controls the live migration process and config the related
> > > facilities.
> > > We don't use config space to transfer data.
> > > 
> > > The new added registers work like queue_enable or features.
> > > 
> > > For example, we use DMA to report dirty pages and MMIO to fetch the dirty data.
> > > 
> > > I remember in another thread you said:"you can't use DMA for any migration
> > > flows"
> > > 
> > > And I agree to that statement, so we use config space registers to control the
> > > flow.
> > > 
> > > Thanks,
> > > Zhu Lingshan
> > > 
> > > 
> > >      Jason
> > > 
> > If you are using dma then I don't see what's wrong with admin vq.
> > dma is all it does.
> dma != admin vq,

Well they share the same issue that they don't work for nesting
because DMA can not be intercepted.

> and I think we have discussed many details in pros and cons
> in admin vq live migration proposal in virtio-comment.
> I am not sure we should span the discussions here, repeat them over again.
> 
> Thanks
> > 

Yea let's not.

-- 
MST


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
  2023-10-16  8:52                                                     ` Michael S. Tsirkin
@ 2023-10-16  9:53                                                       ` Zhu, Lingshan
  -1 siblings, 0 replies; 321+ messages in thread
From: Zhu, Lingshan @ 2023-10-16  9:53 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Gunthorpe, Parav Pandit, Christoph Hellwig, Jason Wang,
	kvm, Maor Gottlieb, virtualization, Jiri Pirko, Leon Romanovsky



On 10/16/2023 4:52 PM, Michael S. Tsirkin wrote:
> On Mon, Oct 16, 2023 at 04:33:10PM +0800, Zhu, Lingshan wrote:
>>
>> On 10/13/2023 9:50 PM, Michael S. Tsirkin wrote:
>>> On Fri, Oct 13, 2023 at 06:28:34PM +0800, Zhu, Lingshan wrote:
>>>> On 10/12/2023 9:27 PM, Jason Gunthorpe wrote:
>>>>
>>>>       On Thu, Oct 12, 2023 at 06:29:47PM +0800, Zhu, Lingshan wrote:
>>>>
>>>>
>>>>           sorry for the late reply, we have discussed this for weeks in virtio mailing
>>>>           list. I have proposed a live migration solution which is a config space solution.
>>>>
>>>>       I'm sorry that can't be a serious proposal - config space can't do
>>>>       DMA, it is not suitable.
>>>>
>>>> config space only controls the live migration process and config the related
>>>> facilities.
>>>> We don't use config space to transfer data.
>>>>
>>>> The new added registers work like queue_enable or features.
>>>>
>>>> For example, we use DMA to report dirty pages and MMIO to fetch the dirty data.
>>>>
>>>> I remember in another thread you said:"you can't use DMA for any migration
>>>> flows"
>>>>
>>>> And I agree to that statement, so we use config space registers to control the
>>>> flow.
>>>>
>>>> Thanks,
>>>> Zhu Lingshan
>>>>
>>>>
>>>>       Jason
>>>>
>>> If you are using dma then I don't see what's wrong with admin vq.
>>> dma is all it does.
>> dma != admin vq,
> Well they share the same issue that they don't work for nesting
> because DMA can not be intercepted.
(hope this is not a spam to virtualization list and I try to keep this 
short)
only use dma for host memory access, e.g., dirty page bitmap, no need to 
intercepted.
>
>> and I think we have discussed many details in pros and cons
>> in admin vq live migration proposal in virtio-comment.
>> I am not sure we should span the discussions here, repeat them over again.
>>
>> Thanks
> Yea let's not.
>


^ permalink raw reply	[flat|nested] 321+ messages in thread

* Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
@ 2023-10-16  9:53                                                       ` Zhu, Lingshan
  0 siblings, 0 replies; 321+ messages in thread
From: Zhu, Lingshan @ 2023-10-16  9:53 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, Leon Romanovsky, virtualization, Christoph Hellwig,
	Jason Gunthorpe, Jiri Pirko, Maor Gottlieb



On 10/16/2023 4:52 PM, Michael S. Tsirkin wrote:
> On Mon, Oct 16, 2023 at 04:33:10PM +0800, Zhu, Lingshan wrote:
>>
>> On 10/13/2023 9:50 PM, Michael S. Tsirkin wrote:
>>> On Fri, Oct 13, 2023 at 06:28:34PM +0800, Zhu, Lingshan wrote:
>>>> On 10/12/2023 9:27 PM, Jason Gunthorpe wrote:
>>>>
>>>>       On Thu, Oct 12, 2023 at 06:29:47PM +0800, Zhu, Lingshan wrote:
>>>>
>>>>
>>>>           sorry for the late reply, we have discussed this for weeks in virtio mailing
>>>>           list. I have proposed a live migration solution which is a config space solution.
>>>>
>>>>       I'm sorry that can't be a serious proposal - config space can't do
>>>>       DMA, it is not suitable.
>>>>
>>>> config space only controls the live migration process and config the related
>>>> facilities.
>>>> We don't use config space to transfer data.
>>>>
>>>> The new added registers work like queue_enable or features.
>>>>
>>>> For example, we use DMA to report dirty pages and MMIO to fetch the dirty data.
>>>>
>>>> I remember in another thread you said:"you can't use DMA for any migration
>>>> flows"
>>>>
>>>> And I agree to that statement, so we use config space registers to control the
>>>> flow.
>>>>
>>>> Thanks,
>>>> Zhu Lingshan
>>>>
>>>>
>>>>       Jason
>>>>
>>> If you are using dma then I don't see what's wrong with admin vq.
>>> dma is all it does.
>> dma != admin vq,
> Well they share the same issue that they don't work for nesting
> because DMA can not be intercepted.
(hope this is not a spam to virtualization list and I try to keep this 
short)
only use dma for host memory access, e.g., dirty page bitmap, no need to 
intercepted.
>
>> and I think we have discussed many details in pros and cons
>> in admin vq live migration proposal in virtio-comment.
>> I am not sure we should span the discussions here, repeat them over again.
>>
>> Thanks
> Yea let's not.
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 321+ messages in thread

end of thread, other threads:[~2023-10-16  9:53 UTC | newest]

Thread overview: 321+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-21 12:40 [PATCH vfio 00/11] Introduce a vfio driver over virtio devices Yishai Hadas via Virtualization
2023-09-21 12:40 ` Yishai Hadas
2023-09-21 12:40 ` [PATCH vfio 01/11] virtio-pci: Use virtio pci device layer vq info instead of generic one Yishai Hadas via Virtualization
2023-09-21 12:40   ` Yishai Hadas
2023-09-21 13:46   ` Michael S. Tsirkin
2023-09-21 13:46     ` Michael S. Tsirkin
2023-09-26 19:13     ` Feng Liu
2023-09-26 19:13       ` Feng Liu via Virtualization
2023-09-27 18:09       ` Feng Liu
2023-09-27 18:09         ` Feng Liu via Virtualization
2023-09-27 21:24         ` Michael S. Tsirkin
2023-09-27 21:24           ` Michael S. Tsirkin
2023-09-21 12:40 ` [PATCH vfio 02/11] virtio: Define feature bit for administration virtqueue Yishai Hadas via Virtualization
2023-09-21 12:40   ` Yishai Hadas
2023-09-21 12:40 ` [PATCH vfio 03/11] virtio-pci: Introduce admin virtqueue Yishai Hadas via Virtualization
2023-09-21 12:40   ` Yishai Hadas
2023-09-21 13:57   ` Michael S. Tsirkin
2023-09-21 13:57     ` Michael S. Tsirkin
2023-09-26 19:23     ` Feng Liu
2023-09-26 19:23       ` Feng Liu via Virtualization
2023-09-27 18:12       ` Feng Liu
2023-09-27 18:12         ` Feng Liu via Virtualization
2023-09-27 21:27         ` Michael S. Tsirkin
2023-09-27 21:27           ` Michael S. Tsirkin
2023-10-02 18:07           ` Feng Liu
2023-10-02 18:07             ` Feng Liu via Virtualization
2023-09-21 12:40 ` [PATCH vfio 04/11] virtio: Expose the synchronous command helper function Yishai Hadas via Virtualization
2023-09-21 12:40   ` Yishai Hadas
2023-09-21 12:40 ` [PATCH vfio 05/11] virtio-pci: Introduce admin command sending function Yishai Hadas via Virtualization
2023-09-21 12:40   ` Yishai Hadas
2023-09-21 12:40 ` [PATCH vfio 06/11] virtio-pci: Introduce API to get PF virtio device from VF PCI device Yishai Hadas via Virtualization
2023-09-21 12:40   ` Yishai Hadas
2023-09-21 12:40 ` [PATCH vfio 07/11] virtio-pci: Introduce admin commands Yishai Hadas via Virtualization
2023-09-21 12:40   ` Yishai Hadas
2023-09-24  5:18   ` kernel test robot
2023-09-24  5:18     ` kernel test robot
2023-09-25  3:18   ` kernel test robot
2023-09-25  3:18     ` kernel test robot
2023-09-21 12:40 ` [PATCH vfio 08/11] vfio/pci: Expose vfio_pci_core_setup_barmap() Yishai Hadas via Virtualization
2023-09-21 12:40   ` Yishai Hadas
2023-09-21 16:35   ` Alex Williamson
2023-09-21 16:35     ` Alex Williamson
2023-09-26  9:45     ` Yishai Hadas
2023-09-26  9:45       ` Yishai Hadas via Virtualization
2023-09-21 12:40 ` [PATCH vfio 09/11] vfio/pci: Expose vfio_pci_iowrite/read##size() Yishai Hadas via Virtualization
2023-09-21 12:40   ` Yishai Hadas
2023-09-21 12:40 ` [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device Yishai Hadas via Virtualization
2023-09-21 12:40   ` Yishai Hadas
2023-09-21 13:08   ` Michael S. Tsirkin
2023-09-21 13:08     ` Michael S. Tsirkin
2023-09-21 20:34   ` Michael S. Tsirkin
2023-09-21 20:34     ` Michael S. Tsirkin
2023-09-26 10:51     ` Yishai Hadas
2023-09-26 10:51       ` Yishai Hadas via Virtualization
2023-09-26 11:25       ` Michael S. Tsirkin
2023-09-26 11:25         ` Michael S. Tsirkin
2023-09-22  9:54   ` Michael S. Tsirkin
2023-09-22  9:54     ` Michael S. Tsirkin
2023-09-26 11:14     ` Yishai Hadas
2023-09-26 11:14       ` Yishai Hadas via Virtualization
2023-09-26 11:41       ` Michael S. Tsirkin
2023-09-26 11:41         ` Michael S. Tsirkin
2023-09-27 13:18         ` Jason Gunthorpe
2023-09-27 21:30           ` Michael S. Tsirkin
2023-09-27 21:30             ` Michael S. Tsirkin
2023-09-27 23:16             ` Jason Gunthorpe
2023-09-28  5:26               ` Michael S. Tsirkin
2023-09-28  5:26                 ` Michael S. Tsirkin
2023-10-02  6:28         ` Christoph Hellwig
2023-10-02  6:28           ` Christoph Hellwig
2023-10-02 15:13           ` Jason Gunthorpe
2023-10-05  8:49             ` Christoph Hellwig
2023-10-05  8:49               ` Christoph Hellwig
2023-10-05 11:10               ` Jason Gunthorpe
2023-10-06 13:09                 ` Christoph Hellwig
2023-10-06 13:09                   ` Christoph Hellwig
2023-10-10 13:10                   ` Jason Gunthorpe
2023-10-10 13:56                     ` Michael S. Tsirkin
2023-10-10 13:56                       ` Michael S. Tsirkin
2023-10-10 14:08                       ` Jason Gunthorpe
2023-10-10 14:54                         ` Michael S. Tsirkin
2023-10-10 14:54                           ` Michael S. Tsirkin
2023-10-10 15:09                           ` Yishai Hadas
2023-10-10 15:09                             ` Yishai Hadas via Virtualization
2023-10-10 15:14                             ` Michael S. Tsirkin
2023-10-10 15:14                               ` Michael S. Tsirkin
2023-10-10 15:43                               ` Yishai Hadas
2023-10-10 15:43                                 ` Yishai Hadas via Virtualization
2023-10-10 15:58                                 ` Parav Pandit
2023-10-10 15:58                                   ` Parav Pandit via Virtualization
2023-10-10 15:58                                 ` Michael S. Tsirkin
2023-10-10 15:58                                   ` Michael S. Tsirkin
2023-10-10 16:09                                   ` Yishai Hadas
2023-10-10 16:09                                     ` Yishai Hadas via Virtualization
2023-10-10 20:42                                     ` Michael S. Tsirkin
2023-10-10 20:42                                       ` Michael S. Tsirkin
2023-10-11  7:44                                       ` Yishai Hadas
2023-10-11  7:44                                         ` Yishai Hadas via Virtualization
2023-10-11  8:02                                         ` Michael S. Tsirkin
2023-10-11  8:02                                           ` Michael S. Tsirkin
2023-10-11  8:58                                           ` Yishai Hadas
2023-10-11  8:58                                             ` Yishai Hadas via Virtualization
2023-10-11  9:03                                             ` Michael S. Tsirkin
2023-10-11  9:03                                               ` Michael S. Tsirkin
2023-10-11 11:25                                               ` Yishai Hadas
2023-10-11 11:25                                                 ` Yishai Hadas via Virtualization
2023-10-11  6:12                                 ` Christoph Hellwig
2023-10-11  6:12                                   ` Christoph Hellwig
2023-10-10 15:59                               ` Jason Gunthorpe
2023-10-10 16:03                                 ` Michael S. Tsirkin
2023-10-10 16:03                                   ` Michael S. Tsirkin
2023-10-10 16:07                                   ` Jason Gunthorpe
2023-10-10 16:21                                     ` Parav Pandit
2023-10-10 16:21                                       ` Parav Pandit via Virtualization
2023-10-10 20:38                                       ` Michael S. Tsirkin
2023-10-10 20:38                                         ` Michael S. Tsirkin
2023-10-11  6:13                                 ` Christoph Hellwig
2023-10-11  6:13                                   ` Christoph Hellwig
2023-10-11  6:43                                   ` Michael S. Tsirkin
2023-10-11  6:43                                     ` Michael S. Tsirkin
2023-10-11  6:59                                     ` Christoph Hellwig
2023-10-11  6:59                                       ` Christoph Hellwig
2023-10-11  8:00                                       ` Parav Pandit
2023-10-11  8:00                                         ` Parav Pandit via Virtualization
2023-10-11  8:10                                         ` Michael S. Tsirkin
2023-10-11  8:10                                           ` Michael S. Tsirkin
2023-10-11 12:18                                           ` Jason Gunthorpe
2023-10-11 17:03                                             ` Michael S. Tsirkin
2023-10-11 17:03                                               ` Michael S. Tsirkin
2023-10-11 17:20                                               ` Jason Gunthorpe
2023-10-11 17:05                                             ` Michael S. Tsirkin
2023-10-11 17:05                                               ` Michael S. Tsirkin
2023-10-12 10:29                                         ` Zhu, Lingshan
2023-10-12 10:29                                           ` Zhu, Lingshan
2023-10-12 13:27                                           ` Jason Gunthorpe
2023-10-13 10:28                                             ` Zhu, Lingshan
2023-10-13 13:50                                               ` Michael S. Tsirkin
2023-10-13 13:50                                                 ` Michael S. Tsirkin
2023-10-16  8:33                                                 ` Zhu, Lingshan
2023-10-16  8:33                                                   ` Zhu, Lingshan
2023-10-16  8:52                                                   ` Michael S. Tsirkin
2023-10-16  8:52                                                     ` Michael S. Tsirkin
2023-10-16  9:53                                                     ` Zhu, Lingshan
2023-10-16  9:53                                                       ` Zhu, Lingshan
2023-10-11  8:12                                       ` Michael S. Tsirkin
2023-10-11  8:12                                         ` Michael S. Tsirkin
2023-10-12 10:30                                       ` Zhu, Lingshan
2023-10-12 10:30                                         ` Zhu, Lingshan
2023-10-11  6:26                     ` Christoph Hellwig
2023-10-11  6:26                       ` Christoph Hellwig
2023-10-11 13:57                       ` Jason Gunthorpe
2023-10-11 14:17                         ` Christoph Hellwig
2023-10-11 14:17                           ` Christoph Hellwig
2023-10-11 14:58                           ` Jason Gunthorpe
2023-10-11 16:59                             ` Michael S. Tsirkin
2023-10-11 16:59                               ` Michael S. Tsirkin
2023-10-11 17:19                               ` Jason Gunthorpe
2023-10-11 20:20                                 ` Michael S. Tsirkin
2023-10-11 20:20                                   ` Michael S. Tsirkin
2023-09-21 12:40 ` [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices Yishai Hadas via Virtualization
2023-09-21 12:40   ` Yishai Hadas
2023-09-21 13:16   ` Michael S. Tsirkin
2023-09-21 13:16     ` Michael S. Tsirkin
2023-09-21 14:11     ` Jason Gunthorpe
2023-09-21 14:16       ` Michael S. Tsirkin
2023-09-21 14:16         ` Michael S. Tsirkin
2023-09-21 16:41         ` Jason Gunthorpe
2023-09-21 16:53           ` Michael S. Tsirkin
2023-09-21 16:53             ` Michael S. Tsirkin
2023-09-21 18:39             ` Jason Gunthorpe
2023-09-21 19:13               ` Michael S. Tsirkin
2023-09-21 19:13                 ` Michael S. Tsirkin
2023-09-21 19:49                 ` Jason Gunthorpe
2023-09-21 20:45                   ` Michael S. Tsirkin
2023-09-21 20:45                     ` Michael S. Tsirkin
2023-09-21 22:55                     ` Jason Gunthorpe
2023-09-22  3:02                       ` Jason Wang
2023-09-22  3:02                         ` Jason Wang
2023-09-22 11:23                       ` Michael S. Tsirkin
2023-09-22 11:23                         ` Michael S. Tsirkin
2023-09-22 12:15                         ` Jason Gunthorpe
2023-09-22  3:01                   ` Jason Wang
2023-09-22  3:01                     ` Jason Wang
2023-09-22 12:11                     ` Jason Gunthorpe
2023-09-25  2:34                       ` Jason Wang
2023-09-25  2:34                         ` Jason Wang
2023-09-25 12:26                         ` Jason Gunthorpe
2023-09-25 19:44                           ` Michael S. Tsirkin
2023-09-25 19:44                             ` Michael S. Tsirkin
2023-09-26  0:40                             ` Jason Gunthorpe
2023-09-26  5:34                               ` Michael S. Tsirkin
2023-09-26  5:34                                 ` Michael S. Tsirkin
2023-09-26  5:42                               ` Michael S. Tsirkin
2023-09-26  5:42                                 ` Michael S. Tsirkin
2023-09-26 13:50                                 ` Jason Gunthorpe
2023-09-27 21:38                                   ` Michael S. Tsirkin
2023-09-27 21:38                                     ` Michael S. Tsirkin
2023-09-27 23:20                                     ` Jason Gunthorpe
2023-09-28  5:31                                       ` Michael S. Tsirkin
2023-09-28  5:31                                         ` Michael S. Tsirkin
2023-09-26  4:37                           ` Jason Wang
2023-09-26  4:37                             ` Jason Wang
2023-09-26  5:33                             ` Parav Pandit
2023-09-26  5:33                               ` Parav Pandit via Virtualization
2023-09-21 19:17               ` Michael S. Tsirkin
2023-09-21 19:17                 ` Michael S. Tsirkin
2023-09-21 19:51                 ` Jason Gunthorpe
2023-09-21 20:55                   ` Michael S. Tsirkin
2023-09-21 20:55                     ` Michael S. Tsirkin
2023-09-21 23:08                     ` Jason Gunthorpe
2023-09-25  4:44                     ` Zhu, Lingshan
2023-09-25  4:44                       ` Zhu, Lingshan
2023-09-22  3:45               ` Zhu, Lingshan
2023-09-22  3:45                 ` Zhu, Lingshan
2023-09-21 13:33   ` Michael S. Tsirkin
2023-09-21 13:33     ` Michael S. Tsirkin
2023-09-21 16:43   ` Alex Williamson
2023-09-21 16:43     ` Alex Williamson
2023-09-21 16:52     ` Jason Gunthorpe
2023-09-21 17:01       ` Michael S. Tsirkin
2023-09-21 17:01         ` Michael S. Tsirkin
2023-09-21 17:07         ` Jason Gunthorpe
2023-09-21 17:21           ` Michael S. Tsirkin
2023-09-21 17:21             ` Michael S. Tsirkin
2023-09-21 17:44             ` Jason Gunthorpe
2023-09-21 17:55               ` Michael S. Tsirkin
2023-09-21 17:55                 ` Michael S. Tsirkin
2023-09-21 18:16                 ` Jason Gunthorpe
2023-09-21 19:34                   ` Michael S. Tsirkin
2023-09-21 19:34                     ` Michael S. Tsirkin
2023-09-21 19:53                     ` Jason Gunthorpe
2023-09-21 20:16                       ` Michael S. Tsirkin
2023-09-21 20:16                         ` Michael S. Tsirkin
2023-09-21 22:48                         ` Jason Gunthorpe
2023-09-22  9:47                           ` Michael S. Tsirkin
2023-09-22  9:47                             ` Michael S. Tsirkin
2023-09-22 12:23                             ` Jason Gunthorpe
2023-09-22 15:45                               ` Michael S. Tsirkin
2023-09-22 15:45                                 ` Michael S. Tsirkin
2023-09-22  3:02                         ` Jason Wang
2023-09-22  3:02                           ` Jason Wang
2023-09-22 12:22                           ` Jason Gunthorpe
2023-09-22 12:25                             ` Parav Pandit
2023-09-22 12:25                               ` Parav Pandit via Virtualization
2023-09-22 15:13                               ` Michael S. Tsirkin
2023-09-22 15:13                                 ` Michael S. Tsirkin
2023-09-22 15:15                                 ` Jason Gunthorpe
2023-09-22 15:40                                   ` Michael S. Tsirkin
2023-09-22 15:40                                     ` Michael S. Tsirkin
2023-09-22 16:22                                     ` Jason Gunthorpe
2023-09-25 17:36                                       ` Michael S. Tsirkin
2023-09-25 17:36                                         ` Michael S. Tsirkin
2023-09-25  2:30                               ` Jason Wang
2023-09-25  2:30                                 ` Jason Wang
2023-09-25  8:26                                 ` Parav Pandit
2023-09-25  8:26                                   ` Parav Pandit via Virtualization
2023-09-25 18:36                                   ` Michael S. Tsirkin
2023-09-25 18:36                                     ` Michael S. Tsirkin
2023-09-26  2:34                                     ` Zhu, Lingshan
2023-09-26  2:34                                       ` Zhu, Lingshan
2023-09-26  3:45                                     ` Parav Pandit
2023-09-26  3:45                                       ` Parav Pandit via Virtualization
2023-09-26  4:37                                       ` Jason Wang
2023-09-26  4:37                                         ` Jason Wang
2023-10-12 10:52                                       ` Michael S. Tsirkin
2023-10-12 10:52                                         ` Michael S. Tsirkin
2023-10-12 11:11                                         ` Parav Pandit
2023-10-12 11:11                                           ` Parav Pandit via Virtualization
2023-10-12 11:30                                           ` Michael S. Tsirkin
2023-10-12 11:30                                             ` Michael S. Tsirkin
2023-10-12 11:40                                             ` Parav Pandit
2023-10-12 11:40                                               ` Parav Pandit via Virtualization
2023-09-26  2:32                                   ` Jason Wang
2023-09-26  2:32                                     ` Jason Wang
2023-09-26  4:01                                     ` Parav Pandit
2023-09-26  4:01                                       ` Parav Pandit via Virtualization
2023-09-26  4:37                                       ` Jason Wang
2023-09-26  4:37                                         ` Jason Wang
2023-09-26  5:27                                         ` Parav Pandit
2023-09-26  5:27                                           ` Parav Pandit via Virtualization
2023-09-26 11:49                                     ` Michael S. Tsirkin
2023-09-26 11:49                                       ` Michael S. Tsirkin
2023-10-08  4:28                                       ` Jason Wang
2023-10-08  4:28                                         ` Jason Wang
2023-09-22  3:02                       ` Jason Wang
2023-09-22  3:02                         ` Jason Wang
2023-09-22 12:25                         ` Jason Gunthorpe
2023-09-22 15:39                           ` Michael S. Tsirkin
2023-09-22 15:39                             ` Michael S. Tsirkin
2023-09-22 16:19                             ` Jason Gunthorpe
2023-09-25 18:16                               ` Michael S. Tsirkin
2023-09-25 18:16                                 ` Michael S. Tsirkin
2023-09-25 18:53                                 ` Jason Gunthorpe
2023-09-25 19:52                                   ` Michael S. Tsirkin
2023-09-25 19:52                                     ` Michael S. Tsirkin
2023-09-21 17:09         ` Parav Pandit via Virtualization
2023-09-21 17:09           ` Parav Pandit
2023-09-21 17:24           ` Michael S. Tsirkin
2023-09-21 17:24             ` Michael S. Tsirkin
2023-09-21 19:58   ` Alex Williamson
2023-09-21 19:58     ` Alex Williamson
2023-09-21 20:01     ` Jason Gunthorpe
2023-09-21 20:20       ` Michael S. Tsirkin
2023-09-21 20:20         ` Michael S. Tsirkin
2023-09-21 20:59         ` Alex Williamson
2023-09-21 20:59           ` Alex Williamson
2023-09-22 12:37     ` Jason Gunthorpe
2023-09-22 12:59       ` Parav Pandit
2023-09-22 12:59         ` Parav Pandit via Virtualization
2023-09-26 15:20     ` Yishai Hadas
2023-09-26 15:20       ` Yishai Hadas via Virtualization
2023-09-26 17:00       ` Michael S. Tsirkin
2023-09-26 17:00         ` Michael S. Tsirkin
2023-10-02  4:38         ` Parav Pandit
2023-10-02  4:38           ` Parav Pandit via Virtualization
2023-09-22 10:10   ` Michael S. Tsirkin
2023-09-22 10:10     ` Michael S. Tsirkin
2023-09-22 15:53   ` Michael S. Tsirkin
2023-09-22 15:53     ` Michael S. Tsirkin
2023-10-02 11:23     ` Parav Pandit
2023-10-02 11:23       ` Parav Pandit via Virtualization

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.